date:20160725

Re: [PATCH] clocksource: sun4i: Clear interrupts after stopping timer in probe function

2016-07-25 Thread Maxime Ripard

On Tue, Jul 26, 2016 at 11:01:59AM +0800, Chen-Yu Tsai wrote:
> The bootloader (U-boot) sometimes uses this timer for various delays.
> It uses it as a ongoing counter, and does comparisons on the current
> counter value. The timer counter is never stopped.
> 
> In some cases when the user interacts with the bootloader, or lets
> it idle for some time before loading Linux, the timer may expire,
> and an interrupt will be pending. This results in an unexpected
> interrupt when the timer interrupt is enabled by the kernel, at
> which point the event_handler isn't set yet. This results in a NULL
> pointer dereference exception, panic, and no way to reboot.
> 
> Clear any pending interrupts after we stop the timer in the probe
> function to avoid this.
> 
> Signed-off-by: Chen-Yu Tsai 

Awesome, thanks!

You should put stable in Cc though for this kind of patches.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

Re: [PATCH] clocksource: sun4i: Clear interrupts after stopping timer in probe function

2016-07-25 Thread Maxime Ripard

On Tue, Jul 26, 2016 at 11:01:59AM +0800, Chen-Yu Tsai wrote:
> The bootloader (U-boot) sometimes uses this timer for various delays.
> It uses it as a ongoing counter, and does comparisons on the current
> counter value. The timer counter is never stopped.
> 
> In some cases when the user interacts with the bootloader, or lets
> it idle for some time before loading Linux, the timer may expire,
> and an interrupt will be pending. This results in an unexpected
> interrupt when the timer interrupt is enabled by the kernel, at
> which point the event_handler isn't set yet. This results in a NULL
> pointer dereference exception, panic, and no way to reboot.
> 
> Clear any pending interrupts after we stop the timer in the probe
> function to avoid this.
> 
> Signed-off-by: Chen-Yu Tsai 

Awesome, thanks!

You should put stable in Cc though for this kind of patches.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

[PATCH] USB: appledisplay: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread Bhaktipriya Shridhar

The workqueue "wq" is involved in controlling the brightness of an
Apple Cinema Display over USB.

It has a single work item(>work) per appledisplay and hence
doesn't require ordering. Also, it is not being used on a memory
reclaim path.

Hence, the singlethreaded workqueue has been replaced with the use of
system_wq.

System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.

The work item is self-requeueing and needs to wait for the in-flight
work item to finish before proceeding with destruction.
Hence, it has been sync cancelled in appledisplay_disconnect().
This also ensures that there are no pending tasks while disconnecting the
driver.

Signed-off-by: Bhaktipriya Shridhar 
---
 drivers/usb/misc/appledisplay.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/usb/misc/appledisplay.c b/drivers/usb/misc/appledisplay.c
index a0a3827..c760455 100644
--- a/drivers/usb/misc/appledisplay.c
+++ b/drivers/usb/misc/appledisplay.c
@@ -85,7 +85,6 @@ struct appledisplay {
 };

 static atomic_t count_displays = ATOMIC_INIT(0);
-static struct workqueue_struct *wq;

 static void appledisplay_complete(struct urb *urb)
 {
@@ -122,7 +121,7 @@ static void appledisplay_complete(struct urb *urb)
case ACD_BTN_BRIGHT_UP:
case ACD_BTN_BRIGHT_DOWN:
pdata->button_pressed = 1;
-   queue_delayed_work(wq, >work, 0);
+   schedule_delayed_work(>work, 0);
break;
case ACD_BTN_NONE:
default:
@@ -159,7 +158,7 @@ static int appledisplay_bl_update_status(struct 
backlight_device *bd)
pdata->msgdata, 2,
ACD_USB_TIMEOUT);
mutex_unlock(>sysfslock);
-
+
return retval;
 }

@@ -344,7 +343,7 @@ static void appledisplay_disconnect(struct usb_interface 
*iface)

if (pdata) {
usb_kill_urb(pdata->urb);
-   cancel_delayed_work(>work);
+   cancel_delayed_work_sync(>work);
backlight_device_unregister(pdata->bd);
usb_free_coherent(pdata->udev, ACD_URB_BUFFER_LEN,
pdata->urbdata, pdata->urb->transfer_dma);
@@ -365,19 +364,11 @@ static struct usb_driver appledisplay_driver = {

 static int __init appledisplay_init(void)
 {
-   wq = create_singlethread_workqueue("appledisplay");
-   if (!wq) {
-   printk(KERN_ERR "appledisplay: Could not create work queue\n");
-   return -ENOMEM;
-   }
-
return usb_register(_driver);
 }

 static void __exit appledisplay_exit(void)
 {
-   flush_workqueue(wq);
-   destroy_workqueue(wq);
usb_deregister(_driver);
 }

--
2.1.4

[PATCH] USB: appledisplay: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread Bhaktipriya Shridhar

The workqueue "wq" is involved in controlling the brightness of an
Apple Cinema Display over USB.

It has a single work item(>work) per appledisplay and hence
doesn't require ordering. Also, it is not being used on a memory
reclaim path.

Hence, the singlethreaded workqueue has been replaced with the use of
system_wq.

System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.

The work item is self-requeueing and needs to wait for the in-flight
work item to finish before proceeding with destruction.
Hence, it has been sync cancelled in appledisplay_disconnect().
This also ensures that there are no pending tasks while disconnecting the
driver.

Signed-off-by: Bhaktipriya Shridhar 
---
 drivers/usb/misc/appledisplay.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/usb/misc/appledisplay.c b/drivers/usb/misc/appledisplay.c
index a0a3827..c760455 100644
--- a/drivers/usb/misc/appledisplay.c
+++ b/drivers/usb/misc/appledisplay.c
@@ -85,7 +85,6 @@ struct appledisplay {
 };

 static atomic_t count_displays = ATOMIC_INIT(0);
-static struct workqueue_struct *wq;

 static void appledisplay_complete(struct urb *urb)
 {
@@ -122,7 +121,7 @@ static void appledisplay_complete(struct urb *urb)
case ACD_BTN_BRIGHT_UP:
case ACD_BTN_BRIGHT_DOWN:
pdata->button_pressed = 1;
-   queue_delayed_work(wq, >work, 0);
+   schedule_delayed_work(>work, 0);
break;
case ACD_BTN_NONE:
default:
@@ -159,7 +158,7 @@ static int appledisplay_bl_update_status(struct 
backlight_device *bd)
pdata->msgdata, 2,
ACD_USB_TIMEOUT);
mutex_unlock(>sysfslock);
-
+
return retval;
 }

@@ -344,7 +343,7 @@ static void appledisplay_disconnect(struct usb_interface 
*iface)

if (pdata) {
usb_kill_urb(pdata->urb);
-   cancel_delayed_work(>work);
+   cancel_delayed_work_sync(>work);
backlight_device_unregister(pdata->bd);
usb_free_coherent(pdata->udev, ACD_URB_BUFFER_LEN,
pdata->urbdata, pdata->urb->transfer_dma);
@@ -365,19 +364,11 @@ static struct usb_driver appledisplay_driver = {

 static int __init appledisplay_init(void)
 {
-   wq = create_singlethread_workqueue("appledisplay");
-   if (!wq) {
-   printk(KERN_ERR "appledisplay: Could not create work queue\n");
-   return -ENOMEM;
-   }
-
return usb_register(_driver);
 }

 static void __exit appledisplay_exit(void)
 {
-   flush_workqueue(wq);
-   destroy_workqueue(wq);
usb_deregister(_driver);
 }

--
2.1.4

Re: [PATCH] xen/x86: Define stubs for xen_smp_intr_init/xen_smp_intr_free

2016-07-25 Thread Juergen Gross

On 25/07/16 23:14, Boris Ostrovsky wrote:
> xen_smp_intr_init() and xen_smp_intr_free() are now called from
> enlighten.c and therefore not guaranteed to have CONFIG_SMP.
> 
> Instead of adding multiple ifdefs there provide stubs in smp.h
> 
> Signed-off-by: Boris Ostrovsky 

Reviewed-by: Juergen Gross 


Juergen

Re: [PATCH] xen/x86: Define stubs for xen_smp_intr_init/xen_smp_intr_free

2016-07-25 Thread Juergen Gross

On 25/07/16 23:14, Boris Ostrovsky wrote:
> xen_smp_intr_init() and xen_smp_intr_free() are now called from
> enlighten.c and therefore not guaranteed to have CONFIG_SMP.
> 
> Instead of adding multiple ifdefs there provide stubs in smp.h
> 
> Signed-off-by: Boris Ostrovsky 

Reviewed-by: Juergen Gross 


Juergen

Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces

2016-07-25 Thread Andrew Vagin

On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)"  writes:

[snip]

> [snip]
> >>> So, from my point of view, the important piece that was missing from
> >>> your commit message was the note to use readlink("/proc/self/fd/%d")
> >>> on the returned FDs. I think that detail needs to be part of the
> >>> commit message (and also the man page text). I think it even be
> >>> helpful to include the above program as part of the commit message:
> >>> it helps people more quickly grasp the API.
> >>
> >> Please, please make the standard way to compare these things fstat.
> >> That is much less magic than a symlink, and a little more future proof.
> >> Possibly even kcmp.

I like the idea to use kcmp to compare namespaces. I am going to add this
functionality to kcmp and describe all these in the man page.

> >
> > As in fstat() to get the st_ino field, right?
> 
> Both the st_ino and st_dev fields.
> 
> The most likely change to support checkpoint/restart in the future is to
> preserve st_ino across migrations and instantiate a different instance
> of nsfs to hold the inode numbers from the previous machine.

It sounds tricky. BTW: Actually this is not only one places where we have
this sort of problem. For example, now mount id-s are not preserved when
a container is migrated. The same problem is applied to tmpfs, where
inode numbers are not preserved for files. 

> 
> We would need to handle the preservation carefully or else there is
> a chance that two namespace file descriptors (collected from different
> sources) with different st_dev and st_ino fields may actuall refer to
> the same object.
> 
> Which is a long way of saying we have the st_dev field please use it,
> it may matter at some point.
> 
> Eric

Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces

2016-07-25 Thread Andrew Vagin

On Mon, Jul 25, 2016 at 09:59:43AM -0500, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)"  writes:

[snip]

> [snip]
> >>> So, from my point of view, the important piece that was missing from
> >>> your commit message was the note to use readlink("/proc/self/fd/%d")
> >>> on the returned FDs. I think that detail needs to be part of the
> >>> commit message (and also the man page text). I think it even be
> >>> helpful to include the above program as part of the commit message:
> >>> it helps people more quickly grasp the API.
> >>
> >> Please, please make the standard way to compare these things fstat.
> >> That is much less magic than a symlink, and a little more future proof.
> >> Possibly even kcmp.

I like the idea to use kcmp to compare namespaces. I am going to add this
functionality to kcmp and describe all these in the man page.

> >
> > As in fstat() to get the st_ino field, right?
> 
> Both the st_ino and st_dev fields.
> 
> The most likely change to support checkpoint/restart in the future is to
> preserve st_ino across migrations and instantiate a different instance
> of nsfs to hold the inode numbers from the previous machine.

It sounds tricky. BTW: Actually this is not only one places where we have
this sort of problem. For example, now mount id-s are not preserved when
a container is migrated. The same problem is applied to tmpfs, where
inode numbers are not preserved for files. 

> 
> We would need to handle the preservation carefully or else there is
> a chance that two namespace file descriptors (collected from different
> sources) with different st_dev and st_ino fields may actuall refer to
> the same object.
> 
> Which is a long way of saying we have the st_dev field please use it,
> it may matter at some point.
> 
> Eric

RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread Dexuan Cui

> From: David Miller [mailto:da...@davemloft.net]
> ...
> From: Dexuan Cui 
> Date: Tue, 26 Jul 2016 03:09:16 +
> 
> > BTW, during the past month, at least 7 other people also reviewed
> > the patch and gave me quite a few good comments, which have
> > been addressed.
> 
> Correction: Several people gave coding style and simple corrections
> to your patch.
> 
> Very few gave any review of the _SUBSTANCE_ of your changes.
> 
> And the one of the few who did, and suggested you build your
> facilities using the existing S390 hypervisor socket infrastructure,
> you brushed off _IMMEDIATELY_.
>
> That drives me crazy.  The one person who gave you real feedback
> you basically didn't consider seriously at all.

Hi David,
I'm very sorry -- I guess I must have missed something here -- I don't
remember somebody replied with S390 hypervisor socket
infrastructure... I'm re-reading all the replies, trying to locate the
reply and I'll find out why I didn't take it seriously. Sorry in advance.

> I know why you don't want to consider alternative implementations,
> and it's because you guys have so much invested in what you've
> implemented already.
This is not true. I'm absolutely open to any possibility to have an
alternative better implementation.
Please allow me to find the "S390 hypervisor socket infrastructure" reply
first and I'll report back ASAP.
 
> But that's tough and not our problem.
> 
> And until this changes, yes, this submission will be stuck in the
> mud and continue slogging on like this.

I definitely agree and understand.

Thanks,
-- Dexuan

RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread Dexuan Cui

> From: David Miller [mailto:da...@davemloft.net]
> ...
> From: Dexuan Cui 
> Date: Tue, 26 Jul 2016 03:09:16 +
> 
> > BTW, during the past month, at least 7 other people also reviewed
> > the patch and gave me quite a few good comments, which have
> > been addressed.
> 
> Correction: Several people gave coding style and simple corrections
> to your patch.
> 
> Very few gave any review of the _SUBSTANCE_ of your changes.
> 
> And the one of the few who did, and suggested you build your
> facilities using the existing S390 hypervisor socket infrastructure,
> you brushed off _IMMEDIATELY_.
>
> That drives me crazy.  The one person who gave you real feedback
> you basically didn't consider seriously at all.

Hi David,
I'm very sorry -- I guess I must have missed something here -- I don't
remember somebody replied with S390 hypervisor socket
infrastructure... I'm re-reading all the replies, trying to locate the
reply and I'll find out why I didn't take it seriously. Sorry in advance.

> I know why you don't want to consider alternative implementations,
> and it's because you guys have so much invested in what you've
> implemented already.
This is not true. I'm absolutely open to any possibility to have an
alternative better implementation.
Please allow me to find the "S390 hypervisor socket infrastructure" reply
first and I'll report back ASAP.
 
> But that's tough and not our problem.
> 
> And until this changes, yes, this submission will be stuck in the
> mud and continue slogging on like this.

I definitely agree and understand.

Thanks,
-- Dexuan

[PATCH v2 3/3] xen-blkfront: dynamic configuration of per-vbd resources

2016-07-25 Thread Bob Liu

The current VBD layer reserves buffer space for each attached device based on
three statically configured settings which are read at boot time.
 * max_indirect_segs: Maximum amount of segments.
 * max_ring_page_order: Maximum order of pages to be used for the shared ring.
 * max_queues: Maximum of queues(rings) to be used.

But the storage backend, workload, and guest memory result in very different
tuning requirements. It's impossible to centrally predict application
characteristics so it's best to leave allow the settings can be dynamiclly
adjusted based on workload inside the Guest.

Usage:
Show current values:
cat /sys/devices/vbd-xxx/max_indirect_segs
cat /sys/devices/vbd-xxx/max_ring_page_order
cat /sys/devices/vbd-xxx/max_queues

Write new values:
echo  > /sys/devices/vbd-xxx/max_indirect_segs
echo  > /sys/devices/vbd-xxx/max_ring_page_order
echo  > /sys/devices/vbd-xxx/max_queues

Signed-off-by: Bob Liu 
--
v2: Rename to max_ring_page_order and rm the waiting code suggested by Roger.
---
 drivers/block/xen-blkfront.c |  275 +-
 1 file changed, 269 insertions(+), 6 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1b4c380..ff5ebe5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -212,6 +212,11 @@ struct blkfront_info
/* Save uncomplete reqs and bios for migration. */
struct list_head requests;
struct bio_list bio_list;
+   /* For dynamic configuration. */
+   unsigned int reconfiguring:1;
+   int new_max_indirect_segments;
+   int max_ring_page_order;
+   int max_queues;
 };
 
 static unsigned int nr_minors;
@@ -1350,6 +1355,31 @@ static void blkif_free(struct blkfront_info *info, int 
suspend)
for (i = 0; i < info->nr_rings; i++)
blkif_free_ring(>rinfo[i]);
 
+   /* Remove old xenstore nodes. */
+   if (info->nr_ring_pages > 1)
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-page-order");
+
+   if (info->nr_rings == 1) {
+   if (info->nr_ring_pages == 1) {
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-ref");
+   } else {
+   for (i = 0; i < info->nr_ring_pages; i++) {
+   char ring_ref_name[RINGREF_NAME_LEN];
+
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, 
"ring-ref%u", i);
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, 
ring_ref_name);
+   }
+   }
+   } else {
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, 
"multi-queue-num-queues");
+
+   for (i = 0; i < info->nr_rings; i++) {
+   char queuename[QUEUE_NAME_LEN];
+
+   snprintf(queuename, QUEUE_NAME_LEN, "queue-%u", i);
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, queuename);
+   }
+   }
kfree(info->rinfo);
info->rinfo = NULL;
info->nr_rings = 0;
@@ -1763,15 +1793,21 @@ static int talk_to_blkback(struct xenbus_device *dev,
const char *message = NULL;
struct xenbus_transaction xbt;
int err;
-   unsigned int i, max_page_order = 0;
+   unsigned int i, backend_max_order = 0;
unsigned int ring_page_order = 0;
 
err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "max-ring-page-order", "%u", _page_order);
+  "max-ring-page-order", "%u", _max_order);
if (err != 1)
info->nr_ring_pages = 1;
else {
-   ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
+   if (info->max_ring_page_order) {
+   /* Dynamic configured through /sys. */
+   BUG_ON(info->max_ring_page_order > backend_max_order);
+   ring_page_order = info->max_ring_page_order;
+   } else
+   /* Default. */
+   ring_page_order = min(xen_blkif_max_ring_order, 
backend_max_order);
info->nr_ring_pages = 1 << ring_page_order;
}
 
@@ -1894,7 +1930,14 @@ static int negotiate_mq(struct blkfront_info *info)
if (err < 0)
backend_max_queues = 1;
 
-   info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+   if (info->max_queues) {
+   /* Dynamic configured through /sys */
+   BUG_ON(info->max_queues > backend_max_queues);
+   info->nr_rings = info->max_queues;
+   } else
+   /* Default. */
+   info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+
/* We need at least one ring. */
if (!info->nr_rings)
info->nr_rings = 1;
@@ -2352,11 +2395,197 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)

[PATCH v2 2/3] xen-blkfront: introduce blkif_set_queue_limits()

2016-07-25 Thread Bob Liu

blk_mq_update_nr_hw_queues() reset all queue limits to default which it's not
as xen-blkfront expected, introducing blkif_set_queue_limits() to reset limits
with initial correct values.

Signed-off-by: Bob Liu 
---
v2: Move blkif_set_queue_limits() after blkfront_gather_backend_features.
---
 drivers/block/xen-blkfront.c |   87 +++---
 1 file changed, 48 insertions(+), 39 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 032fc94..1b4c380 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -189,6 +189,8 @@ struct blkfront_info
struct mutex mutex;
struct xenbus_device *xbdev;
struct gendisk *gd;
+   u16 sector_size;
+   unsigned int physical_sector_size;
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
@@ -913,9 +915,45 @@ static struct blk_mq_ops blkfront_mq_ops = {
.map_queue = blk_mq_map_queue,
 };
 
+static void blkif_set_queue_limits(struct blkfront_info *info)
+{
+   struct request_queue *rq = info->rq;
+   struct gendisk *gd = info->gd;
+   unsigned int segments = info->max_indirect_segments ? :
+   BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+   queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
+
+   if (info->feature_discard) {
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
+   blk_queue_max_discard_sectors(rq, get_capacity(gd));
+   rq->limits.discard_granularity = info->discard_granularity;
+   rq->limits.discard_alignment = info->discard_alignment;
+   if (info->feature_secdiscard)
+   queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, rq);
+   }
+
+   /* Hard sector size and max sectors impersonate the equiv. hardware. */
+   blk_queue_logical_block_size(rq, info->sector_size);
+   blk_queue_physical_block_size(rq, info->physical_sector_size);
+   blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
+
+   /* Each segment in a request is up to an aligned page in size. */
+   blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
+   blk_queue_max_segment_size(rq, PAGE_SIZE);
+
+   /* Ensure a merged request will fit in a single I/O ring slot. */
+   blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
+
+   /* Make sure buffer addresses are sector-aligned. */
+   blk_queue_dma_alignment(rq, 511);
+
+   /* Make sure we don't use bounce buffers. */
+   blk_queue_bounce_limit(rq, BLK_BOUNCE_ANY);
+}
+
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
-   unsigned int physical_sector_size,
-   unsigned int segments)
+   unsigned int physical_sector_size)
 {
struct request_queue *rq;
struct blkfront_info *info = gd->private_data;
@@ -947,37 +985,11 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
}
 
rq->queuedata = info;
-   queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
-
-   if (info->feature_discard) {
-   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
-   blk_queue_max_discard_sectors(rq, get_capacity(gd));
-   rq->limits.discard_granularity = info->discard_granularity;
-   rq->limits.discard_alignment = info->discard_alignment;
-   if (info->feature_secdiscard)
-   queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, rq);
-   }
-
-   /* Hard sector size and max sectors impersonate the equiv. hardware. */
-   blk_queue_logical_block_size(rq, sector_size);
-   blk_queue_physical_block_size(rq, physical_sector_size);
-   blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
-
-   /* Each segment in a request is up to an aligned page in size. */
-   blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
-   blk_queue_max_segment_size(rq, PAGE_SIZE);
-
-   /* Ensure a merged request will fit in a single I/O ring slot. */
-   blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
-
-   /* Make sure buffer addresses are sector-aligned. */
-   blk_queue_dma_alignment(rq, 511);
-
-   /* Make sure we don't use bounce buffers. */
-   blk_queue_bounce_limit(rq, BLK_BOUNCE_ANY);
-
-   gd->queue = rq;
-
+   info->rq = gd->queue = rq;
+   info->gd = gd;
+   info->sector_size = sector_size;
+   info->physical_sector_size = physical_sector_size;
+   blkif_set_queue_limits(info);
return 0;
 }
 
@@ -1142,16 +1154,11 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
gd->driverfs_dev = &(info->xbdev->dev);
set_capacity(gd, capacity);
 
-   if (xlvbd_init_blk_queue(gd, sector_size, physical_sector_size,
-info->max_indirect_segments ? :
-

[PATCH v2 2/3] xen-blkfront: introduce blkif_set_queue_limits()

2016-07-25 Thread Bob Liu

blk_mq_update_nr_hw_queues() reset all queue limits to default which it's not
as xen-blkfront expected, introducing blkif_set_queue_limits() to reset limits
with initial correct values.

Signed-off-by: Bob Liu 
---
v2: Move blkif_set_queue_limits() after blkfront_gather_backend_features.
---
 drivers/block/xen-blkfront.c |   87 +++---
 1 file changed, 48 insertions(+), 39 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 032fc94..1b4c380 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -189,6 +189,8 @@ struct blkfront_info
struct mutex mutex;
struct xenbus_device *xbdev;
struct gendisk *gd;
+   u16 sector_size;
+   unsigned int physical_sector_size;
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
@@ -913,9 +915,45 @@ static struct blk_mq_ops blkfront_mq_ops = {
.map_queue = blk_mq_map_queue,
 };
 
+static void blkif_set_queue_limits(struct blkfront_info *info)
+{
+   struct request_queue *rq = info->rq;
+   struct gendisk *gd = info->gd;
+   unsigned int segments = info->max_indirect_segments ? :
+   BLKIF_MAX_SEGMENTS_PER_REQUEST;
+
+   queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
+
+   if (info->feature_discard) {
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
+   blk_queue_max_discard_sectors(rq, get_capacity(gd));
+   rq->limits.discard_granularity = info->discard_granularity;
+   rq->limits.discard_alignment = info->discard_alignment;
+   if (info->feature_secdiscard)
+   queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, rq);
+   }
+
+   /* Hard sector size and max sectors impersonate the equiv. hardware. */
+   blk_queue_logical_block_size(rq, info->sector_size);
+   blk_queue_physical_block_size(rq, info->physical_sector_size);
+   blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
+
+   /* Each segment in a request is up to an aligned page in size. */
+   blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
+   blk_queue_max_segment_size(rq, PAGE_SIZE);
+
+   /* Ensure a merged request will fit in a single I/O ring slot. */
+   blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
+
+   /* Make sure buffer addresses are sector-aligned. */
+   blk_queue_dma_alignment(rq, 511);
+
+   /* Make sure we don't use bounce buffers. */
+   blk_queue_bounce_limit(rq, BLK_BOUNCE_ANY);
+}
+
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
-   unsigned int physical_sector_size,
-   unsigned int segments)
+   unsigned int physical_sector_size)
 {
struct request_queue *rq;
struct blkfront_info *info = gd->private_data;
@@ -947,37 +985,11 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
}
 
rq->queuedata = info;
-   queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
-
-   if (info->feature_discard) {
-   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq);
-   blk_queue_max_discard_sectors(rq, get_capacity(gd));
-   rq->limits.discard_granularity = info->discard_granularity;
-   rq->limits.discard_alignment = info->discard_alignment;
-   if (info->feature_secdiscard)
-   queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, rq);
-   }
-
-   /* Hard sector size and max sectors impersonate the equiv. hardware. */
-   blk_queue_logical_block_size(rq, sector_size);
-   blk_queue_physical_block_size(rq, physical_sector_size);
-   blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
-
-   /* Each segment in a request is up to an aligned page in size. */
-   blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
-   blk_queue_max_segment_size(rq, PAGE_SIZE);
-
-   /* Ensure a merged request will fit in a single I/O ring slot. */
-   blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
-
-   /* Make sure buffer addresses are sector-aligned. */
-   blk_queue_dma_alignment(rq, 511);
-
-   /* Make sure we don't use bounce buffers. */
-   blk_queue_bounce_limit(rq, BLK_BOUNCE_ANY);
-
-   gd->queue = rq;
-
+   info->rq = gd->queue = rq;
+   info->gd = gd;
+   info->sector_size = sector_size;
+   info->physical_sector_size = physical_sector_size;
+   blkif_set_queue_limits(info);
return 0;
 }
 
@@ -1142,16 +1154,11 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
gd->driverfs_dev = &(info->xbdev->dev);
set_capacity(gd, capacity);
 
-   if (xlvbd_init_blk_queue(gd, sector_size, physical_sector_size,
-info->max_indirect_segments ? :
-

[PATCH v2 3/3] xen-blkfront: dynamic configuration of per-vbd resources

2016-07-25 Thread Bob Liu

The current VBD layer reserves buffer space for each attached device based on
three statically configured settings which are read at boot time.
 * max_indirect_segs: Maximum amount of segments.
 * max_ring_page_order: Maximum order of pages to be used for the shared ring.
 * max_queues: Maximum of queues(rings) to be used.

But the storage backend, workload, and guest memory result in very different
tuning requirements. It's impossible to centrally predict application
characteristics so it's best to leave allow the settings can be dynamiclly
adjusted based on workload inside the Guest.

Usage:
Show current values:
cat /sys/devices/vbd-xxx/max_indirect_segs
cat /sys/devices/vbd-xxx/max_ring_page_order
cat /sys/devices/vbd-xxx/max_queues

Write new values:
echo  > /sys/devices/vbd-xxx/max_indirect_segs
echo  > /sys/devices/vbd-xxx/max_ring_page_order
echo  > /sys/devices/vbd-xxx/max_queues

Signed-off-by: Bob Liu 
--
v2: Rename to max_ring_page_order and rm the waiting code suggested by Roger.
---
 drivers/block/xen-blkfront.c |  275 +-
 1 file changed, 269 insertions(+), 6 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1b4c380..ff5ebe5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -212,6 +212,11 @@ struct blkfront_info
/* Save uncomplete reqs and bios for migration. */
struct list_head requests;
struct bio_list bio_list;
+   /* For dynamic configuration. */
+   unsigned int reconfiguring:1;
+   int new_max_indirect_segments;
+   int max_ring_page_order;
+   int max_queues;
 };
 
 static unsigned int nr_minors;
@@ -1350,6 +1355,31 @@ static void blkif_free(struct blkfront_info *info, int 
suspend)
for (i = 0; i < info->nr_rings; i++)
blkif_free_ring(>rinfo[i]);
 
+   /* Remove old xenstore nodes. */
+   if (info->nr_ring_pages > 1)
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-page-order");
+
+   if (info->nr_rings == 1) {
+   if (info->nr_ring_pages == 1) {
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, "ring-ref");
+   } else {
+   for (i = 0; i < info->nr_ring_pages; i++) {
+   char ring_ref_name[RINGREF_NAME_LEN];
+
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, 
"ring-ref%u", i);
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, 
ring_ref_name);
+   }
+   }
+   } else {
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, 
"multi-queue-num-queues");
+
+   for (i = 0; i < info->nr_rings; i++) {
+   char queuename[QUEUE_NAME_LEN];
+
+   snprintf(queuename, QUEUE_NAME_LEN, "queue-%u", i);
+   xenbus_rm(XBT_NIL, info->xbdev->nodename, queuename);
+   }
+   }
kfree(info->rinfo);
info->rinfo = NULL;
info->nr_rings = 0;
@@ -1763,15 +1793,21 @@ static int talk_to_blkback(struct xenbus_device *dev,
const char *message = NULL;
struct xenbus_transaction xbt;
int err;
-   unsigned int i, max_page_order = 0;
+   unsigned int i, backend_max_order = 0;
unsigned int ring_page_order = 0;
 
err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "max-ring-page-order", "%u", _page_order);
+  "max-ring-page-order", "%u", _max_order);
if (err != 1)
info->nr_ring_pages = 1;
else {
-   ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
+   if (info->max_ring_page_order) {
+   /* Dynamic configured through /sys. */
+   BUG_ON(info->max_ring_page_order > backend_max_order);
+   ring_page_order = info->max_ring_page_order;
+   } else
+   /* Default. */
+   ring_page_order = min(xen_blkif_max_ring_order, 
backend_max_order);
info->nr_ring_pages = 1 << ring_page_order;
}
 
@@ -1894,7 +1930,14 @@ static int negotiate_mq(struct blkfront_info *info)
if (err < 0)
backend_max_queues = 1;
 
-   info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+   if (info->max_queues) {
+   /* Dynamic configured through /sys */
+   BUG_ON(info->max_queues > backend_max_queues);
+   info->nr_rings = info->max_queues;
+   } else
+   /* Default. */
+   info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
+
/* We need at least one ring. */
if (!info->nr_rings)
info->nr_rings = 1;
@@ -2352,11 +2395,197 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)

[PATCH 1/3] xen-blkfront: fix places not updated after introducing 64KB page granularity

2016-07-25 Thread Bob Liu

Two places didn't get updated when 64KB page granularity was introduced, this
patch fix them.

Signed-off-by: Bob Liu 
Acked-by: Roger Pau Monné 
---
 drivers/block/xen-blkfront.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index fcc5b4e..032fc94 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1321,7 +1321,7 @@ free_shadow:
rinfo->ring_ref[i] = GRANT_INVALID_REF;
}
}
-   free_pages((unsigned long)rinfo->ring.sring, 
get_order(info->nr_ring_pages * PAGE_SIZE));
+   free_pages((unsigned long)rinfo->ring.sring, 
get_order(info->nr_ring_pages * XEN_PAGE_SIZE));
rinfo->ring.sring = NULL;
 
if (rinfo->irq)
@@ -2013,7 +2013,7 @@ static int blkif_recover(struct blkfront_info *info)
 
blkfront_gather_backend_features(info);
segs = info->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
-   blk_queue_max_segments(info->rq, segs);
+   blk_queue_max_segments(info->rq, segs / GRANTS_PER_PSEG);
 
for (r_index = 0; r_index < info->nr_rings; r_index++) {
struct blkfront_ring_info *rinfo = >rinfo[r_index];
-- 
1.7.10.4

[PATCH 1/3] xen-blkfront: fix places not updated after introducing 64KB page granularity

2016-07-25 Thread Bob Liu

Two places didn't get updated when 64KB page granularity was introduced, this
patch fix them.

Signed-off-by: Bob Liu 
Acked-by: Roger Pau Monné 
---
 drivers/block/xen-blkfront.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index fcc5b4e..032fc94 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1321,7 +1321,7 @@ free_shadow:
rinfo->ring_ref[i] = GRANT_INVALID_REF;
}
}
-   free_pages((unsigned long)rinfo->ring.sring, 
get_order(info->nr_ring_pages * PAGE_SIZE));
+   free_pages((unsigned long)rinfo->ring.sring, 
get_order(info->nr_ring_pages * XEN_PAGE_SIZE));
rinfo->ring.sring = NULL;
 
if (rinfo->irq)
@@ -2013,7 +2013,7 @@ static int blkif_recover(struct blkfront_info *info)
 
blkfront_gather_backend_features(info);
segs = info->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
-   blk_queue_max_segments(info->rq, segs);
+   blk_queue_max_segments(info->rq, segs / GRANTS_PER_PSEG);
 
for (r_index = 0; r_index < info->nr_rings; r_index++) {
struct blkfront_ring_info *rinfo = >rinfo[r_index];
-- 
1.7.10.4

[PATCH] usb: ftdi-elan: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread Bhaktipriya Shridhar

The status workqueue is involved in initializing the Uxxx and polling
the Uxxx until a supported PCMCIA CardBus device is detected.
It then starts the command and respond workqueues and then loads the
module that handles the device, after which it just polls the Uxxx
looking for card ejects.

The command and respond workqueues are involved in implementing a command
sequencer for communicating with the firmware on the other side of
the FTDI chip in the Uxxx.

These workqueues have only a single work item each and hence they do not
require ordering. Also, none of the above workqueues are being used on a
memory recliam path. Hence, the singlethreaded workqueues have been
replaced with the use of system_wq.

System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.

The work items have been sync cancelled because they are self-requeueing
and need to wait for the in-flight work item to finish before proceeding
with destruction. Hence, they have been sync cancelled in
ftdi_status_cancel_work(), ftdi_command_cancel_work() and
ftdi_response_cancel_work(). These functions are called in
ftdi_elan_exit() to ensure that there are no pending work items while
disconnecting the driver.

Signed-off-by: Bhaktipriya Shridhar 
---
 drivers/usb/misc/ftdi-elan.c | 53 +---
 1 file changed, 10 insertions(+), 43 deletions(-)

diff --git a/drivers/usb/misc/ftdi-elan.c b/drivers/usb/misc/ftdi-elan.c
index 52c27ca..59031dc 100644
--- a/drivers/usb/misc/ftdi-elan.c
+++ b/drivers/usb/misc/ftdi-elan.c
@@ -61,9 +61,6 @@ module_param(distrust_firmware, bool, 0);
 MODULE_PARM_DESC(distrust_firmware,
 "true to distrust firmware power/overcurrent setup");
 extern struct platform_driver u132_platform_driver;
-static struct workqueue_struct *status_queue;
-static struct workqueue_struct *command_queue;
-static struct workqueue_struct *respond_queue;
 /*
  * ftdi_module_lock exists to protect access to global variables
  *
@@ -228,56 +225,56 @@ static void ftdi_elan_init_kref(struct usb_ftdi *ftdi)

 static void ftdi_status_requeue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (!queue_delayed_work(status_queue, >status_work, delta))
+   if (!schedule_delayed_work(>status_work, delta))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_status_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(status_queue, >status_work, delta))
+   if (schedule_delayed_work(>status_work, delta))
kref_get(>kref);
 }

 static void ftdi_status_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(>status_work))
+   if (cancel_delayed_work_sync(>status_work))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_command_requeue_work(struct usb_ftdi *ftdi, unsigned int 
delta)
 {
-   if (!queue_delayed_work(command_queue, >command_work, delta))
+   if (!schedule_delayed_work(>command_work, delta))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_command_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(command_queue, >command_work, delta))
+   if (schedule_delayed_work(>command_work, delta))
kref_get(>kref);
 }

 static void ftdi_command_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(>command_work))
+   if (cancel_delayed_work_sync(>command_work))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_response_requeue_work(struct usb_ftdi *ftdi,
   unsigned int delta)
 {
-   if (!queue_delayed_work(respond_queue, >respond_work, delta))
+   if (!schedule_delayed_work(>respond_work, delta))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_respond_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(respond_queue, >respond_work, delta))
+   if (schedule_delayed_work(>respond_work, delta))
kref_get(>kref);
 }

 static void ftdi_response_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(>respond_work))
+   if (cancel_delayed_work_sync(>respond_work))
kref_put(>kref, ftdi_elan_delete);
 }

@@ -2823,9 +2820,6 @@ static void ftdi_elan_disconnect(struct usb_interface 
*interface)
ftdi->initialized = 0;
ftdi->registered = 0;

[PATCH] usb: ftdi-elan: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread Bhaktipriya Shridhar

The status workqueue is involved in initializing the Uxxx and polling
the Uxxx until a supported PCMCIA CardBus device is detected.
It then starts the command and respond workqueues and then loads the
module that handles the device, after which it just polls the Uxxx
looking for card ejects.

The command and respond workqueues are involved in implementing a command
sequencer for communicating with the firmware on the other side of
the FTDI chip in the Uxxx.

These workqueues have only a single work item each and hence they do not
require ordering. Also, none of the above workqueues are being used on a
memory recliam path. Hence, the singlethreaded workqueues have been
replaced with the use of system_wq.

System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.

The work items have been sync cancelled because they are self-requeueing
and need to wait for the in-flight work item to finish before proceeding
with destruction. Hence, they have been sync cancelled in
ftdi_status_cancel_work(), ftdi_command_cancel_work() and
ftdi_response_cancel_work(). These functions are called in
ftdi_elan_exit() to ensure that there are no pending work items while
disconnecting the driver.

Signed-off-by: Bhaktipriya Shridhar 
---
 drivers/usb/misc/ftdi-elan.c | 53 +---
 1 file changed, 10 insertions(+), 43 deletions(-)

diff --git a/drivers/usb/misc/ftdi-elan.c b/drivers/usb/misc/ftdi-elan.c
index 52c27ca..59031dc 100644
--- a/drivers/usb/misc/ftdi-elan.c
+++ b/drivers/usb/misc/ftdi-elan.c
@@ -61,9 +61,6 @@ module_param(distrust_firmware, bool, 0);
 MODULE_PARM_DESC(distrust_firmware,
 "true to distrust firmware power/overcurrent setup");
 extern struct platform_driver u132_platform_driver;
-static struct workqueue_struct *status_queue;
-static struct workqueue_struct *command_queue;
-static struct workqueue_struct *respond_queue;
 /*
  * ftdi_module_lock exists to protect access to global variables
  *
@@ -228,56 +225,56 @@ static void ftdi_elan_init_kref(struct usb_ftdi *ftdi)

 static void ftdi_status_requeue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (!queue_delayed_work(status_queue, >status_work, delta))
+   if (!schedule_delayed_work(>status_work, delta))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_status_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(status_queue, >status_work, delta))
+   if (schedule_delayed_work(>status_work, delta))
kref_get(>kref);
 }

 static void ftdi_status_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(>status_work))
+   if (cancel_delayed_work_sync(>status_work))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_command_requeue_work(struct usb_ftdi *ftdi, unsigned int 
delta)
 {
-   if (!queue_delayed_work(command_queue, >command_work, delta))
+   if (!schedule_delayed_work(>command_work, delta))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_command_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(command_queue, >command_work, delta))
+   if (schedule_delayed_work(>command_work, delta))
kref_get(>kref);
 }

 static void ftdi_command_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(>command_work))
+   if (cancel_delayed_work_sync(>command_work))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_response_requeue_work(struct usb_ftdi *ftdi,
   unsigned int delta)
 {
-   if (!queue_delayed_work(respond_queue, >respond_work, delta))
+   if (!schedule_delayed_work(>respond_work, delta))
kref_put(>kref, ftdi_elan_delete);
 }

 static void ftdi_respond_queue_work(struct usb_ftdi *ftdi, unsigned int delta)
 {
-   if (queue_delayed_work(respond_queue, >respond_work, delta))
+   if (schedule_delayed_work(>respond_work, delta))
kref_get(>kref);
 }

 static void ftdi_response_cancel_work(struct usb_ftdi *ftdi)
 {
-   if (cancel_delayed_work(>respond_work))
+   if (cancel_delayed_work_sync(>respond_work))
kref_put(>kref, ftdi_elan_delete);
 }

@@ -2823,9 +2820,6 @@ static void ftdi_elan_disconnect(struct usb_interface 
*interface)
ftdi->initialized = 0;
ftdi->registered = 0;
}
-

[PATCH 1/1] socfpga: defconfig: Enable Altera GPIO driver as module

2016-07-25 Thread thloh

From: Tien Hock Loh 

This patch enables Altera GPIO driver as module in socfpga_defconfig

Signed-off-by: Tien Hock Loh 
---
 arch/arm/configs/socfpga_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/socfpga_defconfig 
b/arch/arm/configs/socfpga_defconfig
index 753f1a5..241ce4ca 100644
--- a/arch/arm/configs/socfpga_defconfig
+++ b/arch/arm/configs/socfpga_defconfig
@@ -108,3 +108,4 @@ CONFIG_DETECT_HUNG_TASK=y
 # CONFIG_SCHED_DEBUG is not set
 CONFIG_ENABLE_DEFAULT_TRACERS=y
 CONFIG_DEBUG_USER=y
+CONFIG_GPIO_ALTERA=m
-- 
1.7.11.GIT

[PATCH 1/1] socfpga: defconfig: Enable Altera GPIO driver as module

2016-07-25 Thread thloh

From: Tien Hock Loh 

This patch enables Altera GPIO driver as module in socfpga_defconfig

Signed-off-by: Tien Hock Loh 
---
 arch/arm/configs/socfpga_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/socfpga_defconfig 
b/arch/arm/configs/socfpga_defconfig
index 753f1a5..241ce4ca 100644
--- a/arch/arm/configs/socfpga_defconfig
+++ b/arch/arm/configs/socfpga_defconfig
@@ -108,3 +108,4 @@ CONFIG_DETECT_HUNG_TASK=y
 # CONFIG_SCHED_DEBUG is not set
 CONFIG_ENABLE_DEFAULT_TRACERS=y
 CONFIG_DEBUG_USER=y
+CONFIG_GPIO_ALTERA=m
-- 
1.7.11.GIT

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread Alan Curry

Al Viro wrote:
> On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
> 
> > > The symptom is that downloaded files (http, ftp, and probably other
> > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > random locations. Only downloads that sustain a high speed for at least a
> > > few seconds are corrupted. Anything small enough to be received in less
> > > than about 5 seconds is not affected.
> 
> Can that sucker be reproduced with netcat?  That would eliminate all issues
> with multi-iovec recvmsg(2), narrowing the things down quite bit.

netcat seems to be immune. Comparing strace results, I didn't see any
recvmsg() calls in the other programs that have had the problem, but there
is an interesting difference: netcat calls select() to wait for the socket
to be ready for reading, where my other test programs just call read() and
let it block until ready.

So I wrote a small test program to isolate that difference. It downloads
a file using only read() and write() and a hardcoded HTTP request. It has
a select mode (main loop alternates read() and select() on the TCP socket)
and a noselect mode (main loop just read()s the TCP socket).

The program is included at the bottom of this message.

I ran it several times in both modes and got corruption if and only if the
noselect mode was used.

> 
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c

Will try that patch soon. Meanwhile, here's my test:

/* Demonstration program "dlbug".
   Usage: dlbug select > outfile
  or
  dlbug noselect > outfile
   outfile will contain the full HTTP response. Edit out the HTTP headers
   and what's left should be a valid gzip if the download worked. */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
  const char *request =
"GET /debian/dists/stable/main/Contents-amd64.gz HTTP/1.0\r\n"
"Host: ftp.us.debian.org\r\n"
"\r\n";
  ssize_t request_len = strlen(request), w, r, copied;
  struct addrinfo hints, *host;
  int sock, err, doselect;
  char buf[10240];

  if(argc!=2 || (!strcmp(argv[1], "select") && !strcmp(argv[1], "noselect"))) {
fprintf(stderr, "Usage: %s {select|noselect}\n", argv[0]);
return 1;
  }

  doselect = !strcmp(argv[1], "select");

  memset(, 0, sizeof hints);
  hints.ai_family = AF_INET;
  hints.ai_socktype = SOCK_STREAM;

  err = getaddrinfo("ftp.us.debian.org", 0, , );
  if(err) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
return 1;
  }

  sock = socket(host->ai_family, host->ai_socktype, host->ai_protocol);
  if(sock < 0) {
perror("socket");
return 1;
  }

  ((struct sockaddr_in *)host->ai_addr)->sin_port = htons(80);

  if(connect(sock, host->ai_addr, host->ai_addrlen) < 0) {
perror("connect");
return 1;
  }

  while(request_len) {
w = write(sock, request, request_len);
if(w < 0) {
  perror("write to socket");
  return 1;
}
request += w;
request_len -= w;
  }

  while((r = read(sock, buf, sizeof buf))) {
if(r < 0) {
  perror("read from socket");
  return 1;
}

copied = 0;
while(copied < r) {
  w = write(1, buf+copied, r-copied);
  if(w < 0) {
perror("write to stdout");
return 1;
  }
  copied += w;
}

if(doselect) {
  fd_set rfds;
  FD_ZERO();
  FD_SET(sock, );
  select(sock+1, , 0, 0, 0);
}
  }

  return 0;
}

-- 
Alan Curry

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread Alan Curry

Al Viro wrote:
> On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
> 
> > > The symptom is that downloaded files (http, ftp, and probably other
> > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > random locations. Only downloads that sustain a high speed for at least a
> > > few seconds are corrupted. Anything small enough to be received in less
> > > than about 5 seconds is not affected.
> 
> Can that sucker be reproduced with netcat?  That would eliminate all issues
> with multi-iovec recvmsg(2), narrowing the things down quite bit.

netcat seems to be immune. Comparing strace results, I didn't see any
recvmsg() calls in the other programs that have had the problem, but there
is an interesting difference: netcat calls select() to wait for the socket
to be ready for reading, where my other test programs just call read() and
let it block until ready.

So I wrote a small test program to isolate that difference. It downloads
a file using only read() and write() and a hardcoded HTTP request. It has
a select mode (main loop alternates read() and select() on the TCP socket)
and a noselect mode (main loop just read()s the TCP socket).

The program is included at the bottom of this message.

I ran it several times in both modes and got corruption if and only if the
noselect mode was used.

> 
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c

Will try that patch soon. Meanwhile, here's my test:

/* Demonstration program "dlbug".
   Usage: dlbug select > outfile
  or
  dlbug noselect > outfile
   outfile will contain the full HTTP response. Edit out the HTTP headers
   and what's left should be a valid gzip if the download worked. */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
  const char *request =
"GET /debian/dists/stable/main/Contents-amd64.gz HTTP/1.0\r\n"
"Host: ftp.us.debian.org\r\n"
"\r\n";
  ssize_t request_len = strlen(request), w, r, copied;
  struct addrinfo hints, *host;
  int sock, err, doselect;
  char buf[10240];

  if(argc!=2 || (!strcmp(argv[1], "select") && !strcmp(argv[1], "noselect"))) {
fprintf(stderr, "Usage: %s {select|noselect}\n", argv[0]);
return 1;
  }

  doselect = !strcmp(argv[1], "select");

  memset(, 0, sizeof hints);
  hints.ai_family = AF_INET;
  hints.ai_socktype = SOCK_STREAM;

  err = getaddrinfo("ftp.us.debian.org", 0, , );
  if(err) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
return 1;
  }

  sock = socket(host->ai_family, host->ai_socktype, host->ai_protocol);
  if(sock < 0) {
perror("socket");
return 1;
  }

  ((struct sockaddr_in *)host->ai_addr)->sin_port = htons(80);

  if(connect(sock, host->ai_addr, host->ai_addrlen) < 0) {
perror("connect");
return 1;
  }

  while(request_len) {
w = write(sock, request, request_len);
if(w < 0) {
  perror("write to socket");
  return 1;
}
request += w;
request_len -= w;
  }

  while((r = read(sock, buf, sizeof buf))) {
if(r < 0) {
  perror("read from socket");
  return 1;
}

copied = 0;
while(copied < r) {
  w = write(1, buf+copied, r-copied);
  if(w < 0) {
perror("write to stdout");
return 1;
  }
  copied += w;
}

if(doselect) {
  fd_set rfds;
  FD_ZERO();
  FD_SET(sock, );
  select(sock+1, , 0, 0, 0);
}
  }

  return 0;
}

-- 
Alan Curry

Re: linux-next: manual merge of the xen-tip tree with the block tree

2016-07-25 Thread Stephen Rothwell

Hi Boris,

On Mon, 25 Jul 2016 18:25:00 -0400 Boris Ostrovsky  
wrote:
>
> > Jeremy Fitzhardinge   
> 
> Jeremy is no longer involved with Xen. However,
> 
> Juergen Gross 
> 
> is also Linux Xen/x86 maintainer.

I have replaced Jeremy with Juergen.

-- 
Cheers,
Stephen Rothwell

Re: linux-next: manual merge of the xen-tip tree with the block tree

2016-07-25 Thread Stephen Rothwell

Hi Boris,

On Mon, 25 Jul 2016 18:25:00 -0400 Boris Ostrovsky  
wrote:
>
> > Jeremy Fitzhardinge   
> 
> Jeremy is no longer involved with Xen. However,
> 
> Juergen Gross 
> 
> is also Linux Xen/x86 maintainer.

I have replaced Jeremy with Juergen.

-- 
Cheers,
Stephen Rothwell

linux-next: manual merge of the random tree with the kspp tree

2016-07-25 Thread Stephen Rothwell

Hi Theodore,

Today's linux-next merge of the random tree got a conflict in:

  drivers/char/random.c

between commit:

  8c6a68e9eaa5 ("latent_entropy: Mark functions with __latent_entropy")

from the kspp tree and commit:

  e192be9d9a30 ("random: replace non-blocking pool with a Chacha20-based CRNG")

from the random tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/char/random.c
index 6cca3ed45817,8d0af74f6569..
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@@ -442,10 -471,15 +471,15 @@@ struct entropy_store 
__u8 last_data[EXTRACT_SIZE];
  };
  
+ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
+  size_t nbytes, int min, int rsvd);
+ static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
+   size_t nbytes, int fips);
+ 
+ static void crng_reseed(struct crng_state *crng, struct entropy_store *r);
  static void push_to_pool(struct work_struct *work);
 -static __u32 input_pool_data[INPUT_POOL_WORDS];
 -static __u32 blocking_pool_data[OUTPUT_POOL_WORDS];
 +static __u32 input_pool_data[INPUT_POOL_WORDS] __latent_entropy;
 +static __u32 blocking_pool_data[OUTPUT_POOL_WORDS] __latent_entropy;
- static __u32 nonblocking_pool_data[OUTPUT_POOL_WORDS] __latent_entropy;
  
  static struct entropy_store input_pool = {
.poolinfo = _table[0],

linux-next: manual merge of the random tree with the kspp tree

2016-07-25 Thread Stephen Rothwell

Hi Theodore,

Today's linux-next merge of the random tree got a conflict in:

  drivers/char/random.c

between commit:

  8c6a68e9eaa5 ("latent_entropy: Mark functions with __latent_entropy")

from the kspp tree and commit:

  e192be9d9a30 ("random: replace non-blocking pool with a Chacha20-based CRNG")

from the random tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/char/random.c
index 6cca3ed45817,8d0af74f6569..
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@@ -442,10 -471,15 +471,15 @@@ struct entropy_store 
__u8 last_data[EXTRACT_SIZE];
  };
  
+ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
+  size_t nbytes, int min, int rsvd);
+ static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
+   size_t nbytes, int fips);
+ 
+ static void crng_reseed(struct crng_state *crng, struct entropy_store *r);
  static void push_to_pool(struct work_struct *work);
 -static __u32 input_pool_data[INPUT_POOL_WORDS];
 -static __u32 blocking_pool_data[OUTPUT_POOL_WORDS];
 +static __u32 input_pool_data[INPUT_POOL_WORDS] __latent_entropy;
 +static __u32 blocking_pool_data[OUTPUT_POOL_WORDS] __latent_entropy;
- static __u32 nonblocking_pool_data[OUTPUT_POOL_WORDS] __latent_entropy;
  
  static struct entropy_store input_pool = {
.poolinfo = _table[0],

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread alexmcwhirter

Thanks for the detailed bug-report. I looked around the web to see if 
it
was already reported or not. If found that this issue was reported 
before:

[0], [1] and [2] by the same person (CC'ed). One difference is that the
reporter had this issue with rsync on multiple SPARC systems. I ran a
git grep on a 4.7.0-rc7+ (wt-2016-07-21-15-g97bd3b0). But it didn't 
find

any patches directly referencing the commit. I'm not sure if this issue
has been fixed by now or not. I would greatly appreciate any comment
about this from the "people of netdev" (Al Viro? Alex Mcwhirter?).


I can confirm the issue i was having with this commit still exists on 
sparc with the latest mainline kernel.

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread alexmcwhirter

Thanks for the detailed bug-report. I looked around the web to see if 
it
was already reported or not. If found that this issue was reported 
before:

[0], [1] and [2] by the same person (CC'ed). One difference is that the
reporter had this issue with rsync on multiple SPARC systems. I ran a
git grep on a 4.7.0-rc7+ (wt-2016-07-21-15-g97bd3b0). But it didn't 
find

any patches directly referencing the commit. I'm not sure if this issue
has been fixed by now or not. I would greatly appreciate any comment
about this from the "people of netdev" (Al Viro? Alex Mcwhirter?).


I can confirm the issue i was having with this commit still exists on 
sparc with the latest mainline kernel.

Re: [PATCH v3 02/11] mm: Hardened usercopy

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 7:03 PM, Michael Ellerman  wrote:
> Josh Poimboeuf  writes:
>
>> On Thu, Jul 21, 2016 at 11:34:25AM -0700, Kees Cook wrote:
>>> On Wed, Jul 20, 2016 at 11:52 PM, Michael Ellerman  
>>> wrote:
>>> > Kees Cook  writes:
>>> >
>>> >> diff --git a/mm/usercopy.c b/mm/usercopy.c
>>> >> new file mode 100644
>>> >> index ..e4bf4e7ccdf6
>>> >> --- /dev/null
>>> >> +++ b/mm/usercopy.c
>>> >> @@ -0,0 +1,234 @@
>>> > ...
>>> >> +
>>> >> +/*
>>> >> + * Checks if a given pointer and length is contained by the current
>>> >> + * stack frame (if possible).
>>> >> + *
>>> >> + *   0: not at all on the stack
>>> >> + *   1: fully within a valid stack frame
>>> >> + *   2: fully on the stack (when can't do frame-checking)
>>> >> + *   -1: error condition (invalid stack position or bad stack frame)
>>> >> + */
>>> >> +static noinline int check_stack_object(const void *obj, unsigned long 
>>> >> len)
>>> >> +{
>>> >> + const void * const stack = task_stack_page(current);
>>> >> + const void * const stackend = stack + THREAD_SIZE;
>>> >
>>> > That allows access to the entire stack, including the struct thread_info,
>>> > is that what we want - it seems dangerous? Or did I miss a check
>>> > somewhere else?
>>>
>>> That seems like a nice improvement to make, yeah.
>>>
>>> > We have end_of_stack() which computes the end of the stack taking
>>> > thread_info into account (end being the opposite of your end above).
>>>
>>> Amusingly, the object_is_on_stack() check in sched.h doesn't take
>>> thread_info into account either. :P Regardless, I think using
>>> end_of_stack() may not be best. To tighten the check, I think we could
>>> add this after checking that the object is on the stack:
>>>
>>> #ifdef CONFIG_STACK_GROWSUP
>>> stackend -= sizeof(struct thread_info);
>>> #else
>>> stack += sizeof(struct thread_info);
>>> #endif
>>>
>>> e.g. then if the pointer was in the thread_info, the second test would
>>> fail, triggering the protection.
>>
>> FWIW, this won't work right on x86 after Andy's
>> CONFIG_THREAD_INFO_IN_TASK patches get merged.
>
> Yeah. I wonder if it's better for the arch helper to just take the obj and 
> len,
> and work out it's own bounds for the stack using current and whatever makes
> sense on that arch.
>
> It would avoid too much ifdefery in the generic code, and also avoid any
> confusion about whether stackend is the high or low address.
>
> eg. on powerpc we could do:
>
> int noinline arch_within_stack_frames(const void *obj, unsigned long len)
> {
> void *stack_low  = end_of_stack(current);
> void *stack_high = task_stack_page(current) + THREAD_SIZE;
>
>
> Whereas arches with STACK_GROWSUP=y could do roughly the reverse, and x86 can 
> do
> whatever it needs to depending on whether the thread_info is on or off stack.
>
> cheers

Yeah, I agree: this should be in the arch code. If the arch can
actually do frame checking, the thread_info (if it exists on the
stack) would already be excluded. But it'd be a nice tightening of the
check.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

Re: [PATCH v3 02/11] mm: Hardened usercopy

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 7:03 PM, Michael Ellerman  wrote:
> Josh Poimboeuf  writes:
>
>> On Thu, Jul 21, 2016 at 11:34:25AM -0700, Kees Cook wrote:
>>> On Wed, Jul 20, 2016 at 11:52 PM, Michael Ellerman  
>>> wrote:
>>> > Kees Cook  writes:
>>> >
>>> >> diff --git a/mm/usercopy.c b/mm/usercopy.c
>>> >> new file mode 100644
>>> >> index ..e4bf4e7ccdf6
>>> >> --- /dev/null
>>> >> +++ b/mm/usercopy.c
>>> >> @@ -0,0 +1,234 @@
>>> > ...
>>> >> +
>>> >> +/*
>>> >> + * Checks if a given pointer and length is contained by the current
>>> >> + * stack frame (if possible).
>>> >> + *
>>> >> + *   0: not at all on the stack
>>> >> + *   1: fully within a valid stack frame
>>> >> + *   2: fully on the stack (when can't do frame-checking)
>>> >> + *   -1: error condition (invalid stack position or bad stack frame)
>>> >> + */
>>> >> +static noinline int check_stack_object(const void *obj, unsigned long 
>>> >> len)
>>> >> +{
>>> >> + const void * const stack = task_stack_page(current);
>>> >> + const void * const stackend = stack + THREAD_SIZE;
>>> >
>>> > That allows access to the entire stack, including the struct thread_info,
>>> > is that what we want - it seems dangerous? Or did I miss a check
>>> > somewhere else?
>>>
>>> That seems like a nice improvement to make, yeah.
>>>
>>> > We have end_of_stack() which computes the end of the stack taking
>>> > thread_info into account (end being the opposite of your end above).
>>>
>>> Amusingly, the object_is_on_stack() check in sched.h doesn't take
>>> thread_info into account either. :P Regardless, I think using
>>> end_of_stack() may not be best. To tighten the check, I think we could
>>> add this after checking that the object is on the stack:
>>>
>>> #ifdef CONFIG_STACK_GROWSUP
>>> stackend -= sizeof(struct thread_info);
>>> #else
>>> stack += sizeof(struct thread_info);
>>> #endif
>>>
>>> e.g. then if the pointer was in the thread_info, the second test would
>>> fail, triggering the protection.
>>
>> FWIW, this won't work right on x86 after Andy's
>> CONFIG_THREAD_INFO_IN_TASK patches get merged.
>
> Yeah. I wonder if it's better for the arch helper to just take the obj and 
> len,
> and work out it's own bounds for the stack using current and whatever makes
> sense on that arch.
>
> It would avoid too much ifdefery in the generic code, and also avoid any
> confusion about whether stackend is the high or low address.
>
> eg. on powerpc we could do:
>
> int noinline arch_within_stack_frames(const void *obj, unsigned long len)
> {
> void *stack_low  = end_of_stack(current);
> void *stack_high = task_stack_page(current) + THREAD_SIZE;
>
>
> Whereas arches with STACK_GROWSUP=y could do roughly the reverse, and x86 can 
> do
> whatever it needs to depending on whether the thread_info is on or off stack.
>
> cheers

Yeah, I agree: this should be in the arch code. If the arch can
actually do frame checking, the thread_info (if it exists on the
stack) would already be excluded. But it'd be a nice tightening of the
check.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

Re: [PATCH 1/3] net: asix: Add in_pm parameter

2016-07-25 Thread David Miller


Please correct the problems Grant Grundler mentioned in all of these
patches, and resubmit this entire series freshly.

Also, please include a proper "[PATCH 0/3] ..." introduction posting
for the series which explains what this series is about, how it
implements what it is doing, and why it is doing things that way.

Thanks.

Re: [PATCH 1/3] net: asix: Add in_pm parameter

2016-07-25 Thread David Miller


Please correct the problems Grant Grundler mentioned in all of these
patches, and resubmit this entire series freshly.

Also, please include a proper "[PATCH 0/3] ..." introduction posting
for the series which explains what this series is about, how it
implements what it is doing, and why it is doing things that way.

Thanks.

[PATCH] powerpc: sgy_cts1000: Fix gpio_halt_cb()'s signature

2016-07-25 Thread Andrey Smirnov

Halt callback in struct machdep_calls is declared with __noreturn
attribute, so omitting that attribute in gpio_halt_cb()'s signatrue
results in compilation error.

Change the signature to address the problem as well as change the code
of the function to avoid ever returning from the function.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/sgy_cts1000.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/sgy_cts1000.c 
b/arch/powerpc/platforms/85xx/sgy_cts1000.c
index 79fd0df..21d6aaa 100644
--- a/arch/powerpc/platforms/85xx/sgy_cts1000.c
+++ b/arch/powerpc/platforms/85xx/sgy_cts1000.c
@@ -38,18 +38,18 @@ static void gpio_halt_wfn(struct work_struct *work)
 }
 static DECLARE_WORK(gpio_halt_wq, gpio_halt_wfn);
 
-static void gpio_halt_cb(void)
+static void __noreturn gpio_halt_cb(void)
 {
enum of_gpio_flags flags;
int trigger, gpio;
 
if (!halt_node)
-   return;
+   panic("No reset GPIO information was provided in DT\n");
 
gpio = of_get_gpio_flags(halt_node, 0, );
 
if (!gpio_is_valid(gpio))
-   return;
+   panic("Provided GPIO is invalid\n");
 
trigger = (flags == OF_GPIO_ACTIVE_LOW);
 
@@ -57,6 +57,8 @@ static void gpio_halt_cb(void)
 
/* Probably wont return */
gpio_set_value(gpio, trigger);
+
+   panic("Halt failed\n");
 }
 
 /* This IRQ means someone pressed the power button and it is waiting for us
-- 
2.5.5

[PATCH] powerpc: sgy_cts1000: Fix gpio_halt_cb()'s signature

2016-07-25 Thread Andrey Smirnov

Halt callback in struct machdep_calls is declared with __noreturn
attribute, so omitting that attribute in gpio_halt_cb()'s signatrue
results in compilation error.

Change the signature to address the problem as well as change the code
of the function to avoid ever returning from the function.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/sgy_cts1000.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/sgy_cts1000.c 
b/arch/powerpc/platforms/85xx/sgy_cts1000.c
index 79fd0df..21d6aaa 100644
--- a/arch/powerpc/platforms/85xx/sgy_cts1000.c
+++ b/arch/powerpc/platforms/85xx/sgy_cts1000.c
@@ -38,18 +38,18 @@ static void gpio_halt_wfn(struct work_struct *work)
 }
 static DECLARE_WORK(gpio_halt_wq, gpio_halt_wfn);
 
-static void gpio_halt_cb(void)
+static void __noreturn gpio_halt_cb(void)
 {
enum of_gpio_flags flags;
int trigger, gpio;
 
if (!halt_node)
-   return;
+   panic("No reset GPIO information was provided in DT\n");
 
gpio = of_get_gpio_flags(halt_node, 0, );
 
if (!gpio_is_valid(gpio))
-   return;
+   panic("Provided GPIO is invalid\n");
 
trigger = (flags == OF_GPIO_ACTIVE_LOW);
 
@@ -57,6 +57,8 @@ static void gpio_halt_cb(void)
 
/* Probably wont return */
gpio_set_value(gpio, trigger);
+
+   panic("Halt failed\n");
 }
 
 /* This IRQ means someone pressed the power button and it is waiting for us
-- 
2.5.5

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 8:01 PM, Jason Cooper  wrote:
> To date, all callers of randomize_range() have set the length to 0, and
> check for a zero return value.  For the current callers, the only way
> to get zero returned is if end <= start.  Since they are all adding a
> constant to the start address, this is unnecessary.
>
> We can remove a bunch of needless checks by simplifying the API to do
> just what everyone wants, return an address between [start, start +
> range].
>
> While we're here, s/get_random_int/get_random_long/.  No current call
> site is adversely affected by get_random_int(), since all current range
> requests are < MAX_UINT.  However, we should match caller expectations
> to avoid coming up short (ha!) in the future.
>
> Signed-off-by: Jason Cooper 
> ---
>  drivers/char/random.c  | 17 -
>  include/linux/random.h |  2 +-
>  2 files changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 0158d3bff7e5..1251cb2cbab2 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>  EXPORT_SYMBOL(get_random_long);
>
>  /*
> - * randomize_range() returns a start address such that
> - *
> - *[..  .]
> - *  start  end
> - *
> - * a  with size "len" starting at the return value is inside in the
> - * area defined by [start, end], but is otherwise randomized.
> + * randomize_addr() returns a page aligned address within [start, start +
> + * range]
>   */
>  unsigned long
> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
> +randomize_addr(unsigned long start, unsigned long range)

Also, this series isn't bisectable since randomize_range gets removed
here before the callers are updated. Perhaps add a macro that calls
randomize_addr with a BUG_ON for len != 0? (And then remove it in the
last patch?)

-Kees

>  {
> -   unsigned long range = end - len - start;
> -
> -   if (end <= start + len)
> -   return 0;
> -   return PAGE_ALIGN(get_random_int() % range + start);
> +   return PAGE_ALIGN(get_random_long() % range + start);
>  }
>
>  /* Interface for in-kernel drivers of true hardware RNGs.
> diff --git a/include/linux/random.h b/include/linux/random.h
> index e47e533742b5..1ad877a98186 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
> urandom_fops;
>
>  unsigned int get_random_int(void);
>  unsigned long get_random_long(void);
> -unsigned long randomize_range(unsigned long start, unsigned long end, 
> unsigned long len);
> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>
>  u32 prandom_u32(void);
>  void prandom_bytes(void *buf, size_t nbytes);
> --
> 2.9.2
>



-- 
Kees Cook
Chrome OS & Brillo Security

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 8:01 PM, Jason Cooper  wrote:
> To date, all callers of randomize_range() have set the length to 0, and
> check for a zero return value.  For the current callers, the only way
> to get zero returned is if end <= start.  Since they are all adding a
> constant to the start address, this is unnecessary.
>
> We can remove a bunch of needless checks by simplifying the API to do
> just what everyone wants, return an address between [start, start +
> range].
>
> While we're here, s/get_random_int/get_random_long/.  No current call
> site is adversely affected by get_random_int(), since all current range
> requests are < MAX_UINT.  However, we should match caller expectations
> to avoid coming up short (ha!) in the future.
>
> Signed-off-by: Jason Cooper 
> ---
>  drivers/char/random.c  | 17 -
>  include/linux/random.h |  2 +-
>  2 files changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 0158d3bff7e5..1251cb2cbab2 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>  EXPORT_SYMBOL(get_random_long);
>
>  /*
> - * randomize_range() returns a start address such that
> - *
> - *[..  .]
> - *  start  end
> - *
> - * a  with size "len" starting at the return value is inside in the
> - * area defined by [start, end], but is otherwise randomized.
> + * randomize_addr() returns a page aligned address within [start, start +
> + * range]
>   */
>  unsigned long
> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
> +randomize_addr(unsigned long start, unsigned long range)

Also, this series isn't bisectable since randomize_range gets removed
here before the callers are updated. Perhaps add a macro that calls
randomize_addr with a BUG_ON for len != 0? (And then remove it in the
last patch?)

-Kees

>  {
> -   unsigned long range = end - len - start;
> -
> -   if (end <= start + len)
> -   return 0;
> -   return PAGE_ALIGN(get_random_int() % range + start);
> +   return PAGE_ALIGN(get_random_long() % range + start);
>  }
>
>  /* Interface for in-kernel drivers of true hardware RNGs.
> diff --git a/include/linux/random.h b/include/linux/random.h
> index e47e533742b5..1ad877a98186 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
> urandom_fops;
>
>  unsigned int get_random_int(void);
>  unsigned long get_random_long(void);
> -unsigned long randomize_range(unsigned long start, unsigned long end, 
> unsigned long len);
> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>
>  u32 prandom_u32(void);
>  void prandom_bytes(void *buf, size_t nbytes);
> --
> 2.9.2
>



-- 
Kees Cook
Chrome OS & Brillo Security

Re: [kbuild-all] arch/xtensa/include/asm/initialize_mmu.h:55: Error: invalid register 'atomctl' for 'wsr' instruction

2016-07-25 Thread Fengguang Wu


Hi Max,

On Tue, Jul 26, 2016 at 02:20:25AM +0300, Max Filippov wrote:

Hi Fengguang,

On Fri, Jul 22, 2016 at 3:44 PM, Fengguang Wu  wrote:

On Fri, Jul 22, 2016 at 06:32:28PM +0800, kbuild test robot wrote:

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master
head:   47ef4ad2684d380dd6d596140fb79395115c3950
commit: 9da8320bb97768e35f2e64fa7642015271d672eb xtensa: add
test_kc705_hifi variant
date:   4 months ago
config: xtensa-audio_kc705_defconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0



All errors (new ones prefixed by >>):

  arch/xtensa/include/asm/initialize_mmu.h: Assembler messages:


arch/xtensa/include/asm/initialize_mmu.h:55: Error: invalid register
'atomctl' for 'wsr' instruction


--
  arch/xtensa/kernel/coprocessor.S: Assembler messages:


arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_ovf_sar'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_bithead'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_ts_fts_bu_bp'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cw_sd_no'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cbegin0'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cend0'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'ae_s64.i'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'ae_s64.i'


Are they really matter? Or I can shut these errors up.


Looks like I haven't supplied you with the compiler for test_kc705_hifi, for
which these errors are reported. I've built it and put it here:

 
http://jcmvbkbc.spb.ru/~jcmvbkbc/tmp/201604261801/x86_64-gcc-5.3.0-nolibc-xtensa-test_kc705_hifi-elf.tar.xz

Please integrate it into your system along with other xtensa compilers.


OK, done. :)

Thanks,
Fengguang

Re: [kbuild-all] arch/xtensa/include/asm/initialize_mmu.h:55: Error: invalid register 'atomctl' for 'wsr' instruction

2016-07-25 Thread Fengguang Wu


Hi Max,

On Tue, Jul 26, 2016 at 02:20:25AM +0300, Max Filippov wrote:

Hi Fengguang,

On Fri, Jul 22, 2016 at 3:44 PM, Fengguang Wu  wrote:

On Fri, Jul 22, 2016 at 06:32:28PM +0800, kbuild test robot wrote:

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master
head:   47ef4ad2684d380dd6d596140fb79395115c3950
commit: 9da8320bb97768e35f2e64fa7642015271d672eb xtensa: add
test_kc705_hifi variant
date:   4 months ago
config: xtensa-audio_kc705_defconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0



All errors (new ones prefixed by >>):

  arch/xtensa/include/asm/initialize_mmu.h: Assembler messages:


arch/xtensa/include/asm/initialize_mmu.h:55: Error: invalid register
'atomctl' for 'wsr' instruction


--
  arch/xtensa/kernel/coprocessor.S: Assembler messages:


arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_ovf_sar'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_bithead'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_ts_fts_bu_bp'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cw_sd_no'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cbegin0'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'rur.ae_cend0'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'ae_s64.i'
arch/xtensa/kernel/coprocessor.S:93: Error: unknown opcode or format
name 'ae_s64.i'


Are they really matter? Or I can shut these errors up.


Looks like I haven't supplied you with the compiler for test_kc705_hifi, for
which these errors are reported. I've built it and put it here:

 
http://jcmvbkbc.spb.ru/~jcmvbkbc/tmp/201604261801/x86_64-gcc-5.3.0-nolibc-xtensa-test_kc705_hifi-elf.tar.xz

Please integrate it into your system along with other xtensa compilers.


OK, done. :)

Thanks,
Fengguang

Re: [PATCH] caif-hsi: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread David Miller

From: Bhaktipriya Shridhar 
Date: Mon, 25 Jul 2016 18:40:57 +0530

> alloc_workqueue replaces deprecated create_singlethread_workqueue().
> 
> A dedicated workqueue has been used since the workitems are being used
> on a packet tx/rx path. Hence, WQ_MEM_RECLAIM has been set to guarantee
> forward progress under memory pressure.
> 
> An ordered workqueue has been used since workitems >wake_up_work
> and >wake_down_work cannot be run concurrently.
> 
> Calls to flush_workqueue() before destroy_workqueue() have been dropped
> since destroy_workqueue() itself calls drain_workqueue() which flushes
> repeatedly till the workqueue becomes empty.
> 
> Signed-off-by: Bhaktipriya Shridhar 

Applied.

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 8:30 PM, Jason Cooper  wrote:
> All,
>
> On Tue, Jul 26, 2016 at 03:01:55AM +, Jason Cooper wrote:
>> To date, all callers of randomize_range() have set the length to 0, and
>> check for a zero return value.  For the current callers, the only way
>> to get zero returned is if end <= start.  Since they are all adding a
>> constant to the start address, this is unnecessary.
>>
>> We can remove a bunch of needless checks by simplifying the API to do
>> just what everyone wants, return an address between [start, start +
>> range].
>>
>> While we're here, s/get_random_int/get_random_long/.  No current call
>> site is adversely affected by get_random_int(), since all current range
>> requests are < MAX_UINT.  However, we should match caller expectations
>> to avoid coming up short (ha!) in the future.
>>
>> Signed-off-by: Jason Cooper 
>> ---
>>  drivers/char/random.c  | 17 -
>>  include/linux/random.h |  2 +-
>>  2 files changed, 5 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index 0158d3bff7e5..1251cb2cbab2 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>>  EXPORT_SYMBOL(get_random_long);
>>
>>  /*
>> - * randomize_range() returns a start address such that
>> - *
>> - *[..  .]
>> - *  start  end
>> - *
>> - * a  with size "len" starting at the return value is inside in the
>> - * area defined by [start, end], but is otherwise randomized.
>> + * randomize_addr() returns a page aligned address within [start, start +
>> + * range]
>>   */
>>  unsigned long
>> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
>> +randomize_addr(unsigned long start, unsigned long range)
>>  {
>> - unsigned long range = end - len - start;
>> -
>> - if (end <= start + len)
>> - return 0;
>> - return PAGE_ALIGN(get_random_int() % range + start);
>> + return PAGE_ALIGN(get_random_long() % range + start);
>>  }
>
> bah!  old patch file.  This should have been:
>
> if (range == 0)
> return start;
> else
> return PAGE_ALIGN(get_random_long() % range + start);

I think range should be limited to start + range < UINTMAX, and it
should be very clear if the range is inclusive or exclusive.  start =
0, range = 4096. does this mean 1 page, or 2 pages possible?

-Kees

>
> sorry,
>
> Jason.
>
>>
>>  /* Interface for in-kernel drivers of true hardware RNGs.
>> diff --git a/include/linux/random.h b/include/linux/random.h
>> index e47e533742b5..1ad877a98186 100644
>> --- a/include/linux/random.h
>> +++ b/include/linux/random.h
>> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
>> urandom_fops;
>>
>>  unsigned int get_random_int(void);
>>  unsigned long get_random_long(void);
>> -unsigned long randomize_range(unsigned long start, unsigned long end, 
>> unsigned long len);
>> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>>
>>  u32 prandom_u32(void);
>>  void prandom_bytes(void *buf, size_t nbytes);
>> --
>> 2.9.2
>>



-- 
Kees Cook
Chrome OS & Brillo Security

Re: [PATCH] caif-hsi: Remove deprecated create_singlethread_workqueue

2016-07-25 Thread David Miller

From: Bhaktipriya Shridhar 
Date: Mon, 25 Jul 2016 18:40:57 +0530

> alloc_workqueue replaces deprecated create_singlethread_workqueue().
> 
> A dedicated workqueue has been used since the workitems are being used
> on a packet tx/rx path. Hence, WQ_MEM_RECLAIM has been set to guarantee
> forward progress under memory pressure.
> 
> An ordered workqueue has been used since workitems >wake_up_work
> and >wake_down_work cannot be run concurrently.
> 
> Calls to flush_workqueue() before destroy_workqueue() have been dropped
> since destroy_workqueue() itself calls drain_workqueue() which flushes
> repeatedly till the workqueue becomes empty.
> 
> Signed-off-by: Bhaktipriya Shridhar 

Applied.

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Kees Cook

On Mon, Jul 25, 2016 at 8:30 PM, Jason Cooper  wrote:
> All,
>
> On Tue, Jul 26, 2016 at 03:01:55AM +, Jason Cooper wrote:
>> To date, all callers of randomize_range() have set the length to 0, and
>> check for a zero return value.  For the current callers, the only way
>> to get zero returned is if end <= start.  Since they are all adding a
>> constant to the start address, this is unnecessary.
>>
>> We can remove a bunch of needless checks by simplifying the API to do
>> just what everyone wants, return an address between [start, start +
>> range].
>>
>> While we're here, s/get_random_int/get_random_long/.  No current call
>> site is adversely affected by get_random_int(), since all current range
>> requests are < MAX_UINT.  However, we should match caller expectations
>> to avoid coming up short (ha!) in the future.
>>
>> Signed-off-by: Jason Cooper 
>> ---
>>  drivers/char/random.c  | 17 -
>>  include/linux/random.h |  2 +-
>>  2 files changed, 5 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index 0158d3bff7e5..1251cb2cbab2 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>>  EXPORT_SYMBOL(get_random_long);
>>
>>  /*
>> - * randomize_range() returns a start address such that
>> - *
>> - *[..  .]
>> - *  start  end
>> - *
>> - * a  with size "len" starting at the return value is inside in the
>> - * area defined by [start, end], but is otherwise randomized.
>> + * randomize_addr() returns a page aligned address within [start, start +
>> + * range]
>>   */
>>  unsigned long
>> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
>> +randomize_addr(unsigned long start, unsigned long range)
>>  {
>> - unsigned long range = end - len - start;
>> -
>> - if (end <= start + len)
>> - return 0;
>> - return PAGE_ALIGN(get_random_int() % range + start);
>> + return PAGE_ALIGN(get_random_long() % range + start);
>>  }
>
> bah!  old patch file.  This should have been:
>
> if (range == 0)
> return start;
> else
> return PAGE_ALIGN(get_random_long() % range + start);

I think range should be limited to start + range < UINTMAX, and it
should be very clear if the range is inclusive or exclusive.  start =
0, range = 4096. does this mean 1 page, or 2 pages possible?

-Kees

>
> sorry,
>
> Jason.
>
>>
>>  /* Interface for in-kernel drivers of true hardware RNGs.
>> diff --git a/include/linux/random.h b/include/linux/random.h
>> index e47e533742b5..1ad877a98186 100644
>> --- a/include/linux/random.h
>> +++ b/include/linux/random.h
>> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
>> urandom_fops;
>>
>>  unsigned int get_random_int(void);
>>  unsigned long get_random_long(void);
>> -unsigned long randomize_range(unsigned long start, unsigned long end, 
>> unsigned long len);
>> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>>
>>  u32 prandom_u32(void);
>>  void prandom_bytes(void *buf, size_t nbytes);
>> --
>> 2.9.2
>>



-- 
Kees Cook
Chrome OS & Brillo Security

Re: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread David Miller

From: Dexuan Cui 
Date: Tue, 26 Jul 2016 03:09:16 +

> BTW, during the past month, at least 7 other people also reviewed
> the patch and gave me quite a few good comments, which have
> been addressed.

Correction: Several people gave coding style and simple corrections
to your patch.

Very few gave any review of the _SUBSTANCE_ of your changes.

And the one of the few who did, and suggested you build your
facilities using the existing S390 hypervisor socket infrastructure,
you brushed off _IMMEDIATELY_.

That drives me crazy.  The one person who gave you real feedback
you basically didn't consider seriously at all.

I know why you don't want to consider alternative implementations,
and it's because you guys have so much invested in what you've
implemented already.

But that's tough and not our problem.

And until this changes, yes, this submission will be stuck in the
mud and continue slogging on like this.

Sorry.

Re: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread David Miller

From: Dexuan Cui 
Date: Tue, 26 Jul 2016 03:09:16 +

> BTW, during the past month, at least 7 other people also reviewed
> the patch and gave me quite a few good comments, which have
> been addressed.

Correction: Several people gave coding style and simple corrections
to your patch.

Very few gave any review of the _SUBSTANCE_ of your changes.

And the one of the few who did, and suggested you build your
facilities using the existing S390 hypervisor socket infrastructure,
you brushed off _IMMEDIATELY_.

That drives me crazy.  The one person who gave you real feedback
you basically didn't consider seriously at all.

I know why you don't want to consider alternative implementations,
and it's because you guys have so much invested in what you've
implemented already.

But that's tough and not our problem.

And until this changes, yes, this submission will be stuck in the
mud and continue slogging on like this.

Sorry.

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread Alan Curry

Christian Lamparter wrote:
> 
> As for carl9170: I'm not sure what the driver or firmware can do about
> this at this time. You can try to disable the hardware crypto by setting
> nohwcrypt via the module option. However, this might not do anything at all.

The nohwcrypt parameter didn't make any difference.

> > 
> > lsusb identifies my network device as:
> > 
> > Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link 
> > TL-WN821N v2 802.11n [Atheros AR9170]
> > 
> > I have version 1.9.9 of carl9170-1.fw in /lib/firmware
> Just one additional question: Is the TL-WN821N connected to a USB3 port?

It never has been before. I tried it today and it made no difference.

-- 
Alan Curry

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-25 Thread Alan Curry

Christian Lamparter wrote:
> 
> As for carl9170: I'm not sure what the driver or firmware can do about
> this at this time. You can try to disable the hardware crypto by setting
> nohwcrypt via the module option. However, this might not do anything at all.

The nohwcrypt parameter didn't make any difference.

> > 
> > lsusb identifies my network device as:
> > 
> > Bus 005 Device 004: ID 0cf3:1002 Atheros Communications, Inc. TP-Link 
> > TL-WN821N v2 802.11n [Atheros AR9170]
> > 
> > I have version 1.9.9 of carl9170-1.fw in /lib/firmware
> Just one additional question: Is the TL-WN821N connected to a USB3 port?

It never has been before. I tried it today and it made no difference.

-- 
Alan Curry

[PATCH 1/2] powerpc: mpc85xx_mds: Select PHYLIB only if NETDEVICES is enabled

2016-07-25 Thread Andrey Smirnov

PHYLIB depends on NETDEVICES, so to avoid unmet dependencies warning
from Kconfig it needs to be selected conditionally.

Also add checks if PHYLIB is built-in to avoid undefined references to
PHYLIB's symbols.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/Kconfig   | 2 +-
 arch/powerpc/platforms/85xx/mpc85xx_mds.c | 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index e626461..3da35bc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -72,7 +72,7 @@ config MPC85xx_CDS
 config MPC85xx_MDS
bool "Freescale MPC85xx MDS"
select DEFAULT_UIMAGE
-   select PHYLIB
+   select PHYLIB if NETDEVICES
select HAS_RAPIDIO
select SWIOTLB
help
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_mds.c 
b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
index dbcb467..71aff5e 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_mds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
@@ -63,6 +63,8 @@
 #define DBG(fmt...)
 #endif
 
+#if IS_BUILTIN(CONFIG_PHYLIB)
+
 #define MV88E_SCR  0x10
 #define MV88E_SCR_125CLK   0x0010
 static int mpc8568_fixup_125_clock(struct phy_device *phydev)
@@ -152,6 +154,8 @@ static int mpc8568_mds_phy_fixups(struct phy_device *phydev)
return err;
 }
 
+#endif
+
 /* 
  *
  * Setup the architecture
@@ -313,6 +317,7 @@ static void __init mpc85xx_mds_setup_arch(void)
swiotlb_detect_4g();
 }
 
+#if IS_BUILTIN(CONFIG_PHYLIB)
 
 static int __init board_fixups(void)
 {
@@ -342,9 +347,12 @@ static int __init board_fixups(void)
 
return 0;
 }
+
 machine_arch_initcall(mpc8568_mds, board_fixups);
 machine_arch_initcall(mpc8569_mds, board_fixups);
 
+#endif
+
 static int __init mpc85xx_publish_devices(void)
 {
if (machine_is(mpc8568_mds))
@@ -435,4 +443,3 @@ define_machine(p1021_mds) {
.pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
 #endif
 };
-
-- 
2.5.5

[PATCH 1/2] powerpc: mpc85xx_mds: Select PHYLIB only if NETDEVICES is enabled

2016-07-25 Thread Andrey Smirnov

PHYLIB depends on NETDEVICES, so to avoid unmet dependencies warning
from Kconfig it needs to be selected conditionally.

Also add checks if PHYLIB is built-in to avoid undefined references to
PHYLIB's symbols.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/Kconfig   | 2 +-
 arch/powerpc/platforms/85xx/mpc85xx_mds.c | 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index e626461..3da35bc 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -72,7 +72,7 @@ config MPC85xx_CDS
 config MPC85xx_MDS
bool "Freescale MPC85xx MDS"
select DEFAULT_UIMAGE
-   select PHYLIB
+   select PHYLIB if NETDEVICES
select HAS_RAPIDIO
select SWIOTLB
help
diff --git a/arch/powerpc/platforms/85xx/mpc85xx_mds.c 
b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
index dbcb467..71aff5e 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_mds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_mds.c
@@ -63,6 +63,8 @@
 #define DBG(fmt...)
 #endif
 
+#if IS_BUILTIN(CONFIG_PHYLIB)
+
 #define MV88E_SCR  0x10
 #define MV88E_SCR_125CLK   0x0010
 static int mpc8568_fixup_125_clock(struct phy_device *phydev)
@@ -152,6 +154,8 @@ static int mpc8568_mds_phy_fixups(struct phy_device *phydev)
return err;
 }
 
+#endif
+
 /* 
  *
  * Setup the architecture
@@ -313,6 +317,7 @@ static void __init mpc85xx_mds_setup_arch(void)
swiotlb_detect_4g();
 }
 
+#if IS_BUILTIN(CONFIG_PHYLIB)
 
 static int __init board_fixups(void)
 {
@@ -342,9 +347,12 @@ static int __init board_fixups(void)
 
return 0;
 }
+
 machine_arch_initcall(mpc8568_mds, board_fixups);
 machine_arch_initcall(mpc8569_mds, board_fixups);
 
+#endif
+
 static int __init mpc85xx_publish_devices(void)
 {
if (machine_is(mpc8568_mds))
@@ -435,4 +443,3 @@ define_machine(p1021_mds) {
.pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
 #endif
 };
-
-- 
2.5.5

[PATCH 2/2] powerpc: e8248e: Select PHYLIB only if NETDEVICES is enabled

2016-07-25 Thread Andrey Smirnov

Select PHYLIB only if NETDEVICES is enabled and MDIO_BITBANG only if
PHYLIB is present to avoid warnings from Kconfig.

To prevent undefined references during linking register MDIO driver only
if CONFIG_MDIO_BITBANG is enabled.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/82xx/Kconfig   | 4 ++--
 arch/powerpc/platforms/82xx/ep8248e.c | 4 +++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/82xx/Kconfig 
b/arch/powerpc/platforms/82xx/Kconfig
index 7c7df400..994d1a9 100644
--- a/arch/powerpc/platforms/82xx/Kconfig
+++ b/arch/powerpc/platforms/82xx/Kconfig
@@ -30,8 +30,8 @@ config EP8248E
select 8272
select 8260
select FSL_SOC
-   select PHYLIB
-   select MDIO_BITBANG
+   select PHYLIB if NETDEVICES
+   select MDIO_BITBANG if PHYLIB
help
  This enables support for the Embedded Planet EP8248E board.
 
diff --git a/arch/powerpc/platforms/82xx/ep8248e.c 
b/arch/powerpc/platforms/82xx/ep8248e.c
index cdab847..8fec050 100644
--- a/arch/powerpc/platforms/82xx/ep8248e.c
+++ b/arch/powerpc/platforms/82xx/ep8248e.c
@@ -298,7 +298,9 @@ static const struct of_device_id of_bus_ids[] __initconst = 
{
 static int __init declare_of_platform_devices(void)
 {
of_platform_bus_probe(NULL, of_bus_ids, NULL);
-   platform_driver_register(_mdio_driver);
+
+   if (IS_ENABLED(CONFIG_MDIO_BITBANG))
+   platform_driver_register(_mdio_driver);
 
return 0;
 }
-- 
2.5.5

[PATCH 2/2] powerpc: e8248e: Select PHYLIB only if NETDEVICES is enabled

2016-07-25 Thread Andrey Smirnov

Select PHYLIB only if NETDEVICES is enabled and MDIO_BITBANG only if
PHYLIB is present to avoid warnings from Kconfig.

To prevent undefined references during linking register MDIO driver only
if CONFIG_MDIO_BITBANG is enabled.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/82xx/Kconfig   | 4 ++--
 arch/powerpc/platforms/82xx/ep8248e.c | 4 +++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/82xx/Kconfig 
b/arch/powerpc/platforms/82xx/Kconfig
index 7c7df400..994d1a9 100644
--- a/arch/powerpc/platforms/82xx/Kconfig
+++ b/arch/powerpc/platforms/82xx/Kconfig
@@ -30,8 +30,8 @@ config EP8248E
select 8272
select 8260
select FSL_SOC
-   select PHYLIB
-   select MDIO_BITBANG
+   select PHYLIB if NETDEVICES
+   select MDIO_BITBANG if PHYLIB
help
  This enables support for the Embedded Planet EP8248E board.
 
diff --git a/arch/powerpc/platforms/82xx/ep8248e.c 
b/arch/powerpc/platforms/82xx/ep8248e.c
index cdab847..8fec050 100644
--- a/arch/powerpc/platforms/82xx/ep8248e.c
+++ b/arch/powerpc/platforms/82xx/ep8248e.c
@@ -298,7 +298,9 @@ static const struct of_device_id of_bus_ids[] __initconst = 
{
 static int __init declare_of_platform_devices(void)
 {
of_platform_bus_probe(NULL, of_bus_ids, NULL);
-   platform_driver_register(_mdio_driver);
+
+   if (IS_ENABLED(CONFIG_MDIO_BITBANG))
+   platform_driver_register(_mdio_driver);
 
return 0;
 }
-- 
2.5.5

[PATCH 2/3] powerpc: Call chained reset handlers during reset

2016-07-25 Thread Andrey Smirnov

Call out to all restart handlers that were added via
register_restart_handler() API when restarting the machine.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/kernel/setup-common.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 5cd3283..205d073 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -145,6 +145,10 @@ void machine_restart(char *cmd)
ppc_md.restart(cmd);
 
smp_send_stop();
+
+   do_kernel_restart(cmd);
+   mdelay(1000);
+
machine_hang();
 }
 
-- 
2.5.5

[PATCH 3/3] powerpc: Convert fsl_rstcr_restart to a reset handler

2016-07-25 Thread Andrey Smirnov

Convert fsl_rstcr_restart into a function to be registered with
register_reset_handler() API and introduce fls_rstcr_restart_register()
function that can be added as an initcall that would do aforementioned
registration.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/bsc913x_qds.c |  2 +-
 arch/powerpc/platforms/85xx/bsc913x_rdb.c |  2 +-
 arch/powerpc/platforms/85xx/c293pcie.c|  2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c |  2 +-
 arch/powerpc/platforms/85xx/ge_imp3a.c|  2 +-
 arch/powerpc/platforms/85xx/mpc8536_ds.c  |  2 +-
 arch/powerpc/platforms/85xx/mpc85xx_ads.c |  2 +-
 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 26 +++---
 arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  7 ---
 arch/powerpc/platforms/85xx/mpc85xx_mds.c |  7 ---
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 21 +++--
 arch/powerpc/platforms/85xx/mvme2500.c|  2 +-
 arch/powerpc/platforms/85xx/p1010rdb.c|  2 +-
 arch/powerpc/platforms/85xx/p1022_ds.c|  2 +-
 arch/powerpc/platforms/85xx/p1022_rdk.c   |  3 ++-
 arch/powerpc/platforms/85xx/p1023_rdb.c   |  2 +-
 arch/powerpc/platforms/85xx/ppa8548.c |  2 +-
 arch/powerpc/platforms/85xx/qemu_e500.c   |  2 +-
 arch/powerpc/platforms/85xx/sbc8548.c |  2 +-
 arch/powerpc/platforms/85xx/socrates.c|  2 +-
 arch/powerpc/platforms/85xx/stx_gp3.c |  2 +-
 arch/powerpc/platforms/85xx/tqm85xx.c |  2 +-
 arch/powerpc/platforms/85xx/twr_p102x.c   |  2 +-
 arch/powerpc/platforms/85xx/xes_mpc85xx.c |  7 ---
 arch/powerpc/platforms/86xx/gef_ppc9a.c   |  2 +-
 arch/powerpc/platforms/86xx/gef_sbc310.c  |  2 +-
 arch/powerpc/platforms/86xx/gef_sbc610.c  |  2 +-
 arch/powerpc/platforms/86xx/mpc8610_hpcd.c|  2 +-
 arch/powerpc/platforms/86xx/mpc86xx_hpcn.c|  2 +-
 arch/powerpc/platforms/86xx/sbc8641d.c|  2 +-
 arch/powerpc/sysdev/fsl_soc.c | 22 +-
 arch/powerpc/sysdev/fsl_soc.h |  2 +-
 32 files changed, 86 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/bsc913x_qds.c 
b/arch/powerpc/platforms/85xx/bsc913x_qds.c
index 07dd6ae..14ea7a0 100644
--- a/arch/powerpc/platforms/85xx/bsc913x_qds.c
+++ b/arch/powerpc/platforms/85xx/bsc913x_qds.c
@@ -53,6 +53,7 @@ static void __init bsc913x_qds_setup_arch(void)
 }
 
 machine_arch_initcall(bsc9132_qds, mpc85xx_common_publish_devices);
+machine_arch_initcall(bsc9133_qds, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -72,7 +73,6 @@ define_machine(bsc9132_qds) {
.pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
 #endif
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/bsc913x_rdb.c 
b/arch/powerpc/platforms/85xx/bsc913x_rdb.c
index e48f671..cd4e717 100644
--- a/arch/powerpc/platforms/85xx/bsc913x_rdb.c
+++ b/arch/powerpc/platforms/85xx/bsc913x_rdb.c
@@ -43,6 +43,7 @@ static void __init bsc913x_rdb_setup_arch(void)
 }
 
 machine_device_initcall(bsc9131_rdb, mpc85xx_common_publish_devices);
+machine_arch_initcall(bsc9131_rdb, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -59,7 +60,6 @@ define_machine(bsc9131_rdb) {
.setup_arch = bsc913x_rdb_setup_arch,
.init_IRQ   = bsc913x_rdb_pic_init,
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/c293pcie.c 
b/arch/powerpc/platforms/85xx/c293pcie.c
index 3b9e3f0..fbd63f9 100644
--- a/arch/powerpc/platforms/85xx/c293pcie.c
+++ b/arch/powerpc/platforms/85xx/c293pcie.c
@@ -48,6 +48,7 @@ static void __init c293_pcie_setup_arch(void)
 }
 
 machine_arch_initcall(c293_pcie, mpc85xx_common_publish_devices);
+machine_arch_initcall(c293_pcie, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -65,7 +66,6 @@ define_machine(c293_pcie) {
.setup_arch = c293_pcie_setup_arch,
.init_IRQ   = c293_pcie_pic_init,
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
index 3a6a84f..297379b 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -225,7 +225,6 @@ define_machine(corenet_generic) {

[PATCH 2/3] powerpc: Call chained reset handlers during reset

2016-07-25 Thread Andrey Smirnov

Call out to all restart handlers that were added via
register_restart_handler() API when restarting the machine.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/kernel/setup-common.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 5cd3283..205d073 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -145,6 +145,10 @@ void machine_restart(char *cmd)
ppc_md.restart(cmd);
 
smp_send_stop();
+
+   do_kernel_restart(cmd);
+   mdelay(1000);
+
machine_hang();
 }
 
-- 
2.5.5

[PATCH 3/3] powerpc: Convert fsl_rstcr_restart to a reset handler

2016-07-25 Thread Andrey Smirnov

Convert fsl_rstcr_restart into a function to be registered with
register_reset_handler() API and introduce fls_rstcr_restart_register()
function that can be added as an initcall that would do aforementioned
registration.

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/platforms/85xx/bsc913x_qds.c |  2 +-
 arch/powerpc/platforms/85xx/bsc913x_rdb.c |  2 +-
 arch/powerpc/platforms/85xx/c293pcie.c|  2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c |  2 +-
 arch/powerpc/platforms/85xx/ge_imp3a.c|  2 +-
 arch/powerpc/platforms/85xx/mpc8536_ds.c  |  2 +-
 arch/powerpc/platforms/85xx/mpc85xx_ads.c |  2 +-
 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 26 +++---
 arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  7 ---
 arch/powerpc/platforms/85xx/mpc85xx_mds.c |  7 ---
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 21 +++--
 arch/powerpc/platforms/85xx/mvme2500.c|  2 +-
 arch/powerpc/platforms/85xx/p1010rdb.c|  2 +-
 arch/powerpc/platforms/85xx/p1022_ds.c|  2 +-
 arch/powerpc/platforms/85xx/p1022_rdk.c   |  3 ++-
 arch/powerpc/platforms/85xx/p1023_rdb.c   |  2 +-
 arch/powerpc/platforms/85xx/ppa8548.c |  2 +-
 arch/powerpc/platforms/85xx/qemu_e500.c   |  2 +-
 arch/powerpc/platforms/85xx/sbc8548.c |  2 +-
 arch/powerpc/platforms/85xx/socrates.c|  2 +-
 arch/powerpc/platforms/85xx/stx_gp3.c |  2 +-
 arch/powerpc/platforms/85xx/tqm85xx.c |  2 +-
 arch/powerpc/platforms/85xx/twr_p102x.c   |  2 +-
 arch/powerpc/platforms/85xx/xes_mpc85xx.c |  7 ---
 arch/powerpc/platforms/86xx/gef_ppc9a.c   |  2 +-
 arch/powerpc/platforms/86xx/gef_sbc310.c  |  2 +-
 arch/powerpc/platforms/86xx/gef_sbc610.c  |  2 +-
 arch/powerpc/platforms/86xx/mpc8610_hpcd.c|  2 +-
 arch/powerpc/platforms/86xx/mpc86xx_hpcn.c|  2 +-
 arch/powerpc/platforms/86xx/sbc8641d.c|  2 +-
 arch/powerpc/sysdev/fsl_soc.c | 22 +-
 arch/powerpc/sysdev/fsl_soc.h |  2 +-
 32 files changed, 86 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/bsc913x_qds.c 
b/arch/powerpc/platforms/85xx/bsc913x_qds.c
index 07dd6ae..14ea7a0 100644
--- a/arch/powerpc/platforms/85xx/bsc913x_qds.c
+++ b/arch/powerpc/platforms/85xx/bsc913x_qds.c
@@ -53,6 +53,7 @@ static void __init bsc913x_qds_setup_arch(void)
 }
 
 machine_arch_initcall(bsc9132_qds, mpc85xx_common_publish_devices);
+machine_arch_initcall(bsc9133_qds, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -72,7 +73,6 @@ define_machine(bsc9132_qds) {
.pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
 #endif
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/bsc913x_rdb.c 
b/arch/powerpc/platforms/85xx/bsc913x_rdb.c
index e48f671..cd4e717 100644
--- a/arch/powerpc/platforms/85xx/bsc913x_rdb.c
+++ b/arch/powerpc/platforms/85xx/bsc913x_rdb.c
@@ -43,6 +43,7 @@ static void __init bsc913x_rdb_setup_arch(void)
 }
 
 machine_device_initcall(bsc9131_rdb, mpc85xx_common_publish_devices);
+machine_arch_initcall(bsc9131_rdb, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -59,7 +60,6 @@ define_machine(bsc9131_rdb) {
.setup_arch = bsc913x_rdb_setup_arch,
.init_IRQ   = bsc913x_rdb_pic_init,
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/c293pcie.c 
b/arch/powerpc/platforms/85xx/c293pcie.c
index 3b9e3f0..fbd63f9 100644
--- a/arch/powerpc/platforms/85xx/c293pcie.c
+++ b/arch/powerpc/platforms/85xx/c293pcie.c
@@ -48,6 +48,7 @@ static void __init c293_pcie_setup_arch(void)
 }
 
 machine_arch_initcall(c293_pcie, mpc85xx_common_publish_devices);
+machine_arch_initcall(c293_pcie, fsl_rstcr_restart_register);
 
 /*
  * Called very early, device-tree isn't unflattened
@@ -65,7 +66,6 @@ define_machine(c293_pcie) {
.setup_arch = c293_pcie_setup_arch,
.init_IRQ   = c293_pcie_pic_init,
.get_irq= mpic_get_irq,
-   .restart= fsl_rstcr_restart,
.calibrate_decr = generic_calibrate_decr,
.progress   = udbg_progress,
 };
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
index 3a6a84f..297379b 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -225,7 +225,6 @@ define_machine(corenet_generic) {
 #else
.get_irq

[PATCH 1/3] powerpc: Factor out common code in setup-common.c

2016-07-25 Thread Andrey Smirnov

Factor out a small bit of common code in machine_restart(),
machine_power_off() and machine_halt().

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/kernel/setup-common.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 714b4ba..5cd3283 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -130,15 +130,22 @@ void machine_shutdown(void)
ppc_md.machine_shutdown();
 }
 
+static void machine_hang(void)
+{
+   pr_emerg("System Halted, OK to turn off power\n");
+   local_irq_disable();
+   while (1)
+   ;
+}
+
 void machine_restart(char *cmd)
 {
machine_shutdown();
if (ppc_md.restart)
ppc_md.restart(cmd);
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 
 void machine_power_off(void)
@@ -146,10 +153,9 @@ void machine_power_off(void)
machine_shutdown();
if (pm_power_off)
pm_power_off();
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 /* Used by the G5 thermal driver */
 EXPORT_SYMBOL_GPL(machine_power_off);
@@ -162,10 +168,9 @@ void machine_halt(void)
machine_shutdown();
if (ppc_md.halt)
ppc_md.halt();
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 
 
-- 
2.5.5

[PATCH 1/3] powerpc: Factor out common code in setup-common.c

2016-07-25 Thread Andrey Smirnov

Factor out a small bit of common code in machine_restart(),
machine_power_off() and machine_halt().

Signed-off-by: Andrey Smirnov 
---
 arch/powerpc/kernel/setup-common.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 714b4ba..5cd3283 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -130,15 +130,22 @@ void machine_shutdown(void)
ppc_md.machine_shutdown();
 }
 
+static void machine_hang(void)
+{
+   pr_emerg("System Halted, OK to turn off power\n");
+   local_irq_disable();
+   while (1)
+   ;
+}
+
 void machine_restart(char *cmd)
 {
machine_shutdown();
if (ppc_md.restart)
ppc_md.restart(cmd);
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 
 void machine_power_off(void)
@@ -146,10 +153,9 @@ void machine_power_off(void)
machine_shutdown();
if (pm_power_off)
pm_power_off();
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 /* Used by the G5 thermal driver */
 EXPORT_SYMBOL_GPL(machine_power_off);
@@ -162,10 +168,9 @@ void machine_halt(void)
machine_shutdown();
if (ppc_md.halt)
ppc_md.halt();
+
smp_send_stop();
-   printk(KERN_EMERG "System Halted, OK to turn off power\n");
-   local_irq_disable();
-   while (1) ;
+   machine_hang();
 }
 
 
-- 
2.5.5

Re: [PATCH v2 02/10] userns: Add per user namespace sysctls.

2016-07-25 Thread Eric W. Biederman

David Miller  writes:

> From: ebied...@xmission.com (Eric W. Biederman)
> Date: Mon, 25 Jul 2016 19:44:50 -0500
>
>> User namespaces have enabled unprivileged users access to a lot more
>> data structures and so to catch programs that go crazy we need a lot
>> more limits.  I believe some of those limits make sense per namespace.
>> As it is easy in some cases to say any more than Y number of those
>> per namespace is excessive.   For example a limit of 1,000,000 ipv4
>> routes per network namespaces is a sanity check as there are
>> currently 621,649 ipv4 prefixes advertized in bgp.
>
> When we give a new namespace to unprivileged users, we honestly should
> make the sysctl settings we give to them become "limits".  They can
> further constrain the sysctl settings but may not raise them.

I won't disagree.  I was thinking in terms of global setting that
hold the limits for per namespace counters.  As we are talking sanity
check limits.

Perhaps we could get sophisticated and do something more but the simpler
we can make things and get the job done the better.

Eric

Re: [PATCH v2 02/10] userns: Add per user namespace sysctls.

2016-07-25 Thread Eric W. Biederman

David Miller  writes:

> From: ebied...@xmission.com (Eric W. Biederman)
> Date: Mon, 25 Jul 2016 19:44:50 -0500
>
>> User namespaces have enabled unprivileged users access to a lot more
>> data structures and so to catch programs that go crazy we need a lot
>> more limits.  I believe some of those limits make sense per namespace.
>> As it is easy in some cases to say any more than Y number of those
>> per namespace is excessive.   For example a limit of 1,000,000 ipv4
>> routes per network namespaces is a sanity check as there are
>> currently 621,649 ipv4 prefixes advertized in bgp.
>
> When we give a new namespace to unprivileged users, we honestly should
> make the sysctl settings we give to them become "limits".  They can
> further constrain the sysctl settings but may not raise them.

I won't disagree.  I was thinking in terms of global setting that
hold the limits for per namespace counters.  As we are talking sanity
check limits.

Perhaps we could get sophisticated and do something more but the simpler
we can make things and get the job done the better.

Eric

[PATCH 3/3] mm/duet: framework code

2016-07-25 Thread George Amvrosiadis

The Duet framework code:

- bittree.c: red-black bitmap tree that keeps track of items of interest
- debug.c: functions used to print information used to debug Duet
- hash.c: implementation of the global hash table where page events are stored
  for all tasks
- hook.c: the function invoked by the page cache hooks when Duet is online
- init.c: routines used to bring Duet online or offline
- path.c: routines performing resolution of UUIDs to paths using d_path
- task.c: implementation of Duet task fd operations

Signed-off-by: George Amvrosiadis 
---
 init/Kconfig  |   2 +
 mm/Makefile   |   1 +
 mm/duet/Kconfig   |  31 +++
 mm/duet/Makefile  |   7 +
 mm/duet/bittree.c | 537 +
 mm/duet/common.h  | 211 
 mm/duet/debug.c   |  98 +
 mm/duet/hash.c| 315 +
 mm/duet/hook.c|  81 
 mm/duet/init.c| 172 
 mm/duet/path.c| 184 +
 mm/duet/syscall.h |  61 ++
 mm/duet/task.c| 584 ++
 13 files changed, 2284 insertions(+)
 create mode 100644 mm/duet/Kconfig
 create mode 100644 mm/duet/Makefile
 create mode 100644 mm/duet/bittree.c
 create mode 100644 mm/duet/common.h
 create mode 100644 mm/duet/debug.c
 create mode 100644 mm/duet/hash.c
 create mode 100644 mm/duet/hook.c
 create mode 100644 mm/duet/init.c
 create mode 100644 mm/duet/path.c
 create mode 100644 mm/duet/syscall.h
 create mode 100644 mm/duet/task.c

diff --git a/init/Kconfig b/init/Kconfig
index c02d897..6f94b5a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -294,6 +294,8 @@ config USELIB
  earlier, you may need to enable this syscall.  Current systems
  running glibc can safely disable this.
 
+source mm/duet/Kconfig
+
 config AUDIT
bool "Auditing support"
depends on NET
diff --git a/mm/Makefile b/mm/Makefile
index 78c6f7d..074c15f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -99,3 +99,4 @@ obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
+obj-$(CONFIG_DUET) += duet/
diff --git a/mm/duet/Kconfig b/mm/duet/Kconfig
new file mode 100644
index 000..2f3a0c5
--- /dev/null
+++ b/mm/duet/Kconfig
@@ -0,0 +1,31 @@
+config DUET
+   bool "Duet framework support"
+
+   help
+ Duet is a framework aiming to reduce the IO footprint of analytics
+ and maintenance work. By exposing page cache events to these tasks,
+ it allows them to adapt their data processing order, in order to
+ benefit from data available in the page cache. Duet's operation is
+ based on hooks into the page cache.
+
+ To compile support for Duet, say Y.
+
+config DUET_STATS
+   bool "Duet statistics collection"
+   depends on DUET
+   help
+ This option enables support for the collection of statistics on the
+ operation of Duet. It will print information about the data structures
+ used internally, and profiling information about the framework.
+
+ If unsure, say N.
+
+config DUET_DEBUG
+   bool "Duet debugging support"
+   depends on DUET
+   help
+ Enable runtime debugging support for the Duet framework. This may
+ enable additional and expensive checks with negative impact on
+ performance.
+
+ To compile debugging support for Duet, say Y. If unsure, say N.
diff --git a/mm/duet/Makefile b/mm/duet/Makefile
new file mode 100644
index 000..c0c9e11
--- /dev/null
+++ b/mm/duet/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the linux Duet framework.
+#
+
+obj-$(CONFIG_DUET) += duet.o
+
+duet-y := init.o hash.o hook.o task.o bittree.o path.o debug.o
diff --git a/mm/duet/bittree.c b/mm/duet/bittree.c
new file mode 100644
index 000..3b20c35
--- /dev/null
+++ b/mm/duet/bittree.c
@@ -0,0 +1,537 @@
+/*
+ * Copyright (C) 2016 George Amvrosiadis.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include "common.h"
+
+#define BMAP_READ  0x01/* Read bmaps (overrides other flags) */
+#define BMAP_CHECK 0x02/* Check given bmap value expression */
+   /* Sets bmaps to match expression if not set */
+
+/* Bmap expressions can be formed using the following flags: */
+#define BMAP_DONE_SET  0x04/* Set done bmap values */
+#define BMAP_DONE_RST  0x08/* Reset done bmap values */
+#define

[PATCH 1/3] mm: support for duet hooks

2016-07-25 Thread George Amvrosiadis

Adds the Duet hooks in the page cache. In filemap.c, two hooks are added at the
time of addition and removal of a page descriptor. In page-flags.h, two more
hooks are added to track page dirtying and flushing.

The hooks are inactive while Duet is offline.

Signed-off-by: George Amvrosiadis 
---
 include/linux/duet.h   | 43 +
 include/linux/page-flags.h | 53 ++
 mm/filemap.c   | 11 ++
 3 files changed, 107 insertions(+)
 create mode 100644 include/linux/duet.h

diff --git a/include/linux/duet.h b/include/linux/duet.h
new file mode 100644
index 000..80491e2
--- /dev/null
+++ b/include/linux/duet.h
@@ -0,0 +1,43 @@
+/*
+ * Defs necessary for Duet hooks
+ *
+ * Author: George Amvrosiadis 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef _DUET_H
+#define _DUET_H
+
+/*
+ * Duet hooks into the page cache to monitor four types of events:
+ *   ADDED:a page __descriptor__ was inserted into the page cache
+ *   REMOVED:  a page __describptor__ was removed from the page cache
+ *   DIRTY:page's dirty bit was set
+ *   FLUSHED:  page's dirty bit was cleared
+ */
+#define DUET_PAGE_ADDED0x0001
+#define DUET_PAGE_REMOVED  0x0002
+#define DUET_PAGE_DIRTY0x0004
+#define DUET_PAGE_FLUSHED  0x0008
+
+#define DUET_HOOK(funp, evt, data) \
+   do { \
+   rcu_read_lock(); \
+   funp = rcu_dereference(duet_hook_fp); \
+   if (funp) \
+   funp(evt, (void *)data); \
+   rcu_read_unlock(); \
+   } while (0)
+
+/* Hook function pointer initialized by the Duet framework */
+typedef void (duet_hook_t) (__u16, void *);
+extern duet_hook_t *duet_hook_fp;
+
+#endif /* _DUET_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a3244..53be4a0 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -12,6 +12,9 @@
 #include 
 #include 
 #endif /* !__GENERATING_BOUNDS_H */
+#ifdef CONFIG_DUET
+#include 
+#endif /* CONFIG_DUET */
 
 /*
  * Various page->flags bits:
@@ -254,8 +257,58 @@ PAGEFLAG(Error, error, PF_NO_COMPOUND) 
TESTCLEARFLAG(Error, error, PF_NO_COMPOUN
 PAGEFLAG(Referenced, referenced, PF_HEAD)
TESTCLEARFLAG(Referenced, referenced, PF_HEAD)
__SETPAGEFLAG(Referenced, referenced, PF_HEAD)
+#ifdef CONFIG_DUET
+TESTPAGEFLAG(Dirty, dirty, PF_HEAD)
+
+static inline void SetPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (!test_and_set_bit(PG_dirty, >flags))
+   DUET_HOOK(dhfp, DUET_PAGE_DIRTY, page);
+}
+
+static inline void __ClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (__test_and_clear_bit(PG_dirty, >flags))
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+}
+
+static inline void ClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (test_and_clear_bit(PG_dirty, >flags))
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+}
+
+static inline int TestSetPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (!test_and_set_bit(PG_dirty, >flags)) {
+   DUET_HOOK(dhfp, DUET_PAGE_DIRTY, page);
+   return 0;
+   }
+   return 1;
+}
+
+static inline int TestClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (test_and_clear_bit(PG_dirty, >flags)) {
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+   return 1;
+   }
+   return 0;
+}
+#else
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
+#endif /* CONFIG_DUET */
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
diff --git a/mm/filemap.c b/mm/filemap.c
index 20f3b1f..f06ebc0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -166,6 +166,11 @@ static void page_cache_tree_delete(struct address_space 
*mapping,
 void __delete_from_page_cache(struct page *page, void *shadow)
 {
struct address_space *mapping = page->mapping;
+#ifdef CONFIG_DUET
+   duet_hook_t *dhfp = NULL;
+
+   DUET_HOOK(dhfp, DUET_PAGE_REMOVED, page);
+#endif /* CONFIG_DUET */
 
trace_mm_filemap_delete_from_page_cache(page);
/*
@@ -628,6 +633,9 @@ static int __add_to_page_cache_locked(struct page *page,
int huge =

[PATCH 3/3] mm/duet: framework code

2016-07-25 Thread George Amvrosiadis

The Duet framework code:

- bittree.c: red-black bitmap tree that keeps track of items of interest
- debug.c: functions used to print information used to debug Duet
- hash.c: implementation of the global hash table where page events are stored
  for all tasks
- hook.c: the function invoked by the page cache hooks when Duet is online
- init.c: routines used to bring Duet online or offline
- path.c: routines performing resolution of UUIDs to paths using d_path
- task.c: implementation of Duet task fd operations

Signed-off-by: George Amvrosiadis 
---
 init/Kconfig  |   2 +
 mm/Makefile   |   1 +
 mm/duet/Kconfig   |  31 +++
 mm/duet/Makefile  |   7 +
 mm/duet/bittree.c | 537 +
 mm/duet/common.h  | 211 
 mm/duet/debug.c   |  98 +
 mm/duet/hash.c| 315 +
 mm/duet/hook.c|  81 
 mm/duet/init.c| 172 
 mm/duet/path.c| 184 +
 mm/duet/syscall.h |  61 ++
 mm/duet/task.c| 584 ++
 13 files changed, 2284 insertions(+)
 create mode 100644 mm/duet/Kconfig
 create mode 100644 mm/duet/Makefile
 create mode 100644 mm/duet/bittree.c
 create mode 100644 mm/duet/common.h
 create mode 100644 mm/duet/debug.c
 create mode 100644 mm/duet/hash.c
 create mode 100644 mm/duet/hook.c
 create mode 100644 mm/duet/init.c
 create mode 100644 mm/duet/path.c
 create mode 100644 mm/duet/syscall.h
 create mode 100644 mm/duet/task.c

diff --git a/init/Kconfig b/init/Kconfig
index c02d897..6f94b5a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -294,6 +294,8 @@ config USELIB
  earlier, you may need to enable this syscall.  Current systems
  running glibc can safely disable this.
 
+source mm/duet/Kconfig
+
 config AUDIT
bool "Auditing support"
depends on NET
diff --git a/mm/Makefile b/mm/Makefile
index 78c6f7d..074c15f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -99,3 +99,4 @@ obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
+obj-$(CONFIG_DUET) += duet/
diff --git a/mm/duet/Kconfig b/mm/duet/Kconfig
new file mode 100644
index 000..2f3a0c5
--- /dev/null
+++ b/mm/duet/Kconfig
@@ -0,0 +1,31 @@
+config DUET
+   bool "Duet framework support"
+
+   help
+ Duet is a framework aiming to reduce the IO footprint of analytics
+ and maintenance work. By exposing page cache events to these tasks,
+ it allows them to adapt their data processing order, in order to
+ benefit from data available in the page cache. Duet's operation is
+ based on hooks into the page cache.
+
+ To compile support for Duet, say Y.
+
+config DUET_STATS
+   bool "Duet statistics collection"
+   depends on DUET
+   help
+ This option enables support for the collection of statistics on the
+ operation of Duet. It will print information about the data structures
+ used internally, and profiling information about the framework.
+
+ If unsure, say N.
+
+config DUET_DEBUG
+   bool "Duet debugging support"
+   depends on DUET
+   help
+ Enable runtime debugging support for the Duet framework. This may
+ enable additional and expensive checks with negative impact on
+ performance.
+
+ To compile debugging support for Duet, say Y. If unsure, say N.
diff --git a/mm/duet/Makefile b/mm/duet/Makefile
new file mode 100644
index 000..c0c9e11
--- /dev/null
+++ b/mm/duet/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the linux Duet framework.
+#
+
+obj-$(CONFIG_DUET) += duet.o
+
+duet-y := init.o hash.o hook.o task.o bittree.o path.o debug.o
diff --git a/mm/duet/bittree.c b/mm/duet/bittree.c
new file mode 100644
index 000..3b20c35
--- /dev/null
+++ b/mm/duet/bittree.c
@@ -0,0 +1,537 @@
+/*
+ * Copyright (C) 2016 George Amvrosiadis.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include "common.h"
+
+#define BMAP_READ  0x01/* Read bmaps (overrides other flags) */
+#define BMAP_CHECK 0x02/* Check given bmap value expression */
+   /* Sets bmaps to match expression if not set */
+
+/* Bmap expressions can be formed using the following flags: */
+#define BMAP_DONE_SET  0x04/* Set done bmap values */
+#define BMAP_DONE_RST  0x08/* Reset done bmap values */
+#define BMAP_RELV_SET  0x10/* Set

[PATCH 1/3] mm: support for duet hooks

2016-07-25 Thread George Amvrosiadis

Adds the Duet hooks in the page cache. In filemap.c, two hooks are added at the
time of addition and removal of a page descriptor. In page-flags.h, two more
hooks are added to track page dirtying and flushing.

The hooks are inactive while Duet is offline.

Signed-off-by: George Amvrosiadis 
---
 include/linux/duet.h   | 43 +
 include/linux/page-flags.h | 53 ++
 mm/filemap.c   | 11 ++
 3 files changed, 107 insertions(+)
 create mode 100644 include/linux/duet.h

diff --git a/include/linux/duet.h b/include/linux/duet.h
new file mode 100644
index 000..80491e2
--- /dev/null
+++ b/include/linux/duet.h
@@ -0,0 +1,43 @@
+/*
+ * Defs necessary for Duet hooks
+ *
+ * Author: George Amvrosiadis 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef _DUET_H
+#define _DUET_H
+
+/*
+ * Duet hooks into the page cache to monitor four types of events:
+ *   ADDED:a page __descriptor__ was inserted into the page cache
+ *   REMOVED:  a page __describptor__ was removed from the page cache
+ *   DIRTY:page's dirty bit was set
+ *   FLUSHED:  page's dirty bit was cleared
+ */
+#define DUET_PAGE_ADDED0x0001
+#define DUET_PAGE_REMOVED  0x0002
+#define DUET_PAGE_DIRTY0x0004
+#define DUET_PAGE_FLUSHED  0x0008
+
+#define DUET_HOOK(funp, evt, data) \
+   do { \
+   rcu_read_lock(); \
+   funp = rcu_dereference(duet_hook_fp); \
+   if (funp) \
+   funp(evt, (void *)data); \
+   rcu_read_unlock(); \
+   } while (0)
+
+/* Hook function pointer initialized by the Duet framework */
+typedef void (duet_hook_t) (__u16, void *);
+extern duet_hook_t *duet_hook_fp;
+
+#endif /* _DUET_H */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a3244..53be4a0 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -12,6 +12,9 @@
 #include 
 #include 
 #endif /* !__GENERATING_BOUNDS_H */
+#ifdef CONFIG_DUET
+#include 
+#endif /* CONFIG_DUET */
 
 /*
  * Various page->flags bits:
@@ -254,8 +257,58 @@ PAGEFLAG(Error, error, PF_NO_COMPOUND) 
TESTCLEARFLAG(Error, error, PF_NO_COMPOUN
 PAGEFLAG(Referenced, referenced, PF_HEAD)
TESTCLEARFLAG(Referenced, referenced, PF_HEAD)
__SETPAGEFLAG(Referenced, referenced, PF_HEAD)
+#ifdef CONFIG_DUET
+TESTPAGEFLAG(Dirty, dirty, PF_HEAD)
+
+static inline void SetPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (!test_and_set_bit(PG_dirty, >flags))
+   DUET_HOOK(dhfp, DUET_PAGE_DIRTY, page);
+}
+
+static inline void __ClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (__test_and_clear_bit(PG_dirty, >flags))
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+}
+
+static inline void ClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (test_and_clear_bit(PG_dirty, >flags))
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+}
+
+static inline int TestSetPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (!test_and_set_bit(PG_dirty, >flags)) {
+   DUET_HOOK(dhfp, DUET_PAGE_DIRTY, page);
+   return 0;
+   }
+   return 1;
+}
+
+static inline int TestClearPageDirty(struct page *page)
+{
+   duet_hook_t *dhfp = NULL;
+
+   if (test_and_clear_bit(PG_dirty, >flags)) {
+   DUET_HOOK(dhfp, DUET_PAGE_FLUSHED, page);
+   return 1;
+   }
+   return 0;
+}
+#else
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
+#endif /* CONFIG_DUET */
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
diff --git a/mm/filemap.c b/mm/filemap.c
index 20f3b1f..f06ebc0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -166,6 +166,11 @@ static void page_cache_tree_delete(struct address_space 
*mapping,
 void __delete_from_page_cache(struct page *page, void *shadow)
 {
struct address_space *mapping = page->mapping;
+#ifdef CONFIG_DUET
+   duet_hook_t *dhfp = NULL;
+
+   DUET_HOOK(dhfp, DUET_PAGE_REMOVED, page);
+#endif /* CONFIG_DUET */
 
trace_mm_filemap_delete_from_page_cache(page);
/*
@@ -628,6 +633,9 @@ static int __add_to_page_cache_locked(struct page *page,
int huge = PageHuge(page);
struct mem_cgroup

[PATCH 2/3] mm/duet: syscall wiring

2016-07-25 Thread George Amvrosiadis

Usual syscall wiring for the four Duet syscalls.

Signed-off-by: George Amvrosiadis 
---
 arch/x86/entry/syscalls/syscall_32.tbl |  4 
 arch/x86/entry/syscalls/syscall_64.tbl |  4 
 include/linux/syscalls.h   |  8 
 include/uapi/asm-generic/unistd.h  | 12 +++-
 kernel/sys_ni.c|  6 ++
 5 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 4cddd17..f34ff94 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -386,3 +386,7 @@
 377i386copy_file_range sys_copy_file_range
 378i386preadv2 sys_preadv2 
compat_sys_preadv2
 379i386pwritev2sys_pwritev2
compat_sys_pwritev2
+380i386duet_status sys_duet_status
+381i386duet_init   sys_duet_init
+382i386duet_bmap   sys_duet_bmap
+383i386duet_get_path   sys_duet_get_path
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 555263e..d04efaa 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -335,6 +335,10 @@
 326common  copy_file_range sys_copy_file_range
 32764  preadv2 sys_preadv2
 32864  pwritev2sys_pwritev2
+329common  duet_status sys_duet_status
+330common  duet_init   sys_duet_init
+331common  duet_bmap   sys_duet_bmap
+332common  duet_get_path   sys_duet_get_path
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index d022390..da1049e 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -65,6 +65,8 @@ struct old_linux_dirent;
 struct perf_event_attr;
 struct file_handle;
 struct sigaltstack;
+struct duet_status_args;
+struct duet_uuid_arg;
 union bpf_attr;
 
 #include 
@@ -898,4 +900,10 @@ asmlinkage long sys_copy_file_range(int fd_in, loff_t 
__user *off_in,
 
 asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
 
+asmlinkage long sys_duet_status(u16 flags, struct duet_status_args __user 
*arg);
+asmlinkage long sys_duet_init(const char __user *taskname, u32 regmask,
+ const char __user *pathname);
+asmlinkage long sys_duet_bmap(u16 flags, struct duet_uuid_arg __user *arg);
+asmlinkage long sys_duet_get_path(struct duet_uuid_arg __user *uarg,
+ char __user *pathbuf, int pathbufsize);
 #endif
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index a26415b..7c287c0 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -725,8 +725,18 @@ __SC_COMP(__NR_preadv2, sys_preadv2, compat_sys_preadv2)
 #define __NR_pwritev2 287
 __SC_COMP(__NR_pwritev2, sys_pwritev2, compat_sys_pwritev2)
 
+/* mm/duet/syscall.c */
+#define __NR_duet_status 288
+__SYSCALL(__NR_duet_status, sys_duet_status)
+#define __NR_duet_init 289
+__SYSCALL(__NR_duet_init, sys_duet_init)
+#define __NR_duet_bmap 290
+__SYSCALL(__NR_duet_bmap, sys_duet_bmap)
+#define __NR_duet_get_path 291
+__SYSCALL(__NR_duet_get_path, sys_duet_get_path)
+
 #undef __NR_syscalls
-#define __NR_syscalls 288
+#define __NR_syscalls 292
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 2c5e3a8..3d4c53a 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -176,6 +176,12 @@ cond_syscall(sys_capget);
 cond_syscall(sys_capset);
 cond_syscall(sys_copy_file_range);
 
+/* Duet syscall entries */
+cond_syscall(sys_duet_status);
+cond_syscall(sys_duet_init);
+cond_syscall(sys_duet_bmap);
+cond_syscall(sys_duet_get_path);
+
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);
 cond_syscall(sys_pciconfig_write);
-- 
2.7.4

[PATCH 0/3] new feature: monitoring page cache events

2016-07-25 Thread George Amvrosiadis

I'm attaching a patch set implementing a mechanism we call Duet, which allows
applications to monitor events at the page cache level: page additions,
removals, dirtying, and flushing. Using such events, applications can identify
and prioritize processing of cached data, thereby reducing their I/O footprint.

One user of these events are maintenance tasks that scan large amounts of data
(e.g., backup, defrag, scrubbing). Knowing what is currently cached allows them
to piggy-back on each other and other applications running in the system. I've
managed to run up to 3 such applications together (backup, scrubbing, defrag)
and have them finish their work with 1/3rd of the I/O by using Duet. In this
case, the task that traversed the data the fastest (scrubber) allowed the rest
of the tasks to piggyback on the data brought into the cache. I.e., a file that
was read to be backed up was also picked up by the scrubber and defrag process.

I've found adapting applications to be straight-forward. Although I don't
include examples in this patch set, I've adapted btrfs scrubbing, btrfs send
(backup), btrfs defrag, rsync, and f2fs garbage collection in a few hundred
lines of code each (basically just had to add an event handler and wire it up
to the task's processing loop). You can read more about this in our full paper:
http://dl.acm.org/citation.cfm?id=2815424. I'd be happy to generate subsequent
patch sets for individual tasks if there's interest in this one. We've also
used Duet to speed up Hadoop and Spark by taking into account cache residency
of HDFS blocks across the cluster, when scheduling tasks, by up to 54%
depending on overlap on the data processed:
https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/deslauriers


Syscall interface (and how it works): Duet uses hooks into the page cache (see
the "mm: support for duet hooks" patch). These hooks inform Duet of page events,
which are stored in a hash table. Only events that are of interest to running
tasks are stored, and only one copy of each event is stored for all interested
tasks. To register for events, the following syscalls are used (see the
"mm/duet: syscall wiring" patch for prototypes):

- sys_duet_init(char *taskname, u32 regmask, char *path): returns an fd that
  watches for events under PATH (e.g. '/home') and are also described in the
  REGMASK (e.g. DUET_PAGE_ADDED | DUET_PAGE_REMOVED). TASKNAME is an optional,
  human-readable name for the task.

- sys_duet_bmap(u16 flags, struct duet_uuid_arg *uuid): Duet allows applications
  to track processed items on an internal bitmap (which improves performance by
  being used to filter unnecessary events). The specified UUID is what read()
  returns on the fd created with sys_duet_init(), and uniquely identifies a
  file. FLAGS allow the bitmap to be set, reset, or have its state checked.

- sys_duet_get_path(struct duet_uuid_arg *uuid, char *buf, int bufsize):
  Applications running with Duet do not understand UUIDs, but pathnames. This
  syscall traverses the dentry cache and returns the corresponding path in BUF.

- sys_duet_status(u16 flags, struct duet_status_args *arg): Currently, the Duet
  framework can be turned on/off manually. This allows the admin to specify the
  number of max applications that will be registered concurrently, which allows
  us to size the internal hash table nodes appropriately (and limit performance
  or memory overhead). The syscall is also used for debugging purposes. I think
  this functionality should probably be exposed through ioctl()s to a device,
  and I'm open to suggestions on how to improve the current implementation.

The framework itself (a bit less than 2300 LoC) is currently placed under
mm/duet and the code is included in the "mm/duet: framework code" patch.


Application interface: Applications interface with Duet through a user library,
which is available at https://github.com/gamvrosi/duet-tools. In the same repo,
I have included a dummy_task application which provides an example of how Duet
can be used.


Changelog: The patches are based on Linus' v4.7 tag, and touch on the following
parts of the kernel:

- mm/filemap.c and include/linux/page-flags.h: hooks in the page cache to track
  page events on page addition, removal, dirtying, and flushing.

- arch/x86/*, include/linux/syscalls.h, kernel/sys_ni.h: wiring the 4 syscalls

- mm/duet/*: framework code



George Amvrosiadis (3):
  mm: support for duet hooks
  mm/duet: syscall wiring
  mm/duet: framework code

 arch/x86/entry/syscalls/syscall_32.tbl |   4 +
 arch/x86/entry/syscalls/syscall_64.tbl |   4 +
 include/linux/duet.h   |  43 +++
 include/linux/page-flags.h |  53 +++
 include/linux/syscalls.h   |   8 +
 include/uapi/asm-generic/unistd.h  |  12 +-
 init/Kconfig   |   2 +
 kernel/sys_ni.c|   6 +
 mm/Makefile|   1 +
 mm/duet/Kconfig

[PATCH 2/3] mm/duet: syscall wiring

2016-07-25 Thread George Amvrosiadis

Usual syscall wiring for the four Duet syscalls.

Signed-off-by: George Amvrosiadis 
---
 arch/x86/entry/syscalls/syscall_32.tbl |  4 
 arch/x86/entry/syscalls/syscall_64.tbl |  4 
 include/linux/syscalls.h   |  8 
 include/uapi/asm-generic/unistd.h  | 12 +++-
 kernel/sys_ni.c|  6 ++
 5 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 4cddd17..f34ff94 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -386,3 +386,7 @@
 377i386copy_file_range sys_copy_file_range
 378i386preadv2 sys_preadv2 
compat_sys_preadv2
 379i386pwritev2sys_pwritev2
compat_sys_pwritev2
+380i386duet_status sys_duet_status
+381i386duet_init   sys_duet_init
+382i386duet_bmap   sys_duet_bmap
+383i386duet_get_path   sys_duet_get_path
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 555263e..d04efaa 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -335,6 +335,10 @@
 326common  copy_file_range sys_copy_file_range
 32764  preadv2 sys_preadv2
 32864  pwritev2sys_pwritev2
+329common  duet_status sys_duet_status
+330common  duet_init   sys_duet_init
+331common  duet_bmap   sys_duet_bmap
+332common  duet_get_path   sys_duet_get_path
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index d022390..da1049e 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -65,6 +65,8 @@ struct old_linux_dirent;
 struct perf_event_attr;
 struct file_handle;
 struct sigaltstack;
+struct duet_status_args;
+struct duet_uuid_arg;
 union bpf_attr;
 
 #include 
@@ -898,4 +900,10 @@ asmlinkage long sys_copy_file_range(int fd_in, loff_t 
__user *off_in,
 
 asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
 
+asmlinkage long sys_duet_status(u16 flags, struct duet_status_args __user 
*arg);
+asmlinkage long sys_duet_init(const char __user *taskname, u32 regmask,
+ const char __user *pathname);
+asmlinkage long sys_duet_bmap(u16 flags, struct duet_uuid_arg __user *arg);
+asmlinkage long sys_duet_get_path(struct duet_uuid_arg __user *uarg,
+ char __user *pathbuf, int pathbufsize);
 #endif
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index a26415b..7c287c0 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -725,8 +725,18 @@ __SC_COMP(__NR_preadv2, sys_preadv2, compat_sys_preadv2)
 #define __NR_pwritev2 287
 __SC_COMP(__NR_pwritev2, sys_pwritev2, compat_sys_pwritev2)
 
+/* mm/duet/syscall.c */
+#define __NR_duet_status 288
+__SYSCALL(__NR_duet_status, sys_duet_status)
+#define __NR_duet_init 289
+__SYSCALL(__NR_duet_init, sys_duet_init)
+#define __NR_duet_bmap 290
+__SYSCALL(__NR_duet_bmap, sys_duet_bmap)
+#define __NR_duet_get_path 291
+__SYSCALL(__NR_duet_get_path, sys_duet_get_path)
+
 #undef __NR_syscalls
-#define __NR_syscalls 288
+#define __NR_syscalls 292
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 2c5e3a8..3d4c53a 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -176,6 +176,12 @@ cond_syscall(sys_capget);
 cond_syscall(sys_capset);
 cond_syscall(sys_copy_file_range);
 
+/* Duet syscall entries */
+cond_syscall(sys_duet_status);
+cond_syscall(sys_duet_init);
+cond_syscall(sys_duet_bmap);
+cond_syscall(sys_duet_get_path);
+
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);
 cond_syscall(sys_pciconfig_write);
-- 
2.7.4

[PATCH 0/3] new feature: monitoring page cache events

2016-07-25 Thread George Amvrosiadis

I'm attaching a patch set implementing a mechanism we call Duet, which allows
applications to monitor events at the page cache level: page additions,
removals, dirtying, and flushing. Using such events, applications can identify
and prioritize processing of cached data, thereby reducing their I/O footprint.

One user of these events are maintenance tasks that scan large amounts of data
(e.g., backup, defrag, scrubbing). Knowing what is currently cached allows them
to piggy-back on each other and other applications running in the system. I've
managed to run up to 3 such applications together (backup, scrubbing, defrag)
and have them finish their work with 1/3rd of the I/O by using Duet. In this
case, the task that traversed the data the fastest (scrubber) allowed the rest
of the tasks to piggyback on the data brought into the cache. I.e., a file that
was read to be backed up was also picked up by the scrubber and defrag process.

I've found adapting applications to be straight-forward. Although I don't
include examples in this patch set, I've adapted btrfs scrubbing, btrfs send
(backup), btrfs defrag, rsync, and f2fs garbage collection in a few hundred
lines of code each (basically just had to add an event handler and wire it up
to the task's processing loop). You can read more about this in our full paper:
http://dl.acm.org/citation.cfm?id=2815424. I'd be happy to generate subsequent
patch sets for individual tasks if there's interest in this one. We've also
used Duet to speed up Hadoop and Spark by taking into account cache residency
of HDFS blocks across the cluster, when scheduling tasks, by up to 54%
depending on overlap on the data processed:
https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/deslauriers


Syscall interface (and how it works): Duet uses hooks into the page cache (see
the "mm: support for duet hooks" patch). These hooks inform Duet of page events,
which are stored in a hash table. Only events that are of interest to running
tasks are stored, and only one copy of each event is stored for all interested
tasks. To register for events, the following syscalls are used (see the
"mm/duet: syscall wiring" patch for prototypes):

- sys_duet_init(char *taskname, u32 regmask, char *path): returns an fd that
  watches for events under PATH (e.g. '/home') and are also described in the
  REGMASK (e.g. DUET_PAGE_ADDED | DUET_PAGE_REMOVED). TASKNAME is an optional,
  human-readable name for the task.

- sys_duet_bmap(u16 flags, struct duet_uuid_arg *uuid): Duet allows applications
  to track processed items on an internal bitmap (which improves performance by
  being used to filter unnecessary events). The specified UUID is what read()
  returns on the fd created with sys_duet_init(), and uniquely identifies a
  file. FLAGS allow the bitmap to be set, reset, or have its state checked.

- sys_duet_get_path(struct duet_uuid_arg *uuid, char *buf, int bufsize):
  Applications running with Duet do not understand UUIDs, but pathnames. This
  syscall traverses the dentry cache and returns the corresponding path in BUF.

- sys_duet_status(u16 flags, struct duet_status_args *arg): Currently, the Duet
  framework can be turned on/off manually. This allows the admin to specify the
  number of max applications that will be registered concurrently, which allows
  us to size the internal hash table nodes appropriately (and limit performance
  or memory overhead). The syscall is also used for debugging purposes. I think
  this functionality should probably be exposed through ioctl()s to a device,
  and I'm open to suggestions on how to improve the current implementation.

The framework itself (a bit less than 2300 LoC) is currently placed under
mm/duet and the code is included in the "mm/duet: framework code" patch.


Application interface: Applications interface with Duet through a user library,
which is available at https://github.com/gamvrosi/duet-tools. In the same repo,
I have included a dummy_task application which provides an example of how Duet
can be used.


Changelog: The patches are based on Linus' v4.7 tag, and touch on the following
parts of the kernel:

- mm/filemap.c and include/linux/page-flags.h: hooks in the page cache to track
  page events on page addition, removal, dirtying, and flushing.

- arch/x86/*, include/linux/syscalls.h, kernel/sys_ni.h: wiring the 4 syscalls

- mm/duet/*: framework code



George Amvrosiadis (3):
  mm: support for duet hooks
  mm/duet: syscall wiring
  mm/duet: framework code

 arch/x86/entry/syscalls/syscall_32.tbl |   4 +
 arch/x86/entry/syscalls/syscall_64.tbl |   4 +
 include/linux/duet.h   |  43 +++
 include/linux/page-flags.h |  53 +++
 include/linux/syscalls.h   |   8 +
 include/uapi/asm-generic/unistd.h  |  12 +-
 init/Kconfig   |   2 +
 kernel/sys_ni.c|   6 +
 mm/Makefile|   1 +
 mm/duet/Kconfig

Re: [PATCH v2 3/3] x86/apic: Improved the setting of interrupt mode for bsp

2016-07-25 Thread Eric W. Biederman

Wei Jiangang  writes:

> If we specify the 'notsc' parameter for the dump-capture kernel,
> and then trigger a crash(panic) by using "ALT-SysRq-c" or
> "echo c > /proc/sysrq-trigger", the dump-capture kernel will
> hang in calibrate_delay_converge() and wait for jiffies changes.
> serial log as follows:
>
> tsc: Fast TSC calibration using PIT
> tsc: Detected 2099.947 MHz processor
> Calibrating delay loop...
>
> The reason for jiffies not changes is there's no timer interrupt
> passed to dump-capture kernel.
>
> In fact, once kernel panic occurs, the local APIC is disabled
> by lapic_shutdown() in reboot path.
> generly speaking, local APIC state can be initialized by BIOS
> after Power-Up or Reset, which doesn't apply to kdump case.
> so the kernel has to be responsible for initialize the interrupt
> mode properly according the latest status of APIC in bootup path.
>
> An MP operating system is booted under either PIC mode or
> virtual wire mode. Later, the operating system switches to
> symmetric I/O mode as it enters multiprocessor mode.
> Two kinds of virtual wire mode are defined in Intel MP spec:
> virtual wire mode via local APIC or via I/O APIC.
>
> Now we determine the mode of APIC only through a SMP BIOS(MP table).
> That's not enough. It's better to do further check if APIC works
> with effective interrupt mode, and then, do some proper setting.

Reading through the code let me pause a moment and say:
"Yowzers the interrupt initialization code has gotten hard to follow.  It
is now full of indirection with ill defined semantics."  pre_vector_init
indeed.

I will argue this is the wrong fix.

We really should not have to worry about getting the system functional
in virtual wire mode on a modern system.  And looking at the code
someone has done half the work and made it conditional under
acpi_gbl_reduced_hardware.

Now reduced hardware implies a bit more than we ware talking about but
if there is ACPI apic information we should not need to worry about
external interrupts and can just enable the apics.

In fact I think having MPtable information is enough for that.

So I think what needs to happens is for the apic initialization to get
an overhaul that makes apic initialization the happy path and the other
irq controllers the odd backwards compatibility path.  And when we
are done we never run in anything except full apic mode unless the
hardware doesn't support it.

I think that will leave things more robust as we don't need to setup
and then reset up the interrupts during boot.

Eric


> Signed-off-by: Cao jin 
> Signed-off-by: Wei Jiangang 
> ---
>  arch/x86/include/asm/io_apic.h |  5 
>  arch/x86/kernel/apic/apic.c| 60 
> +-
>  arch/x86/kernel/apic/io_apic.c | 28 
>  3 files changed, 92 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index 6cbf2cfb3f8a..a3257366bf7f 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -190,6 +190,7 @@ static inline unsigned int io_apic_read(unsigned int 
> apic, unsigned int reg)
>  }
>  
>  extern void setup_IO_APIC(void);
> +extern bool virt_wire_through_ioapic(void);
>  extern void enable_IO_APIC(void);
>  extern void disable_IO_APIC(void);
>  extern void setup_ioapic_dest(void);
> @@ -231,6 +232,10 @@ static inline void io_apic_init_mappings(void) { }
>  #define native_disable_io_apic   NULL
>  
>  static inline void setup_IO_APIC(void) { }
> +static inline bool virt_wire_through_ioapic(void)
> +{
> + return false;
> +}
>  static inline void enable_IO_APIC(void) { }
>  static inline void setup_ioapic_dest(void) { }
>  
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 8e25b9b2d351..a3939fb130cc 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1124,6 +1124,58 @@ void __init sync_Arb_IDs(void)
>  }
>  
>  /*
> + * Check APIC enable/disable flag
> + */
> +static bool check_apic_enabled(void)
> +{
> + unsigned int value;
> +
> + /*
> +  * If APIC is disabled globally (IA32_APIC_BASE[11] == 0)
> +  * the boot cpu hasn't X86_FEATURE_APIC,
> +  * and init_bsp_APIC() has already checked it before.
> +  * so no need to check global enable/disable flag here
> +  */
> +
> + /* Check the software enable/disable flag */
> + value = apic_read(APIC_SPIV);
> + if (!(value & APIC_SPIV_APIC_ENABLED))
> + return false;
> +
> + return true;
> +}
> +
> +/*
> + * Return false means the through-local-APIC virtual wire mode is inactive
> + */
> +static bool virt_wire_through_lapic(void)
> +{
> + unsigned int value;
> +
> + /*
> +  * The through-local-APIC virtual wire mode requests
> +  * local APIC to enable LINT0 for ExtINT delivery mode
> +  * and LINT1 for NMI

Re: [PATCH v2 3/3] x86/apic: Improved the setting of interrupt mode for bsp

2016-07-25 Thread Eric W. Biederman

Wei Jiangang  writes:

> If we specify the 'notsc' parameter for the dump-capture kernel,
> and then trigger a crash(panic) by using "ALT-SysRq-c" or
> "echo c > /proc/sysrq-trigger", the dump-capture kernel will
> hang in calibrate_delay_converge() and wait for jiffies changes.
> serial log as follows:
>
> tsc: Fast TSC calibration using PIT
> tsc: Detected 2099.947 MHz processor
> Calibrating delay loop...
>
> The reason for jiffies not changes is there's no timer interrupt
> passed to dump-capture kernel.
>
> In fact, once kernel panic occurs, the local APIC is disabled
> by lapic_shutdown() in reboot path.
> generly speaking, local APIC state can be initialized by BIOS
> after Power-Up or Reset, which doesn't apply to kdump case.
> so the kernel has to be responsible for initialize the interrupt
> mode properly according the latest status of APIC in bootup path.
>
> An MP operating system is booted under either PIC mode or
> virtual wire mode. Later, the operating system switches to
> symmetric I/O mode as it enters multiprocessor mode.
> Two kinds of virtual wire mode are defined in Intel MP spec:
> virtual wire mode via local APIC or via I/O APIC.
>
> Now we determine the mode of APIC only through a SMP BIOS(MP table).
> That's not enough. It's better to do further check if APIC works
> with effective interrupt mode, and then, do some proper setting.

Reading through the code let me pause a moment and say:
"Yowzers the interrupt initialization code has gotten hard to follow.  It
is now full of indirection with ill defined semantics."  pre_vector_init
indeed.

I will argue this is the wrong fix.

We really should not have to worry about getting the system functional
in virtual wire mode on a modern system.  And looking at the code
someone has done half the work and made it conditional under
acpi_gbl_reduced_hardware.

Now reduced hardware implies a bit more than we ware talking about but
if there is ACPI apic information we should not need to worry about
external interrupts and can just enable the apics.

In fact I think having MPtable information is enough for that.

So I think what needs to happens is for the apic initialization to get
an overhaul that makes apic initialization the happy path and the other
irq controllers the odd backwards compatibility path.  And when we
are done we never run in anything except full apic mode unless the
hardware doesn't support it.

I think that will leave things more robust as we don't need to setup
and then reset up the interrupts during boot.

Eric


> Signed-off-by: Cao jin 
> Signed-off-by: Wei Jiangang 
> ---
>  arch/x86/include/asm/io_apic.h |  5 
>  arch/x86/kernel/apic/apic.c| 60 
> +-
>  arch/x86/kernel/apic/io_apic.c | 28 
>  3 files changed, 92 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index 6cbf2cfb3f8a..a3257366bf7f 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -190,6 +190,7 @@ static inline unsigned int io_apic_read(unsigned int 
> apic, unsigned int reg)
>  }
>  
>  extern void setup_IO_APIC(void);
> +extern bool virt_wire_through_ioapic(void);
>  extern void enable_IO_APIC(void);
>  extern void disable_IO_APIC(void);
>  extern void setup_ioapic_dest(void);
> @@ -231,6 +232,10 @@ static inline void io_apic_init_mappings(void) { }
>  #define native_disable_io_apic   NULL
>  
>  static inline void setup_IO_APIC(void) { }
> +static inline bool virt_wire_through_ioapic(void)
> +{
> + return false;
> +}
>  static inline void enable_IO_APIC(void) { }
>  static inline void setup_ioapic_dest(void) { }
>  
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 8e25b9b2d351..a3939fb130cc 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1124,6 +1124,58 @@ void __init sync_Arb_IDs(void)
>  }
>  
>  /*
> + * Check APIC enable/disable flag
> + */
> +static bool check_apic_enabled(void)
> +{
> + unsigned int value;
> +
> + /*
> +  * If APIC is disabled globally (IA32_APIC_BASE[11] == 0)
> +  * the boot cpu hasn't X86_FEATURE_APIC,
> +  * and init_bsp_APIC() has already checked it before.
> +  * so no need to check global enable/disable flag here
> +  */
> +
> + /* Check the software enable/disable flag */
> + value = apic_read(APIC_SPIV);
> + if (!(value & APIC_SPIV_APIC_ENABLED))
> + return false;
> +
> + return true;
> +}
> +
> +/*
> + * Return false means the through-local-APIC virtual wire mode is inactive
> + */
> +static bool virt_wire_through_lapic(void)
> +{
> + unsigned int value;
> +
> + /*
> +  * The through-local-APIC virtual wire mode requests
> +  * local APIC to enable LINT0 for ExtINT delivery mode
> +  * and LINT1 for NMI delivery mode
> +  */
> + value = apic_read(APIC_LVT0);
> + if

linux-next: manual merge of the xen-tip tree with the tip tree

2016-07-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the xen-tip tree got a conflict in:

  arch/x86/xen/smp.c

between commit:

  4c9075835511 ("xen/x86: Move irq allocation from Xen smp_op.cpu_up()")

from the tip tree and commit:

  ad5475f9faf5 ("x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op")

from the xen-tip tree.

I fixed it up (I think - see below) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/xen/smp.c
index 09d5cc062dbe,0b4d04c8ab4d..
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@@ -486,7 -495,11 +493,7 @@@ static int xen_cpu_up(unsigned int cpu
  
xen_pmu_init(cpu);
  
-   rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL);
 -  rc = xen_smp_intr_init(cpu);
 -  if (rc)
 -  return rc;
 -
+   rc = HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL);
BUG_ON(rc);
  
while (cpu_report_state(cpu) != CPU_ONLINE)

linux-next: manual merge of the xen-tip tree with the tip tree

2016-07-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the xen-tip tree got a conflict in:

  arch/x86/xen/smp.c

between commit:

  4c9075835511 ("xen/x86: Move irq allocation from Xen smp_op.cpu_up()")

from the tip tree and commit:

  ad5475f9faf5 ("x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op")

from the xen-tip tree.

I fixed it up (I think - see below) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/xen/smp.c
index 09d5cc062dbe,0b4d04c8ab4d..
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@@ -486,7 -495,11 +493,7 @@@ static int xen_cpu_up(unsigned int cpu
  
xen_pmu_init(cpu);
  
-   rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL);
 -  rc = xen_smp_intr_init(cpu);
 -  if (rc)
 -  return rc;
 -
+   rc = HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL);
BUG_ON(rc);
  
while (cpu_report_state(cpu) != CPU_ONLINE)

linux-next: manual merge of the xen-tip tree with the tip tree

2016-07-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the xen-tip tree got a conflict in:

  arch/x86/xen/enlighten.c

between commit:

  4c9075835511 ("xen/x86: Move irq allocation from Xen smp_op.cpu_up()")

from the tip tree and commit:

  88e957d6e47f ("xen: introduce xen_vcpu_id mapping")

from the xen-tip tree.

I fixed it up (I think - see below) and can carry the fix as necessary.
This is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/xen/enlighten.c
index dc96f939af88,85ef4c0442e0..
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@@ -1803,49 -1823,21 +1824,53 @@@ static void __init init_hvm_pv_info(voi
xen_domain_type = XEN_HVM_DOMAIN;
  }
  
 -static int xen_hvm_cpu_notify(struct notifier_block *self, unsigned long 
action,
 -void *hcpu)
 +static int xen_cpu_notify(struct notifier_block *self, unsigned long action,
 +void *hcpu)
  {
int cpu = (long)hcpu;
 +  int rc;
 +
switch (action) {
case CPU_UP_PREPARE:
 -  if (cpu_acpi_id(cpu) != U32_MAX)
 -  per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
 -  else
 -  per_cpu(xen_vcpu_id, cpu) = cpu;
 -  xen_vcpu_setup(cpu);
 -  if (xen_have_vector_callback) {
 -  if (xen_feature(XENFEAT_hvm_safe_pvclock))
 -  xen_setup_timer(cpu);
 +  if (xen_hvm_domain()) {
 +  /*
 +   * This can happen if CPU was offlined earlier and
 +   * offlining timed out in common_cpu_die().
 +   */
 +  if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
 +  xen_smp_intr_free(cpu);
 +  xen_uninit_lock_cpu(cpu);
 +  }
 +
++  if (cpu_acpi_id(cpu) != U32_MAX)
++  per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
++  else
++  per_cpu(xen_vcpu_id, cpu) = cpu;
 +  xen_vcpu_setup(cpu);
}
 +
 +  if (xen_pv_domain() ||
 +  (xen_have_vector_callback &&
 +   xen_feature(XENFEAT_hvm_safe_pvclock)))
 +  xen_setup_timer(cpu);
 +
 +  rc = xen_smp_intr_init(cpu);
 +  if (rc) {
 +  WARN(1, "xen_smp_intr_init() for CPU %d failed: %d\n",
 +   cpu, rc);
 +  return NOTIFY_BAD;
 +  }
 +
 +  break;
 +  case CPU_ONLINE:
 +  xen_init_lock_cpu(cpu);
 +  break;
 +  case CPU_UP_CANCELED:
 +  xen_smp_intr_free(cpu);
 +  if (xen_pv_domain() ||
 +  (xen_have_vector_callback &&
 +   xen_feature(XENFEAT_hvm_safe_pvclock)))
 +  xen_teardown_timer(cpu);
break;
default:
break;

linux-next: manual merge of the xen-tip tree with the tip tree

2016-07-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the xen-tip tree got a conflict in:

  arch/x86/xen/enlighten.c

between commit:

  4c9075835511 ("xen/x86: Move irq allocation from Xen smp_op.cpu_up()")

from the tip tree and commit:

  88e957d6e47f ("xen: introduce xen_vcpu_id mapping")

from the xen-tip tree.

I fixed it up (I think - see below) and can carry the fix as necessary.
This is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/xen/enlighten.c
index dc96f939af88,85ef4c0442e0..
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@@ -1803,49 -1823,21 +1824,53 @@@ static void __init init_hvm_pv_info(voi
xen_domain_type = XEN_HVM_DOMAIN;
  }
  
 -static int xen_hvm_cpu_notify(struct notifier_block *self, unsigned long 
action,
 -void *hcpu)
 +static int xen_cpu_notify(struct notifier_block *self, unsigned long action,
 +void *hcpu)
  {
int cpu = (long)hcpu;
 +  int rc;
 +
switch (action) {
case CPU_UP_PREPARE:
 -  if (cpu_acpi_id(cpu) != U32_MAX)
 -  per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
 -  else
 -  per_cpu(xen_vcpu_id, cpu) = cpu;
 -  xen_vcpu_setup(cpu);
 -  if (xen_have_vector_callback) {
 -  if (xen_feature(XENFEAT_hvm_safe_pvclock))
 -  xen_setup_timer(cpu);
 +  if (xen_hvm_domain()) {
 +  /*
 +   * This can happen if CPU was offlined earlier and
 +   * offlining timed out in common_cpu_die().
 +   */
 +  if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
 +  xen_smp_intr_free(cpu);
 +  xen_uninit_lock_cpu(cpu);
 +  }
 +
++  if (cpu_acpi_id(cpu) != U32_MAX)
++  per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
++  else
++  per_cpu(xen_vcpu_id, cpu) = cpu;
 +  xen_vcpu_setup(cpu);
}
 +
 +  if (xen_pv_domain() ||
 +  (xen_have_vector_callback &&
 +   xen_feature(XENFEAT_hvm_safe_pvclock)))
 +  xen_setup_timer(cpu);
 +
 +  rc = xen_smp_intr_init(cpu);
 +  if (rc) {
 +  WARN(1, "xen_smp_intr_init() for CPU %d failed: %d\n",
 +   cpu, rc);
 +  return NOTIFY_BAD;
 +  }
 +
 +  break;
 +  case CPU_ONLINE:
 +  xen_init_lock_cpu(cpu);
 +  break;
 +  case CPU_UP_CANCELED:
 +  xen_smp_intr_free(cpu);
 +  if (xen_pv_domain() ||
 +  (xen_have_vector_callback &&
 +   xen_feature(XENFEAT_hvm_safe_pvclock)))
 +  xen_teardown_timer(cpu);
break;
default:
break;

Re: [PATCH v9 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-25 Thread Dou Liyang




在 2016年07月26日 07:20, Andrew Morton 写道:

On Mon, 25 Jul 2016 16:35:42 +0800 Dou Liyang  wrote:


[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.


Plan B is to hunt down and fix up all the workqueue structures at
hotplug-time.  Has that option been evaluated?



Yes, the option has been evaluate in this patch:
http://www.gossamer-threads.com/lists/linux/kernel/2116748



Your fix is x86-only and this bug presumably affects other
architectures, yes?I think a "Plan B" would fix all architectures?



Yes, the bug may presumably affect few architectures which support CPU 
hotplug and NUMA.


We have sent the "Plan B" in our community and got a lot of advice and 
ideas. Based on these suggestions, We carefully balance that two plan. 
Then we choice the first.




Thirdly, what is the merge path for these patches?  Is an x86
or ACPI maintainer working with you on them?


Yes, we get a lot of guidance and help from RJ who is an ACPI maintainer.


Thanks,

Dou

Re: [PATCH v9 0/7] Make cpuid <-> nodeid mapping persistent

2016-07-25 Thread Dou Liyang




在 2016年07月26日 07:20, Andrew Morton 写道:

On Mon, 25 Jul 2016 16:35:42 +0800 Dou Liyang  wrote:


[Problem]

cpuid <-> nodeid mapping is firstly established at boot time. And workqueue 
caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is 
established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU

node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.


Plan B is to hunt down and fix up all the workqueue structures at
hotplug-time.  Has that option been evaluated?



Yes, the option has been evaluate in this patch:
http://www.gossamer-threads.com/lists/linux/kernel/2116748



Your fix is x86-only and this bug presumably affects other
architectures, yes?I think a "Plan B" would fix all architectures?



Yes, the bug may presumably affect few architectures which support CPU 
hotplug and NUMA.


We have sent the "Plan B" in our community and got a lot of advice and 
ideas. Based on these suggestions, We carefully balance that two plan. 
Then we choice the first.




Thirdly, what is the merge path for these patches?  Is an x86
or ACPI maintainer working with you on them?


Yes, we get a lot of guidance and help from RJ who is an ACPI maintainer.


Thanks,

Dou

[e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-25 Thread Fengguang Wu

Greetings,

This BUG message can be found in recent kernels as well as v4.4 and
linux-stable. It happens when running

modprobe netconsole netconsole=@/,$port@$server/ 

[   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
[   39.943285] netpoll: netconsole: local port 6665
[   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
[   39.943609] netpoll: netconsole: interface 'eth0'
[   39.943756] netpoll: netconsole: remote port 6672
[   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   39.944311] netpoll: netconsole: local IP 192.168.1.193
[   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
[   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
[   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
[   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
[   39.944524]  006e c90001f2fa00 810aec03 
81a25948
[   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
[   39.944526] Call Trace:
[   39.944533]  [] dump_stack+0x63/0x8a
[   39.944536]  [] ___might_sleep+0xd3/0x120
[   39.944537]  [] __might_sleep+0x4a/0x80
[   39.944541]  [] synchronize_irq+0x38/0xa0
[   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
[   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
[   39.944547]  [] disable_irq+0x1c/0x20
[   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
[   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
[   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
[   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
[   39.944572]  [] netpoll_send_udp+0x2ec/0x450
[   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
[   39.944578]  [] call_console_drivers+0x115/0x120
[   39.944580]  [] console_unlock+0x333/0x5c0
[   39.944583]  [] register_console+0x1c4/0x380
[   39.944586]  [] init_netconsole+0x1c5/0x1000 [netconsole]
[   39.944588]  [] ? 0xa004f000
[   39.944591]  [] do_one_initcall+0x3d/0x150
[   39.944592]  [] ? __might_sleep+0x4a/0x80
[   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
[   39.944598]  [] do_init_module+0x5f/0x1d8
[   39.944602]  [] load_module+0x1429/0x1b40
[   39.944604]  [] ? __symbol_put+0x40/0x40
[   39.944607]  [] ? kernel_read_file+0x178/0x1a0
[   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
[   39.944611]  [] SYSC_finit_module+0xc3/0xf0
[   39.944614]  [] SyS_finit_module+0xe/0x10
[   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
[   39.946384] console [netcon0] enabled
[   39.946514] netconsole: network logging started

Can this be possibly fixed?

Thanks,
Fengguang

[e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-25 Thread Fengguang Wu

Greetings,

This BUG message can be found in recent kernels as well as v4.4 and
linux-stable. It happens when running

modprobe netconsole netconsole=@/,$port@$server/ 

[   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
[   39.943285] netpoll: netconsole: local port 6665
[   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
[   39.943609] netpoll: netconsole: interface 'eth0'
[   39.943756] netpoll: netconsole: remote port 6672
[   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   39.944311] netpoll: netconsole: local IP 192.168.1.193
[   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
[   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
[   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
[   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
[   39.944524]  006e c90001f2fa00 810aec03 
81a25948
[   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
[   39.944526] Call Trace:
[   39.944533]  [] dump_stack+0x63/0x8a
[   39.944536]  [] ___might_sleep+0xd3/0x120
[   39.944537]  [] __might_sleep+0x4a/0x80
[   39.944541]  [] synchronize_irq+0x38/0xa0
[   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
[   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
[   39.944547]  [] disable_irq+0x1c/0x20
[   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
[   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
[   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
[   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
[   39.944572]  [] netpoll_send_udp+0x2ec/0x450
[   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
[   39.944578]  [] call_console_drivers+0x115/0x120
[   39.944580]  [] console_unlock+0x333/0x5c0
[   39.944583]  [] register_console+0x1c4/0x380
[   39.944586]  [] init_netconsole+0x1c5/0x1000 [netconsole]
[   39.944588]  [] ? 0xa004f000
[   39.944591]  [] do_one_initcall+0x3d/0x150
[   39.944592]  [] ? __might_sleep+0x4a/0x80
[   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
[   39.944598]  [] do_init_module+0x5f/0x1d8
[   39.944602]  [] load_module+0x1429/0x1b40
[   39.944604]  [] ? __symbol_put+0x40/0x40
[   39.944607]  [] ? kernel_read_file+0x178/0x1a0
[   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
[   39.944611]  [] SYSC_finit_module+0xc3/0xf0
[   39.944614]  [] SyS_finit_module+0xe/0x10
[   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
[   39.946384] console [netcon0] enabled
[   39.946514] netconsole: network logging started

Can this be possibly fixed?

Thanks,
Fengguang

Re: [PATCH v3 3/3] mac80211: mesh: fixed HT ies in beacon template

2016-07-25 Thread Masashi Honma


On 2016年07月22日 14:26, Masashi Honma wrote:
> On 2016年07月14日 05:07, Yaniv Machani wrote:
>> +
>> +/* if channel width is 20MHz - configure HT capab accordingly*/
>> +if (sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_20) {
>> +cap &= ~IEEE80211_HT_CAP_SUP_WIDTH_20_40;
>> +cap &= ~IEEE80211_HT_CAP_DSSSCCK40;
>> +}
>
> I have tested this part of your patch and this works for me.
>
> Previouly, "Supported Channel Width Set bit" in HT Capabilities element
> was 1 even though disable_ht40=1 existed in wpa_supplicant.conf.
> After appllication of patch, the bit was 0.
>
>

# I retransmit this because of mail delivery errors.

I forgot to mention I have used this patch to test.
http://lists.infradead.org/pipermail/hostap/2016-July/036029.html

Re: [PATCH v3 3/3] mac80211: mesh: fixed HT ies in beacon template

2016-07-25 Thread Masashi Honma


On 2016年07月22日 14:26, Masashi Honma wrote:
> On 2016年07月14日 05:07, Yaniv Machani wrote:
>> +
>> +/* if channel width is 20MHz - configure HT capab accordingly*/
>> +if (sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_20) {
>> +cap &= ~IEEE80211_HT_CAP_SUP_WIDTH_20_40;
>> +cap &= ~IEEE80211_HT_CAP_DSSSCCK40;
>> +}
>
> I have tested this part of your patch and this works for me.
>
> Previouly, "Supported Channel Width Set bit" in HT Capabilities element
> was 1 even though disable_ht40=1 existed in wpa_supplicant.conf.
> After appllication of patch, the bit was 0.
>
>

# I retransmit this because of mail delivery errors.

I forgot to mention I have used this patch to test.
http://lists.infradead.org/pipermail/hostap/2016-July/036029.html

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Jason Cooper

All,

On Tue, Jul 26, 2016 at 03:01:55AM +, Jason Cooper wrote:
> To date, all callers of randomize_range() have set the length to 0, and
> check for a zero return value.  For the current callers, the only way
> to get zero returned is if end <= start.  Since they are all adding a
> constant to the start address, this is unnecessary.
> 
> We can remove a bunch of needless checks by simplifying the API to do
> just what everyone wants, return an address between [start, start +
> range].
> 
> While we're here, s/get_random_int/get_random_long/.  No current call
> site is adversely affected by get_random_int(), since all current range
> requests are < MAX_UINT.  However, we should match caller expectations
> to avoid coming up short (ha!) in the future.
> 
> Signed-off-by: Jason Cooper 
> ---
>  drivers/char/random.c  | 17 -
>  include/linux/random.h |  2 +-
>  2 files changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 0158d3bff7e5..1251cb2cbab2 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>  EXPORT_SYMBOL(get_random_long);
>  
>  /*
> - * randomize_range() returns a start address such that
> - *
> - *[..  .]
> - *  start  end
> - *
> - * a  with size "len" starting at the return value is inside in the
> - * area defined by [start, end], but is otherwise randomized.
> + * randomize_addr() returns a page aligned address within [start, start +
> + * range]
>   */
>  unsigned long
> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
> +randomize_addr(unsigned long start, unsigned long range)
>  {
> - unsigned long range = end - len - start;
> -
> - if (end <= start + len)
> - return 0;
> - return PAGE_ALIGN(get_random_int() % range + start);
> + return PAGE_ALIGN(get_random_long() % range + start);
>  }

bah!  old patch file.  This should have been:

if (range == 0)
return start;
else
return PAGE_ALIGN(get_random_long() % range + start);

sorry,

Jason.

>  
>  /* Interface for in-kernel drivers of true hardware RNGs.
> diff --git a/include/linux/random.h b/include/linux/random.h
> index e47e533742b5..1ad877a98186 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
> urandom_fops;
>  
>  unsigned int get_random_int(void);
>  unsigned long get_random_long(void);
> -unsigned long randomize_range(unsigned long start, unsigned long end, 
> unsigned long len);
> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>  
>  u32 prandom_u32(void);
>  void prandom_bytes(void *buf, size_t nbytes);
> -- 
> 2.9.2
>

Re: [RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Jason Cooper

All,

On Tue, Jul 26, 2016 at 03:01:55AM +, Jason Cooper wrote:
> To date, all callers of randomize_range() have set the length to 0, and
> check for a zero return value.  For the current callers, the only way
> to get zero returned is if end <= start.  Since they are all adding a
> constant to the start address, this is unnecessary.
> 
> We can remove a bunch of needless checks by simplifying the API to do
> just what everyone wants, return an address between [start, start +
> range].
> 
> While we're here, s/get_random_int/get_random_long/.  No current call
> site is adversely affected by get_random_int(), since all current range
> requests are < MAX_UINT.  However, we should match caller expectations
> to avoid coming up short (ha!) in the future.
> 
> Signed-off-by: Jason Cooper 
> ---
>  drivers/char/random.c  | 17 -
>  include/linux/random.h |  2 +-
>  2 files changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 0158d3bff7e5..1251cb2cbab2 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
>  EXPORT_SYMBOL(get_random_long);
>  
>  /*
> - * randomize_range() returns a start address such that
> - *
> - *[..  .]
> - *  start  end
> - *
> - * a  with size "len" starting at the return value is inside in the
> - * area defined by [start, end], but is otherwise randomized.
> + * randomize_addr() returns a page aligned address within [start, start +
> + * range]
>   */
>  unsigned long
> -randomize_range(unsigned long start, unsigned long end, unsigned long len)
> +randomize_addr(unsigned long start, unsigned long range)
>  {
> - unsigned long range = end - len - start;
> -
> - if (end <= start + len)
> - return 0;
> - return PAGE_ALIGN(get_random_int() % range + start);
> + return PAGE_ALIGN(get_random_long() % range + start);
>  }

bah!  old patch file.  This should have been:

if (range == 0)
return start;
else
return PAGE_ALIGN(get_random_long() % range + start);

sorry,

Jason.

>  
>  /* Interface for in-kernel drivers of true hardware RNGs.
> diff --git a/include/linux/random.h b/include/linux/random.h
> index e47e533742b5..1ad877a98186 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -34,7 +34,7 @@ extern const struct file_operations random_fops, 
> urandom_fops;
>  
>  unsigned int get_random_int(void);
>  unsigned long get_random_long(void);
> -unsigned long randomize_range(unsigned long start, unsigned long end, 
> unsigned long len);
> +unsigned long randomize_addr(unsigned long start, unsigned long range);
>  
>  u32 prandom_u32(void);
>  void prandom_bytes(void *buf, size_t nbytes);
> -- 
> 2.9.2
>

Re: [PATCH] iio: adc: rockchip_saradc: Explicitly disable ADC on probe

2016-07-25 Thread Guenter Roeck


On 07/25/2016 07:51 PM, Caesar Wang wrote:

Hi Guenter,

Thanks for fixing it.

On 2016年07月26日 03:39, Guenter Roeck wrote:

If the ADC is read for the first time, the caller gets a timeout error,
and the kernel log shows

read channel() error: -110

The ADC may be enabled on boot, and needs to be explicitly disabled
for a read sequence to work (otherwise there is no completion interrupt).
Disaple it explicitly in the probe function.

Fixes: 44d6f2ef94f9 ("iio: adc: add driver for Rockchip saradc")
Signed-off-by: Guenter Roeck 
---
  drivers/iio/adc/rockchip_saradc.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/iio/adc/rockchip_saradc.c 
b/drivers/iio/adc/rockchip_saradc.c
index f9ad6c2d6821..6aa3271d86b5 100644
--- a/drivers/iio/adc/rockchip_saradc.c
+++ b/drivers/iio/adc/rockchip_saradc.c
@@ -280,6 +280,9 @@ static int rockchip_saradc_probe(struct platform_device 
*pdev)
  goto err_pclk;
  }
+/* Make sure ADC is disabled */
+writel_relaxed(0, info->regs + SARADC_CTRL);


I think we should reset the saradc controller.
Since make sure the reset value is 0 and loader-->kernel may even cause harm, 
as my experience on tsadc. (drivers/thermal/rockchip_thermal.c)


e.g.:
/**
* Reset SARADC Controller, reset all saradc registers.
*/
static void rockchip_saradc_reset_controller(struct reset_control *reset)
{
reset_control_assert(reset);
usleep_range(10, 20);
reset_control_deassert(reset);
}

..probe()
{
...
rockchip_saradc_reset_controller();
...
}



Ok, I'll give it a try.

Guenter



-
Caesar


+
  platform_set_drvdata(pdev, indio_dev);
  indio_dev->name = dev_name(>dev);

Re: [PATCH] iio: adc: rockchip_saradc: Explicitly disable ADC on probe

2016-07-25 Thread Guenter Roeck


On 07/25/2016 07:51 PM, Caesar Wang wrote:

Hi Guenter,

Thanks for fixing it.

On 2016年07月26日 03:39, Guenter Roeck wrote:

If the ADC is read for the first time, the caller gets a timeout error,
and the kernel log shows

read channel() error: -110

The ADC may be enabled on boot, and needs to be explicitly disabled
for a read sequence to work (otherwise there is no completion interrupt).
Disaple it explicitly in the probe function.

Fixes: 44d6f2ef94f9 ("iio: adc: add driver for Rockchip saradc")
Signed-off-by: Guenter Roeck 
---
  drivers/iio/adc/rockchip_saradc.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/iio/adc/rockchip_saradc.c 
b/drivers/iio/adc/rockchip_saradc.c
index f9ad6c2d6821..6aa3271d86b5 100644
--- a/drivers/iio/adc/rockchip_saradc.c
+++ b/drivers/iio/adc/rockchip_saradc.c
@@ -280,6 +280,9 @@ static int rockchip_saradc_probe(struct platform_device 
*pdev)
  goto err_pclk;
  }
+/* Make sure ADC is disabled */
+writel_relaxed(0, info->regs + SARADC_CTRL);


I think we should reset the saradc controller.
Since make sure the reset value is 0 and loader-->kernel may even cause harm, 
as my experience on tsadc. (drivers/thermal/rockchip_thermal.c)


e.g.:
/**
* Reset SARADC Controller, reset all saradc registers.
*/
static void rockchip_saradc_reset_controller(struct reset_control *reset)
{
reset_control_assert(reset);
usleep_range(10, 20);
reset_control_deassert(reset);
}

..probe()
{
...
rockchip_saradc_reset_controller();
...
}



Ok, I'll give it a try.

Guenter



-
Caesar


+
  platform_set_drvdata(pdev, indio_dev);
  indio_dev->name = dev_name(>dev);

Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management

2016-07-25 Thread Luck, Tony

You must specify a mask for each L3 cache. So you can achieve your 80/80 split 
either with one rdtgroup that has an 80% mask on each of the sockets and using 
affinity to make one VM run only on CPUs on one socket and the second VM on the 
other. 

Or separate rdtgroups for each VM that give them the 80% when they are on their 
own socket and the spare 20% if the wander off to the other socket.

Sent from my iPhone

> On Jul 25, 2016, at 19:13, Marcelo Tosatti  wrote:
> 
>> On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote:
>>> On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
>>> How does this patchset handle the following condition:
>>> 
>>> 6) Create reservations in such a way that the sum is larger than
>>> total amount of cache, and CPU pinning (example from Karen Noel):
>>> 
>>> VM-1 on socket-1 with 80% of reservation.
>>> VM-2 on socket-2 with 80% of reservation.
>>> VM-1 pinned to socket-1.
>>> VM-2 pinned to socket-2.
>> 
>> That's legal, but perhaps we need a description of
>> overlapping cache reservations.
>> 
>> Hardware tells you how finely you can divide the cache (and this
>> information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
>> you from digging in CPUID leaves).  E.g. on Broadwell the value is
>> 20, so you can control cache allocations in 5% slices.
>> 
>> A bitmask defines which slices you can use (and h/w has the restriction
>> that you must have contiguous '1' bits in any mask).  So you can pick
>> your 80% using 0x0, 0x1fffe, 0x3fffc, 0x7fff8 or 0x0.
>> 
>> There is no requirement that masks be exclusive of each other. So
>> you might pick the two extremes: 0x0 and 0x0 for your two
>> VM's in this example. Each would be allowed to allocate up to 80%,
>> but with a big overlap in the middle. Each has 20% exclusive, but
>> there is a 60% range in the middle that they would compete for.
> 
> This are different sockets, so there is no competing/sharing of L3 cache
> here: the question is about whether the interface allows the
> user to specify that 80/80 reservation without complaining:
> because the VM's are pinned, they will never actually
> share the same L3 cache.
> 
> (haven't finished reading the patchset to be certain).
> 
>> Is this specific case useful? Possibly not.  I think the more common
>> overlap cases might be between processes that you know have shared
>> code/data. Also the case where some rdtgroup has access to allocate
>> in the entire cache (mask 0xf on Broadwell) and some other
>> rdtgroups
>> have limited cache allocation with less bits in the mask.
>> 
>> -Tony
> 
> All you have to do is to build the bitmask for a given processor
> from the union of the tasks which have been scheduled on that
> processor.
> 
>

Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management

2016-07-25 Thread Luck, Tony

You must specify a mask for each L3 cache. So you can achieve your 80/80 split 
either with one rdtgroup that has an 80% mask on each of the sockets and using 
affinity to make one VM run only on CPUs on one socket and the second VM on the 
other. 

Or separate rdtgroups for each VM that give them the 80% when they are on their 
own socket and the spare 20% if the wander off to the other socket.

Sent from my iPhone

> On Jul 25, 2016, at 19:13, Marcelo Tosatti  wrote:
> 
>> On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote:
>>> On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
>>> How does this patchset handle the following condition:
>>> 
>>> 6) Create reservations in such a way that the sum is larger than
>>> total amount of cache, and CPU pinning (example from Karen Noel):
>>> 
>>> VM-1 on socket-1 with 80% of reservation.
>>> VM-2 on socket-2 with 80% of reservation.
>>> VM-1 pinned to socket-1.
>>> VM-2 pinned to socket-2.
>> 
>> That's legal, but perhaps we need a description of
>> overlapping cache reservations.
>> 
>> Hardware tells you how finely you can divide the cache (and this
>> information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
>> you from digging in CPUID leaves).  E.g. on Broadwell the value is
>> 20, so you can control cache allocations in 5% slices.
>> 
>> A bitmask defines which slices you can use (and h/w has the restriction
>> that you must have contiguous '1' bits in any mask).  So you can pick
>> your 80% using 0x0, 0x1fffe, 0x3fffc, 0x7fff8 or 0x0.
>> 
>> There is no requirement that masks be exclusive of each other. So
>> you might pick the two extremes: 0x0 and 0x0 for your two
>> VM's in this example. Each would be allowed to allocate up to 80%,
>> but with a big overlap in the middle. Each has 20% exclusive, but
>> there is a 60% range in the middle that they would compete for.
> 
> This are different sockets, so there is no competing/sharing of L3 cache
> here: the question is about whether the interface allows the
> user to specify that 80/80 reservation without complaining:
> because the VM's are pinned, they will never actually
> share the same L3 cache.
> 
> (haven't finished reading the patchset to be certain).
> 
>> Is this specific case useful? Possibly not.  I think the more common
>> overlap cases might be between processes that you know have shared
>> code/data. Also the case where some rdtgroup has access to allocate
>> in the entire cache (mask 0xf on Broadwell) and some other
>> rdtgroups
>> have limited cache allocation with less bits in the mask.
>> 
>> -Tony
> 
> All you have to do is to build the bitmask for a given processor
> from the union of the tasks which have been scheduled on that
> processor.
> 
>

RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread Dexuan Cui

> From: David Miller [mailto:da...@davemloft.net]
> 
> From: Dexuan Cui 
> Date: Sat, 23 Jul 2016 01:35:51 +
> 
> > +static struct sock *hvsock_create(struct net *net, struct socket *sock,
> > + gfp_t priority, unsigned short type)
> > +{
> > +   struct hvsock_sock *hvsk;
> > +   struct sock *sk;
> > +
> > +   sk = sk_alloc(net, AF_HYPERV, priority, _proto, 0);
> > +   if (!sk)
> > +   return NULL;
>  ...
> > +   /* Looks stream-based socket doesn't need this. */
> > +   sk->sk_backlog_rcv = NULL;
> > +
> > +   sk->sk_state = 0;
> > +   sock_reset_flag(sk, SOCK_DONE);
> 
> All of these are unnecessary initializations, since sk_alloc() zeroes
> out the 'sk' object for you.

Hi David,
Thanks for the comment!  I'll remove the 3 lines.

May I know if you have more comments?

BTW, during the past month, at least 7 other people also reviewed
the patch and gave me quite a few good comments, which have
been addressed. Though only one of them gave the Reviewed-by
line for now, I guess I would get more if I ping them to have a look
at the latest version of the patch, i.e., v19 -- I'm going to post it
with the aforementioned 3 lines removed and if you've more 
comments, I'm ready to address them too. :-)

Thanks,
-- Dexuan

RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread Dexuan Cui

> From: David Miller [mailto:da...@davemloft.net]
> 
> From: Dexuan Cui 
> Date: Sat, 23 Jul 2016 01:35:51 +
> 
> > +static struct sock *hvsock_create(struct net *net, struct socket *sock,
> > + gfp_t priority, unsigned short type)
> > +{
> > +   struct hvsock_sock *hvsk;
> > +   struct sock *sk;
> > +
> > +   sk = sk_alloc(net, AF_HYPERV, priority, _proto, 0);
> > +   if (!sk)
> > +   return NULL;
>  ...
> > +   /* Looks stream-based socket doesn't need this. */
> > +   sk->sk_backlog_rcv = NULL;
> > +
> > +   sk->sk_state = 0;
> > +   sock_reset_flag(sk, SOCK_DONE);
> 
> All of these are unnecessary initializations, since sk_alloc() zeroes
> out the 'sk' object for you.

Hi David,
Thanks for the comment!  I'll remove the 3 lines.

May I know if you have more comments?

BTW, during the past month, at least 7 other people also reviewed
the patch and gave me quite a few good comments, which have
been addressed. Though only one of them gave the Reviewed-by
line for now, I guess I would get more if I ping them to have a look
at the latest version of the patch, i.e., v19 -- I'm going to post it
with the aforementioned 3 lines removed and if you've more 
comments, I'm ready to address them too. :-)

Thanks,
-- Dexuan

Re: [PATCH -next] drm/hisilicon: Fix error handling of ade_power_up()

2016-07-25 Thread Xinliang Liu

On 19 July 2016 at 19:30, Wei Yongjun  wrote:
> From: Wei Yongjun 
>
> Fix the reset_control_deassert() fail and clk_prepare_enable() fail
> error handling of ade_power_up().
>
> Signed-off-by: Wei Yongjun 

Applied, thanks.

-xinliang

> ---
>  drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c 
> b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> index c3707d4..e2bd1e6 100644
> --- a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> +++ b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> @@ -258,18 +258,24 @@ static int ade_power_up(struct ade_hw_ctx *ctx)
> ret = reset_control_deassert(ctx->reset);
> if (ret) {
> DRM_ERROR("failed to deassert reset\n");
> -   return ret;
> +   goto err_reset;
> }
>
> ret = clk_prepare_enable(ctx->ade_core_clk);
> if (ret) {
> DRM_ERROR("failed to enable ade_core_clk (%d)\n", ret);
> -   return ret;
> +   goto err_prepare_enable;
> }
>
> ade_init(ctx);
> ctx->power_on = true;
> return 0;
> +
> +err_prepare_enable:
> +   reset_control_assert(ctx->reset);
> +err_reset:
> +   clk_disable_unprepare(ctx->media_noc_clk);
> +   return ret;
>  }
>
>  static void ade_power_down(struct ade_hw_ctx *ctx)
>
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH -next] drm/hisilicon: Fix error handling of ade_power_up()

2016-07-25 Thread Xinliang Liu

On 19 July 2016 at 19:30, Wei Yongjun  wrote:
> From: Wei Yongjun 
>
> Fix the reset_control_deassert() fail and clk_prepare_enable() fail
> error handling of ade_power_up().
>
> Signed-off-by: Wei Yongjun 

Applied, thanks.

-xinliang

> ---
>  drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c 
> b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> index c3707d4..e2bd1e6 100644
> --- a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> +++ b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c
> @@ -258,18 +258,24 @@ static int ade_power_up(struct ade_hw_ctx *ctx)
> ret = reset_control_deassert(ctx->reset);
> if (ret) {
> DRM_ERROR("failed to deassert reset\n");
> -   return ret;
> +   goto err_reset;
> }
>
> ret = clk_prepare_enable(ctx->ade_core_clk);
> if (ret) {
> DRM_ERROR("failed to enable ade_core_clk (%d)\n", ret);
> -   return ret;
> +   goto err_prepare_enable;
> }
>
> ade_init(ctx);
> ctx->power_on = true;
> return 0;
> +
> +err_prepare_enable:
> +   reset_control_assert(ctx->reset);
> +err_reset:
> +   clk_disable_unprepare(ctx->media_noc_clk);
> +   return ret;
>  }
>
>  static void ade_power_down(struct ade_hw_ctx *ctx)
>
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

[RFC patch 6/6] unicore32: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/unicore32/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 00299c927852..b856178cf167 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -295,8 +295,7 @@ unsigned long get_wchan(struct task_struct *p)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 /*
-- 
2.9.2

[RFC patch 6/6] unicore32: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/unicore32/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 00299c927852..b856178cf167 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -295,8 +295,7 @@ unsigned long get_wchan(struct task_struct *p)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 /*
-- 
2.9.2

[RFC patch 2/6] x86: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/x86/kernel/process.c| 3 +--
 arch/x86/kernel/sys_x86_64.c | 5 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 96becbbb52e0..a083a2c0744e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -507,8 +507,7 @@ unsigned long arch_align_stack(unsigned long sp)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 /*
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 10e0272d789a..f9cad22808fc 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -101,7 +101,6 @@ static void find_start_end(unsigned long flags, unsigned 
long *begin,
   unsigned long *end)
 {
if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
-   unsigned long new_begin;
/* This is usually used needed to map code in small
   model, so it needs to be in the first 31bit. Limit
   it to that.  This means we need to move the
@@ -112,9 +111,7 @@ static void find_start_end(unsigned long flags, unsigned 
long *begin,
*begin = 0x4000;
*end = 0x8000;
if (current->flags & PF_RANDOMIZE) {
-   new_begin = randomize_range(*begin, *begin + 
0x0200, 0);
-   if (new_begin)
-   *begin = new_begin;
+   *begin = randomize_addr(*begin, 0x0200);
}
} else {
*begin = current->mm->mmap_legacy_base;
-- 
2.9.2

[RFC patch 4/6] arm64: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/arm64/kernel/process.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6cd2612236dc..11bf454baf86 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -374,12 +374,8 @@ unsigned long arch_align_stack(unsigned long sp)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk;
-
if (is_compat_task())
-   range_end += 0x0200;
+   return randomize_addr(mm->brk, 0x0200);
else
-   range_end += 0x4000;
-
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x4000);
 }
-- 
2.9.2

[RFC patch 4/6] arm64: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/arm64/kernel/process.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 6cd2612236dc..11bf454baf86 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -374,12 +374,8 @@ unsigned long arch_align_stack(unsigned long sp)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk;
-
if (is_compat_task())
-   range_end += 0x0200;
+   return randomize_addr(mm->brk, 0x0200);
else
-   range_end += 0x4000;
-
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x4000);
 }
-- 
2.9.2

[RFC patch 2/6] x86: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/x86/kernel/process.c| 3 +--
 arch/x86/kernel/sys_x86_64.c | 5 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 96becbbb52e0..a083a2c0744e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -507,8 +507,7 @@ unsigned long arch_align_stack(unsigned long sp)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 /*
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 10e0272d789a..f9cad22808fc 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -101,7 +101,6 @@ static void find_start_end(unsigned long flags, unsigned 
long *begin,
   unsigned long *end)
 {
if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
-   unsigned long new_begin;
/* This is usually used needed to map code in small
   model, so it needs to be in the first 31bit. Limit
   it to that.  This means we need to move the
@@ -112,9 +111,7 @@ static void find_start_end(unsigned long flags, unsigned 
long *begin,
*begin = 0x4000;
*end = 0x8000;
if (current->flags & PF_RANDOMIZE) {
-   new_begin = randomize_range(*begin, *begin + 
0x0200, 0);
-   if (new_begin)
-   *begin = new_begin;
+   *begin = randomize_addr(*begin, 0x0200);
}
} else {
*begin = current->mm->mmap_legacy_base;
-- 
2.9.2

[RFC patch 3/6] ARM: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/arm/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 4a803c5a1ff7..02dee671cded 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -314,8 +314,7 @@ unsigned long get_wchan(struct task_struct *p)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 #ifdef CONFIG_MMU
-- 
2.9.2

[RFC patch 5/6] tile: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/tile/mm/mmap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/tile/mm/mmap.c b/arch/tile/mm/mmap.c
index 851a94e6ae58..50f6a693a2b6 100644
--- a/arch/tile/mm/mmap.c
+++ b/arch/tile/mm/mmap.c
@@ -88,6 +88,5 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
-- 
2.9.2

[RFC patch 5/6] tile: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/tile/mm/mmap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/tile/mm/mmap.c b/arch/tile/mm/mmap.c
index 851a94e6ae58..50f6a693a2b6 100644
--- a/arch/tile/mm/mmap.c
+++ b/arch/tile/mm/mmap.c
@@ -88,6 +88,5 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
-- 
2.9.2

[RFC patch 3/6] ARM: Use simpler API for random address requests

2016-07-25 Thread Jason Cooper

Currently, all callers to randomize_range() set the length to 0 and
calculate end by adding a constant to the start address.  We can
simplify the API to remove a bunch of needless checks and variables.

Use the new randomize_addr(start, range) call to set the requested
address.

Signed-off-by: Jason Cooper 
---
 arch/arm/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 4a803c5a1ff7..02dee671cded 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -314,8 +314,7 @@ unsigned long get_wchan(struct task_struct *p)
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
 {
-   unsigned long range_end = mm->brk + 0x0200;
-   return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
+   return randomize_addr(mm->brk, 0x0200);
 }
 
 #ifdef CONFIG_MMU
-- 
2.9.2

[RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Jason Cooper

To date, all callers of randomize_range() have set the length to 0, and
check for a zero return value.  For the current callers, the only way
to get zero returned is if end <= start.  Since they are all adding a
constant to the start address, this is unnecessary.

We can remove a bunch of needless checks by simplifying the API to do
just what everyone wants, return an address between [start, start +
range].

While we're here, s/get_random_int/get_random_long/.  No current call
site is adversely affected by get_random_int(), since all current range
requests are < MAX_UINT.  However, we should match caller expectations
to avoid coming up short (ha!) in the future.

Signed-off-by: Jason Cooper 
---
 drivers/char/random.c  | 17 -
 include/linux/random.h |  2 +-
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0158d3bff7e5..1251cb2cbab2 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
 EXPORT_SYMBOL(get_random_long);
 
 /*
- * randomize_range() returns a start address such that
- *
- *[..  .]
- *  start  end
- *
- * a  with size "len" starting at the return value is inside in the
- * area defined by [start, end], but is otherwise randomized.
+ * randomize_addr() returns a page aligned address within [start, start +
+ * range]
  */
 unsigned long
-randomize_range(unsigned long start, unsigned long end, unsigned long len)
+randomize_addr(unsigned long start, unsigned long range)
 {
-   unsigned long range = end - len - start;
-
-   if (end <= start + len)
-   return 0;
-   return PAGE_ALIGN(get_random_int() % range + start);
+   return PAGE_ALIGN(get_random_long() % range + start);
 }
 
 /* Interface for in-kernel drivers of true hardware RNGs.
diff --git a/include/linux/random.h b/include/linux/random.h
index e47e533742b5..1ad877a98186 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,7 +34,7 @@ extern const struct file_operations random_fops, urandom_fops;
 
 unsigned int get_random_int(void);
 unsigned long get_random_long(void);
-unsigned long randomize_range(unsigned long start, unsigned long end, unsigned 
long len);
+unsigned long randomize_addr(unsigned long start, unsigned long range);
 
 u32 prandom_u32(void);
 void prandom_bytes(void *buf, size_t nbytes);
-- 
2.9.2

[RFC patch 1/6] random: Simplify API for random address requests

2016-07-25 Thread Jason Cooper

To date, all callers of randomize_range() have set the length to 0, and
check for a zero return value.  For the current callers, the only way
to get zero returned is if end <= start.  Since they are all adding a
constant to the start address, this is unnecessary.

We can remove a bunch of needless checks by simplifying the API to do
just what everyone wants, return an address between [start, start +
range].

While we're here, s/get_random_int/get_random_long/.  No current call
site is adversely affected by get_random_int(), since all current range
requests are < MAX_UINT.  However, we should match caller expectations
to avoid coming up short (ha!) in the future.

Signed-off-by: Jason Cooper 
---
 drivers/char/random.c  | 17 -
 include/linux/random.h |  2 +-
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0158d3bff7e5..1251cb2cbab2 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1822,22 +1822,13 @@ unsigned long get_random_long(void)
 EXPORT_SYMBOL(get_random_long);
 
 /*
- * randomize_range() returns a start address such that
- *
- *[..  .]
- *  start  end
- *
- * a  with size "len" starting at the return value is inside in the
- * area defined by [start, end], but is otherwise randomized.
+ * randomize_addr() returns a page aligned address within [start, start +
+ * range]
  */
 unsigned long
-randomize_range(unsigned long start, unsigned long end, unsigned long len)
+randomize_addr(unsigned long start, unsigned long range)
 {
-   unsigned long range = end - len - start;
-
-   if (end <= start + len)
-   return 0;
-   return PAGE_ALIGN(get_random_int() % range + start);
+   return PAGE_ALIGN(get_random_long() % range + start);
 }
 
 /* Interface for in-kernel drivers of true hardware RNGs.
diff --git a/include/linux/random.h b/include/linux/random.h
index e47e533742b5..1ad877a98186 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,7 +34,7 @@ extern const struct file_operations random_fops, urandom_fops;
 
 unsigned int get_random_int(void);
 unsigned long get_random_long(void);
-unsigned long randomize_range(unsigned long start, unsigned long end, unsigned 
long len);
+unsigned long randomize_addr(unsigned long start, unsigned long range);
 
 u32 prandom_u32(void);
 void prandom_bytes(void *buf, size_t nbytes);
-- 
2.9.2

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2276 matches

Mail list logo