date:20151214

Re: [PATCH] i2c: allow building emev2 without slave mode again

2015-12-14 Thread Arnd Bergmann

On Monday 14 December 2015 14:52:06 Wolfram Sang wrote:
> > > What about not ifdeffing the inline function and keep the build error
> > > whenever someone uses it without I2C_SLAVE being selected?
> > 
> > The inline function is only added there for the case that I2C_SLAVE is
> > disabled, so that would be pointless.
> > 
> > However, what we could do is move the extern declaration outside of
> > the #ifdef to make it always visible. The if(IS_ENABLED(CONFIG_I2C_SLAVE))
> > check should then ensure that it never actually gets called, and we
> > get a link error if some driver gets it wrong.
> 
> Yes, that's what I meant: move the whole function (as it was before your
> patch) out of the CONFIG_I2C_SLAVE block. We should get a compiler error
> even, because for !I2C_SLAVE, the client struct will not have the
> slave_cb member.
> 

But we don't want a compile-error for randconfig builds, and we don't
want unnecessary #ifdef in the driver. 

This change on top of my earlier patch should do what I meant:

diff --git a/include/linux/i2c.h b/include/linux/i2c.h
index 0236e5f2b5be..536641bad92d 100644
--- a/include/linux/i2c.h
+++ b/include/linux/i2c.h
@@ -265,15 +265,15 @@ enum i2c_slave_event {
 extern int i2c_slave_register(struct i2c_client *client, i2c_slave_cb_t 
slave_cb);
 extern int i2c_slave_unregister(struct i2c_client *client);
 
+#if IS_ENABLED(CONFIG_I2C_SLAVE)
 static inline int i2c_slave_event(struct i2c_client *client,
  enum i2c_slave_event event, u8 *val)
 {
-#if IS_ENABLED(CONFIG_I2C_SLAVE)
return client->slave_cb(client, event, val);
+}
 #else
-   return 0;
+extern int i2c_slave_event(struct i2c_client *client, enum i2c_slave_event 
event, u8 *val);
 #endif
-}
 
 /**
  * struct i2c_board_info - template for device creation



Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 27/27] tools: hv: vss: fix the write()'s argument: error -> vss_msg

2015-12-14 Thread K. Y. Srinivasan

From: Dexuan Cui 

Fix the write()'s argument in the daemon code.

Cc: Vitaly Kuznetsov 
Cc: "K. Y. Srinivasan" 
Signed-off-by: Dexuan Cui 
Cc: sta...@vger.kernel.org
Signed-off-by: K. Y. Srinivasan 
---
 tools/hv/hv_vss_daemon.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
index 96234b6..5d51d6f 100644
--- a/tools/hv/hv_vss_daemon.c
+++ b/tools/hv/hv_vss_daemon.c
@@ -254,7 +254,7 @@ int main(int argc, char *argv[])
syslog(LOG_ERR, "Illegal op:%d\n", op);
}
vss_msg->error = error;
-   len = write(vss_fd, , sizeof(struct hv_vss_msg));
+   len = write(vss_fd, vss_msg, sizeof(struct hv_vss_msg));
if (len != sizeof(struct hv_vss_msg)) {
syslog(LOG_ERR, "write failed; error: %d %s", errno,
   strerror(errno));
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [POWERPC] bootwrapper: One check less in fsl_get_immr() after error detection

2015-12-14 Thread Scott Wood

On Mon, 2015-12-14 at 23:10 +0100, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Mon, 14 Dec 2015 23:01:32 +0100
> 
> A status check was performed by the fsl_get_immr() function even if it
> was known already that a system setting did not fit to the expectations.
> 
> This implementation detail could be improved by an adjustment for
> a jump label according to the Linux coding style convention.

What is the actual problem you're trying to solve?  Cluttering the code to
micro-optimize an error path is not an improvement.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 11/27] Drivers: hv: vss: run only on supported host versions

2015-12-14 Thread K. Y. Srinivasan

From: Olaf Hering 

The Backup integration service on WS2012 has appearently trouble to
negotiate with a guest which does not support the provided util version.
Currently the VSS driver supports only version 5/0. A WS2012 offers only
version 1/x and 3/x, and vmbus_prep_negotiate_resp correctly returns an
empty icframe_vercnt/icmsg_vercnt. But the host ignores that and
continues to send ICMSGTYPE_NEGOTIATE messages. The result are weird
errors during boot and general misbehaviour.

Check the Windows version to work around the host bug, skip hv_vss_init
on WS2012 and older.

Signed-off-by: Olaf Hering 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_snapshot.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index a548ae4..81882d4 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -331,6 +331,11 @@ static void vss_on_reset(void)
 int
 hv_vss_init(struct hv_util_service *srv)
 {
+   if (vmbus_proto_version < VERSION_WIN8_1) {
+   pr_warn("Integration service 'Backup (volume snapshot)'"
+   " not supported on this host version.\n");
+   return -ENOTSUPP;
+   }
recv_buffer = srv->recv_buffer;
 
/*
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 14/27] Drivers: hv: vmbus: Get rid of the unused macro

2015-12-14 Thread K. Y. Srinivasan

The macro VMBUS_DEVICE() is unused; get rid of it.

Signed-off-by: K. Y. Srinivasan 
---
 include/linux/hyperv.h |   13 -
 1 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b9f3bb2..f773a68 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -986,19 +986,6 @@ int vmbus_allocate_mmio(struct resource **new, struct 
hv_device *device_obj,
 int vmbus_cpu_number_to_vp_number(int cpu_number);
 u64 hv_do_hypercall(u64 control, void *input, void *output);
 
-/**
- * VMBUS_DEVICE - macro used to describe a specific hyperv vmbus device
- *
- * This macro is used to create a struct hv_vmbus_device_id that matches a
- * specific device.
- */
-#define VMBUS_DEVICE(g0, g1, g2, g3, g4, g5, g6, g7,   \
-g8, g9, ga, gb, gc, gd, ge, gf)\
-   .guid = { g0, g1, g2, g3, g4, g5, g6, g7,   \
- g8, g9, ga, gb, gc, gd, ge, gf },
-
-
-
 /*
  * GUID definitions of various offer types - services offered to the guest.
  */
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 25/27] Drivers: hv: vmbus: Force all channel messages to be delivered on CPU 0

2015-12-14 Thread K. Y. Srinivasan

Force all channel messages to be delivered on CPU0. These messages are not
performance critical and are used during the setup and teardown of the
channel.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/connection.c |   11 +++
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 521f48e..3dc5a9c 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -83,10 +83,13 @@ static int vmbus_negotiate_version(struct 
vmbus_channel_msginfo *msginfo,
msg->interrupt_page = virt_to_phys(vmbus_connection.int_page);
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
-   if (version >= VERSION_WIN8_1) {
-   msg->target_vcpu = hv_context.vp_index[get_cpu()];
-   put_cpu();
-   }
+   /*
+* We want all channel messages to be delivered on CPU 0.
+* This has been the behavior pre-win8. This is not
+* perf issue and having all channel messages delivered on CPU 0
+* would be ok.
+*/
+   msg->target_vcpu = 0;
 
/*
 * Add to list before we send the request since we may
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kobject: Ensure child's resources get released before parent's resources

2015-12-14 Thread Rajat Jain

On Mon, Dec 14, 2015 at 1:40 PM, Greg Kroah-Hartman
 wrote:
> On Mon, Dec 14, 2015 at 11:02:46AM -0800, Rajat Jain wrote:
>> If the only remaining reference to a parent, is the one taken by
>> the child (in kobject_add_internal()), then when the last
>> reference to the child goes away, both child and its parents
>> shall be released. However, currently the resources of parent
>> get released first, followed by the child's resources:
>>
>> kobject_cleanup(child)
>> 
>> kobject_del(child)
>> 
>> kobject_put(child->parent) -> results in parent's release()
>> ...
>> child->kobj_type->release() -> Child's release()
>>
>> This is a problem because the child's release() method may still
>> need to use parent resources or memory for its own cleanup. E.g.
>> child may need parent pointer for dma_free_coherent() etc.
>>
>> Signed-off-by: Rajat Jain 
>> Signed-off-by: Rajat Jain 
>
> Why are you listed twice here?

Ah, sorry, I'll remove that.

>
> Where in the kernel is the parent being freed before the child that is
> causing this issue to happen?  We should fix that root cause first...

Umm, are you saying that it is a bug to reach a scenario where all
references to a parent, except the ones made by the child, are gone?

Sorry, I should have given more context here. Here is the scenario
where I came across this situation, and I'd appreciate any suggestions
on how to better deal with this situation:

I have 2 modules (random names here):

user_interface.ko <--- pci_driver.ko

1) user_interface.ko
 - exports some interfaces (char driver etc) to the userspace,
 - allows low-level device drivers to register devices via some
API (user_interface_add() / user_interface_del())
 - Userspace can issue some transactions. Each transaction results
in a child kobject being attached to the device's kobject.
 - Low level drivers also provide a release() function that can
get called AFTER user_interface_del() if there are transactions
in-flight.
 - Low level drivers should allow operation of the device until
release() gets called.

2) Low level drivers such as pci_driver.ko:
- attach to the actual physical devices (PCI device in this case)
- create a custom device (that has an embedded "struct device")
and register this new custom device with the user_interface.ko.
- also attaches a release() function to the device. This release()
would get called when all references to the device are dropped.
- The entities holding the reference to the device are:
  * 1 reference by the pci_driver.ko itself (when it did
device_initialize())
  * 1 reference by the user_interface.ko (During user_interface_add())
  * 1 reference for each transaction in-flight (a child
kobject under the device)

3) Now, we want to allow removing (rmmod) the low level driver pci_driver.ko.
- Before returning from PCI remove method, need to ensure that
release() has been called.
- So we do call user_interface_del(dev) - drops the reference that
user_interface.ko was holding.
- pci_driver.ko gives up its own reference so that release()
method can get called.
- At this time, the device is just waiting for transactions
in-flight to get completed i.e only the child kobjects hold the
references.

When the last transaction gets completed, I end up in the situation
described in the patch commit log. I'd be very glad if you can provide
suggestions on how to achieve this or if there is anything I am
missing?

On a side note, my poor understanding of the device model came out to
that it does (or may be should) guarantee that all children are freed
before the parent is freed. Is that not the case?

Thanks,

Rajat

>
> thanks,
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 18/27] Drivers: hv: vmbus: fix rescind-offer handling for device without a driver

2015-12-14 Thread K. Y. Srinivasan

From: Dexuan Cui 

In the path vmbus_onoffer_rescind() -> vmbus_device_unregister()  ->
device_unregister() -> ... -> __device_release_driver(), we can see for a
device without a driver loaded: dev->driver is NULL, so
dev->bus->remove(dev), namely vmbus_remove(), isn't invoked.

As a result, vmbus_remove() -> hv_process_channel_removal() isn't invoked
and some cleanups(like sending a CHANNELMSG_RELID_RELEASED message to the
host) aren't done.

We can demo the issue this way:
1. rmmod hv_utils;
2. disable the Heartbeat Integration Service in Hyper-V Manager and lsvmbus
shows the device disappears.
3. re-enable the Heartbeat in Hyper-V Manager and modprobe hv_utils, but
lsvmbus shows the device can't appear again.
This is because, the host thinks the VM hasn't released the relid, so can't
re-offer the device to the VM.

We can fix the issue by moving hv_process_channel_removal()
from vmbus_close_internal() to vmbus_device_release(), since the latter is
always invoked on device_unregister(), whether or not the dev has a driver
loaded.

Signed-off-by: Dexuan Cui 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel.c  |6 --
 drivers/hv/channel_mgmt.c |6 +++---
 drivers/hv/vmbus_drv.c|   15 +++
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 00e1be7..77d2579 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -575,12 +575,6 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
free_pages((unsigned long)channel->ringbuffer_pages,
get_order(channel->ringbuffer_pagecount * PAGE_SIZE));
 
-   /*
-* If the channel has been rescinded; process device removal.
-*/
-   if (channel->rescind)
-   hv_process_channel_removal(channel,
-  channel->offermsg.child_relid);
 out:
tasklet_enable(tasklet);
 
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index dc4fb0b..7903acc 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -191,6 +191,8 @@ void hv_process_channel_removal(struct vmbus_channel 
*channel, u32 relid)
if (channel == NULL)
return;
 
+   BUG_ON(!channel->rescind);
+
if (channel->target_cpu != get_cpu()) {
put_cpu();
smp_call_function_single(channel->target_cpu,
@@ -230,9 +232,7 @@ void vmbus_free_channels(void)
 
list_for_each_entry_safe(channel, tmp, _connection.chn_list,
listentry) {
-   /* if we don't set rescind to true, vmbus_close_internal()
-* won't invoke hv_process_channel_removal().
-*/
+   /* hv_process_channel_removal() needs this */
channel->rescind = true;
 
vmbus_device_unregister(channel->device_obj);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index ab888a1..f123bca 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -601,23 +601,11 @@ static int vmbus_remove(struct device *child_device)
 {
struct hv_driver *drv;
struct hv_device *dev = device_to_hv_device(child_device);
-   u32 relid = dev->channel->offermsg.child_relid;
 
if (child_device->driver) {
drv = drv_to_hv_drv(child_device->driver);
if (drv->remove)
drv->remove(dev);
-   else {
-   hv_process_channel_removal(dev->channel, relid);
-   pr_err("remove not set for driver %s\n",
-   dev_name(child_device));
-   }
-   } else {
-   /*
-* We don't have a driver for this device; deal with the
-* rescind message by removing the channel.
-*/
-   hv_process_channel_removal(dev->channel, relid);
}
 
return 0;
@@ -652,7 +640,10 @@ static void vmbus_shutdown(struct device *child_device)
 static void vmbus_device_release(struct device *device)
 {
struct hv_device *hv_dev = device_to_hv_device(device);
+   struct vmbus_channel *channel = hv_dev->channel;
 
+   hv_process_channel_removal(channel,
+  channel->offermsg.child_relid);
kfree(hv_dev);
 
 }
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3] x86: Create dma_mark_dirty to dirty pages used for DMA by VM guest

2015-12-14 Thread Alexander Duyck

On Mon, Dec 14, 2015 at 12:52 PM, Michael S. Tsirkin  wrote:
> On Mon, Dec 14, 2015 at 09:59:13AM -0800, Alexander Duyck wrote:
>> On Mon, Dec 14, 2015 at 9:20 AM, Michael S. Tsirkin  wrote:
>> > On Mon, Dec 14, 2015 at 08:34:00AM -0800, Alexander Duyck wrote:
>> >> > This way distro can use a guest agent to disable
>> >> > dirtying until before migration starts.
>> >>
>> >> Right.  For a v2 version I would definitely want to have some way to
>> >> limit the scope of this.  My main reason for putting this out here is
>> >> to start altering the course of discussions since it seems like were
>> >> weren't getting anywhere with the ixgbevf migration changes that were
>> >> being proposed.
>> >
>> > Absolutely, thanks for working on this.
>> >
>> >> >> + unsigned long pg_addr, start;
>> >> >> +
>> >> >> + start = (unsigned long)addr;
>> >> >> + pg_addr = PAGE_ALIGN(start + size);
>> >> >> + start &= ~(sizeof(atomic_t) - 1);
>> >> >> +
>> >> >> + /* trigger a write fault on each page, excluding first page */
>> >> >> + while ((pg_addr -= PAGE_SIZE) > start)
>> >> >> + atomic_add(0, (atomic_t *)pg_addr);
>> >> >> +
>> >> >> + /* trigger a write fault on first word of DMA */
>> >> >> + atomic_add(0, (atomic_t *)start);
>
> Actually, I have second thoughts about using atomic_add here,
> especially for _sync.
>
> Many architectures do
>
> #define ATOMIC_OP_RETURN(op, c_op)  \
> static inline int atomic_##op##_return(int i, atomic_t *v)  \
> {   \
> unsigned long flags;\
> int ret;\
> \
> raw_local_irq_save(flags);  \
> ret = (v->counter = v->counter c_op i); \
> raw_local_irq_restore(flags);   \
> \
> return ret; \
> }
>
> and this is not safe if device is still doing DMA to/from
> this memory.
>
> Generally, atomic_t is there for SMP effects, not for sync
> with devices.
>
> This is why I said you should do
> cmpxchg(pg_addr, 0xdead, 0xdead);
>
> Yes, we probably never actually want to run m68k within a VM,
> but let's not misuse interfaces like this.

Right now this implementation is for x86 only.  Any other architecture
currently reports dma_mark_dirty as an empty inline function.  The
reason why I chose the atomic_add for x86 is simply because it is
guaranteed dirty the cache line with relatively few instructions and
operands as all I have to have is the pointer and 0.

For the m68k we could implement it as a cmpxchg instead.  The general
thought here is that each architecture is probably going to have to do
it a little bit differently.

>> >> >
>> >> > start might not be aligned correctly for a cast to atomic_t.
>> >> > It's harmless to do this for any memory, so I think you should
>> >> > just do this for 1st byte of all pages including the first one.
>> >>
>> >> You may not have noticed it but I actually aligned start in the line
>> >> after pg_addr.
>> >
>> > Yes you did. alignof would make it a bit more noticeable.
>> >
>> >>  However instead of aligning to the start of the next
>> >> atomic_t I just masked off the lower bits so that we start at the
>> >> DWORD that contains the first byte of the starting address.  The
>> >> assumption here is that I cannot trigger any sort of fault since if I
>> >> have access to a given byte within a DWORD I will have access to the
>> >> entire DWORD.
>> >
>> > I'm curious where does this come from.  Isn't it true that access is
>> > controlled at page granularity normally, so you can touch beginning of
>> > page just as well?
>>
>> Yeah, I am pretty sure it probably is page granularity.  However my
>> thought was to try and stick to the start of the DMA as the last
>> access.  That way we don't pull in any more cache lines than we need
>> to in order to dirty the pages.  Usually the start of the DMA region
>> will contain some sort of headers or something that needs to be
>> accessed with the highest priority so I wanted to make certain that we
>> were forcing usable data into the L1 cache rather than just the first
>> cache line of the page where the DMA started.  If however the start of
>> a DMA was the start of the page there is nothing there to prevent
>> that.
>
> OK, maybe this helps. You should document all these tricks
> in code comments.

I'll try to get that taken care of for v2.

>> >>  I coded this up so that the spots where we touch the
>> >> memory should match up with addresses provided by the hardware to
>> >> perform the DMA over the PCI bus.
>> >
>> > Yes

[PATCH RESEND 21/27] drivers:hv: Allow for MMIO claims that span ACPI _CRS records

2015-12-14 Thread K. Y. Srinivasan

From: Jake Oshins 

This patch makes 16GB GPUs work in Hyper-V VMs, since, for
compatibility reasons, the Hyper-V BIOS lists MMIO ranges in 2GB
chunks in its root bus's _CRS object.

Signed-off-by: Jake Oshins 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/vmbus_drv.c |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index f123bca..328e4c3 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1063,12 +1063,28 @@ static acpi_status vmbus_walk_resources(struct 
acpi_resource *res, void *ctx)
new_res->start = start;
new_res->end = end;
 
+   /*
+* Stick ranges from higher in address space at the front of the list.
+* If two ranges are adjacent, merge them.
+*/
do {
if (!*old_res) {
*old_res = new_res;
break;
}
 
+   if (((*old_res)->end + 1) == new_res->start) {
+   (*old_res)->end = new_res->end;
+   kfree(new_res);
+   break;
+   }
+
+   if ((*old_res)->start == new_res->end + 1) {
+   (*old_res)->start = new_res->start;
+   kfree(new_res);
+   break;
+   }
+
if ((*old_res)->end < new_res->start) {
new_res->sibling = *old_res;
if (prev_res)
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCHSET 0/6] perf tools: Support dynamic sort keys for tracepoints (v1)

2015-12-14 Thread David Ahern


On 12/14/15 10:47 AM, Arnaldo Carvalho de Melo wrote:

With dynamic sort keys, you can use  as a sort key.  Those
dynamic keys are checked and created on demand.  For instance, below is
to sort by next_pid field on the same data file.

   $ perf report -s comm,sched:sched_switch.next_pid --stdio
   ...
   # Overhead  Commandnext_pid
   #   ...  ..
   #
   21.23%  transmission-gt   0
   20.86%  swapper   17773
6.62%  netctl-auto   0
5.25%  swapper 109
5.21%  kworker/0:1H  0
1.98%  Xephyr0
1.98%  swapper6524
1.98%  swapper   27478
1.37%  swapper   27476
1.17%  swapper 233

Multiple dynamic sort keys are also supported:

   $ perf report -s 
comm,sched:sched_switch.next_pid,sched:sched_switch.next_comm --stdio
   ...
   # Overhead  Commandnext_pid next_comm
   #   ...  ..  
   #
   20.86%  swapper   17773   transmission-gt
9.64%  transmission-gt   0 swapper/0
9.16%  transmission-gt   0 swapper/2
5.25%  swapper 109  kworker/0:1H
5.21%  kworker/0:1H  0 swapper/0
2.14%  netctl-auto   0 swapper/2
1.98%  netctl-auto   0 swapper/0
1.98%  swapper6524Xephyr
1.98%  swapper   27478   netctl-auto
1.78%  transmission-gt   0 swapper/3
1.53%  Xephyr0 swapper/0
1.29%  netctl-auto   0 swapper/1
1.29%  swapper   27476   netctl-auto
1.21%  netctl-auto   0 swapper/3
1.17%  swapper 233irq/33-iwlwifi

Note that pid 0 exists for each cpu so have comm of 'swapper/N'.



This is available on 'perf/dynamic-sort-v1' branch in my tree

   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Any comments are welcome, thanks!
Namhyung


I'll look at the patches for style, but the idea is so nice and natural
I thought about blind merging it :-)



yes, that is a cool feature.

For scheduling tracepoints the analysis could be added to perf-sched to 
ease the burden of the command line syntax.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 20/27] Drivers: hv: vmbus: channge vmbus_connection.channel_lock to mutex

2015-12-14 Thread K. Y. Srinivasan

From: Dexuan Cui 

spinlock is unnecessary here.
mutex is enough.

Signed-off-by: Dexuan Cui 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel_mgmt.c |   12 ++--
 drivers/hv/connection.c   |7 +++
 drivers/hv/hyperv_vmbus.h |2 +-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 9c9da3a..d013171 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -206,9 +206,9 @@ void hv_process_channel_removal(struct vmbus_channel 
*channel, u32 relid)
}
 
if (channel->primary_channel == NULL) {
-   spin_lock_irqsave(_connection.channel_lock, flags);
+   mutex_lock(_connection.channel_mutex);
list_del(>listentry);
-   spin_unlock_irqrestore(_connection.channel_lock, flags);
+   mutex_unlock(_connection.channel_mutex);
 
primary_channel = channel;
} else {
@@ -253,7 +253,7 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
unsigned long flags;
 
/* Make sure this is a new offer */
-   spin_lock_irqsave(_connection.channel_lock, flags);
+   mutex_lock(_connection.channel_mutex);
 
list_for_each_entry(channel, _connection.chn_list, listentry) {
if (!uuid_le_cmp(channel->offermsg.offer.if_type,
@@ -269,7 +269,7 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
list_add_tail(>listentry,
  _connection.chn_list);
 
-   spin_unlock_irqrestore(_connection.channel_lock, flags);
+   mutex_unlock(_connection.channel_mutex);
 
if (!fnew) {
/*
@@ -341,9 +341,9 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
 err_deq_chan:
vmbus_release_relid(newchannel->offermsg.child_relid);
 
-   spin_lock_irqsave(_connection.channel_lock, flags);
+   mutex_lock(_connection.channel_mutex);
list_del(>listentry);
-   spin_unlock_irqrestore(_connection.channel_lock, flags);
+   mutex_unlock(_connection.channel_mutex);
 
if (newchannel->target_cpu != get_cpu()) {
put_cpu();
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 4fc2e88..521f48e 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -146,7 +146,7 @@ int vmbus_connect(void)
spin_lock_init(_connection.channelmsg_lock);
 
INIT_LIST_HEAD(_connection.chn_list);
-   spin_lock_init(_connection.channel_lock);
+   mutex_init(_connection.channel_mutex);
 
/*
 * Setup the vmbus event connection for channel interrupt
@@ -282,11 +282,10 @@ struct vmbus_channel *relid2channel(u32 relid)
 {
struct vmbus_channel *channel;
struct vmbus_channel *found_channel  = NULL;
-   unsigned long flags;
struct list_head *cur, *tmp;
struct vmbus_channel *cur_sc;
 
-   spin_lock_irqsave(_connection.channel_lock, flags);
+   mutex_lock(_connection.channel_mutex);
list_for_each_entry(channel, _connection.chn_list, listentry) {
if (channel->offermsg.child_relid == relid) {
found_channel = channel;
@@ -305,7 +304,7 @@ struct vmbus_channel *relid2channel(u32 relid)
}
}
}
-   spin_unlock_irqrestore(_connection.channel_lock, flags);
+   mutex_unlock(_connection.channel_mutex);
 
return found_channel;
 }
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9beeb14..4d67e98 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -683,7 +683,7 @@ struct vmbus_connection {
 
/* List of channels */
struct list_head chn_list;
-   spinlock_t channel_lock;
+   struct mutex channel_mutex;
 
struct workqueue_struct *work_queue;
 };
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 16/27] Drivers: hv: vmbus: serialize process_chn_event() and vmbus_close_internal()

2015-12-14 Thread K. Y. Srinivasan

From: Dexuan Cui 

process_chn_event(), running in the tasklet, can race with
vmbus_close_internal() in the case of SMP guest, e.g., when the former is
accessing channel->inbound.ring_buffer, the latter could be freeing the
ring_buffer pages.

To resolve the race, we can serialize them by disabling the tasklet when
the latter is running here.

Signed-off-by: Dexuan Cui 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel.c |   21 +++--
 1 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index c4dcab0..f7f3d5c 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -496,8 +497,21 @@ static void reset_channel_cb(void *arg)
 static int vmbus_close_internal(struct vmbus_channel *channel)
 {
struct vmbus_channel_close_channel *msg;
+   struct tasklet_struct *tasklet;
int ret;
 
+   /*
+* process_chn_event(), running in the tasklet, can race
+* with vmbus_close_internal() in the case of SMP guest, e.g., when
+* the former is accessing channel->inbound.ring_buffer, the latter
+* could be freeing the ring_buffer pages.
+*
+* To resolve the race, we can serialize them by disabling the
+* tasklet when the latter is running here.
+*/
+   tasklet = hv_context.event_dpc[channel->target_cpu];
+   tasklet_disable(tasklet);
+
channel->state = CHANNEL_OPEN_STATE;
channel->sc_creation_callback = NULL;
/* Stop callback and cancel the timer asap */
@@ -525,7 +539,7 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
 * If we failed to post the close msg,
 * it is perhaps better to leak memory.
 */
-   return ret;
+   goto out;
}
 
/* Tear down the gpadl for the channel's ring buffer */
@@ -538,7 +552,7 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
 * If we failed to teardown gpadl,
 * it is perhaps better to leak memory.
 */
-   return ret;
+   goto out;
}
}
 
@@ -555,6 +569,9 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
if (channel->rescind)
hv_process_channel_removal(channel,
   channel->offermsg.child_relid);
+out:
+   tasklet_enable(tasklet);
+
return ret;
 }
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 26/27] Drivers: hv: utils: Invoke the poll function after handshake

2015-12-14 Thread K. Y. Srinivasan

When the handshake with daemon is complete, we should poll the channel since
during the handshake, we will not be processing any messages. This is a
potential bug if the host is waiting for a response from the guest.
I would like to thank Dexuan for pointing this out.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_kvp.c  |2 +-
 drivers/hv/hv_snapshot.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 2a3420c..d4ab81b 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -154,7 +154,7 @@ static int kvp_handle_handshake(struct hv_kvp_msg *msg)
pr_debug("KVP: userspace daemon ver. %d registered\n",
 KVP_OP_REGISTER);
kvp_register(dm_reg_value);
-   kvp_transaction.state = HVUTIL_READY;
+   hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
 
return 0;
 }
diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 81882d4..67def4a 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -113,7 +113,7 @@ static int vss_handle_handshake(struct hv_vss_msg *vss_msg)
default:
return -EINVAL;
}
-   vss_transaction.state = HVUTIL_READY;
+   hv_poll_channel(vss_transaction.recv_channel, vss_poll_wrapper);
pr_debug("VSS: userspace daemon ver. %d registered\n", dm_reg_value);
return 0;
 }
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 23/27] Drivers: hv: vmbus: Fix a Host signaling bug

2015-12-14 Thread K. Y. Srinivasan

Currently we have two policies for deciding when to signal the host:
One based on the ring buffer state and the other based on what the
VMBUS client driver wants to do. Consider the case when the client
wants to explicitly control when to signal the host. In this case,
if the client were to defer signaling, we will not be able to signal
the host subsequently when the client does want to signal since the
ring buffer state will prevent the signaling. Implement logic to
have only one signaling policy in force for a given channel.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
Tested-by: Haiyang Zhang 
Cc:  # v4.2+
---
 drivers/hv/channel.c   |   18 ++
 include/linux/hyperv.h |   18 ++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 77d2579..2889d97 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -653,10 +653,19 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
 *on the ring. We will not signal if more data is
 *to be placed.
 *
+* Based on the channel signal state, we will decide
+* which signaling policy will be applied.
+*
 * If we cannot write to the ring-buffer; signal the host
 * even if we may not have written anything. This is a rare
 * enough condition that it should not matter.
 */
+
+   if (channel->signal_policy)
+   signal = true;
+   else
+   kick_q = true;
+
if (((ret == 0) && kick_q && signal) || (ret))
vmbus_setevent(channel);
 
@@ -756,10 +765,19 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
 *on the ring. We will not signal if more data is
 *to be placed.
 *
+* Based on the channel signal state, we will decide
+* which signaling policy will be applied.
+*
 * If we cannot write to the ring-buffer; signal the host
 * even if we may not have written anything. This is a rare
 * enough condition that it should not matter.
 */
+
+   if (channel->signal_policy)
+   signal = true;
+   else
+   kick_q = true;
+
if (((ret == 0) && kick_q && signal) || (ret))
vmbus_setevent(channel);
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index f773a68..acd995b 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -630,6 +630,11 @@ struct hv_input_signal_event_buffer {
struct hv_input_signal_event event;
 };
 
+enum hv_signal_policy {
+   HV_SIGNAL_POLICY_DEFAULT = 0,
+   HV_SIGNAL_POLICY_EXPLICIT,
+};
+
 struct vmbus_channel {
/* Unique channel id */
int id;
@@ -757,8 +762,21 @@ struct vmbus_channel {
 * link up channels based on their CPU affinity.
 */
struct list_head percpu_list;
+   /*
+* Host signaling policy: The default policy will be
+* based on the ring buffer state. We will also support
+* a policy where the client driver can have explicit
+* signaling control.
+*/
+   enum hv_signal_policy  signal_policy;
 };
 
+static inline void set_channel_signal_state(struct vmbus_channel *c,
+   enum hv_signal_policy policy)
+{
+   c->signal_policy = policy;
+}
+
 static inline void set_channel_read_state(struct vmbus_channel *c, bool state)
 {
c->batched_reading = state;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 24/27] drivers/hv: correct tsc page sequence invalid value

2015-12-14 Thread K. Y. Srinivasan

From: Andrey Smetanin 

Hypervisor Top Level Functional Specification v3/4 says
that TSC page sequence value = -1(0x) is used to
indicate that TSC page no longer reliable source of reference
timer. Unfortunately, we found that Windows Hyper-V guest
side implementation uses sequence value = 0 to indicate
that Tsc page no longer valid. This is clearly visible
inside Windows 2012R2 ntoskrnl.exe HvlGetReferenceTime()
function dissassembly:

HvlGetReferenceTime proc near
 xchgax, ax
loc_1401C3132:
 mov rax, cs:HvlpReferenceTscPage
 mov r9d, [rax]
 testr9d, r9d
 jz  short loc_1401C3176
 rdtsc
 mov rcx, cs:HvlpReferenceTscPage
 shl rdx, 20h
 or  rdx, rax
 mov rax, [rcx+8]
 mov rcx, cs:HvlpReferenceTscPage
 mov r8, [rcx+10h]
 mul rdx
 mov rax, cs:HvlpReferenceTscPage
 add rdx, r8
 mov ecx, [rax]
 cmp ecx, r9d
 jnz short loc_1401C3132
 jmp short loc_1401C3184
loc_1401C3176:
 mov ecx, 4020h
 rdmsr
 shl rdx, 20h
 or  rdx, rax
loc_1401C3184:
 mov rax, rdx
 retn
HvlGetReferenceTime endp

This patch aligns Tsc page invalid sequence value with
Windows Hyper-V guest implementation which is more
compatible with both Hyper-V hypervisor and KVM hypervisor.

Signed-off-by: Andrey Smetanin 
Signed-off-by: Denis V. Lunev 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 

Signed-off-by: Denis V. Lunev 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 7a06933..1db9556 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -140,7 +140,7 @@ static cycle_t read_hv_clock_tsc(struct clocksource *arg)
cycle_t current_tick;
struct ms_hyperv_tsc_page *tsc_pg = hv_context.tsc_page;
 
-   if (tsc_pg->tsc_sequence != -1) {
+   if (tsc_pg->tsc_sequence != 0) {
/*
 * Use the tsc page to compute the value.
 */
@@ -162,7 +162,7 @@ static cycle_t read_hv_clock_tsc(struct clocksource *arg)
if (tsc_pg->tsc_sequence == sequence)
return current_tick;
 
-   if (tsc_pg->tsc_sequence != -1)
+   if (tsc_pg->tsc_sequence != 0)
continue;
/*
 * Fallback using MSR method.
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 05/27] Drivers: hv: util: catch allocation errors

2015-12-14 Thread K. Y. Srinivasan

From: Olaf Hering 

Catch allocation errors in hvutil_transport_send.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport 
abstraction')

Signed-off-by: Olaf Hering 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_utils_transport.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 6a9d80a..1505ee6 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -204,9 +204,12 @@ int hvutil_transport_send(struct hvutil_transport *hvt, 
void *msg, int len)
goto out_unlock;
}
hvt->outmsg = kzalloc(len, GFP_KERNEL);
-   memcpy(hvt->outmsg, msg, len);
-   hvt->outmsg_len = len;
-   wake_up_interruptible(>outmsg_q);
+   if (hvt->outmsg) {
+   memcpy(hvt->outmsg, msg, len);
+   hvt->outmsg_len = len;
+   wake_up_interruptible(>outmsg_q);
+   } else
+   ret = -ENOMEM;
 out_unlock:
mutex_unlock(>outmsg_lock);
return ret;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 07/27] drivers/hv: cleanup synic msrs if vmbus connect failed

2015-12-14 Thread K. Y. Srinivasan

From: Andrey Smetanin 

Before vmbus_connect() synic is setup per vcpu - this means
hypervisor receives writes at synic msr's and probably allocate
hypervisor resources per synic setup.

If vmbus_connect() failed for some reason it's neccessary to cleanup
synic setup by call hv_synic_cleanup() at each vcpu to get a chance
to free allocated resources by hypervisor per synic.

This patch does appropriate cleanup in case of vmbus_connect() failure.

Signed-off-by: Andrey Smetanin 
Signed-off-by: Denis V. Lunev 
Reviewed-by: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Haiyang Zhang 
CC: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/vmbus_drv.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index f19b6f7..3297731 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -867,7 +867,7 @@ static int vmbus_bus_init(int irq)
on_each_cpu(hv_synic_init, NULL, 1);
ret = vmbus_connect();
if (ret)
-   goto err_alloc;
+   goto err_connect;
 
if (vmbus_proto_version > VERSION_WIN7)
cpu_hotplug_disable();
@@ -885,6 +885,8 @@ static int vmbus_bus_init(int irq)
 
return 0;
 
+err_connect:
+   on_each_cpu(hv_synic_cleanup, NULL, 1);
 err_alloc:
hv_synic_free();
hv_remove_vmbus_irq();
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 03/27] tools: hv: report ENOSPC errors in hv_fcopy_daemon

2015-12-14 Thread K. Y. Srinivasan

From: Olaf Hering 

Currently some "Unspecified error 0x80004005" is reported on the Windows
side if something fails. Handle the ENOSPC case and return
ERROR_DISK_FULL, which allows at least Copy-VMFile to report a meaning
full error.

Signed-off-by: Olaf Hering 
Signed-off-by: K. Y. Srinivasan 
---
 include/uapi/linux/hyperv.h |1 +
 tools/hv/hv_fcopy_daemon.c  |   20 +---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/hyperv.h b/include/uapi/linux/hyperv.h
index e4c0a35..e347b24 100644
--- a/include/uapi/linux/hyperv.h
+++ b/include/uapi/linux/hyperv.h
@@ -313,6 +313,7 @@ enum hv_kvp_exchg_pool {
 #define HV_INVALIDARG  0x80070057
 #define HV_GUID_NOTFOUND   0x80041002
 #define HV_ERROR_ALREADY_EXISTS0x80070050
+#define HV_ERROR_DISK_FULL 0x80070070
 
 #define ADDR_FAMILY_NONE   0x00
 #define ADDR_FAMILY_IPV4   0x01
diff --git a/tools/hv/hv_fcopy_daemon.c b/tools/hv/hv_fcopy_daemon.c
index 5480e4e..f1d7426 100644
--- a/tools/hv/hv_fcopy_daemon.c
+++ b/tools/hv/hv_fcopy_daemon.c
@@ -37,12 +37,14 @@
 
 static int target_fd;
 static char target_fname[W_MAX_PATH];
+static unsigned long long filesize;
 
 static int hv_start_fcopy(struct hv_start_fcopy *smsg)
 {
int error = HV_E_FAIL;
char *q, *p;
 
+   filesize = 0;
p = (char *)smsg->path_name;
snprintf(target_fname, sizeof(target_fname), "%s/%s",
 (char *)smsg->path_name, (char *)smsg->file_name);
@@ -98,14 +100,26 @@ done:
 static int hv_copy_data(struct hv_do_fcopy *cpmsg)
 {
ssize_t bytes_written;
+   int ret = 0;
 
bytes_written = pwrite(target_fd, cpmsg->data, cpmsg->size,
cpmsg->offset);
 
-   if (bytes_written != cpmsg->size)
-   return HV_E_FAIL;
+   filesize += cpmsg->size;
+   if (bytes_written != cpmsg->size) {
+   switch (errno) {
+   case ENOSPC:
+   ret = HV_ERROR_DISK_FULL;
+   break;
+   default:
+   ret = HV_E_FAIL;
+   break;
+   }
+   syslog(LOG_ERR, "pwrite failed to write %llu bytes: %ld (%s)",
+  filesize, (long)bytes_written, strerror(errno));
+   }
 
-   return 0;
+   return ret;
 }
 
 static int hv_copy_finished(void)
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 19/27] Drivers: hv: vmbus: release relid on error in vmbus_process_offer()

2015-12-14 Thread K. Y. Srinivasan

From: Dexuan Cui 

We want to simplify vmbus_onoffer_rescind() by not invoking
hv_process_channel_removal(NULL, ...).

Signed-off-by: Dexuan Cui 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/channel_mgmt.c |   21 +++--
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 7903acc..9c9da3a 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -177,19 +177,22 @@ static void percpu_channel_deq(void *arg)
 }
 
 
-void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid)
+static void vmbus_release_relid(u32 relid)
 {
struct vmbus_channel_relid_released msg;
-   unsigned long flags;
-   struct vmbus_channel *primary_channel;
 
memset(, 0, sizeof(struct vmbus_channel_relid_released));
msg.child_relid = relid;
msg.header.msgtype = CHANNELMSG_RELID_RELEASED;
vmbus_post_msg(, sizeof(struct vmbus_channel_relid_released));
+}
 
-   if (channel == NULL)
-   return;
+void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid)
+{
+   unsigned long flags;
+   struct vmbus_channel *primary_channel;
+
+   vmbus_release_relid(relid);
 
BUG_ON(!channel->rescind);
 
@@ -336,6 +339,8 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
return;
 
 err_deq_chan:
+   vmbus_release_relid(newchannel->offermsg.child_relid);
+
spin_lock_irqsave(_connection.channel_lock, flags);
list_del(>listentry);
spin_unlock_irqrestore(_connection.channel_lock, flags);
@@ -587,7 +592,11 @@ static void vmbus_onoffer_rescind(struct 
vmbus_channel_message_header *hdr)
channel = relid2channel(rescind->child_relid);
 
if (channel == NULL) {
-   hv_process_channel_removal(NULL, rescind->child_relid);
+   /*
+* This is very impossible, because in
+* vmbus_process_offer(), we have already invoked
+* vmbus_release_relid() on error.
+*/
return;
}
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 06/27] Drivers: hv: utils: use memdup_user in hvt_op_write

2015-12-14 Thread K. Y. Srinivasan

From: Olaf Hering 

Use memdup_user to handle OOM.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport 
abstraction')

Signed-off-by: Olaf Hering 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_utils_transport.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 1505ee6..24b2766 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -80,11 +80,10 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   inmsg = kzalloc(count, GFP_KERNEL);
-   if (copy_from_user(inmsg, buf, count)) {
-   kfree(inmsg);
-   return -EFAULT;
-   }
+   inmsg = memdup_user(buf, count);
+   if (IS_ERR(inmsg))
+   return PTR_ERR(inmsg);
+
if (hvt->on_msg(inmsg, count))
return -EFAULT;
kfree(inmsg);
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] extcon: add Maxim MAX3355 driver

2015-12-14 Thread Sergei Shtylyov


Hello.

On 12/15/2015 01:21 AM, Rob Herring wrote:


Maxim  Integrated MAX3355E chip integrates a  charge pump and comparators
to
enable a system with an integrated USB OTG dual-role transceiver to
function
as  an USB  OTG dual-role device.  In addition  to sensing/controlling
Vbus,
the chip also passes thru the ID signal  from the USB  OTG connector.
On some Renesas boards,  this signal is  just fed into the SoC thru a
GPIO
pin --  there's no real  OTG controller, only host and gadget USB
controllers
sharing the same USB bus; however, we'd  like to allow host or gadget
drivers
to be loaded depending on the cable type,  hence the need for the MAX3355
extcon driver. The Vbus status signals are also  wired to GPIOs (however,
we
aren't currently interested in them),  the OFFVBUS# signal is controlled
by
the host controllers, there's  also the SHDN# signal wired to a GPIO, it
should be driven high for the  normal operation.



As multiple people have said, fix the spacing here.



You are the first to complain abou _this_ patch. If you don't have other
issues with this driver in which case you should have trimmed the reply at
this point), I'd like to keep my spacing as is. Thank you.


Your previous version was not "extcon-usb-gpio: add enable pin
support"[1] which has now been re-written to be max3355 specific?


   No, the MAX3355 driver pre-dates that version. First there was a driver, 
then I tried to re-use the existing stuff (there was no extcon-usb-gpio at the 
time of writing my driver), then had to return to the separate driver idea...



"So
what" and "I'd like to keep my spacing as is" aren't valid reasons.
Fix it, then I'll look at the rest again.


   I'll consider doing that if you care to explain what's the problem with my 
spacing. TIA.



Rob


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 04/27] tools: hv: remove repeated HV_FCOPY string

2015-12-14 Thread K. Y. Srinivasan

From: Olaf Hering 

HV_FCOPY is already used as identifier in syslog.

Signed-off-by: Olaf Hering 
Signed-off-by: K. Y. Srinivasan 
---
 tools/hv/hv_fcopy_daemon.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/hv/hv_fcopy_daemon.c b/tools/hv/hv_fcopy_daemon.c
index f1d7426..fdc9ca4 100644
--- a/tools/hv/hv_fcopy_daemon.c
+++ b/tools/hv/hv_fcopy_daemon.c
@@ -179,7 +179,7 @@ int main(int argc, char *argv[])
}
 
openlog("HV_FCOPY", 0, LOG_USER);
-   syslog(LOG_INFO, "HV_FCOPY starting; pid is:%d", getpid());
+   syslog(LOG_INFO, "starting; pid is:%d", getpid());
 
fcopy_fd = open("/dev/vmbus/hv_fcopy", O_RDWR);
 
@@ -215,7 +215,7 @@ int main(int argc, char *argv[])
}
kernel_modver = *(__u32 *)buffer;
in_handshake = 0;
-   syslog(LOG_INFO, "HV_FCOPY: kernel module version: %d",
+   syslog(LOG_INFO, "kernel module version: %d",
   kernel_modver);
continue;
}
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 02/27] Drivers: hv: utils: run polling callback always in interrupt context

2015-12-14 Thread K. Y. Srinivasan

From: Olaf Hering 

All channel interrupts are bound to specific VCPUs in the guest
at the point channel is created. While currently, we invoke the
polling function on the correct CPU (the CPU to which the channel
is bound to) in some cases we may run the polling function in
a non-interrupt context. This  potentially can cause an issue as the
polling function can be interrupted by the channel callback function.
Fix the issue by running the polling function on the appropriate CPU
at interrupt level. Additional details of the issue being addressed by
this patch are given below:

Currently hv_fcopy_onchannelcallback is called from interrupts and also
via the ->write function of hv_utils. Since the used global variables to
maintain state are not thread safe the state can get out of sync.
This affects the variable state as well as the channel inbound buffer.

As suggested by KY adjust hv_poll_channel to always run the given
callback on the cpu which the channel is bound to. This avoids the need
for locking because all the util services are single threaded and only
one transaction is active at any given point in time.

Additionally, remove the context variable, they will always be the same as
recv_channel.

Signed-off-by: Olaf Hering 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_fcopy.c |   34 --
 drivers/hv/hv_kvp.c   |   28 ++--
 drivers/hv/hv_snapshot.c  |   29 +++--
 drivers/hv/hyperv_vmbus.h |6 +-
 4 files changed, 34 insertions(+), 63 deletions(-)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index bbdec50..c37a71e 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -51,7 +51,6 @@ static struct {
struct hv_fcopy_hdr  *fcopy_msg; /* current message */
struct vmbus_channel *recv_channel; /* chn we got the request */
u64 recv_req_id; /* request ID. */
-   void *fcopy_context; /* for the channel callback */
 } fcopy_transaction;
 
 static void fcopy_respond_to_host(int error);
@@ -67,6 +66,13 @@ static struct hvutil_transport *hvt;
  */
 static int dm_reg_value;
 
+static void fcopy_poll_wrapper(void *channel)
+{
+   /* Transaction is finished, reset the state here to avoid races. */
+   fcopy_transaction.state = HVUTIL_READY;
+   hv_fcopy_onchannelcallback(channel);
+}
+
 static void fcopy_timeout_func(struct work_struct *dummy)
 {
/*
@@ -74,13 +80,7 @@ static void fcopy_timeout_func(struct work_struct *dummy)
 * process the pending transaction.
 */
fcopy_respond_to_host(HV_E_FAIL);
-
-   /* Transaction is finished, reset the state. */
-   if (fcopy_transaction.state > HVUTIL_READY)
-   fcopy_transaction.state = HVUTIL_READY;
-
-   hv_poll_channel(fcopy_transaction.fcopy_context,
-   hv_fcopy_onchannelcallback);
+   hv_poll_channel(fcopy_transaction.recv_channel, fcopy_poll_wrapper);
 }
 
 static int fcopy_handle_handshake(u32 version)
@@ -108,9 +108,7 @@ static int fcopy_handle_handshake(u32 version)
return -EINVAL;
}
pr_debug("FCP: userspace daemon ver. %d registered\n", version);
-   fcopy_transaction.state = HVUTIL_READY;
-   hv_poll_channel(fcopy_transaction.fcopy_context,
-   hv_fcopy_onchannelcallback);
+   hv_poll_channel(fcopy_transaction.recv_channel, fcopy_poll_wrapper);
return 0;
 }
 
@@ -227,15 +225,8 @@ void hv_fcopy_onchannelcallback(void *context)
int util_fw_version;
int fcopy_srv_version;
 
-   if (fcopy_transaction.state > HVUTIL_READY) {
-   /*
-* We will defer processing this callback once
-* the current transaction is complete.
-*/
-   fcopy_transaction.fcopy_context = context;
+   if (fcopy_transaction.state > HVUTIL_READY)
return;
-   }
-   fcopy_transaction.fcopy_context = NULL;
 
vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 2, ,
 );
@@ -305,9 +296,8 @@ static int fcopy_on_msg(void *msg, int len)
if (cancel_delayed_work_sync(_timeout_work)) {
fcopy_transaction.state = HVUTIL_USERSPACE_RECV;
fcopy_respond_to_host(*val);
-   fcopy_transaction.state = HVUTIL_READY;
-   hv_poll_channel(fcopy_transaction.fcopy_context,
-   hv_fcopy_onchannelcallback);
+   hv_poll_channel(fcopy_transaction.recv_channel,
+   fcopy_poll_wrapper);
}
 
return 0;
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index e6aa33a..2a3420c 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -66,7 +66,6 @@ static struct {
struct hv_kvp_msg  *kvp_msg; /* current message */
struct vmbus_channel *recv_channel; /* chn we got the request */
u64 recv_req_id; /*

[PATCH RESEND 01/27] Drivers: hv: util: Increase the timeout for util services

2015-12-14 Thread K. Y. Srinivasan

Util services such as KVP and FCOPY need assistance from daemon's running
in user space. Increase the timeout so we don't prematurely terminate
the transaction in the kernel. Host sets up a 60 second timeout for
all util driver transactions. The host will retry the transaction if it
times out. Set the guest timeout at 30 seconds.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/hv/hv_fcopy.c |3 ++-
 drivers/hv/hv_kvp.c   |3 ++-
 drivers/hv/hyperv_vmbus.h |5 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index db4b887..bbdec50 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -275,7 +275,8 @@ void hv_fcopy_onchannelcallback(void *context)
 * Send the information to the user-level daemon.
 */
schedule_work(_send_work);
-   schedule_delayed_work(_timeout_work, 5*HZ);
+   schedule_delayed_work(_timeout_work,
+ HV_UTIL_TIMEOUT * HZ);
return;
}
icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 74c38a9..e6aa33a 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -668,7 +668,8 @@ void hv_kvp_onchannelcallback(void *context)
 * user-mode not responding.
 */
schedule_work(_sendkey_work);
-   schedule_delayed_work(_timeout_work, 5*HZ);
+   schedule_delayed_work(_timeout_work,
+ HV_UTIL_TIMEOUT * HZ);
 
return;
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 3782636..225b96b 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -31,6 +31,11 @@
 #include 
 
 /*
+ * Timeout for services such as KVP and fcopy.
+ */
+#define HV_UTIL_TIMEOUT 30
+
+/*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
  * is set by CPUID(HVCPUID_VERSION_FEATURES).
  */
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHV2 1/3] x86, ras: Add new infrastructure for machine check fixup tables

2015-12-14 Thread Borislav Petkov

On Mon, Dec 14, 2015 at 10:58:45AM -0700, Ross Zwisler wrote:
> With this code if CONFIG_MCE_KERNEL_RECOVERY isn't defined you'll get
> a compiler error that the function doesn't have a return statement,
> right?  I think we need an #else to return NULL, or to have the #ifdef
> encompass the whole function definition as it was in Tony's version.

Right, correct.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 4.4-rc5: ugly warn on: 5 W+X pages found

2015-12-14 Thread Andy Lutomirski

On Mon, Dec 14, 2015 at 1:24 PM, Arjan van de Ven  wrote:
>
>> That's weird.  The only API to do that seems to be manually setting
>> kmap_prot to _PAGE_KERNEL_EXEC, and nothing does that.  (Why is
>> kmap_prot a variable on x86 at all?  It has exactly one writer, and
>> that's the code that initializes it in the first place.  Shouldn't we
>> #define kmap_prot _PAGE_KERNEL?
>
>
> iirc it changes based on runtime detection of NX capability
>

Maybe it did, but if it still does, I can't find the code.

What *does* change is __supported_pte_mask.  If we're willing to make
disable_nx work a little less well, we could try to initialize
__supported_pte_mask from the very beginning.  (We currently seem to
detect and enable NX even before we enable paging.)  I suspect that
Pavel is seeing a kmap mapping left over from so early that it didn't
have NX set (killed by massage_pgprot).

Borislav, could we do that?  (Why do we have disable_nx at all?  I
suspect it was for debugging a long, long time ago.)

Alternatively, we could go through and set NX everywhere after we
decide we have NX, but that seems rather error-prone.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] misc/bmp085: Enable building as a module

2015-12-14 Thread Arnd Bergmann

On Monday 14 December 2015 14:29:23 Ben Hutchings wrote:
> Commit 985087dbcb02 'misc: add support for bmp18x chips to the bmp085
> driver' changed the BMP085 config symbol to a boolean.  I see no
> reason why the shared code cannot be built as a module, so change it
> back to tristate.
> 
> Fixes: 985087dbcb02 ("misc: add support for bmp18x chips to the bmp085 
> driver")
> Cc: Eric Andersson 
> Signed-off-by: Ben Hutchings 
> 

Looks good to me. There are often subtle bugs in configurations like these
where a driver can depend on either SPI or I2C, but I don't see a problem
here, because the common code still only gets built as a module if neither
front-end is built-in.

Acked-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] extcon: add Maxim MAX3355 driver

2015-12-14 Thread Rob Herring

On Mon, Dec 14, 2015 at 11:36 AM, Sergei Shtylyov
 wrote:
> Hello.
>
> On 12/14/2015 04:24 AM, Rob Herring wrote:
>
>>> Maxim  Integrated MAX3355E chip integrates a  charge pump and comparators
>>> to
>>> enable a system with an integrated USB OTG dual-role transceiver to
>>> function
>>> as  an USB  OTG dual-role device.  In addition  to sensing/controlling
>>> Vbus,
>>> the chip also passes thru the ID signal  from the USB  OTG connector.
>>> On some Renesas boards,  this signal is  just fed into the SoC thru a
>>> GPIO
>>> pin --  there's no real  OTG controller, only host and gadget USB
>>> controllers
>>> sharing the same USB bus; however, we'd  like to allow host or gadget
>>> drivers
>>> to be loaded depending on the cable type,  hence the need for the MAX3355
>>> extcon driver. The Vbus status signals are also  wired to GPIOs (however,
>>> we
>>> aren't currently interested in them),  the OFFVBUS# signal is controlled
>>> by
>>> the host controllers, there's  also the SHDN# signal wired to a GPIO, it
>>> should be driven high for the  normal operation.
>>
>>
>> As multiple people have said, fix the spacing here.
>
>
>You are the first to complain abou _this_ patch. If you don't have other
> issues with this driver in which case you should have trimmed the reply at
> this point), I'd like to keep my spacing as is. Thank you.

Your previous version was not "extcon-usb-gpio: add enable pin
support"[1] which has now been re-written to be max3355 specific? "So
what" and "I'd like to keep my spacing as is" aren't valid reasons.
Fix it, then I'll look at the rest again.

Rob

[1] https://patchwork.ozlabs.org/patch/555378/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH RESEND 00/27] Drivers: hv: Miscellaneous fixes.

2015-12-14 Thread KY Srinivasan



> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Monday, December 14, 2015 10:59 AM
> To: KY Srinivasan 
> Cc: linux-kernel@vger.kernel.org; de...@linuxdriverproject.org;
> o...@aepfle.de; a...@canonical.com; vkuzn...@redhat.com;
> jasow...@redhat.com
> Subject: Re: [PATCH RESEND 00/27] Drivers: hv: Miscellaneous fixes.
> 
> On Fri, Dec 11, 2015 at 08:21:24PM -0800, K. Y. Srinivasan wrote:
> > Most of the patches in this set are being resent.
> 
> Why?  What changed?

Since many of these patches were in your queue since October,
I thought I should resend them.

> 
> Also, your series can't be sorted by subject at all, so I can't apply
> them in the correct order (some have RESEND in the subject, some do
> not...)
> 
> Please resend them so that I can apply them...

I will resend them all now.

Thanks,

K. Y
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] mlock.2: mlock2.2: Add entry to for new mlock2 syscall

2015-12-14 Thread Michael Kerrisk (man-pages)

On 11/09/2015 07:27 PM, Eric B Munson wrote:
> Update the mlock.2 man page with information on mlock2() and the new
> mlockall() flag MCL_ONFAULT.

Hello Eric,

Thanks for the nicely written patch. I've applied.

Cheers,

Michael


> Signed-off-by: Eric B Munson 
> Acked-by: Michal Hocko 
> Acked-by: Vlastimil Babka 
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Jonathan Corbet 
> Cc: linux-...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes from V3:
>  Add note about not having a glibc wrapper for mlock2
> 
> Changes from V2:
>  Update available from kernel version
> 
>  man2/mlock.2  | 114 
> +++---
>  man2/mlock2.2 |   1 +
>  2 files changed, 102 insertions(+), 13 deletions(-)
>  create mode 100644 man2/mlock2.2
> 
> diff --git a/man2/mlock.2 b/man2/mlock.2
> index 79c544d..0ad580c 100644
> --- a/man2/mlock.2
> +++ b/man2/mlock.2
> @@ -23,21 +23,23 @@
>  .\" .
>  .\" %%%LICENSE_END
>  .\"
> -.TH MLOCK 2 2015-07-23 "Linux" "Linux Programmer's Manual"
> +.TH MLOCK 2 2015-08-28 "Linux" "Linux Programmer's Manual"
>  .SH NAME
> -mlock, munlock, mlockall, munlockall \- lock and unlock memory
> +mlock, mlock2, munlock, mlockall, munlockall \- lock and unlock memory
>  .SH SYNOPSIS
>  .nf
>  .B #include 
>  .sp
>  .BI "int mlock(const void *" addr ", size_t " len );
> +.BI "int mlock2(const void *" addr ", size_t " len ", int " flags );
>  .BI "int munlock(const void *" addr ", size_t " len );
>  .sp
>  .BI "int mlockall(int " flags );
>  .B int munlockall(void);
>  .fi
>  .SH DESCRIPTION
> -.BR mlock ()
> +.BR mlock (),
> +.BR mlock2 (),
>  and
>  .BR mlockall ()
>  respectively lock part or all of the calling process's virtual address
> @@ -51,7 +53,7 @@ respectively unlocking part or all of the calling process's 
> virtual
>  address space, so that pages in the specified virtual address range may
>  once more to be swapped out if required by the kernel memory manager.
>  Memory locking and unlocking are performed in units of whole pages.
> -.SS mlock() and munlock()
> +.SS mlock(), mlock2(), and munlock()
>  .BR mlock ()
>  locks pages in the address range starting at
>  .I addr
> @@ -62,6 +64,39 @@ All pages that contain a part of the specified address 
> range are
>  guaranteed to be resident in RAM when the call returns successfully;
>  the pages are guaranteed to stay in RAM until later unlocked.
>  
> +.BR mlock2 ()
> +also locks pages in the specified range starting at
> +.I addr
> +and continuing for
> +.I len
> +bytes.
> +However, the state of the pages contained in that range after the call
> +returns successfully will depend on the value in the
> +.I flags
> +argument.
> +
> +The
> +.I flags
> +argument can be either 0 or the following constant:
> +.TP 1.2i
> +.B MLOCK_ONFAULT
> +Lock pages that are currently resident and mark the entire range to have
> +pages locked when they are populated by the page fault.
> +.PP
> +
> +If
> +.I flags
> +is 0,
> +.BR mlock2 ()
> +will function exactly as
> +.BR mlock ()
> +would.
> +
> +Note: Currently, there is not a glibc wrapper for
> +.BR mlock2 ()
> +so it will need to be invoked using
> +.BR syscall (2)
> +
>  .BR munlock ()
>  unlocks pages in the address range starting at
>  .I addr
> @@ -93,9 +128,33 @@ the process.
>  .B MCL_FUTURE
>  Lock all pages which will become mapped into the address space of the
>  process in the future.
> -These could be for instance new pages required
> +These could be, for instance, new pages required
>  by a growing heap and stack as well as new memory-mapped files or
>  shared memory regions.
> +.TP
> +.BR MCL_ONFAULT " (since Linux 4.4)"
> +Used together with
> +.BR MCL_CURRENT ,
> +.BR MCL_FUTURE ,
> +or both.  Mark all current (with
> +.BR MCL_CURRENT )
> +or future (with
> +.BR MCL_FUTURE )
> +mappings to lock pages when they are faulted in.  When used with
> +.BR MCL_CURRENT ,
> +all present pages are locked, but
> +.BR mlockall ()
> +will not fault in non-present pages.  When used with
> +.BR MCL_FUTURE ,
> +all future mappings will be marked to lock pages when they are faulted
> +in, but they will not be populated by the lock when the mapping is
> +created.
> +.B MCL_ONFAULT
> +must be used with either
> +.B MCL_CURRENT
> +or
> +.B MCL_FUTURE
> +or both.
>  .PP
>  If
>  .B MCL_FUTURE
> @@ -148,7 +207,8 @@ to perform the requested operation.
>  .\"SVr4 documents an additional EAGAIN error code.
>  .LP
>  For
> -.BR mlock ()
> +.BR mlock (),
> +.BR mlock2 (),
>  and
>  .BR munlock ():
>  .TP
> @@ -157,9 +217,9 @@ Some or all of the specified address range could not be 
> locked.
>  .TP
>  .B EINVAL
>  The result of the addition
> -.IR start + len
> +.IR addr + len
>  was less than
> -.IR start
> +.IR addr
>  (e.g., the addition may have resulted in an overflow).
>  .TP
>  .B EINVAL
> @@ -181,12 +241,23 @@ mapping would result in three mappings:
>  two locked mappings at each end and an

Re: WARNING: CPU: 0 PID: 913 at fs/inode.c:275 drop_nlink+0x4b/0x50()

2015-12-14 Thread Andrew Morton

On Sun, 13 Dec 2015 23:19:31 +0100 Vegard Nossum  
wrote:

> On 11/26/2015 09:30 AM, OGAWA Hirofumi wrote:
> > Vegard Nossum  writes:
> >> On 11/25/2015 10:54 PM, OGAWA Hirofumi wrote:
> >>> Vegard Nossum  writes:
>  On 11/23/2015 11:21 PM, Richard Weinberger wrote:
> > Am 23.11.2015 um 08:55 schrieb Vegard Nossum:
> >> With the attached vfat disk image (fuzzed), I get the
> >> following WARNING:
> >>
> >> WARNING: CPU: 0 PID: 913 at fs/inode.c:275
> >> drop_nlink+0x4b/0x50()
> >> [...]
> >>> Can you try this one?
> >> That seems to fix the problem here, thanks!
> > Andrew, please queue this up for next chance.
> 
> Sorry to bug you, I didn't see this merged yet and just making sure this
> doesn't slip through the cracks. Or should I expect it only for the next
> merge window?

I merged this into -mm on Nov 11.  I'd assumed it was 4.5 material.  Was
that wrong?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: question about cpusets vs sched_setaffinity()

2015-12-14 Thread Jason Baron



On 12/11/2015 06:26 PM, Chris Friesen wrote:
> On 12/11/2015 04:15 PM, Jason Baron wrote:
>> On 12/10/2015 04:30 PM, Chris Friesen wrote:
> 
>>> If I put a task into a cpuset and then call sched_setaffinity() on it,
>>> it will be affined to the intersection of the two sets of cpus.  (Those
>>> specified on the set, and those specified in the syscall.)
>>>
>>> However, if I then change the cpus in the cpuset the process affinity
>>> will simply be overwritten by the new cpuset affinity.  It does not seem
>>> to take into account any restrictions from the original
>>> sched_setaffinity() call.
>>>
>>> Wouldn't it make more sense to affine the process to the intersection
>>> between the new set of cpus from the cpuset, and the current process
>>> affinity?  That way if I explicitly masked out certain CPUs in the
>>> original sched_setaffinity() call then they would remain masked out
>>> regardless of changes to the set of cpus assigned to the cpuset.
> 
> 
> 
>> To add the behavior you are describing, I think requires another
>> cpumask_t field in the task_struct. Where we could store the last
>> requested mask value for sched_setaffinity() and use that when updating
>> the cpus for a cpuset via an intersection as you described. I think
>> adding a task to a cpuset still should wipe out any sched_setaffinity()
>> settings - but that would depend on the desired semantics here. It would
>> also require a knob so as not to break existing behavior by default.
> 
> Agreed, the additional field in the task_struct makes sense.  Personally
> I don't think that adding a task to a cpuset should wipe out any
> previously-set affinity, I think it should take the intersection for
> that case as well.
> 
> In this environment it might make sense to have separate queries to
> return the requested and actual affinity.
>

So because cpumask_t is dimensioned by NR_CPUS, I think we would need a
pointer to the cpumask_t field. And we could allocate it when we want
the cpus set by sched_setaffinity() to persist across the cgroup cpuset
cpu changes. I think you are right that a flag to
sched_[set|get]affinity() for this case might be nice - but that would
require a new syscall...

>> You could also create a child cgroup for the process that you don't want
>> to change and set the cpus on that cgroup instead of using
>> sched_setaffinity(). Then you change the cpus for the parent cgroup and
>> that shouldn't affect the child as long as the child cgroup is a subset.
>> But its not entirely clear to me if that addresses your use-case?
> 
> I ended up doing something like this where I had a top-level cpuset and
> a number of child cpusets, each with an exclusive subset of the CPUs
> assigned to it.  But it meant that I needed more complicated code to
> figure out which tasks needed to go into which child cpusets, and more
> complicated code to handle removing a CPU from the top-level cpuset
> (since you have to remove it from any children first).
> 
> Chris

I agree that it would be nice to improve this interface, since you are
creating extra cgroups here just to sort of work around this.

Thanks,

-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[POWERPC] bootwrapper: One check less in fsl_get_immr() after error detection

2015-12-14 Thread SF Markus Elfring

From: Markus Elfring 
Date: Mon, 14 Dec 2015 23:01:32 +0100

A status check was performed by the fsl_get_immr() function even if it
was known already that a system setting did not fit to the expectations.

This implementation detail could be improved by an adjustment for
a jump label according to the Linux coding style convention.

Signed-off-by: Markus Elfring 
---
 arch/powerpc/boot/fsl-soc.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/boot/fsl-soc.c b/arch/powerpc/boot/fsl-soc.c
index b835ed6..ff1dae3 100644
--- a/arch/powerpc/boot/fsl-soc.c
+++ b/arch/powerpc/boot/fsl-soc.c
@@ -34,24 +34,24 @@ u32 *fsl_get_immr(void)
naddr = 2;
 
if (naddr != 1 && naddr != 2)
-   goto err;
+   goto report_failure;
 
size = getprop(soc, "ranges", prop_buf, MAX_PROP_LEN);
 
if (size < 12)
-   goto err;
+   goto report_failure;
if (prop_buf[0] != 0)
-   goto err;
+   goto report_failure;
if (naddr == 2 && prop_buf[1] != 0)
-   goto err;
+   goto report_failure;
 
if (!dt_xlate_addr(soc, prop_buf + naddr, 8, ))
ret = 0;
}
 
-err:
-   if (!ret)
+   if (!ret) {
+report_failure:
printf("fsl_get_immr: Failed to find immr base\r\n");
-
+   }
return (u32 *)ret;
 }
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lkp] [mm] 3f5d849cb0: page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))

2015-12-14 Thread Stephen Rothwell

Hi all,

On Mon, 14 Dec 2015 10:26:53 -0500 Johannes Weiner  wrote:
>
> I'm fairly certain this is this unlocked page issue:
> http://www.spinics.net/lists/kernel/msg2142719.html
> 
> The fix went into Linus's tree only yesterday:
> dfd01f0 ("sched/wait: Fix the signal handling fix")

And will be in linux-next today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/2] iio: health: Add driver for the TI AFE4404 heart monitor

2015-12-14 Thread Andrew F. Davis


On 12/12/2015 09:41 AM, Jonathan Cameron wrote:

On 07/12/15 17:26, Andrew F. Davis wrote:

On 12/05/2015 12:21 PM, Jonathan Cameron wrote:

On 02/12/15 19:57, Andrew F. Davis wrote:

Add driver for the TI AFE4404 heart rate monitor and pulse oximeter.
This device detects reflected LED light fluctuations and presents an ADC
value to the user space for further signal processing.

Datasheet: http://www.ti.com/product/AFE4404/datasheet

Signed-off-by: Andrew F. Davis 

I like this a lot.  Seems much simpler to me.

Various bits and bobs inline though.  You quite rightly stated there
was too much to describe in the change log so I've kind of started
from scratch on the review as well.

Thanks for your hard work on this.

Jonathan

Reponses inline.

---
   .../ABI/testing/sysfs-bus-iio-health-afe4404   |  20 +
   drivers/iio/Kconfig|   1 +
   drivers/iio/Makefile   |   1 +
   drivers/iio/health/Kconfig |  25 +
   drivers/iio/health/Makefile|   6 +
   drivers/iio/health/afe4404.c   | 619 
+
   drivers/iio/health/afe440x.h   | 168 ++
   7 files changed, 840 insertions(+)
   create mode 100644 Documentation/ABI/testing/sysfs-bus-iio-health-afe4404
   create mode 100644 drivers/iio/health/Kconfig
   create mode 100644 drivers/iio/health/Makefile
   create mode 100644 drivers/iio/health/afe4404.c
   create mode 100644 drivers/iio/health/afe440x.h

diff --git a/Documentation/ABI/testing/sysfs-bus-iio-health-afe4404 
b/Documentation/ABI/testing/sysfs-bus-iio-health-afe4404
new file mode 100644
index 000..c104d66
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-iio-health-afe4404
@@ -0,0 +1,20 @@
+What:/sys/bus/iio/devices/iio:deviceX/tia_resistanceY
+/sys/bus/iio/devices/iio:deviceX/tia_capacitanceY
+Date:December 2015
+KernelVersion:
+Contact:Andrew F. Davis 
+Description:
+Get and set the resistance and the capacitance settings for the
+Transimpedance Amplifier. Y is 1 for Rf1 and Cf1, Y is 2 for
+Rf2 and Cf2 values.
+Resistance setting is from 0 -> 7
+Capcitance setting is from 0 -> 15

No magic numbers if at all possible.  These correspond to real resistances
and capacitances.

You also want an _available attribute for each of them for the different
possible values.

I'm not overly keep on the naming still but it's part specific enough that
if you fix the units I'll let it go ;)


I'm not sure if there is a clean way to do this, these are not channels
so parsing input will need checks based on what kind of register and value
we send in, these are raw input to the registers for other registers, I
changed the name to attempt to make it clear to users these are not
channels but raw registers (out_resistance1_raw -> tia_resistance1).

I could still make this more clear in the ABI doc:

 Valid capacitance settings are 0 -> 7 which correspond to
 5pF, 2.5pF, 10pF, 7.5pF, 20pF, 17.5pF, 25pF, and 22.5pF
 respectively.

Just do a simple mapping in the driver from magic number to real value
and the other way around + provide the available attribute.  Rule 2 of sysfs
interfaces: no magic values where a real meaningful one can be provided for
the cost of a few more lines of code.

You can use the existing floating point to value pair functions from IIO
to get you to a couple of values that can be trivially matched against
a const table.

It's not free, but probably only 20 lines including the static const table
of values.


Took a bit more work than expected but done for the next version.



or something similar.


+
+What:/sys/bus/iio/devices/iio:deviceX/tia_separate_en
+Date:December 2015
+KernelVersion:
+Contact:Andrew F. Davis 
+Description:
+Enable or disable separate settings for the TransImpedance
+Amplifier above, when disabled both values are set by the
+first channel.

Weird and wonderful but fine for a part specific attibute!

As noted below, I think we need to document all the new ABI even though
much of it is just extended names on standard channels.



Works for me, I'll add them back.


diff --git a/drivers/iio/Kconfig b/drivers/iio/Kconfig
index 66792e7..ac085ab 100644
--- a/drivers/iio/Kconfig
+++ b/drivers/iio/Kconfig
@@ -52,6 +52,7 @@ source "drivers/iio/common/Kconfig"
   source "drivers/iio/dac/Kconfig"
   source "drivers/iio/frequency/Kconfig"
   source "drivers/iio/gyro/Kconfig"
+source "drivers/iio/health/Kconfig"
   source "drivers/iio/humidity/Kconfig"
   source "drivers/iio/imu/Kconfig"
   source "drivers/iio/light/Kconfig"
diff --git a/drivers/iio/Makefile b/drivers/iio/Makefile
index aeca726..6c5eb2a 100644
--- a/drivers/iio/Makefile
+++ b/drivers/iio/Makefile
@@ -18,6 +18,7 @@ obj-y += common/
   obj-y += dac/
   obj-y += gyro/
   obj-y += frequency/
+obj-y += health/
   obj-y += humidity/

Re: [PATCH] arm64: add HAVE_LATENCYTOP_SUPPORT config

2015-12-14 Thread Shi, Yang


Hi folks,

I tried to enable latencytop for arm64 and came across this discussion, 
so any plan about when this will get merged into mainline? 4.5 merge window?


Thanks,
Yang


On 11/10/2015 3:34 AM, Heiko Carstens wrote:

From: Will Deacon 
Date: Tue, 10 Nov 2015 11:10:04 +
Subject: [PATCH] Kconfig: remove HAVE_LATENCYTOP_SUPPORT

As illustrated by a3afe70b83fd ("[S390] latencytop s390 support."),
HAVE_LATENCYTOP_SUPPORT is defined by an architecture to advertise an
implementation of save_stack_trace_tsk.

However, as of 9212ddb5eada ("stacktrace: provide save_stack_trace_tsk()
weak alias") a dummy implementation is provided if STACKTRACE=y.
Given that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.

Signed-off-by: Will Deacon 
---
  arch/arc/Kconfig| 3 ---
  arch/arm/Kconfig| 5 -
  arch/metag/Kconfig  | 3 ---
  arch/microblaze/Kconfig | 3 ---
  arch/parisc/Kconfig | 3 ---
  arch/powerpc/Kconfig| 3 ---
  arch/s390/Kconfig   | 3 ---
  arch/sh/Kconfig | 3 ---
  arch/sparc/Kconfig  | 4 
  arch/unicore32/Kconfig  | 3 ---
  arch/x86/Kconfig| 3 ---
  lib/Kconfig.debug   | 1 -
  12 files changed, 37 deletions(-)


Acked-by: Heiko Carstens 


___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] ARM: dts: modify rk3288 jaq backlight-level

2015-12-14 Thread Heiko Stübner

Hi Caesar,

Am Montag, 7. Dezember 2015, 21:11:08 schrieb Caesar Wang:
> the panel which jaq use require the pwm duty cycle larger than 3%,
> when the backlight status from power off to power on, otherwise the
> backlight will flush, so we modify the second brightness-level to 8,
> and when the backlight from power off to power on the pwm duty cycle
> will larger than 3%.
> 
> Signed-off-by: Caesar Wang 

I've merged the two patches and included them in my series for the veyron edp 
devicetree changes [0].


Heiko

[0] 
https://github.com/mmind/linux-rockchip/commit/45e3abed0a86c7ee5cea0b35492b65890b928175

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: multi_v7_defconfig: Enable Rockchip generic power domain.

2015-12-14 Thread Heiko Stübner

Hi Enric,

defconfig changes are supposed to be applied by the armsoc maintainers, so you 
should probably also include a...@kernel.org as real "To"


Am Montag, 14. Dezember 2015, 18:22:15 schrieben Sie:
> cc'ing: Heiko Stuebner (rockchip maintainer)
> 
> 2015-12-14 18:17 GMT+01:00 Enric Balletbo i Serra :
> > In order to meet high performance an low power requirement for Rockchip
> > enable the power domain support.
> > 

I guess this should also mention the drm/kms issue (driver deferring until 
power-domains are available) as further justification.


Otherwise, as I had experienced this myself while testing after we talked 
about it:
Reviewed-by: Heiko Stuebner 

> > Signed-off-by: Enric Balletbo i Serra 
> > ---
> > 
> >  arch/arm/configs/multi_v7_defconfig | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/arm/configs/multi_v7_defconfig
> > b/arch/arm/configs/multi_v7_defconfig index 69a22fd..81314ec 100644
> > --- a/arch/arm/configs/multi_v7_defconfig
> > +++ b/arch/arm/configs/multi_v7_defconfig
> > @@ -655,6 +655,7 @@ CONFIG_QCOM_PM=y
> > 
> >  CONFIG_QCOM_SMD=y
> >  CONFIG_QCOM_SMD_RPM=y
> >  CONFIG_QCOM_SMEM=y
> > 
> > +CONFIG_ROCKCHIP_PM_DOMAINS=y
> > 
> >  CONFIG_COMMON_CLK_QCOM=y
> >  CONFIG_CHROME_PLATFORMS=y
> >  CONFIG_CROS_EC_CHARDEV=m
> > 
> > --
> > 2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix int1 recursion with unregistered breakpoints

2015-12-14 Thread Jeff Merkey

On 12/14/15, H. Peter Anvin  wrote:
> On 12/14/15 13:03, Jeff Merkey wrote:
>> Please consider the attached patch.
>>
>> I have reviewed all the code that touches this patch and have
>> determined it will function and support all of the software that
>> depends on this handler properly.  I have compiled and tested this
>> patch with a test harness that tests the robustness of the linux
>> breakpoint API and handlers in the following ways:
>>
>> 1.  Setting multiple conditional breakpoints through
>> arch_install_hw_breakpoint API across four processors to test the rate
>> at which the interface can handle breakpoint exceptions
>>
>> 2.  Setting unregistered breakpoints to test the handlers robustness
>> in dealing with error handling conditions and errant or spurious
>> hardware conditions and to simulate actual "lazy debug register
>> switching" (which does not work BTW) with null bp handlers to test the
>> robustness of the handlers.
>>
>> 3.  Clearing and setting breakpoints across multiple processors then
>> triggering concurrent exceptions in both interrupt and process
>> contexts.
>>
>> This patch improves robustness in several ways in the linux kernel:
>>
>> 1.  Corrects bug in handling unregistered breakpoints.
>>
>> 2.  Provides hardware check of dr7 to determine source of breakpoint
>> if OS cannot ascertain the int1 source from its own state and
>> variables.
>>
>> 3.  Actually allows "lazy debug register switching" to function, which
>> until recently has apparently never been actually seen on live
>> hardware or actually tested.
>>
>
> This is all fine and good, but you are missing one of the most important
> parts of a patch: a patch description, describing in detail the problem
> that it solves and why.  This description needs to be comprehensible not
> just for people already initiated but for someone doing code archaeology
> a decade from now.
>
> Thanks,
>
>   -hpa
>
>
>

Yes sir, I'll get that added right away.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tty/n_gsm.c: fix false positive WARN_ON and do some codes improvement

2015-12-14 Thread Greg Kroah-Hartman

On Wed, Nov 25, 2015 at 07:18:37PM +0800, xinhui wrote:
> From: xinhui 
> 
> If gsm driver fails to activate one mux, and this mux is not stored in
> gsm_mux[], there would be a warning in gsm_cleanup_mux(). Actually this
> is a legal case. So just do a simple check instead of WARN_ON.
> 
> There is one filed gsm->num to store its index of gsm_mux[]. So use
> gsm->num to remove itself from gsm_mux[] instead of the for-loop
> traverse.
> 
> Reported-by: Dmitry Vyukov 
> Fixes: 5a64096700dc ("tty/n_gsm.c: fix a memory leak in gsmld_open")
> Signed-off-by: Pan Xinhui 

the signed-off-by name has to match your from: name :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix int1 recursion with unregistered breakpoints

2015-12-14 Thread H. Peter Anvin

On 12/14/15 13:03, Jeff Merkey wrote:
> Please consider the attached patch.
> 
> I have reviewed all the code that touches this patch and have
> determined it will function and support all of the software that
> depends on this handler properly.  I have compiled and tested this
> patch with a test harness that tests the robustness of the linux
> breakpoint API and handlers in the following ways:
> 
> 1.  Setting multiple conditional breakpoints through
> arch_install_hw_breakpoint API across four processors to test the rate
> at which the interface can handle breakpoint exceptions
> 
> 2.  Setting unregistered breakpoints to test the handlers robustness
> in dealing with error handling conditions and errant or spurious
> hardware conditions and to simulate actual "lazy debug register
> switching" (which does not work BTW) with null bp handlers to test the
> robustness of the handlers.
> 
> 3.  Clearing and setting breakpoints across multiple processors then
> triggering concurrent exceptions in both interrupt and process
> contexts.
> 
> This patch improves robustness in several ways in the linux kernel:
> 
> 1.  Corrects bug in handling unregistered breakpoints.
> 
> 2.  Provides hardware check of dr7 to determine source of breakpoint
> if OS cannot ascertain the int1 source from its own state and
> variables.
> 
> 3.  Actually allows "lazy debug register switching" to function, which
> until recently has apparently never been actually seen on live
> hardware or actually tested.
> 

This is all fine and good, but you are missing one of the most important
parts of a patch: a patch description, describing in detail the problem
that it solves and why.  This description needs to be comprehensible not
just for people already initiated but for someone doing code archaeology
a decade from now.

Thanks,

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kobject: Ensure child's resources get released before parent's resources

2015-12-14 Thread Greg Kroah-Hartman

On Mon, Dec 14, 2015 at 11:02:46AM -0800, Rajat Jain wrote:
> If the only remaining reference to a parent, is the one taken by
> the child (in kobject_add_internal()), then when the last
> reference to the child goes away, both child and its parents
> shall be released. However, currently the resources of parent
> get released first, followed by the child's resources:
> 
> kobject_cleanup(child)
> 
> kobject_del(child)
> 
> kobject_put(child->parent) -> results in parent's release()
> ...
> child->kobj_type->release() -> Child's release()
> 
> This is a problem because the child's release() method may still
> need to use parent resources or memory for its own cleanup. E.g.
> child may need parent pointer for dma_free_coherent() etc.
> 
> Signed-off-by: Rajat Jain 
> Signed-off-by: Rajat Jain 

Why are you listed twice here?

Where in the kernel is the parent being freed before the child that is
causing this issue to happen?  We should fix that root cause first...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 00/27] Drivers: hv: Miscellaneous fixes.

2015-12-14 Thread Greg KH

On Fri, Dec 11, 2015 at 08:21:24PM -0800, K. Y. Srinivasan wrote:
> Most of the patches in this set are being resent.

Why?  What changed?

Also, your series can't be sorted by subject at all, so I can't apply
them in the correct order (some have RESEND in the subject, some do
not...)

Please resend them so that I can apply them...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [PATCH 1/3] staging: dgnc: Patch includes the checkpatch fixes

2015-12-14 Thread Greg KH

On Sat, Dec 12, 2015 at 02:58:50AM -0500, Sanidhya Solanki wrote:
> >From a1635ea5e75cb2f10728ae4ddf3a21567958e98f Mon Sep 17 00:00:00 2001
> From: Sanidhya Solanki 
> Date: Sat, 12 Dec 2015 02:20:03 -0500
> Subject: [PATCH] [PATCH 1/3] staging: dgnc: Patch includes the checkpatch
>  fixes
> 
> Patch contains the spacing fixes that checkpatch prompted for,
> as asked by the TODO.
> 
> Signed-off-by: Sanidhya Solanki 

Please resend all of these patches properly, without the embedded email
headers so that I have a chance to apply them.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] Staging: dgnc: dgnc_neo.c: usleep_range is preferred over udelay

2015-12-14 Thread Greg KH

On Fri, Nov 13, 2015 at 04:48:10PM +0530, Nizam Haider wrote:
> removed heckpatch warning

heckpatch?  And what warning is that?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix int1 recursion with unregistered breakpoints

2015-12-14 Thread Jeff Merkey

Please consider the attached patch.

I have reviewed all the code that touches this patch and have
determined it will function and support all of the software that
depends on this handler properly.  I have compiled and tested this
patch with a test harness that tests the robustness of the linux
breakpoint API and handlers in the following ways:

1.  Setting multiple conditional breakpoints through
arch_install_hw_breakpoint API across four processors to test the rate
at which the interface can handle breakpoint exceptions

2.  Setting unregistered breakpoints to test the handlers robustness
in dealing with error handling conditions and errant or spurious
hardware conditions and to simulate actual "lazy debug register
switching" (which does not work BTW) with null bp handlers to test the
robustness of the handlers.

3.  Clearing and setting breakpoints across multiple processors then
triggering concurrent exceptions in both interrupt and process
contexts.

This patch improves robustness in several ways in the linux kernel:

1.  Corrects bug in handling unregistered breakpoints.

2.  Provides hardware check of dr7 to determine source of breakpoint
if OS cannot ascertain the int1 source from its own state and
variables.

3.  Actually allows "lazy debug register switching" to function, which
until recently has apparently never been actually seen on live
hardware or actually tested.

Signed-off-by: Jeff Merkey 
---
 arch/x86/kernel/hw_breakpoint.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 50a3fad..ca13db0 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -444,7 +444,7 @@ EXPORT_SYMBOL_GPL(hw_breakpoint_restore);
 static int hw_breakpoint_handler(struct die_args *args)
 {
int i, cpu, rc = NOTIFY_STOP;
-   struct perf_event *bp;
+   struct perf_event *bp = NULL;
unsigned long dr7, dr6;
unsigned long *dr6_p;
 
@@ -475,6 +475,13 @@ static int hw_breakpoint_handler(struct die_args *args)
for (i = 0; i < HBP_NUM; ++i) {
if (likely(!(dr6 & (DR_TRAP0 << i
continue;
+   /*
+   * check if we got an execute breakpoint
+   * from the dr7 register.  if we did, set
+   * the resume flag to avoid int1 recursion.
+   */
+   if ((dr7 & (3 << ((i * 4) + 16))) == 0)
+   args->regs->flags |= X86_EFLAGS_RF;
 
/*
 * The counter may be concurrently released but that can only
@@ -503,7 +510,9 @@ static int hw_breakpoint_handler(struct die_args *args)
 
/*
 * Set up resume flag to avoid breakpoint recursion when
-* returning back to origin.
+* returning back to origin.  Perform the check
+   * twice in case the event handler altered the
+   * system flags.
 */
if (bp->hw.info.type == X86_BREAKPOINT_EXECUTE)
args->regs->flags |= X86_EFLAGS_RF;
@@ -519,6 +528,18 @@ static int hw_breakpoint_handler(struct die_args *args)
(dr6 & (~DR_TRAP_BITS)))
rc = NOTIFY_DONE;
 
+   /*
+   * if we are about to signal to
+   * do_debug() to stop further processing
+   * and we have not ascertained the source
+   * of the breakpoint, log it as spurious.
+   */
+   if (rc == NOTIFY_STOP && !bp) {
+   printk_ratelimited(KERN_INFO
+   "INFO: spurious INT1 exception dr6: 0x%lX dr7: 
0x%lX\n",
+   dr6, dr7);
+   }
+
set_debugreg(dr7, 7);
put_cpu();
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] cpuidle: avoid module usage in non-modular code

2015-12-14 Thread Paul Gortmaker

[Re: [PATCH 0/3] cpuidle: avoid module usage in non-modular code] On 14/12/2015 
(Mon 22:31) Rafael J. Wysocki wrote:

> On Sunday, December 13, 2015 06:57:09 PM Paul Gortmaker wrote:
> > This series of commits is a part of a larger project to ensure
> > people don't reference modular support functions in non-modular
> > code.  Overall there was roughly 5k lines of dead code in the
> > kernel due to this.  So far we've fixed several areas, like tty,
> > x86, net, ... and we continue to work on other areas.

[...]

> 
> If no one objects, I can queue up this series for 4.5 unless you have other
> plans with respect to it.

Please do.

I was hoping to spread as many of these around as possible so I don't
end up with a giant pull request to Linus.  There is code out there
without a clear maintainership path, so eventually I'll have to send
some his way (or via akpm) but the less that end up in that pile, the
better IMHO.

Thanks,
Paul.
--
> 
> Thanks,
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFCv6 PATCH 09/10] sched: deadline: use deadline bandwidth in scale_rt_capacity

2015-12-14 Thread Luca Abeni

On Mon, 14 Dec 2015 17:51:28 +0100
Peter Zijlstra  wrote:

> On Mon, Dec 14, 2015 at 04:56:17PM +0100, Vincent Guittot wrote:
> > I agree that if the WCET is far from reality, we will underestimate
> > available capacity for CFS. Have you got some use case in mind which
> > overestimates the WCET ?
> 
> Pretty much any 'correct' WCET is pessimistic. There's heaps of smart
> people working on improving WCET bounds, but they're still out there.
> This is mostly because of the .1% tail cases that 'never' happen
> but would make your tokamak burn a hole just when you're outside.
As I mentioned in a previous email, you do not even need to consider
these extreme cases... If a task has a highly variable execution time
(I always mention video players and compressed video processing, but
collegues working on computer vision told me that some video tracking
algorithms have similar characteristics) you might want to allocate the
runtime based on the maximum execution time (or a time near to the
maximum)... But the task will consume less than that a lot of times.


> > If we can't rely on this parameters to evaluate the amount of
> > capacity used by deadline scheduler on a core, this will imply that
> > we can't also use it for requesting capacity to cpufreq and we
> > should fallback on a monitoring mechanism which reacts to a change
> > instead of anticipating it.
> 
> No, since the WCET can and _will_ happen, its the best you can do with
> cpufreq. If you were to set it lower you could not be able to execute
> correctly in your 'never' tail cases.
> 
> There 'might' be smart pants ways around this, where you run part of
> the execution at lower speed and switch to a higher speed to 'catch'
> up if you exceed some boundary, such that, on average, you run at the
> same speed the WCET mandates, but I'm not sure that's worth it.
> Juri/Luca might know.
Some previous works (see for example
https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc00.pdf
) investigated the usage of the "active utilisation" for switching the
CPU frequency. This "active utilisation tracking" mechanism is the same
I mentioned in the previous email, and implemented here:
https://github.com/lucabe72/linux-reclaiming/commit/49fc786a1c453148625f064fa38ea538470df55b
 .

I suspect the "inactive timer" I used to decrease the utilisation at
the so called 0-lag time might be problematic, but I did not find any
way to implement (or approximate) the active utilisation tracking
without this timer... Anyway, if there is interest I am willing to
adapt/rework/modify my patches as needed.


Luca
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: powernv: Redesign the presentation of throttle notification

2015-12-14 Thread Paul Clarke


On 12/13/2015 12:17 PM, Shilpasri G Bhat wrote:

Replace the throttling event console messages to perf trace event
"power:powernv_throttle" and throttle counter stats which are
exported in sysfs. The newly added sysfs files are as follows:

1)/sys/devices/system/node/node0/throttle_frequencies
   This gives the throttle stats for each of the available frequencies.
   The throttle stat of a frequency is the total number of times the max
   frequency was reduced to that frequency.
   # cat /sys/devices/system/node/node0/throttle_frequencies
   4023000 0
   399 0
   3956000 1
   3923000 0
   389 0
   3857000 2
   3823000 0
   379 0
   3757000 2
   3724000 1
   369 1
   ...


Is this data useful?  It seems like "elapsed time" at each frequency might be 
more useful, if any.



2)/sys/devices/system/node/node0/throttle_reasons
   This gives the stats for each of the supported throttle reasons.
   This gives the total number of times the frequency was throttled due
   to each of the reasons.
   # cat /sys/devices/system/node/node0/throttle_reasons
   No throttling 7
   Power Cap 0
   Processor Over Temperature 7
   Power Supply Failure 0
   Over Current 0
   OCC Reset 0

3)/sys/devices/system/node/node0/throttle_stat
   This gives the total number of throttle events occurred in turbo
   range of frequencies and non-turbo(below nominal) range of
   frequencies.


non-turbo should read "at or below nominal".  Maybe "sub-turbo" is a better 
term(?)



   # cat /sys/devices/system/node/node0/throttle_stat
   Turbo 7
   Nominal 0


Should this read "Non-turbo" or "Sub-turbo" instead of "Nominal", since the 
events could well occur when already operating below nominal.



Signed-off-by: Shilpasri G Bhat 
---
  drivers/cpufreq/powernv-cpufreq.c | 186 +-
  include/trace/events/power.h  |  22 +
  2 files changed, 166 insertions(+), 42 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index cb50138..bdde9d6 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -28,6 +28,9 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 

  #include 
  #include 
@@ -43,12 +46,27 @@
  static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
  static bool rebooting, throttled, occ_reset;

+static char throttle_reason[][30] = {
+   "No throttling",
+   "Power Cap",
+   "Processor Over Temperature",
+   "Power Supply Failure",
+   "Over Current",
+   "OCC Reset"
+};


I'm curious if this would be slightly more efficiently implemented as:
static const char *throttle_reason[] = { ... };

Do you need 30 characters per string for a reason?

Regardless, it should be const.

[...]
--
PC

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Season Greetings

2015-12-14 Thread Edpsych





From: Edpsych
Sent: 14 December 2015 19:21
To: Edpsych
Subject: Season Greetings

You Have Been Picked , Email:(mr_sgloria_macken...@email.com) For More Info

















































































































































































--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 4.4-rc5: ugly warn on: 5 W+X pages found

2015-12-14 Thread Arjan van de Ven




That's weird.  The only API to do that seems to be manually setting
kmap_prot to _PAGE_KERNEL_EXEC, and nothing does that.  (Why is
kmap_prot a variable on x86 at all?  It has exactly one writer, and
that's the code that initializes it in the first place.  Shouldn't we
#define kmap_prot _PAGE_KERNEL?


iirc it changes based on runtime detection of NX capability

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: KASAN: use-after-free in xfs_iflush_cluster+0x9d7/0xaf0

2015-12-14 Thread Dave Chinner

On Mon, Dec 14, 2015 at 09:15:26PM +0100, Andrea Gelmini wrote:
> On Tue, Dec 15, 2015 at 06:54:22AM +1100, Dave Chinner wrote:
> > What line of code does this address correspond to in your kernel?
> > 
> > xfs_iflush_cluster+0x9d7
> 
> gelma@glen:~/dev/kernel/v4.4.x$ git grep -Iin xfs_iflush_cluster
> fs/xfs/xfs_inode.c:3179:xfs_iflush_cluster(
> fs/xfs/xfs_inode.c:3414:  error = xfs_iflush_cluster(ip, bp);

If that was what I needed, I wouldn't have needed to ask. :/

I need the translation of the memory address to line number, not the
line number of function call. This requires translation from your
built kernel object file. e.g. on a kernel I just built:

$ gdb vmlinux

(gdb) l *(xfs_iflush_cluster+0x9d7)
0x814df647 is in xfs_bulkstat_one_int (fs/xfs/xfs_itable.c:110).
105 buf->bs_dmevmask = dic->di_dmevmask;
106 buf->bs_dmstate = dic->di_dmstate;
107 buf->bs_aextents = dic->di_anextents;
108 buf->bs_forkoff = XFS_IFORK_BOFF(ip);
109
110 switch (dic->di_format) {
111 case XFS_DINODE_FMT_DEV:
112 buf->bs_rdev = ip->i_df.if_u2.if_rdev;
113 buf->bs_blksize = BLKDEV_IOSIZE;
114 buf->bs_blocks = 0;

That's clearly not code in xfs_iflush_cluster() or any function that
xfs_iflush_cluster() calls. Indeed, xfs_iflush_cluster() is only
0x411 bytes long on that kernel, so there's more than 2x the amount
of code in that function in your instrumented kernel than mine.

Hence I need the address-to-line number translation from your kernel
to tell me what line of code is being tripped over.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] wan: wanxl: add pci_disable_device in case of error

2015-12-14 Thread David Miller

From: Saurabh Sengar 
Date: Sat, 12 Dec 2015 00:58:19 +0530

> If there is 'no suitable DMA available' error, device should be disabled
> before returning
> 
> Signed-off-by: Saurabh Sengar 

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/8 v4] thermal: rcar: retern error rcar_thermal_get_temp() if no ctemp update

2015-12-14 Thread Eduardo Valentin

On Tue, Dec 08, 2015 at 05:28:13AM +, Kuninori Morimoto wrote:
> From: Kuninori Morimoto 
> 
> Current rcar_thermal_get_temp() returns latest temperature, but it might
> not be updated if some HW issue happened. This means user might get
> wrong temperature. This patch solved this issue.
> 
> Signed-off-by: Kuninori Morimoto 
> ---
> v3 -> v4
> 
>  - "happend" -> "happened"
> 
>  drivers/thermal/rcar_thermal.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/thermal/rcar_thermal.c b/drivers/thermal/rcar_thermal.c
> index 67b5216..40c3ba5 100644
> --- a/drivers/thermal/rcar_thermal.c
> +++ b/drivers/thermal/rcar_thermal.c
> @@ -199,9 +199,9 @@ static int rcar_thermal_update_temp(struct 
> rcar_thermal_priv *priv)
>  
>   dev_dbg(dev, "thermal%d  %d -> %d\n", priv->id, priv->ctemp, ctemp);
>  
> - priv->ctemp = ctemp;
>   ret = 0;
>  err_out_unlock:
> + priv->ctemp = ctemp;
>   mutex_unlock(>lock);
>   return ret;

I believe the problem here is actually the lack of error
handling/propagation. Are you sure you want to write to parameter
in the fail path ?

rcar_thermal_update_temp already returns error code when it fails
to read temperature. Don't you think it would make more sense to fix the
places that call rcar_thermal_update_temp to properly handle its return
value and propagate that error code when necessary?

BR,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] gpio: pxa: integrate with pincontrol

2015-12-14 Thread Robert Jarzmik

Linus Walleij  writes:

> On Thu, Dec 10, 2015 at 6:31 PM, Robert Jarzmik  
> wrote:
>> Linus Walleij  writes:
>>
- the GPDR (gpio direction register) shared access bothers me a bit
>>>
>>> How is it shared and between what users?
>>
>> It's shared between the pin controller and the gpio controller.
>
> OK then it may be one of these cases where we should jit the pin controller
> and the GPIO controller together in the same file (under drivers/pinctrl)
> to simplify the mess. We can do that in the NEXT merge window because
> right now I don't want any more crisscross between gpio and pin control
> as there are refactorings I'm piling up.

Well, maybe, but I don't know how to do it, due to the number of possibilities,
given that :
 - gpio-pxa.c should work for pxa27x device-tree
   (this would be possible with a pinctrl+gpio fusion)
 - gpio-pxa.c should work for pxa3xx device-tree
   (I don't know how to do this, pxa3xx uses pinctrl-single AFAIK)
 - gpio-pxa.c should work for pxa168 + mmp* device-tree
   (same as for pxa3xx)
 - gpio-pxa.c might should with pxa27x platform_data
   (this doesn't work yet fully, the wake-up pin give me headaches)
 - gpio-pxa.c should work with pxa25x platform_data
   (I don't see either how a merged pinctrl+gpio driver could address this)

All of this to say it looks a bit complicated to me to have a gpio+pinctrl in
the same file, but I might be missing something obvious.
 
> Another option is e.g. accessing the registers through regmap-mmio but
> it feels a bit like overkill for this...
Yes, overkill maybe, but maybe an idea. The nice thing about regmap is the debug
capabilities, the less good is the permanent need of locks ...

>> The odd thing with the pxa architecture is that the GPDR bit selects between 
>> 2
>> different alternate functions, even when the pin is not a GPIO. Strange 
>> design,
>> isn't it ?
>
> Probably just unfortunate naming.
>
> In my presentation "building GPIO and pin control from the ground up" I
> try to explain a bit how hardware engineers design these things...
> http://dflund.se/~triad/papers/pincontrol.pdf
Nice, I had already read it :)

>> As a consequence, both the gpio driver and pinctrl have to modify it, for
>> different purposes :
>>  - pinctrl will modify it to select a specific alternate function
>>  - gpio driver will modify it when the pin is a GPIO, to modify its 
>> direction.
>
> OK. Solutions per above, I guess it currently just optimistically hope
> we do not fiddle the same bit in parallell from the two drivers (which
> is maybe even possible to prove to be true).
Ok. I will think about it again in the next days.

Cheers.

-- 
Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFCv6 PATCH 09/10] sched: deadline: use deadline bandwidth in scale_rt_capacity

2015-12-14 Thread Luca Abeni

On Mon, 14 Dec 2015 16:07:59 +
Juri Lelli  wrote:
[...]
> > I agree that if the WCET is far from reality, we will underestimate
> > available capacity for CFS. Have you got some use case in mind which
> > overestimates the WCET ?
> 
> I guess simply the fact that one task can be admitted to the system,
> but then in practice sleep, waiting from some event to happen.
My favourite example (since 1998 :) is a video player (but every task
processing compressed video should work as an example): there is a
noticeable difference between the time needed to process large I frames
with a lot of movement (that is about the WCET) and the time needed to
process small B frames with not much movement. And if we want to avoid
too much jitter in the video playback we have to allocate the runtime
based on the maximum time needed to process a video frame.


> > If we can't rely on this parameters to evaluate the amount of
> > capacity used by deadline scheduler on a core, this will imply that
> > we can't also use it for requesting capacity to cpufreq and we
> > should fallback on a monitoring mechanism which reacts to a change
> > instead of anticipating it.
> > 
> 
> There is at least one way in the middle: use utilization of active
> servers (as I think Luca was already mentioning). This solution should
> remove some of the pessimism, but still be safe for our needs.
If you track the active utilisation as done by the GRUB algorithm
( http://retis.sssup.it/~lipari/papers/lipariBaruah2000.pdf ) and by my
patches, you can remove _all_ the pessimism :)


Luca
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] cpuidle: avoid module usage in non-modular code

2015-12-14 Thread Daniel Lezcano


On 12/14/2015 12:57 AM, Paul Gortmaker wrote:

This series of commits is a part of a larger project to ensure
people don't reference modular support functions in non-modular
code.  Overall there was roughly 5k lines of dead code in the
kernel due to this.  So far we've fixed several areas, like tty,
x86, net, ... and we continue to work on other areas.

There are several reasons to not use module support for code that
can never be built as a module, but the big ones are:

  (1) it is easy to accidentally code up unused module_exit and remove code
  (2) it can be misleading when reading the source, thinking it can be
   modular when the Makefile and/or Kconfig prohibit it
  (3) it requires the include of the module.h header file which in turn
  includes nearly everything else.

Fortunately for cpuidle, the changes are largely trivial and change
zero runtime.  All the changes here just remap the modular functions
onto the non-modular ones that they would be remapped onto anyway.

Changes are against linux-next and compile tested on ARM allmodconfig.
I've Cc'd ARM list because all of these are used on ARM, but I'm
thinking these probably can go in via the PM tree.


Acked-by: Daniel Lezcano 


--
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 02/19] test_bitmap: unit tests for lib/bitmap.c

2015-12-14 Thread David Decotigny

From: David Decotigny 

This is mainly testing bitmap construction and conversion to/from u32[]
for now.

Tested:
  qemu i386, x86_64, ppc, ppc64 BE and LE, ARM.

Signed-off-by: David Decotigny 
---
 lib/Kconfig.debug |   8 +
 lib/Makefile  |   1 +
 lib/test_bitmap.c | 343 ++
 tools/testing/selftests/lib/Makefile  |   2 +-
 tools/testing/selftests/lib/bitmap.sh |  10 +
 5 files changed, 363 insertions(+), 1 deletion(-)
 create mode 100644 lib/test_bitmap.c
 create mode 100644 tools/testing/selftests/lib/bitmap.sh

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0d76ecc..3d25bdf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1721,6 +1721,14 @@ config TEST_KSTRTOX
 config TEST_PRINTF
tristate "Test printf() family of functions at runtime"
 
+config TEST_BITMAP
+   tristate "Test bitmap_*() family of functions at runtime"
+   default n
+   help
+ Enable this option to test the bitmap functions at boot.
+
+ If unsure, say N.
+
 config TEST_RHASHTABLE
tristate "Perform selftest on resizable hash table"
default n
diff --git a/lib/Makefile b/lib/Makefile
index 180dd4d..ba6b7fe 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o
 obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o
 obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
 obj-$(CONFIG_TEST_PRINTF) += test_printf.o
+obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
new file mode 100644
index 000..33e572a
--- /dev/null
+++ b/lib/test_bitmap.c
@@ -0,0 +1,343 @@
+/*
+ * Test cases for printf facility.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static unsigned total_tests __initdata;
+static unsigned failed_tests __initdata;
+
+static char pbl_buffer[PAGE_SIZE] __initdata;
+
+
+static bool __init
+__check_eq_bitmap(const unsigned long *exp_bmap, unsigned int exp_nbits,
+ const unsigned long *bmap, unsigned int nbits)
+{
+   if (exp_nbits != nbits) {
+   pr_warn("bitmap length mismatch: expected %u, got %u\n",
+   exp_nbits, nbits);
+   return false;
+   }
+
+   if (!bitmap_equal(exp_bmap, bmap, nbits)) {
+   pr_warn("bitmaps contents differ: expected \"%*pbl\", got 
\"%*pbl\"\n",
+   exp_nbits, exp_bmap, nbits, bmap);
+   return false;
+   }
+   return true;
+}
+
+static int __init
+expect_eq_bitmap(const unsigned long *exp_bmap, unsigned int exp_nbits,
+  const unsigned long *bmap, unsigned int nbits)
+{
+   total_tests++;
+   if (!__check_eq_bitmap(exp_bmap, exp_nbits, bmap, nbits)) {
+   failed_tests++;
+   return 1;
+   }
+   return 0;
+}
+
+static bool __init
+__check_eq_pbl(const char *expected_pbl,
+  const unsigned long *bitmap, unsigned int nbits)
+{
+   snprintf(pbl_buffer, sizeof(pbl_buffer), "%*pbl", nbits, bitmap);
+   if (strcmp(expected_pbl, pbl_buffer)) {
+   pr_warn("expected \"%s\", got \"%s\"\n",
+   expected_pbl, pbl_buffer);
+   return false;
+   }
+   return true;
+}
+
+static int __init
+expect_eq_pbl(const char *expected_pbl,
+   const unsigned long *bitmap, unsigned int nbits)
+{
+   total_tests++;
+   if (!__check_eq_pbl(expected_pbl, bitmap, nbits)) {
+   failed_tests++;
+   return 1;
+   }
+   return 0;
+}
+
+static bool __init
+__check_eq_u32_array(const u32 *exp_arr, unsigned int exp_len,
+const u32 *arr, unsigned int len)
+{
+   if (exp_len != len) {
+   pr_warn("array length differ: expected %u, got %u\n",
+   exp_len, len);
+   return false;
+   }
+
+   if (memcmp(exp_arr, arr, len*sizeof(*arr))) {
+   pr_warn("array contents differ\n");
+   print_hex_dump(KERN_WARNING, "  exp:  ", DUMP_PREFIX_OFFSET,
+  32, 4, exp_arr, exp_len*sizeof(*exp_arr), false);
+   print_hex_dump(KERN_WARNING, "  got:  ", DUMP_PREFIX_OFFSET,
+  32, 4, arr, len*sizeof(*arr), false);
+   return false;
+   }
+
+   return true;
+}
+
+static int __init
+expect_eq_u32_array(const u32 *exp_arr, unsigned int exp_len,
+ const u32 *arr, unsigned int len)
+{
+   total_tests++;
+   if (!__check_eq_u32_array(exp_arr, exp_len, arr, len)) {
+   failed_tests++;
+   return 1;
+   }
+   return 0;
+}
+
+static void __init test_zero_fill_copy(void)
+{
+   DECLARE_BITMAP(bmap1, 1024);
+

[PATCH net-next v5 03/19] net: usnic: remove unused call to ethtool_ops::get_settings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c 
b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index f8e3211..5b60579 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -269,7 +269,6 @@ int usnic_ib_query_device(struct ib_device *ibdev,
struct usnic_ib_dev *us_ibdev = to_usdev(ibdev);
union ib_gid gid;
struct ethtool_drvinfo info;
-   struct ethtool_cmd cmd;
int qp_per_vf;
 
usnic_dbg("\n");
@@ -278,7 +277,6 @@ int usnic_ib_query_device(struct ib_device *ibdev,
 
mutex_lock(_ibdev->usdev_lock);
us_ibdev->netdev->ethtool_ops->get_drvinfo(us_ibdev->netdev, );
-   us_ibdev->netdev->ethtool_ops->get_settings(us_ibdev->netdev, );
memset(props, 0, sizeof(*props));
usnic_mac_ip_to_gid(us_ibdev->ufdev->mac, us_ibdev->ufdev->inaddr,
[0]);
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFCv6 PATCH 09/10] sched: deadline: use deadline bandwidth in scale_rt_capacity

2015-12-14 Thread Luca Abeni

On Mon, 14 Dec 2015 16:56:17 +0100
Vincent Guittot  wrote:
[...]
> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> index 08858d1..e44c6be 100644
> >> --- a/kernel/sched/sched.h
> >> +++ b/kernel/sched/sched.h
> >> @@ -519,6 +519,8 @@ struct dl_rq {
> >>  #else
> >>   struct dl_bw dl_bw;
> >>  #endif
> >> + /* This is the "average utilization" for this runqueue */
> >> + s64 avg_bw;
> >>  };
> >
> > So I don't think this is right. AFAICT this projects the WCET as the
> > amount of time actually used by DL. This will, under many
> > circumstances, vastly overestimate the amount of time actually
> > spend on it. Therefore unduly pessimisme the fair capacity of this
> > CPU.
> 
> I agree that if the WCET is far from reality, we will underestimate
> available capacity for CFS. Have you got some use case in mind which
> overestimates the WCET ?
> If we can't rely on this parameters to evaluate the amount of capacity
> used by deadline scheduler on a core, this will imply that we can't
> also use it for requesting capacity to cpufreq and we should fallback
> on a monitoring mechanism which reacts to a change instead of
> anticipating it.
I think a more "theoretically sound" approach would be to track the
_active_ utilisation (informally speaking, the sum of the utilisations
of the tasks that are actually active on a core - the exact definition
of "active" is the trick here).
As done, for example, here:
https://github.com/lucabe72/linux-reclaiming/tree/track-utilisation-v2
(in particular, see
https://github.com/lucabe72/linux-reclaiming/commit/49fc786a1c453148625f064fa38ea538470df55b
)
I understand this approach might look too complex... But I think it is
much less pessimistic while still being "safe".
If there is something that I can do to make that code more acceptable,
let me know.


Luca
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net] stmmac: dwmac-sunxi: Call exit cleanup function in probe error path

2015-12-14 Thread David Miller

From: Chen-Yu Tsai 
Date: Fri, 11 Dec 2015 18:03:49 +0800

> dwmac-sunxi has 2 callbacks that were called from stmmac_platform as
> part of the probe and remove sequences.
> 
> Ater the conversion of dwmac-sunxi into a standalone platform driver,
> the .init function is called before calling into the stmmac driver
> core, but .exit is not called to clean up if stmmac returns an error.
> 
> This patch fixes the probe error path. This properly cleans up and
> releases resources when the driver core fails to probe.
> 
> Cc: Joachim Eastwood 
> Fixes: 9a9e9a1edee8 ("stmmac: dwmac-sunxi: turn setup callback into a
> probe function")
> Signed-off-by: Chen-Yu Tsai 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu

2015-12-14 Thread Sander Eikelenboom


On 2015-12-14 20:48, Eric Shelton wrote:

Please note that the same issue appears to have been introduced in the
recent 4.2.7 kernel.  It perhaps has to do
with b4ff8389ed14b849354b59ce9b360bdefcdbf99c having a matching
commit e8d097151d309eb71f750bbf34e6a7ef6256da7e in linux-stable.git.  
The

below patch to arch/x86/kernel/rtc.c was also effective for 4.2.7.

Eric


Hi Eric,

Yeah it's unfortunate the patch patching the other patches destined for 
stable didn't make it in time for stable :(.
Any how the chosen solution wasn't ideal so there now is a V2 patch by 
Boris. It hasn't been picked up yet,
but hopefully will be anytime soon (for the patch see 
http://lkml.iu.edu/hypermail/linux/kernel/1512.1/03504.html)


--
Sander


On 2015-12-02 18:30, Sander Eikelenboom wrote:

On 2015-12-02 15:55, David Vrabel wrote:
> On 28/11/15 15:47, Sander Eikelenboom wrote:
>> genirq: Flags mismatch irq 8.  (hvc_console) vs. 
>> (rtc0)
>
> We shouldn't register an rtc_cmos device because its legacy irq
> conflicts with the irq needed for hvc0.  For a multi VCPU guest irq 8
> is
> in use for the pv spinlocks and this gets requested first, preventing
> the rtc device from probing.
>
> Does this patch fix it for you?
>
> David

It does, thanks.

Reported-and-tested-by: Sander Eikelenboom 

--
Sander

> 8<
> x86: rtc_cmos platform device requires legacy irqs
>
> Adding the rtc platform device when there are no legacy irqs (no
> legacy PIC) causes a conflict with other devices that end up using the
> same irq number.
>
> In a single VCPU PV guest we should have:
>
> /proc/interrupts:
>CPU0
>   0:   4934  xen-percpu-virq  timer0
>   1:  0  xen-percpu-ipi   spinlock0
>   2:  0  xen-percpu-ipi   resched0
>   3:  0  xen-percpu-ipi   callfunc0
>   4:  0  xen-percpu-virq  debug0
>   5:  0  xen-percpu-ipi   callfuncsingle0
>   6:  0  xen-percpu-ipi   irqwork0
>   7:321   xen-dyn-event xenbus
>   8: 90   xen-dyn-event hvc_console
>   ...
>
> But hvc_console cannot get its interrupt because it is already in use
> by rtc0 and the console does not work.
>
>   genirq: Flags mismatch irq 8.  (hvc_console) vs. 
> (rtc0)
>
> The rtc_cmos device requires a particular legacy irq so don't add it
> if there are no legacy irqs.
>
> Signed-off-by: David Vrabel 
> ---
>  arch/x86/kernel/rtc.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
> index cd96852..07c70f1 100644
> --- a/arch/x86/kernel/rtc.c
> +++ b/arch/x86/kernel/rtc.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #ifdef CONFIG_X86_32
>  /*
> @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void)
>   }
>  #endif
>
> + /* RTC uses legacy IRQs. */
> + if (!nr_legacy_irqs())
> + return -ENODEV;
> +
>   platform_device_register(_device);
>   dev_info(_device.dev,
>"registered platform RTC device (no PNP device

found)\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched/wait: Fix the signal handling fix

2015-12-14 Thread Peter Zijlstra

On Mon, Dec 14, 2015 at 07:50:04PM +0100, Oleg Nesterov wrote:
> > +   ret = (*action)(>key, mode);
> 
> And every action() should check signal_pending_state()...
> 
> So why we can't change __wait_on_bit/etc instead and remove all the signal-
> pending checks from the callbacks? It seems that we can just check
> signal_pending_state() before prepare_to_wait(). Or perhaps we can add
> another helper which acts like prepare_to_wait_event().
> 
> Yes, some callers want -EINTR, some -ERESTARTSYS, but this shouldn't be a
> problem.
> 
> And sorry if this was already discussed, another case when I am trying to
> return to lkml with a lot of unread emails.

Yes that looks like a viable cleanup. But at least now we have a base
that's working for everyone.

I'll try and do some patches tomorrow.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 05/19] net: ethtool: add new ETHTOOL_GSETTINGS/SSETTINGS API

2015-12-14 Thread David Decotigny

From: David Decotigny 

This patch defines a new ETHTOOL_GSETTINGS/SSETTINGS API, handled by
the new get_ksettings/set_ksettings callbacks. This API provides
support for most legacy ethtool_cmd fields, adds support for larger
link mode masks (up to 4064 bits, variable length), and removes
ethtool_cmd deprecated fields (transceiver/maxrxpkt/maxtxpkt).

This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
the following backward compatibility properties:
 - legacy ethtool with legacy drivers: no change, still using the
   get_settings/set_settings callbacks.
 - legacy ethtool with new get/set_ksettings drivers: the new driver
   callbacks are used, data internally converted to legacy
   ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
   mode mask. ETHTOOL_SSET will fail if user tries to set the
   ethtool_cmd deprecated fields to non-0
   (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
   driver sets higher bits.
 - future ethtool with legacy drivers: no change, still using the
   get_settings/set_settings callbacks, internally converted to new data
   structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
   ignored and seen as 0 from user space. Note that that "future"
   ethtool tool will not allow changes to these deprecated fields.
 - future ethtool with new drivers: direct call to the new callbacks.

By "future" ethtool, what is meant is:
 - query: first try ETHTOOL_GSETTINGS, and revert to ETHTOOL_GSET if fails
 - set: query first and remember which of ETHTOOL_GSETTINGS or
   ETHTOOL_GSET was successful
   - if ETHTOOL_GSETTINGS was successful, then change config with
 ETHTOOL_SSETTINGS. A failure there is final (do not try ETHTOOL_SSET).
   - otherwise ETHTOOL_GSET was successful, change config with
 ETHTOOL_SSET. A failure there is final (do not try ETHTOOL_SSETTINGS).

The interaction user/kernel via the new API requires a small
ETHTOOL_GSETTINGS handshake first to agree on the length of the link
mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
length it is expecting from user as a negative length (and cmd field
is 0). When kernel and user agree, kernel returns valid info in all
fields (ie. link mode length > 0 and cmd is ETHTOOL_GSETTINGS).

Data structure crossing user/kernel boundary is 32/64-bit
agnostic. Converted internally to a legal kernel bitmap.

The internal __ethtool_get_settings kernel helper will gradually be
replaced by __ethtool_get_ksettings by the time the first ksettings
drivers start to appear. So this patch doesn't change it, it will be
removed before it needs to be changed.

Signed-off-by: David Decotigny 
---
 include/linux/ethtool.h  |  88 +-
 include/uapi/linux/ethtool.h | 320 +++---
 net/core/ethtool.c   | 407 ++-
 3 files changed, 735 insertions(+), 80 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 653dc9c..6077cbb 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -12,6 +12,7 @@
 #ifndef _LINUX_ETHTOOL_H
 #define _LINUX_ETHTOOL_H
 
+#include 
 #include 
 #include 
 
@@ -40,9 +41,6 @@ struct compat_ethtool_rxnfc {
 
 #include 
 
-extern int __ethtool_get_settings(struct net_device *dev,
- struct ethtool_cmd *cmd);
-
 /**
  * enum ethtool_phys_id_state - indicator state for physical identification
  * @ETHTOOL_ID_INACTIVE: Physical ID indicator should be deactivated
@@ -97,13 +95,72 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 
n_rx_rings)
return index % n_rx_rings;
 }
 
+/* number of link mode bits/ulongs handled internally by kernel */
+#define __ETHTOOL_LINK_MODE_MASK_NBITS \
+   (__ETHTOOL_LINK_MODE_LAST + 1)
+
+/* declare a link mode bitmap */
+#define __ETHTOOL_DECLARE_LINK_MODE_MASK(name) \
+   DECLARE_BITMAP(name, __ETHTOOL_LINK_MODE_MASK_NBITS)
+
+/* drivers must ignore parent.cmd and parent.link_mode_masks_nwords
+ * fields, but they are allowed to overwrite them (will be ignored).
+ */
+struct ethtool_ksettings {
+   struct ethtool_settings parent;
+   struct {
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(advertising);
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising);
+   } link_modes;
+};
+
+/**
+ * ethtool_ksettings_zero_link_mode - clear ksettings link mode mask
+ *   @ptr : pointer to struct ethtool_ksettings
+ *   @name : one of supported/advertising/lp_advertising
+ */
+#define ethtool_ksettings_zero_link_mode(ptr, name)\
+   bitmap_zero((ptr)->link_modes.name, __ETHTOOL_LINK_MODE_MASK_NBITS)
+
+/**
+ * ethtool_ksettings_add_link_mode - set bit in ksettings link mode mask
+ *   @ptr : pointer to struct ethtool_ksettings
+ *   @name : one of supported/advertising/lp_advertising
+ *   @mode : one of the

Re: [RFC] mm: change find_vma() function

2015-12-14 Thread Kirill A. Shutemov

On Mon, Dec 14, 2015 at 06:55:09PM +0100, Oleg Nesterov wrote:
> On 12/14, Kirill A. Shutemov wrote:
> >
> > On Mon, Dec 14, 2015 at 07:02:25PM +0800, yalin wang wrote:
> > > change find_vma() to break ealier when found the adderss
> > > is not in any vma, don't need loop to search all vma.
> > >
> > > Signed-off-by: yalin wang 
> > > ---
> > >  mm/mmap.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index b513f20..8294c9b 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -2064,6 +2064,9 @@ struct vm_area_struct *find_vma(struct mm_struct 
> > > *mm, unsigned long addr)
> > >   vma = tmp;
> > >   if (tmp->vm_start <= addr)
> > >   break;
> > > + if (!tmp->vm_prev || tmp->vm_prev->vm_end <= addr)
> > > + break;
> > > +
> >
> > This 'break' would return 'tmp' as found vma.
> 
> But this would be right?

Hm. Right. Sorry for my tone.

I think the right condition is 'tmp->vm_prev->vm_end < addr', not '<=' as
vm_end is the first byte after the vma. But it's equivalent in practice
here.

Anyway, I don't think it's possible to gain anything measurable from this
optimization.

> 
> Not that I think this optimization makes sense, I simply do not know,
> but to me this change looks technically correct at first glance...
> 
> But the changelog is wrong or I missed something. This change can stop
> the main loop earlier; if "tmp" is the first vma,

For the first vma, we don't get anything comparing to what we have now:
check for !rb_node on the next iteration would have the same trade off and
effect as the proposed check.

> or if the previous one is below the address.

Yes, but would it compensate additional check on each 'tmp->vm_end > addr'
iteration to the point? That's not obvious.

> Or perhaps I just misread that "not in any vma" note in the changelog.
> 
> No?
> 
> Oleg.
> 

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/8 v4] thermal: of-thermal: of_thermal_set_trip_temp() call thermal_zone_device_update()

2015-12-14 Thread Eduardo Valentin


Hey!

On Tue, Dec 08, 2015 at 05:30:00AM +, Kuninori Morimoto wrote:
> From: Kuninori Morimoto 
> 
> of_thermal_set_trip_temp() updates trip temperature. It should call
> thermal_zone_device_update() immediately.
> 
> Signed-off-by: Kuninori Morimoto 
> ---
> v3 -> v4
> 
>  - no change
> 
>  drivers/thermal/of-thermal.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index 42b7d42..a1dd7b1 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -334,6 +334,8 @@ static int of_thermal_set_trip_temp(struct 
> thermal_zone_device *tz, int trip,
>   /* thermal framework should take care of data->mask & (1 << trip) */
>   data->trips[trip].temperature = temp;
>  
> + thermal_zone_device_update(tz);
> +

Although I understand the need for this, I would prefer you move this change
to thermal_core.c. The reasoning is to keep the same behavior for
thermal zones created over of thermal and regular thermal zones.

BR,

>   return 0;
>  }
>  
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 06/19] tx4939: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 arch/mips/txx9/generic/setup_tx4939.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/mips/txx9/generic/setup_tx4939.c 
b/arch/mips/txx9/generic/setup_tx4939.c
index e3733cd..4a3ebf6 100644
--- a/arch/mips/txx9/generic/setup_tx4939.c
+++ b/arch/mips/txx9/generic/setup_tx4939.c
@@ -320,11 +320,12 @@ void __init tx4939_sio_init(unsigned int sclk, unsigned 
int cts_mask)
 #if IS_ENABLED(CONFIG_TC35815)
 static u32 tx4939_get_eth_speed(struct net_device *dev)
 {
-   struct ethtool_cmd cmd;
-   if (__ethtool_get_settings(dev, ))
+   struct ethtool_ksettings cmd;
+
+   if (__ethtool_get_ksettings(dev, ))
return 100; /* default 100Mbps */
 
-   return ethtool_cmd_speed();
+   return cmd.parent.speed;
 }
 
 static int tx4939_netdev_event(struct notifier_block *this,
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 07/19] net: usnic: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c 
b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index e082170..e0d12d4 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -324,12 +324,12 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
struct ib_port_attr *props)
 {
struct usnic_ib_dev *us_ibdev = to_usdev(ibdev);
-   struct ethtool_cmd cmd;
+   struct ethtool_ksettings cmd;
 
usnic_dbg("\n");
 
mutex_lock(_ibdev->usdev_lock);
-   __ethtool_get_settings(us_ibdev->netdev, );
+   __ethtool_get_ksettings(us_ibdev->netdev, );
memset(props, 0, sizeof(*props));
 
props->lid = 0;
@@ -353,8 +353,8 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
props->pkey_tbl_len = 1;
props->bad_pkey_cntr = 0;
props->qkey_viol_cntr = 0;
-   eth_speed_to_ib_speed(cmd.speed, >active_speed,
-   >active_width);
+   eth_speed_to_ib_speed(cmd.parent.speed, >active_speed,
+ >active_width);
props->max_mtu = IB_MTU_4096;
props->active_mtu = iboe_get_mtu(us_ibdev->ufdev->mtu);
/* Userspace will adjust for hdrs */
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 01/19] lib/bitmap.c: conversion routines to/from u32 array

2015-12-14 Thread David Decotigny

From: David Decotigny 

Aimed at transferring bitmaps to/from user-space in a 32/64-bit agnostic
way.

Tested:
  unit tests (next patch) on qemu i386, x86_64, ppc, ppc64 BE and LE,
  ARM.

Signed-off-by: David Decotigny 
---
 include/linux/bitmap.h |  6 
 lib/bitmap.c   | 86 ++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 9653fdb..f7dc158 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -59,6 +59,8 @@
  * bitmap_find_free_region(bitmap, bits, order)Find and allocate bit 
region
  * bitmap_release_region(bitmap, pos, order)   Free specified bit region
  * bitmap_allocate_region(bitmap, pos, order)  Allocate specified bit region
+ * bitmap_from_u32array(dst, nbits, buf, nwords) *dst = *buf (nwords 32b words)
+ * bitmap_to_u32array(buf, nwords, src, nbits) *buf = *dst (nwords 32b words)
  */
 
 /*
@@ -163,6 +165,10 @@ extern void bitmap_fold(unsigned long *dst, const unsigned 
long *orig,
 extern int bitmap_find_free_region(unsigned long *bitmap, unsigned int bits, 
int order);
 extern void bitmap_release_region(unsigned long *bitmap, unsigned int pos, int 
order);
 extern int bitmap_allocate_region(unsigned long *bitmap, unsigned int pos, int 
order);
+extern void bitmap_from_u32array(unsigned long *bitmap, unsigned int nbits,
+const u32 *buf, unsigned int nwords);
+extern void bitmap_to_u32array(u32 *buf, unsigned int nwords,
+  const unsigned long *bitmap, unsigned int nbits);
 #ifdef __BIG_ENDIAN
 extern void bitmap_copy_le(unsigned long *dst, const unsigned long *src, 
unsigned int nbits);
 #else
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 8148143..e1cc648 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -12,6 +12,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1060,6 +1062,90 @@ int bitmap_allocate_region(unsigned long *bitmap, 
unsigned int pos, int order)
 EXPORT_SYMBOL(bitmap_allocate_region);
 
 /**
+ * bitmap_from_u32array - copy the contents of a u32 array of bits to bitmap
+ * @bitmap: array of unsigned longs, the destination bitmap, non NULL
+ * @nbits: number of bits in @bitmap
+ * @buf: array of u32 (in host byte order), the source bitmap, non NULL
+ * @nwords: number of u32 words in @buf
+ *
+ * copy min(nbits, 32*nwords) bits from @buf to @bitmap, remaining
+ * bits between nword and nbits in @bitmap (if any) are cleared. In
+ * last word of @bitmap, the bits beyond nbits (if any) are kept
+ * unchanged.
+ */
+void bitmap_from_u32array(unsigned long *bitmap, unsigned int nbits,
+ const u32 *buf, unsigned int nwords)
+{
+   unsigned int k;
+   const u32 *src = buf;
+
+   for (k = 0; k < BITS_TO_LONGS(nbits); ++k) {
+   unsigned long part = 0;
+
+   if (nwords) {
+   part = *src++;
+   nwords--;
+   }
+
+#if BITS_PER_LONG == 64
+   if (nwords) {
+   part |= ((unsigned long) *src++) << 32;
+   nwords--;
+   }
+#endif
+
+   if (k < nbits/BITS_PER_LONG)
+   bitmap[k] = part;
+   else {
+   unsigned long mask = BITMAP_LAST_WORD_MASK(nbits);
+
+   bitmap[k] = (bitmap[k] & ~mask) | (part & mask);
+   }
+   }
+}
+EXPORT_SYMBOL(bitmap_from_u32array);
+
+/**
+ * bitmap_to_u32array - copy the contents of bitmap to a u32 array of bits
+ * @buf: array of u32 (in host byte order), the dest bitmap, non NULL
+ * @nwords: number of u32 words in @buf
+ * @bitmap: array of unsigned longs, the source bitmap, non NULL
+ * @nbits: number of bits in @bitmap
+ *
+ * copy min(nbits, 32*nwords) bits from @bitmap to @buf. Remaining
+ * bits after nbits in @buf (if any) are cleared.
+ */
+void bitmap_to_u32array(u32 *buf, unsigned int nwords,
+   const unsigned long *bitmap, unsigned int nbits)
+{
+   unsigned int k = 0;
+   u32 *dst = buf;
+
+   while (nwords) {
+   unsigned long part = 0;
+
+   if (k < BITS_TO_LONGS(nbits)) {
+   part = bitmap[k];
+   if (k >= nbits/BITS_PER_LONG)
+   part &= BITMAP_LAST_WORD_MASK(nbits);
+   k++;
+   }
+
+   *dst++ = part & 0xUL;
+   nwords--;
+
+#if BITS_PER_LONG == 64
+   if (nwords) {
+   part >>= 32;
+   *dst++ = part & 0xUL;
+   nwords--;
+   }
+#endif
+   }
+}
+EXPORT_SYMBOL(bitmap_to_u32array);
+
+/**
  * bitmap_copy_le - copy a bitmap, putting the bits into little-endian order.
  * @dst:   destination buffer
  * @src:   bitmap to copy
--

[PATCH net-next v5 10/19] net: macvlan: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/net/macvlan.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 06c8bfe..a95b793 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -940,12 +940,12 @@ static void macvlan_ethtool_get_drvinfo(struct net_device 
*dev,
strlcpy(drvinfo->version, "0.1", sizeof(drvinfo->version));
 }
 
-static int macvlan_ethtool_get_settings(struct net_device *dev,
-   struct ethtool_cmd *cmd)
+static int macvlan_ethtool_get_ksettings(struct net_device *dev,
+struct ethtool_ksettings *cmd)
 {
const struct macvlan_dev *vlan = netdev_priv(dev);
 
-   return __ethtool_get_settings(vlan->lowerdev, cmd);
+   return __ethtool_get_ksettings(vlan->lowerdev, cmd);
 }
 
 static netdev_features_t macvlan_fix_features(struct net_device *dev,
@@ -1020,7 +1020,7 @@ static int macvlan_dev_get_iflink(const struct net_device 
*dev)
 
 static const struct ethtool_ops macvlan_ethtool_ops = {
.get_link   = ethtool_op_get_link,
-   .get_settings   = macvlan_ethtool_get_settings,
+   .get_ksettings  = macvlan_ethtool_get_ksettings,
.get_drvinfo= macvlan_ethtool_get_drvinfo,
 };
 
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 09/19] net: ipvlan: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/net/ipvlan/ipvlan_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index a9268db..63b3aa5 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -346,12 +346,12 @@ static const struct header_ops ipvlan_header_ops = {
.cache_update   = eth_header_cache_update,
 };
 
-static int ipvlan_ethtool_get_settings(struct net_device *dev,
-  struct ethtool_cmd *cmd)
+static int ipvlan_ethtool_get_ksettings(struct net_device *dev,
+   struct ethtool_ksettings *cmd)
 {
const struct ipvl_dev *ipvlan = netdev_priv(dev);
 
-   return __ethtool_get_settings(ipvlan->phy_dev, cmd);
+   return __ethtool_get_ksettings(ipvlan->phy_dev, cmd);
 }
 
 static void ipvlan_ethtool_get_drvinfo(struct net_device *dev,
@@ -377,7 +377,7 @@ static void ipvlan_ethtool_set_msglevel(struct net_device 
*dev, u32 value)
 
 static const struct ethtool_ops ipvlan_ethtool_ops = {
.get_link   = ethtool_op_get_link,
-   .get_settings   = ipvlan_ethtool_get_settings,
+   .get_ksettings  = ipvlan_ethtool_get_ksettings,
.get_drvinfo= ipvlan_ethtool_get_drvinfo,
.get_msglevel   = ipvlan_ethtool_get_msglevel,
.set_msglevel   = ipvlan_ethtool_set_msglevel,
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 04/19] net: usnic: use __ethtool_get_settings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c 
b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 5b60579..e082170 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -329,7 +329,7 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
usnic_dbg("\n");
 
mutex_lock(_ibdev->usdev_lock);
-   us_ibdev->netdev->ethtool_ops->get_settings(us_ibdev->netdev, );
+   __ethtool_get_settings(us_ibdev->netdev, );
memset(props, 0, sizeof(*props));
 
props->lid = 0;
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 08/19] net: bonding: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/net/bonding/bond_main.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index fe0e7a6..ce8c026 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -374,22 +374,20 @@ down:
 static void bond_update_speed_duplex(struct slave *slave)
 {
struct net_device *slave_dev = slave->dev;
-   struct ethtool_cmd ecmd;
-   u32 slave_speed;
+   struct ethtool_ksettings ecmd;
int res;
 
slave->speed = SPEED_UNKNOWN;
slave->duplex = DUPLEX_UNKNOWN;
 
-   res = __ethtool_get_settings(slave_dev, );
+   res = __ethtool_get_ksettings(slave_dev, );
if (res < 0)
return;
 
-   slave_speed = ethtool_cmd_speed();
-   if (slave_speed == 0 || slave_speed == ((__u32) -1))
+   if (ecmd.parent.speed == 0 || ecmd.parent.speed == ((__u32)-1))
return;
 
-   switch (ecmd.duplex) {
+   switch (ecmd.parent.duplex) {
case DUPLEX_FULL:
case DUPLEX_HALF:
break;
@@ -397,8 +395,8 @@ static void bond_update_speed_duplex(struct slave *slave)
return;
}
 
-   slave->speed = slave_speed;
-   slave->duplex = ecmd.duplex;
+   slave->speed = ecmd.parent.speed;
+   slave->duplex = ecmd.parent.duplex;
 
return;
 }
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 15/19] net: bridge: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 net/bridge/br_if.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 8d1d4a2..d1022fd 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -36,10 +36,10 @@
  */
 static int port_cost(struct net_device *dev)
 {
-   struct ethtool_cmd ecmd;
+   struct ethtool_ksettings ecmd;
 
-   if (!__ethtool_get_settings(dev, )) {
-   switch (ethtool_cmd_speed()) {
+   if (!__ethtool_get_ksettings(dev, )) {
+   switch (ecmd.parent.speed) {
case SPEED_1:
return 2;
case SPEED_1000:
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v12] Add Mediatek thermal support

2015-12-14 Thread Eduardo Valentin

On Mon, Dec 14, 2015 at 11:37:39AM +0100, Sascha Hauer wrote:
> Eduardo,
> 
> Ok, to apply this? There seem to be no further comments.

Yeah, sorry for the delay. I will do one more review round, but I dont
see much. So, hopefully should be applied in the coming days.


BR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 12/19] net: fcoe: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/scsi/fcoe/fcoe_transport.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/fcoe/fcoe_transport.c 
b/drivers/scsi/fcoe/fcoe_transport.c
index d7597c0..2d5909f 100644
--- a/drivers/scsi/fcoe/fcoe_transport.c
+++ b/drivers/scsi/fcoe/fcoe_transport.c
@@ -93,36 +93,40 @@ static struct notifier_block libfcoe_notifier = {
 int fcoe_link_speed_update(struct fc_lport *lport)
 {
struct net_device *netdev = fcoe_get_netdev(lport);
-   struct ethtool_cmd ecmd;
+   struct ethtool_ksettings ecmd;
 
-   if (!__ethtool_get_settings(netdev, )) {
+   if (!__ethtool_get_ksettings(netdev, )) {
lport->link_supported_speeds &= ~(FC_PORTSPEED_1GBIT  |
  FC_PORTSPEED_10GBIT |
  FC_PORTSPEED_20GBIT |
  FC_PORTSPEED_40GBIT);
 
-   if (ecmd.supported & (SUPPORTED_1000baseT_Half |
- SUPPORTED_1000baseT_Full |
- SUPPORTED_1000baseKX_Full))
+   if (ecmd.link_modes.supported[0] & (
+   SUPPORTED_1000baseT_Half |
+   SUPPORTED_1000baseT_Full |
+   SUPPORTED_1000baseKX_Full))
lport->link_supported_speeds |= FC_PORTSPEED_1GBIT;
 
-   if (ecmd.supported & (SUPPORTED_1baseT_Full   |
- SUPPORTED_1baseKX4_Full |
- SUPPORTED_1baseKR_Full  |
- SUPPORTED_1baseR_FEC))
+   if (ecmd.link_modes.supported[0] & (
+   SUPPORTED_1baseT_Full   |
+   SUPPORTED_1baseKX4_Full |
+   SUPPORTED_1baseKR_Full  |
+   SUPPORTED_1baseR_FEC))
lport->link_supported_speeds |= FC_PORTSPEED_10GBIT;
 
-   if (ecmd.supported & (SUPPORTED_2baseMLD2_Full |
- SUPPORTED_2baseKR2_Full))
+   if (ecmd.link_modes.supported[0] & (
+   SUPPORTED_2baseMLD2_Full |
+   SUPPORTED_2baseKR2_Full))
lport->link_supported_speeds |= FC_PORTSPEED_20GBIT;
 
-   if (ecmd.supported & (SUPPORTED_4baseKR4_Full |
- SUPPORTED_4baseCR4_Full |
- SUPPORTED_4baseSR4_Full |
- SUPPORTED_4baseLR4_Full))
+   if (ecmd.link_modes.supported[0] & (
+   SUPPORTED_4baseKR4_Full |
+   SUPPORTED_4baseCR4_Full |
+   SUPPORTED_4baseSR4_Full |
+   SUPPORTED_4baseLR4_Full))
lport->link_supported_speeds |= FC_PORTSPEED_40GBIT;
 
-   switch (ethtool_cmd_speed()) {
+   switch (ecmd.parent.speed) {
case SPEED_1000:
lport->link_speed = FC_PORTSPEED_1GBIT;
break;
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 16/19] net: core: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 net/core/net-sysfs.c   | 15 +--
 net/packet/af_packet.c | 11 +--
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index f88a62a..3dd4bb1 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -199,9 +199,10 @@ static ssize_t speed_show(struct device *dev,
return restart_syscall();
 
if (netif_running(netdev)) {
-   struct ethtool_cmd cmd;
-   if (!__ethtool_get_settings(netdev, ))
-   ret = sprintf(buf, fmt_dec, ethtool_cmd_speed());
+   struct ethtool_ksettings cmd;
+
+   if (!__ethtool_get_ksettings(netdev, ))
+   ret = sprintf(buf, fmt_dec, cmd.parent.speed);
}
rtnl_unlock();
return ret;
@@ -218,10 +219,12 @@ static ssize_t duplex_show(struct device *dev,
return restart_syscall();
 
if (netif_running(netdev)) {
-   struct ethtool_cmd cmd;
-   if (!__ethtool_get_settings(netdev, )) {
+   struct ethtool_ksettings cmd;
+
+   if (!__ethtool_get_ksettings(netdev, )) {
const char *duplex;
-   switch (cmd.duplex) {
+
+   switch (cmd.parent.duplex) {
case DUPLEX_HALF:
duplex = "half";
break;
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 992396a..626dae0 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -557,9 +557,8 @@ static int prb_calc_retire_blk_tmo(struct packet_sock *po,
 {
struct net_device *dev;
unsigned int mbits = 0, msec = 0, div = 0, tmo = 0;
-   struct ethtool_cmd ecmd;
+   struct ethtool_ksettings ecmd;
int err;
-   u32 speed;
 
rtnl_lock();
dev = __dev_get_by_index(sock_net(>sk), po->ifindex);
@@ -567,19 +566,19 @@ static int prb_calc_retire_blk_tmo(struct packet_sock *po,
rtnl_unlock();
return DEFAULT_PRB_RETIRE_TOV;
}
-   err = __ethtool_get_settings(dev, );
-   speed = ethtool_cmd_speed();
+   err = __ethtool_get_ksettings(dev, );
rtnl_unlock();
if (!err) {
/*
 * If the link speed is so slow you don't really
 * need to worry about perf anyways
 */
-   if (speed < SPEED_1000 || speed == SPEED_UNKNOWN) {
+   if (ecmd.parent.speed < SPEED_1000 ||
+   ecmd.parent.speed == SPEED_UNKNOWN) {
return DEFAULT_PRB_RETIRE_TOV;
} else {
msec = 1;
-   div = speed / 1000;
+   div = ecmd.parent.speed / 1000;
}
}
 
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 11/19] net: team: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/net/team/team.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 059c0f6..7cc98a7 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2799,12 +2799,12 @@ static void __team_port_change_send(struct team_port 
*port, bool linkup)
port->state.linkup = linkup;
team_refresh_port_linkup(port);
if (linkup) {
-   struct ethtool_cmd ecmd;
+   struct ethtool_ksettings ecmd;
 
-   err = __ethtool_get_settings(port->dev, );
+   err = __ethtool_get_ksettings(port->dev, );
if (!err) {
-   port->state.speed = ethtool_cmd_speed();
-   port->state.duplex = ecmd.duplex;
+   port->state.speed = ecmd.parent.speed;
+   port->state.duplex = ecmd.parent.duplex;
goto send_event;
}
}
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 17/19] net: ethtool: remove unused __ethtool_get_settings

2015-12-14 Thread David Decotigny

From: David Decotigny 

replaced by __ethtool_get_ksettings.

Signed-off-by: David Decotigny 
---
 include/linux/ethtool.h |  4 
 net/core/ethtool.c  | 45 ++---
 2 files changed, 14 insertions(+), 35 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 6077cbb..05d4f0e 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -148,10 +148,6 @@ struct ethtool_ksettings {
 extern int __ethtool_get_ksettings(struct net_device *dev,
   struct ethtool_ksettings *ksettings);
 
-/* DEPRECATED, use __ethtool_get_ksettings */
-extern int __ethtool_get_settings(struct net_device *dev,
- struct ethtool_cmd *cmd);
-
 /**
  * struct ethtool_ops - optional netdev operations
  * @get_settings: DEPRECATED, use %get_ksettings/%set_ksettings
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 4865031..84dca87 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -495,7 +495,12 @@ int __ethtool_get_ksettings(struct net_device *dev,
 * legacy %ethtool_cmd API, unless it's not supported either.
 * TODO: remove when ethtool_ops::get_settings disappears internally
 */
-   err = __ethtool_get_settings(dev, );
+   if (!dev->ethtool_ops->get_settings)
+   return -EOPNOTSUPP;
+
+   memset(, 0, sizeof(cmd));
+   cmd.cmd = ETHTOOL_GSET;
+   err = dev->ethtool_ops->get_settings(dev, );
if (err < 0)
return err;
 
@@ -652,30 +657,6 @@ static int ethtool_set_ksettings(struct net_device *dev, 
void __user *useraddr)
return dev->ethtool_ops->set_ksettings(dev, );
 }
 
-/* Internal kernel helper to query a device ethtool_cmd settings.
- *
- * Note about transition to ethtool_settings API: We do not need (or
- * want) this function to support "dev" instances that implement the
- * ethtool_settings API as we will update the drivers calling this
- * function to call __ethtool_get_ksettings instead, before the first
- * drivers implement ethtool_ops::get_ksettings.
- *
- * TODO 1: at least make this function static when no driver is using it
- * TODO 2: remove when ethtool_ops::get_settings disappears internally
- */
-int __ethtool_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
-{
-   ASSERT_RTNL();
-
-   if (!dev->ethtool_ops->get_settings)
-   return -EOPNOTSUPP;
-
-   memset(cmd, 0, sizeof(struct ethtool_cmd));
-   cmd->cmd = ETHTOOL_GSET;
-   return dev->ethtool_ops->get_settings(dev, cmd);
-}
-EXPORT_SYMBOL(__ethtool_get_settings);
-
 static void
 warn_incomplete_ethtool_legacy_settings_conversion(const char *details)
 {
@@ -717,16 +698,18 @@ static int ethtool_get_settings(struct net_device *dev, 
void __user *useraddr)
/* send a sensible cmd tag back to user */
cmd.cmd = ETHTOOL_GSET;
} else {
-   int err;
-   /* TODO: return -EOPNOTSUPP when
-* ethtool_ops::get_settings disappears internally
-*/
-
/* driver doesn't support %ethtool_ksettings
 * API. revert to legacy %ethtool_cmd API, unless it's
 * not supported either.
 */
-   err = __ethtool_get_settings(dev, );
+   int err;
+
+   if (!dev->ethtool_ops->get_settings)
+   return -EOPNOTSUPP;
+
+   memset(, 0, sizeof(cmd));
+   cmd.cmd = ETHTOOL_GSET;
+   err = dev->ethtool_ops->get_settings(dev, );
if (err < 0)
return err;
}
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 18/19] net: mlx4: convenience predicate for debug messages

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 35de7d2..b04054d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -740,9 +740,11 @@ __printf(3, 4)
 void en_print(const char *level, const struct mlx4_en_priv *priv,
  const char *format, ...);
 
+#define en_dbg_enabled(mlevel, priv)   \
+   (NETIF_MSG_##mlevel & (priv)->msg_enable)
 #define en_dbg(mlevel, priv, format, ...)  \
 do {   \
-   if (NETIF_MSG_##mlevel & (priv)->msg_enable)\
+   if (en_dbg_enabled(mlevel, priv))   \
en_print(KERN_DEBUG, priv, format, ##__VA_ARGS__);  \
 } while (0)
 #define en_warn(priv, format, ...) \
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 13/19] net: rdma: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 include/rdma/ib_addr.h | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index 1152859..1820f26 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -254,24 +254,22 @@ static inline enum ib_mtu iboe_get_mtu(int mtu)
 
 static inline int iboe_get_rate(struct net_device *dev)
 {
-   struct ethtool_cmd cmd;
-   u32 speed;
+   struct ethtool_ksettings cmd;
int err;
 
rtnl_lock();
-   err = __ethtool_get_settings(dev, );
+   err = __ethtool_get_ksettings(dev, );
rtnl_unlock();
if (err)
return IB_RATE_PORT_CURRENT;
 
-   speed = ethtool_cmd_speed();
-   if (speed >= 4)
+   if (cmd.parent.speed >= 4)
return IB_RATE_40_GBPS;
-   else if (speed >= 3)
+   else if (cmd.parent.speed >= 3)
return IB_RATE_30_GBPS;
-   else if (speed >= 2)
+   else if (cmd.parent.speed >= 2)
return IB_RATE_20_GBPS;
-   else if (speed >= 1)
+   else if (cmd.parent.speed >= 1)
return IB_RATE_10_GBPS;
else
return IB_RATE_PORT_CURRENT;
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 14/19] net: 8021q: use __ethtool_get_ksettings

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 net/8021q/vlan_dev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index fded865..e607fee 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -620,12 +620,12 @@ static netdev_features_t vlan_dev_fix_features(struct 
net_device *dev,
return features;
 }
 
-static int vlan_ethtool_get_settings(struct net_device *dev,
-struct ethtool_cmd *cmd)
+static int vlan_ethtool_get_ksettings(struct net_device *dev,
+ struct ethtool_ksettings *cmd)
 {
const struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
 
-   return __ethtool_get_settings(vlan->real_dev, cmd);
+   return __ethtool_get_ksettings(vlan->real_dev, cmd);
 }
 
 static void vlan_ethtool_get_drvinfo(struct net_device *dev,
@@ -740,7 +740,7 @@ static int vlan_dev_get_iflink(const struct net_device *dev)
 }
 
 static const struct ethtool_ops vlan_ethtool_ops = {
-   .get_settings   = vlan_ethtool_get_settings,
+   .get_ksettings  = vlan_ethtool_get_ksettings,
.get_drvinfo= vlan_ethtool_get_drvinfo,
.get_link   = ethtool_op_get_link,
.get_ts_info= vlan_ethtool_get_ts_info,
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next v5 19/19] net: mlx4: use new ETHTOOL_G/SSETTINGS API

2015-12-14 Thread David Decotigny

From: David Decotigny 

Signed-off-by: David Decotigny 
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 344 
 drivers/net/ethernet/mellanox/mlx4/en_main.c|   1 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|   1 +
 3 files changed, 177 insertions(+), 169 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index dd84cab..f33f27b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -501,34 +501,30 @@ static u32 mlx4_en_autoneg_get(struct net_device *dev)
return autoneg;
 }
 
-static u32 ptys_get_supported_port(struct mlx4_ptys_reg *ptys_reg)
+static void ptys2ethtool_update_supported_port(unsigned long *mask,
+  struct mlx4_ptys_reg *ptys_reg)
 {
u32 eth_proto = be32_to_cpu(ptys_reg->eth_proto_cap);
 
if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_T)
 | MLX4_PROT_MASK(MLX4_1000BASE_T)
 | MLX4_PROT_MASK(MLX4_100BASE_TX))) {
-   return SUPPORTED_TP;
-   }
-
-   if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_CR)
+   __set_bit(ETHTOOL_LINK_MODE_TP_BIT, mask);
+   } else if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_CR)
 | MLX4_PROT_MASK(MLX4_10GBASE_SR)
 | MLX4_PROT_MASK(MLX4_56GBASE_SR4)
 | MLX4_PROT_MASK(MLX4_40GBASE_CR4)
 | MLX4_PROT_MASK(MLX4_40GBASE_SR4)
 | MLX4_PROT_MASK(MLX4_1000BASE_CX_SGMII))) {
-   return SUPPORTED_FIBRE;
-   }
-
-   if (eth_proto & (MLX4_PROT_MASK(MLX4_56GBASE_KR4)
+   __set_bit(ETHTOOL_LINK_MODE_FIBRE_BIT, mask);
+   } else if (eth_proto & (MLX4_PROT_MASK(MLX4_56GBASE_KR4)
 | MLX4_PROT_MASK(MLX4_40GBASE_KR4)
 | MLX4_PROT_MASK(MLX4_20GBASE_KR2)
 | MLX4_PROT_MASK(MLX4_10GBASE_KR)
 | MLX4_PROT_MASK(MLX4_10GBASE_KX4)
 | MLX4_PROT_MASK(MLX4_1000BASE_KX))) {
-   return SUPPORTED_Backplane;
+   __set_bit(ETHTOOL_LINK_MODE_Backplane_BIT, mask);
}
-   return 0;
 }
 
 static u32 ptys_get_active_port(struct mlx4_ptys_reg *ptys_reg)
@@ -574,122 +570,111 @@ static u32 ptys_get_active_port(struct mlx4_ptys_reg 
*ptys_reg)
 enum ethtool_report {
SUPPORTED = 0,
ADVERTISED = 1,
-   SPEED = 2
 };
 
+struct ptys2ethtool_config {
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(advertised);
+   u32 speed;
+};
+
+static unsigned long *ptys2ethtool_link_mode(struct ptys2ethtool_config *cfg,
+enum ethtool_report report)
+{
+   switch (report) {
+   case SUPPORTED:
+   return cfg->supported;
+   case ADVERTISED:
+   return cfg->advertised;
+   }
+   return NULL;
+}
+
+#define MLX4_BUILD_PTYS2ETHTOOL_CONFIG(reg_, speed_, ...)  \
+   ({  \
+   struct ptys2ethtool_config *cfg;\
+   const unsigned int modes[] = { __VA_ARGS__ };   \
+   unsigned int i; \
+   cfg = _map[reg_];  \
+   cfg->speed = speed_;\
+   bitmap_zero(cfg->supported, \
+   __ETHTOOL_LINK_MODE_MASK_NBITS);\
+   bitmap_zero(cfg->advertised,\
+   __ETHTOOL_LINK_MODE_MASK_NBITS);\
+   for (i = 0 ; i < ARRAY_SIZE(modes) ; ++i) { \
+   __set_bit(modes[i], cfg->supported);\
+   __set_bit(modes[i], cfg->advertised);   \
+   }   \
+   })
+
 /* Translates mlx4 link mode to equivalent ethtool Link modes/speed */
-static u32 ptys2ethtool_map[MLX4_LINK_MODES_SZ][3] = {
-   [MLX4_100BASE_TX] = {
-   SUPPORTED_100baseT_Full,
-   ADVERTISED_100baseT_Full,
-   SPEED_100
-   },
-
-   [MLX4_1000BASE_T] = {
-   SUPPORTED_1000baseT_Full,
-   ADVERTISED_1000baseT_Full,
-   SPEED_1000
-   },
-   [MLX4_1000BASE_CX_SGMII] = {
-   SUPPORTED_1000baseKX_Full,
-   ADVERTISED_1000baseKX_Full,
-   SPEED_1000
-   },
-   [MLX4_1000BASE_KX] = {
-   SUPPORTED_1000baseKX_Full,
-   ADVERTISED_1000baseKX_Full,
-

[PATCH net-next v5 00/19] new ETHTOOL_GSETTINGS/SSETTINGS API

2015-12-14 Thread David Decotigny

From: David Decotigny 


History:
 v5
 note: please see v4 bullets for a question regarding bitmap.c
 - minor fix to make allyesconfig/allmodconfig
 v4
 - removed typedef for link mode bitmaps
 - moved bitmap<->u32[] conversion routines to bitmap.c . This is the
   naive implementation. I have an endian-aware version that uses
   memcpy/memset as much as possible, but I find it harder to follow
   (see http://paste.ubuntu.com/13863722/). Please let me know if I
   should use it instead.
 - fixes suggested by Ben Hutchings
 v3
 - rebased v2 on top of latest net-next, minor checkpatch/printf %*pb
   updates
 v2
 - keep return 0 in get_settings when successful, instead of
   propagating positive result from driver's get_settings callback.
 v1
 - original submission


The main goal of this series is to support ethtool link mode masks
larger than 32 bits. It implements a new ioctl pair
(ETHTOOL_GSETTINGS/SSETTINGS), its associated callbacks
(get/set_settings) and a new struct ethtool_settings, which should
eventually replace legacy ethtool_cmd. Internally, the kernel uses
fixed length link mode masks defined at compilation time in ethtool.h
(for now: 31 bits), that can be increased by changing
__ETHTOOL_LINK_MODE_LAST in ethtool.h (absolute max is 4064 bits,
checked at compile time), and the user/kernel interface allows this
length to be arbitrary within 1..4064. This should allow some
flexibility without using too much heap/stack space, at the cost of
a small kernel/user handshake for the user to determine the sizes of
those bitmaps.

Along the way, I chose to drop in the new structure the 3 ethtool_cmd
fields marked "deprecated" (transceiver/maxrxpkt/maxtxpkt). They are
still available for old drivers via the (old) ETHTOOL_GSET/SSET API,
but are not available to drivers that switch to new API. Of those 3
fields, ethtool_cmd::transceiver seems to be still actively used by
several drivers, maybe we should not consider this field deprecated?
The 2 other fields are basically not used. This transition requires
some care in the way old and new ethtool talk to the kernel.

More technical details provided in the description for main patch. In
particular details about backward compatibility properties.

Some questions to more experts than me:
 - the kernel/interface multiplexes the "tell me the bitmap length"
   handshake and the "give me the settings" inside the new
   ETHTOOL_GSETTINGS cmd. I was thinking of making this into 2
   separate cmds: 1 cmd ETHTOOL_GKERNELPROPERTIES which would be
   kernel-wide rather than device-specific, would return properties
   like "length of the link mode bitmaps", and possibly others. And
   ETHTOOL_GSETTINGS would expect the proper bitmaps
 - the link mode bitmaps are piggybacked at tail of the new struct
   ethtool_settings. Since its user-visible definition does not assume
   specific bitmap width, I am using a 0-length array as the publicly
   visible placeholder. But then, the kernel needs to specialize it
   (struct ethtool_ksettings) to specify its current link mode
   masks. This means that kernel code is "littered" with
   "ksettings->parent.field" to access "field" inside
   ethtool_settings:
   + I don't like the field name "parent", any suggestion welcome
   + and/or: I could use ethtool_settings everywhere (instead of a new
 ethtool_ksettings) and an accessor to retrieve the link mode
 masks?
   + or: we could decide to make the link mode masks statically
 bounded again, ie. make their width public, but larger than
 current 32, and unchangeable forever. This would make everything
 straightforward, but we might hit limits later, or have an
 unneeded memory/stack usage for unused bits.
   any preference?
 - I foresee bugs where people use the legacy/deprecated SUPPORTED_x
   macros instead of the new ETHTOOL_LINK_MODE_x_BIT enums in the new
   get/set__ksettings callbacks. Not sure how to prevent problems with
   this.

The only driver which was converted for now is mlx4. I am not
considering fcoe as fully converted, but I updated it a minima to be
able to remove __ethtool_get_settings, now known as
__ethtool_get_ksettings.

Tested with legacy and "future" ethtool on 64b x86 kernel and 32+64b
ethtool, and on a 32b x86 kernel + 32b ethtool.


# Patch Set Summary:

David Decotigny (19):
  lib/bitmap.c: conversion routines to/from u32 array
  test_bitmap: unit tests for lib/bitmap.c
  net: usnic: remove unused call to ethtool_ops::get_settings
  net: usnic: use __ethtool_get_settings
  net: ethtool: add new ETHTOOL_GSETTINGS/SSETTINGS API
  tx4939: use __ethtool_get_ksettings
  net: usnic: use __ethtool_get_ksettings
  net: bonding: use __ethtool_get_ksettings
  net: ipvlan: use __ethtool_get_ksettings
  net: macvlan: use __ethtool_get_ksettings
  net: team: use __ethtool_get_ksettings
  net: fcoe: use __ethtool_get_ksettings
  net: rdma: use __ethtool_get_ksettings
  net: 8021q: use

[PATCH] Fix int1 recursion with unregistered breakpoints

2015-12-14 Thread Jeff Merkey

Please consider the attached patch.

I have reviewed all the code that touches this patch and have
determined it will function and support all of the software that
depends on this handler properly.  I have compiled and tested this
patch with a test harness that tests the robustness of the linux
breakpoint API and handlers in the following ways:

1.  Setting multiple conditional breakpoints through
arch_install_hw_breakpoint API across four processors to test the rate
at which the interface can handle breakpoint exceptions

2.  Setting unregistered breakpoints to test the handlers robustness
in dealing with error handling conditions and errant or spurious
hardware conditions and to simulate actual "lazy debug register
switching" (which does not work BTW) with null bp handlers to test the
robustness of the handlers.

3.  Clearing and setting breakpoints across multiple processors then
triggering concurrent exceptions in both interrupt and process
contexts.

This patch improves robustness in several ways in the linux kernel:

1.  Corrects bug in handling unregistered breakpoints.

2.  Provides hardware check of dr7 to determine source of breakpoint
if OS cannot ascertain the int1 source from its own state and
variables.

3.  Actually allows "lazy debug register switching" to function, which
until recently has apparently never been actually seen on live
hardware or actually tested.

Signed-off-by: Jeff Merkey 
---
 arch/x86/kernel/hw_breakpoint.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index 50a3fad..ca13db0 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -444,7 +444,7 @@ EXPORT_SYMBOL_GPL(hw_breakpoint_restore);
 static int hw_breakpoint_handler(struct die_args *args)
 {
int i, cpu, rc = NOTIFY_STOP;
-   struct perf_event *bp;
+   struct perf_event *bp = NULL;
unsigned long dr7, dr6;
unsigned long *dr6_p;
 
@@ -475,6 +475,13 @@ static int hw_breakpoint_handler(struct die_args *args)
for (i = 0; i < HBP_NUM; ++i) {
if (likely(!(dr6 & (DR_TRAP0 << i
continue;
+   /*
+   * check if we got an execute breakpoint
+   * from the dr7 register.  if we did, set
+   * the resume flag to avoid int1 recursion.
+   */
+   if ((dr7 & (3 << ((i * 4) + 16))) == 0)
+   args->regs->flags |= X86_EFLAGS_RF;
 
/*
 * The counter may be concurrently released but that can only
@@ -503,7 +510,9 @@ static int hw_breakpoint_handler(struct die_args *args)
 
/*
 * Set up resume flag to avoid breakpoint recursion when
-* returning back to origin.
+* returning back to origin.  Perform the check
+   * twice in case the event handler altered the
+   * system flags.
 */
if (bp->hw.info.type == X86_BREAKPOINT_EXECUTE)
args->regs->flags |= X86_EFLAGS_RF;
@@ -519,6 +528,18 @@ static int hw_breakpoint_handler(struct die_args *args)
(dr6 & (~DR_TRAP_BITS)))
rc = NOTIFY_DONE;
 
+   /*
+   * if we are about to signal to
+   * do_debug() to stop further processing
+   * and we have not ascertained the source
+   * of the breakpoint, log it as spurious.
+   */
+   if (rc == NOTIFY_STOP && !bp) {
+   printk_ratelimited(KERN_INFO
+   "INFO: spurious INT1 exception dr6: 0x%lX dr7: 
0x%lX\n",
+   dr6, dr7);
+   }
+
set_debugreg(dr7, 7);
put_cpu();
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 4.4-rc5: ugly warn on: 5 W+X pages found

2015-12-14 Thread Andy Lutomirski

On Mon, Dec 14, 2015 at 12:26 PM, Pavel Machek  wrote:
> Hi!
>
>> > I know. But either someone cares, and it should be fixes, or noone
>> > cares, and the check should be removed.
>>
>> Someone cares, and it should be scheduled to be fixed for 4.5. The EFI
>> mapping changes that were required to avoid the warning were much too
>> big and late to make 4.4.
>>
>> So for now, don't enable CONFIG_DEBUG_WX for now. Unless you want to
>> actively debug the EFI mapping changes, that is. Which I heartily
>> recommend people doing.
>
> Ok, good, except... This is thinkpad X60. Good old BIOS. It should
> have no EFI.
>
> pavel@duo:~$ dmesg | grep EFI
> pavel@duo:~$
>
> From the messages I got:
>
>> [3.285993] x86/mm: Found insecure W+X mapping at address
>> ffe69000/0xffe69000
>
> ---[ Persisent kmap() Area ]---
> 0xffc0-0xffd280001184K   pte
> 0xffd28000-0xffddd000 724K RW GLB NX pte
> 0xffddd000-0xffe69000 560K   pte
> 0xffe69000-0xffe6e000  20K RW GLB x  pte
> 0xffe6e000-0xffe6f000   4K   pte
> ---[ Fixmap Area ]---
>
> That is not EFI, right?

That's weird.  The only API to do that seems to be manually setting
kmap_prot to _PAGE_KERNEL_EXEC, and nothing does that.  (Why is
kmap_prot a variable on x86 at all?  It has exactly one writer, and
that's the code that initializes it in the first place.  Shouldn't we
#define kmap_prot _PAGE_KERNEL?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] cpuidle: avoid module usage in non-modular code

2015-12-14 Thread Rafael J. Wysocki

On Sunday, December 13, 2015 06:57:09 PM Paul Gortmaker wrote:
> This series of commits is a part of a larger project to ensure
> people don't reference modular support functions in non-modular
> code.  Overall there was roughly 5k lines of dead code in the
> kernel due to this.  So far we've fixed several areas, like tty,
> x86, net, ... and we continue to work on other areas.
> 
> There are several reasons to not use module support for code that
> can never be built as a module, but the big ones are:
> 
>  (1) it is easy to accidentally code up unused module_exit and remove code
>  (2) it can be misleading when reading the source, thinking it can be
>   modular when the Makefile and/or Kconfig prohibit it
>  (3) it requires the include of the module.h header file which in turn
>  includes nearly everything else.
> 
> Fortunately for cpuidle, the changes are largely trivial and change
> zero runtime.  All the changes here just remap the modular functions
> onto the non-modular ones that they would be remapped onto anyway.
> 
> Changes are against linux-next and compile tested on ARM allmodconfig.
> I've Cc'd ARM list because all of these are used on ARM, but I'm
> thinking these probably can go in via the PM tree.

If no one objects, I can queue up this series for 4.5 unless you have other
plans with respect to it.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 1/4] acpi: pci: Setup MSI domain for ACPI based pci devices

2015-12-14 Thread Rafael J. Wysocki

On Monday, December 14, 2015 03:13:48 PM Marc Zyngier wrote:
> On 10/12/15 16:55, Suravee Suthikulpanit wrote:
> > This patch introduces pci_msi_register_fwnode_provider() for irqchip
> > to register a callback, to provide a way to determine appropriate MSI
> > domain for a pci device.
> > 
> > It also introduces pci_host_bridge_acpi_msi_domain(), which returns
> > the MSI domain of the specified PCI host bridge with DOMAIN_BUS_PCI_MSI
> > bus token. Then, it is assigned to pci device.
> > 
> > Reviewed-by: Marc Zyngier 
> > Cc: Bjorn Helgaas 
> > Cc: Rafael J. Wysocki 
> > Signed-off-by: Suravee Suthikulpanit 
> 
> Bjorn, Rafael,
> 
> Do you have any comment on this?
> 
> I was hoping to queue this work (and the 3 patches that depend on it)
> for 4.5, but if you don't have the bandwidth to review it, I'll postpone
> it to the following merge window.

How much time do we have to look at it before it is postponed?

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3] x86: Create dma_mark_dirty to dirty pages used for DMA by VM guest

2015-12-14 Thread Michael S. Tsirkin

On Mon, Dec 14, 2015 at 09:59:13AM -0800, Alexander Duyck wrote:
> On Mon, Dec 14, 2015 at 9:20 AM, Michael S. Tsirkin  wrote:
> > On Mon, Dec 14, 2015 at 08:34:00AM -0800, Alexander Duyck wrote:
> >> > This way distro can use a guest agent to disable
> >> > dirtying until before migration starts.
> >>
> >> Right.  For a v2 version I would definitely want to have some way to
> >> limit the scope of this.  My main reason for putting this out here is
> >> to start altering the course of discussions since it seems like were
> >> weren't getting anywhere with the ixgbevf migration changes that were
> >> being proposed.
> >
> > Absolutely, thanks for working on this.
> >
> >> >> + unsigned long pg_addr, start;
> >> >> +
> >> >> + start = (unsigned long)addr;
> >> >> + pg_addr = PAGE_ALIGN(start + size);
> >> >> + start &= ~(sizeof(atomic_t) - 1);
> >> >> +
> >> >> + /* trigger a write fault on each page, excluding first page */
> >> >> + while ((pg_addr -= PAGE_SIZE) > start)
> >> >> + atomic_add(0, (atomic_t *)pg_addr);
> >> >> +
> >> >> + /* trigger a write fault on first word of DMA */
> >> >> + atomic_add(0, (atomic_t *)start);

Actually, I have second thoughts about using atomic_add here,
especially for _sync.

Many architectures do

#define ATOMIC_OP_RETURN(op, c_op)  \
static inline int atomic_##op##_return(int i, atomic_t *v)  \
{   \
unsigned long flags;\
int ret;\
\
raw_local_irq_save(flags);  \
ret = (v->counter = v->counter c_op i); \
raw_local_irq_restore(flags);   \
\
return ret; \
}

and this is not safe if device is still doing DMA to/from
this memory.

Generally, atomic_t is there for SMP effects, not for sync
with devices.

This is why I said you should do
cmpxchg(pg_addr, 0xdead, 0xdead); 

Yes, we probably never actually want to run m68k within a VM,
but let's not misuse interfaces like this.


> >> >
> >> > start might not be aligned correctly for a cast to atomic_t.
> >> > It's harmless to do this for any memory, so I think you should
> >> > just do this for 1st byte of all pages including the first one.
> >>
> >> You may not have noticed it but I actually aligned start in the line
> >> after pg_addr.
> >
> > Yes you did. alignof would make it a bit more noticeable.
> >
> >>  However instead of aligning to the start of the next
> >> atomic_t I just masked off the lower bits so that we start at the
> >> DWORD that contains the first byte of the starting address.  The
> >> assumption here is that I cannot trigger any sort of fault since if I
> >> have access to a given byte within a DWORD I will have access to the
> >> entire DWORD.
> >
> > I'm curious where does this come from.  Isn't it true that access is
> > controlled at page granularity normally, so you can touch beginning of
> > page just as well?
> 
> Yeah, I am pretty sure it probably is page granularity.  However my
> thought was to try and stick to the start of the DMA as the last
> access.  That way we don't pull in any more cache lines than we need
> to in order to dirty the pages.  Usually the start of the DMA region
> will contain some sort of headers or something that needs to be
> accessed with the highest priority so I wanted to make certain that we
> were forcing usable data into the L1 cache rather than just the first
> cache line of the page where the DMA started.  If however the start of
> a DMA was the start of the page there is nothing there to prevent
> that.

OK, maybe this helps. You should document all these tricks
in code comments.

> >>  I coded this up so that the spots where we touch the
> >> memory should match up with addresses provided by the hardware to
> >> perform the DMA over the PCI bus.
> >
> > Yes but there's no requirement to do it like this from
> > virt POV. You just need to touch each page.
> 
> I know, but at the same time if we match up with the DMA then it is
> more likely that we avoid grabbing unneeded cache lines.  In the case
> of most drivers the data for headers and start is at the start of the
> DMA.  So if we dirty the cache line associated with the start of the
> DMA it will be pulled into the L1 cache and there is a greater chance
> that it may already be prefetched as well.
> 
> >> Also I intentionally ran from highest address to lowest since that way
> >> we don't risk pushing the first cache line of the DMA buffer out of
> >> the L1 cache due to the PAGE_SIZE stride.
> >
> > Interesting.

Re: [PATCH v6 4/4] x86: mm: support ARCH_MMAP_RND_BITS.

2015-12-14 Thread Daniel Cashman

On 12/14/2015 10:58 AM, H. Peter Anvin wrote:
> On 12/11/15 09:52, Daniel Cashman wrote:
>> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
>> index 844b06d..647fecf 100644
>> --- a/arch/x86/mm/mmap.c
>> +++ b/arch/x86/mm/mmap.c
>> @@ -69,14 +69,14 @@ unsigned long arch_mmap_rnd(void)
>>  {
>>  unsigned long rnd;
>>  
>> -/*
>> - *  8 bits of randomness in 32bit mmaps, 20 address space bits
>> - * 28 bits of randomness in 64bit mmaps, 40 address space bits
>> - */
>>  if (mmap_is_ia32())
>> -rnd = (unsigned long)get_random_int() % (1<<8);
>> +#ifdef CONFIG_COMPAT
>> +rnd = (unsigned long)get_random_int() % (1 << 
>> mmap_rnd_compat_bits);
>> +#else
>> +rnd = (unsigned long)get_random_int() % (1 << mmap_rnd_bits);
>> +#endif
>>  else
>> -rnd = (unsigned long)get_random_int() % (1<<28);
>> +rnd = (unsigned long)get_random_int() % (1 << mmap_rnd_bits);
>>  
>>  return rnd << PAGE_SHIFT;
>>  }
>>
> 
> Now, you and I know that both variants can be implemented with a simple
> AND, but I have a strong suspicion that once this is turned into a
> variable, this will in fact be changed from an AND to a divide.
> 
> So I'd prefer to use the
> "get_random_int() & ((1UL << mmap_rnd_bits) - 1)" construct instead.

Good point.  Will change in v7 across patch-set.

Thank You,
Dan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V03 3/5] dmaengine: core: Introduce new, universal API to request a channel

2015-12-14 Thread Peter Ujfalusi

The two API function can cover most, if not all current APIs used to
request a channel. With minimal effort dmaengine drivers, platforms and
dmaengine user drivers can be converted to use the two function.

struct dma_chan *dma_request_chan_by_mask(const dma_cap_mask_t *mask);

To request any channel matching with the requested capabilities, can be
used to request channel for memcpy, memset, xor, etc where no hardware
synchronization is needed.

struct dma_chan *dma_request_chan(struct device *dev, const char *name);
To request a slave channel. The dma_request_chan() will try to find the
channel via DT, ACPI or in case if the kernel booted in non DT/ACPI mode
it will use a filter lookup table and retrieves the needed information from
the dma_slave_map provided by the DMA drivers.
This legacy mode needs changes in platform code, in dmaengine drivers and
finally the dmaengine user drivers can be converted:

For each dmaengine driver an array of DMA device, slave and the parameter
for the filter function needs to be added:

static const struct dma_slave_map da830_edma_map[] = {
{ "davinci-mcasp.0", "rx", EDMA_FILTER_PARAM(0, 0) },
{ "davinci-mcasp.0", "tx", EDMA_FILTER_PARAM(0, 1) },
{ "davinci-mcasp.1", "rx", EDMA_FILTER_PARAM(0, 2) },
{ "davinci-mcasp.1", "tx", EDMA_FILTER_PARAM(0, 3) },
{ "davinci-mcasp.2", "rx", EDMA_FILTER_PARAM(0, 4) },
{ "davinci-mcasp.2", "tx", EDMA_FILTER_PARAM(0, 5) },
{ "spi_davinci.0", "rx", EDMA_FILTER_PARAM(0, 14) },
{ "spi_davinci.0", "tx", EDMA_FILTER_PARAM(0, 15) },
{ "da830-mmc.0", "rx", EDMA_FILTER_PARAM(0, 16) },
{ "da830-mmc.0", "tx", EDMA_FILTER_PARAM(0, 17) },
{ "spi_davinci.1", "rx", EDMA_FILTER_PARAM(0, 18) },
{ "spi_davinci.1", "tx", EDMA_FILTER_PARAM(0, 19) },
};

This information is going to be needed by the dmaengine driver, so
modification to the platform_data is needed, and the driver map should be
added to the pdata of the DMA driver:

da8xx_edma0_pdata.slave_map = da830_edma_map;
da8xx_edma0_pdata.slavecnt = ARRAY_SIZE(da830_edma_map);

The DMA driver then needs to configure the needed device -> filter_fn
mapping before it registers with dma_async_device_register() :

ecc->dma_slave.filter_map.map = info->slave_map;
ecc->dma_slave.filter_map.mapcnt = info->slavecnt;
ecc->dma_slave.filter_map.fn = edma_filter_fn;

When neither DT or ACPI lookup is available the dma_request_chan() will
try to match the requester's device name with the filter_map's list of
device names, when a match found it will use the information from the
dma_slave_map to get the channel with the dma_get_channel() internal
function.

Signed-off-by: Peter Ujfalusi 
Reviewed-by: Arnd Bergmann 
---
 Documentation/dmaengine/client.txt | 23 +++---
 drivers/dma/dmaengine.c| 89 +-
 include/linux/dmaengine.h  | 51 +++---
 3 files changed, 127 insertions(+), 36 deletions(-)

diff --git a/Documentation/dmaengine/client.txt 
b/Documentation/dmaengine/client.txt
index d9f9f461102a..9e33189745f0 100644
--- a/Documentation/dmaengine/client.txt
+++ b/Documentation/dmaengine/client.txt
@@ -22,25 +22,14 @@ The slave DMA usage consists of following steps:
Channel allocation is slightly different in the slave DMA context,
client drivers typically need a channel from a particular DMA
controller only and even in some cases a specific channel is desired.
-   To request a channel dma_request_channel() API is used.
+   To request a channel dma_request_chan() API is used.
 
Interface:
-   struct dma_chan *dma_request_channel(dma_cap_mask_t mask,
-   dma_filter_fn filter_fn,
-   void *filter_param);
-   where dma_filter_fn is defined as:
-   typedef bool (*dma_filter_fn)(struct dma_chan *chan, void 
*filter_param);
-
-   The 'filter_fn' parameter is optional, but highly recommended for
-   slave and cyclic channels as they typically need to obtain a specific
-   DMA channel.
-
-   When the optional 'filter_fn' parameter is NULL, dma_request_channel()
-   simply returns the first channel that satisfies the capability mask.
-
-   Otherwise, the 'filter_fn' routine will be called once for each free
-   channel which has a capability in 'mask'.  'filter_fn' is expected to
-   return 'true' when the desired DMA channel is found.
+   struct dma_chan *dma_request_chan(struct device *dev, const char *name);
+
+   Which will find and return the 'name' DMA channel associated with the 'dev'
+   device. The association is done via DT, ACPI or board file based
+   dma_slave_map matching table.
 
A channel allocated via this interface is exclusive to the caller,
until dma_release_channel() is called.
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index ea9d66982d40..c50a247be2e0 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -43,6 +43,7 @@
 
 #define

[PATCH V03 5/5] dmaengine: omap-dma: Add support for DMA filter mapping to slave devices

2015-12-14 Thread Peter Ujfalusi

Add support for providing device to filter_fn mapping so client drivers
can switch to use the dma_request_chan() API.

Signed-off-by: Peter Ujfalusi 
Reviewed-by: Arnd Bergmann 
---
 drivers/dma/omap-dma.c   | 4 
 include/linux/omap-dma.h | 6 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index f86827ac0c8a..9794b073d7d7 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -1153,6 +1153,10 @@ static int omap_dma_probe(struct platform_device *pdev)
return rc;
}
 
+   od->ddev.filter.map = od->plat->slave_map;
+   od->ddev.filter.mapcnt = od->plat->slavecnt;
+   od->ddev.filter.fn = omap_dma_filter_fn;
+
rc = dma_async_device_register(>ddev);
if (rc) {
pr_warn("OMAP-DMA: failed to register slave DMA engine device: 
%d\n",
diff --git a/include/linux/omap-dma.h b/include/linux/omap-dma.h
index 88fa8af2b937..1d99b61adc65 100644
--- a/include/linux/omap-dma.h
+++ b/include/linux/omap-dma.h
@@ -267,6 +267,9 @@ struct omap_dma_reg {
u8  type;
 };
 
+#define SDMA_FILTER_PARAM(hw_req)  ((int[]) { (hw_req) })
+struct dma_slave_map;
+
 /* System DMA platform data structure */
 struct omap_system_dma_plat_info {
const struct omap_dma_reg *reg_map;
@@ -278,6 +281,9 @@ struct omap_system_dma_plat_info {
void (*clear_dma)(int lch);
void (*dma_write)(u32 val, int reg, int lch);
u32 (*dma_read)(int reg, int lch);
+
+   const struct dma_slave_map *slave_map;
+   int slavecnt;
 };
 
 #ifdef CONFIG_ARCH_OMAP2PLUS
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V03 1/5] dmaengine: core: Skip mask matching when it is not provided to private_candidate

2015-12-14 Thread Peter Ujfalusi

If mask is NULL skip the mask matching against the DMA device capabilities.

Signed-off-by: Peter Ujfalusi 
Reviewed-by: Andy Shevchenko 
Reviewed-by: Arnd Bergmann 
---
 drivers/dma/dmaengine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index daf54a39bcc7..6311e1fc80be 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -515,7 +515,7 @@ static struct dma_chan *private_candidate(const 
dma_cap_mask_t *mask,
 {
struct dma_chan *chan;
 
-   if (!__dma_device_satisfies_mask(dev, mask)) {
+   if (mask && !__dma_device_satisfies_mask(dev, mask)) {
pr_debug("%s: wrong capabilities\n", __func__);
return NULL;
}
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V03 2/5] dmaengine: core: Move and merge the code paths using private_candidate

2015-12-14 Thread Peter Ujfalusi

Channel matching with private_candidate() is used in two paths, the error
checking is slightly different in them and they are duplicating code also.
Move the code under find_candidate() to provide consistent execution and
going to allow us to reuse this mode of channel lookup later.

Signed-off-by: Peter Ujfalusi 
Reviewed-by: Andy Shevchenko 
Reviewed-by: Arnd Bergmann 
---
 drivers/dma/dmaengine.c | 81 +
 1 file changed, 42 insertions(+), 39 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 6311e1fc80be..ea9d66982d40 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -546,6 +546,42 @@ static struct dma_chan *private_candidate(const 
dma_cap_mask_t *mask,
return NULL;
 }
 
+static struct dma_chan *find_candidate(struct dma_device *device,
+  const dma_cap_mask_t *mask,
+  dma_filter_fn fn, void *fn_param)
+{
+   struct dma_chan *chan = private_candidate(mask, device, fn, fn_param);
+   int err;
+
+   if (chan) {
+   /* Found a suitable channel, try to grab, prep, and return it.
+* We first set DMA_PRIVATE to disable balance_ref_count as this
+* channel will not be published in the general-purpose
+* allocator
+*/
+   dma_cap_set(DMA_PRIVATE, device->cap_mask);
+   device->privatecnt++;
+   err = dma_chan_get(chan);
+
+   if (err) {
+   if (err == -ENODEV) {
+   pr_debug("%s: %s module removed\n", __func__,
+dma_chan_name(chan));
+   list_del_rcu(>global_node);
+   } else
+   pr_debug("%s: failed to get %s: (%d)\n",
+__func__, dma_chan_name(chan), err);
+
+   if (--device->privatecnt == 0)
+   dma_cap_clear(DMA_PRIVATE, device->cap_mask);
+
+   chan = ERR_PTR(err);
+   }
+   }
+
+   return chan ? chan : ERR_PTR(-EPROBE_DEFER);
+}
+
 /**
  * dma_get_slave_channel - try to get specific channel exclusively
  * @chan: target channel
@@ -584,7 +620,6 @@ struct dma_chan *dma_get_any_slave_channel(struct 
dma_device *device)
 {
dma_cap_mask_t mask;
struct dma_chan *chan;
-   int err;
 
dma_cap_zero(mask);
dma_cap_set(DMA_SLAVE, mask);
@@ -592,23 +627,11 @@ struct dma_chan *dma_get_any_slave_channel(struct 
dma_device *device)
/* lock against __dma_request_channel */
mutex_lock(_list_mutex);
 
-   chan = private_candidate(, device, NULL, NULL);
-   if (chan) {
-   dma_cap_set(DMA_PRIVATE, device->cap_mask);
-   device->privatecnt++;
-   err = dma_chan_get(chan);
-   if (err) {
-   pr_debug("%s: failed to get %s: (%d)\n",
-   __func__, dma_chan_name(chan), err);
-   chan = NULL;
-   if (--device->privatecnt == 0)
-   dma_cap_clear(DMA_PRIVATE, device->cap_mask);
-   }
-   }
+   chan = find_candidate(device, , NULL, NULL);
 
mutex_unlock(_list_mutex);
 
-   return chan;
+   return IS_ERR(chan) ? NULL : chan;
 }
 EXPORT_SYMBOL_GPL(dma_get_any_slave_channel);
 
@@ -625,35 +648,15 @@ struct dma_chan *__dma_request_channel(const 
dma_cap_mask_t *mask,
 {
struct dma_device *device, *_d;
struct dma_chan *chan = NULL;
-   int err;
 
/* Find a channel */
mutex_lock(_list_mutex);
list_for_each_entry_safe(device, _d, _device_list, global_node) {
-   chan = private_candidate(mask, device, fn, fn_param);
-   if (chan) {
-   /* Found a suitable channel, try to grab, prep, and
-* return it.  We first set DMA_PRIVATE to disable
-* balance_ref_count as this channel will not be
-* published in the general-purpose allocator
-*/
-   dma_cap_set(DMA_PRIVATE, device->cap_mask);
-   device->privatecnt++;
-   err = dma_chan_get(chan);
+   chan = find_candidate(device, mask, fn, fn_param);
+   if (!IS_ERR(chan))
+   break;
 
-   if (err == -ENODEV) {
-   pr_debug("%s: %s module removed\n",
-__func__, dma_chan_name(chan));
-   list_del_rcu(>global_node);
-   } else if (err)
-   pr_debug("%s: failed to get %s: (%d)\n",
-

[PATCH V03 4/5] dmaengine: edma: Add support for DMA filter mapping to slave devices

2015-12-14 Thread Peter Ujfalusi

Add support for providing device to filter_fn mapping so client drivers
can switch to use the dma_request_chan() API.

Signed-off-by: Peter Ujfalusi 
Reviewed-by: Arnd Bergmann 
---
 drivers/dma/edma.c | 4 
 include/linux/platform_data/edma.h | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 16fe773fb846..2e8acde6b134 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -2314,6 +2314,10 @@ static int edma_probe(struct platform_device *pdev)
edma_set_chmap(>slave_chans[i], ecc->dummy_slot);
}
 
+   ecc->dma_slave.filter.map = info->slave_map;
+   ecc->dma_slave.filter.mapcnt = info->slavecnt;
+   ecc->dma_slave.filter.fn = edma_filter_fn;
+
ret = dma_async_device_register(>dma_slave);
if (ret) {
dev_err(dev, "slave ddev registration failed (%d)\n", ret);
diff --git a/include/linux/platform_data/edma.h 
b/include/linux/platform_data/edma.h
index 4299f4ba03bd..0a533f94438f 100644
--- a/include/linux/platform_data/edma.h
+++ b/include/linux/platform_data/edma.h
@@ -53,12 +53,16 @@ enum dma_event_q {
 #define EDMA_CTLR(i)   ((i) >> 16)
 #define EDMA_CHAN_SLOT(i)  ((i) & 0x)
 
+#define EDMA_FILTER_PARAM(ctlr, chan)  ((int[]) { EDMA_CTLR_CHAN(ctlr, chan) })
+
 struct edma_rsv_info {
 
const s16   (*rsv_chans)[2];
const s16   (*rsv_slots)[2];
 };
 
+struct dma_slave_map;
+
 /* platform_data for EDMA driver */
 struct edma_soc_info {
/*
@@ -76,6 +80,9 @@ struct edma_soc_info {
 
s8  (*queue_priority_mapping)[2];
const s16   (*xbar_chans)[2];
+
+   const struct dma_slave_map *slave_map;
+   int slavecnt;
 };
 
 #endif
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V03 0/5] dmaengine: New 'universal' API for requesting channel

2015-12-14 Thread Peter Ujfalusi

Hi,

Changes since v2:
- in patch 3 some tabs got replaced by spaces, now they are fixed
- added Reviewed-by from Arnd

Changes since v1:
- Added Reviewed-by from Andy for patch 1-2, I decided to not add the 
reviewed-by
  to patch 3 due to the changes since v1
- patch for omap-dma to support passing the filter setup to the core
- dma_request_slave_channel_reason() remeved and it is now defines as
  dma_request_chan()
- Print of warning removed when DT or ACPI lookup fails and we are going to
  Fallback to legacy lookup
- members of struct dma_filter has been revised for simplicity.

Changes since RFC v03:
- No longer RFC
- Dropped the arch/arm/mcah-davinci and daVinci MMC and SPI patches so we don't
  have inter subsystem issues.
- Comments from Andy to patch no 3 has been addressed with the exception of
  moving code over to device_property
- 'struct dma_filter_map' renamed as 'struct dma_slave_map'
- Code documentation added

Changes since RFC v02:
- Using has_acpi_companion() instead ACPI_HANDLE()
- mask matching change within private_candidate()
- Fallback in dma_request_chan() when DT/ACPI lookup fails.
- Rename dma_get_channel() -> find_candidate()
- Arch code changes as suggested by Arnd
- Some documentation updated, more need to be done.

Changes since RFC v01:
- dma_request_chan(); lost the mask parameter
- The new API does not rely on RESOURCE_DMA, instead the dma_slave_map table
  will be used to provide the needed information to the filter function in
  legacy mode
- Extended the example patches to convert most of daVinci to use the new API to
  request the DMA channels.

As it has been discussed in the following thread:
http://www.gossamer-threads.com/lists/linux/kernel/2181487#2181487

With this series I have taken a path which would result two new API, which can
be used to convert most of the current users already and with some work all
users might be able to move to this set.
With this set the filter_fn used for legacy (non DT/ACPI) channel request is no
longer needed to be exported to client drivers since the selection of the
correct filter_fn will be done in the core.

So, the first proposal is to have:

struct dma_chan *dma_request_chan(struct device *dev, const char *name);
struct dma_chan *dma_request_chan_by_mask(const dma_cap_mask_t *mask);

The dma_request_chan_by_mask() is to request any channel matching with the
requested capabilities, can be used to request channel for memcpy, memset, xor,
etc where no hardware synchronization is needed.

The dma_request_chan() is to request a slave channel. The dma_request_chan()
will try to find the channel via DT, ACPI or in case if the kernel booted in non
DT/ACPI mode it will use a filter lookup table and retrieves the needed
information from the dma_slave_map provided by the DMA drivers.
This legacy mode needs changes in platform code, in dmaengine drivers and
finally the dmaengine user drivers can be converted:

For each dmaengine driver an array of DMA device, slave and the parameter
for the filter function needs to be added:

static const struct dma_slave_map da830_edma_map[] = {
{ "davinci-mcasp.0", "rx", EDMA_FILTER_PARAM(0, 0) },
{ "davinci-mcasp.0", "tx", EDMA_FILTER_PARAM(0, 1) },
{ "davinci-mcasp.1", "rx", EDMA_FILTER_PARAM(0, 2) },
{ "davinci-mcasp.1", "tx", EDMA_FILTER_PARAM(0, 3) },
{ "davinci-mcasp.2", "rx", EDMA_FILTER_PARAM(0, 4) },
{ "davinci-mcasp.2", "tx", EDMA_FILTER_PARAM(0, 5) },
{ "spi_davinci.0", "rx", EDMA_FILTER_PARAM(0, 14) },
{ "spi_davinci.0", "tx", EDMA_FILTER_PARAM(0, 15) },
{ "da830-mmc.0", "rx", EDMA_FILTER_PARAM(0, 16) },
{ "da830-mmc.0", "tx", EDMA_FILTER_PARAM(0, 17) },
{ "spi_davinci.1", "rx", EDMA_FILTER_PARAM(0, 18) },
{ "spi_davinci.1", "tx", EDMA_FILTER_PARAM(0, 19) },
};

This information is going to be used by the dmaengine driver, so
modification to the platform_data is needed, and the driver map should be
added to the pdata of the DMA driver:

da8xx_edma0_pdata.slave_map = da830_edma_map;
da8xx_edma0_pdata.slavecnt = ARRAY_SIZE(da830_edma_map);

The DMA driver then needs to configure the needed device -> filter_fn
mapping before it registers with dma_async_device_register() :

ecc->dma_slave.filter_map.map = info->slave_map;
ecc->dma_slave.filter_map.mapcnt = info->slavecnt;
ecc->dma_slave.filter_map.fn = edma_filter_fn;

When neither DT or ACPI lookup is available the dma_request_chan() will
try to match the requester's device name with the filter_map's list of
device names, when a match found it will use the information from the
dma_slave_map to get the channel with the dma_get_channel() internal
function.

Tested on OMAP-L138 (dm850) EVM, with updtaed patches from RFC v03 [1].
Both legacy and DT boot works fine.

[1] https://www.mail-archive.com/linux-omap@vger.kernel.org/msg122016.html

Regards,
Peter
---
Peter Ujfalusi (5):
  dmaengine: core: Skip mask matching when it is not provided to

Re: [RFC/RFT PATCH] watchdog: Move watchdog device creation to watchdog_dev.c

2015-12-14 Thread Wim Van Sebroeck

On Sun, Dec 13, 2015 at 10:24:35PM -0800, Guenter Roeck wrote:

> On 12/13/2015 02:02 PM, Damien Riegel wrote:
> >On Mon, Dec 07, 2015 at 09:41:03PM +0100, Wim Van Sebroeck wrote:
> >>Hi All,
> >>
> >>>On 12/07/2015 08:15 AM, Damien Riegel wrote:
> On Sun, Dec 06, 2015 at 11:51:41AM -0800, Guenter Roeck wrote:
> >The watchdog character device s currently created in
> >watchdog_dev.c, and the watchdog device in watchdog_core.c. This
> >results in cross-dependencies, as the device creation needs to
> >know the watchdog character device number.
> >
> >On top of that, the watchdog character device is created before
> >the watchdog device is created. This can result in race conditions
> >if the watchdog device node is accessed before the watchdog device
> >has been created.
> >
> >To solve the problem, move watchdog device creation into
> >watchdog_dev.c, and create the watchdog device prior to creating
> >its device node. Also move device class creation into
> >watchdog_dev.c, since this is now the only place where the
> >watchdog class is needed.
> >
> >Inspired by an earlier patch set from Damien Riegel.
> >
> >Cc: Damien Riegel 
> >Signed-off-by: Guenter Roeck  --- Hi Damien,
> >
> >I think this approach would be a bit better. The watchdog device
> >isn't really used in the watchdog core code, so it is better
> >created in watchdog_dev.c. That also fits well with other pending
> >changes, such as sysfs attribute support, and my attempts to move
> >the ref/unref functions completely into the watchdog core. As a
> >side effect, it also cleans up the error path in
> >__watchdog_register_device().
> >
> >What do you think ?
> 
> Hi Guenter,
> 
> Like the idea, but I don't really get the separation. For instance,
> you move watchdog_class in watchdog_dev.c but you keep watchdog_ida
> in watchdog_core.c whereas it is only used for device
> creation/deletion.
> 
> >>>The class is watchdog driver internal, and it is device related, so
> >>>I think it made sense to move it to watchdog_dev.c. On top of that,
> >>>it will be needed there if/when we introduce sysfs attributes.
> >>>
> >>>The watchdog id can be determined by obtaining an id using ida, or
> >>>it can be provided through the watchdog alias. The operation to get
> >>>it is not device related, and it is not straightforward to obtain
> >>>it, so I thought it makes sense to keep the code in watchdog_core.c.
> >>>
> >>>Of course a lot of it is personal preference.
> >>>
> >>
> >>Let me go back to how I saw the design when I created the generic
> >>watchdog framework: When using watchdog device drivers we need to be
> >>able to support the /dev/watchdog system. I also foresaw that we
> >>should have a sysfs interface and I saw the future for watchdog
> >>devices that you should be able to choose between the 2 different
> >>systems. You should be able to use only the /dev/watchdog interfacing,
> >>but you should also be able to use both a sysfs interface and a
> >>/dev/watchdog interface and it should even be possible to have only a
> >>sysfs interface in certain embedded devices. So that's why I split the
> >>watchdog framework over 3 files: core code, the /dev/watchdog
> >>interfacing and the sysfs code. Since I want to have compiled code
> >>small enough when choosing either /Dev/watchdog or sysfs or both this
> >>sounded the most logical thing to do (Unless you have a single file
> >>full of #ifdef-ery that becomes unreadable).
> >>
> >>So I do not agree to have sysfs code in watchdog_dev.c . It belongs in
> >>watchdog_sysfs.c imho. If someone has a better idea, I'll be glad to
> >>listen to it and see what the benefits are. But I want a clean system
> >>for excluding both /dev/ (current watchdog_dev.c) and/or sysfs
> >>(watchdog_sysfs.c) in the future. Off-course the current behaviour is
> >>to have the /dev/ interface and have the option to add sysfs
> >>attributes.
> >
> >I agree that keeping sysfs code separate makes sense, as someone might
> >want to not use it.
> >
> I am not really sure about that. I don't recall a similar concern with
> any other subsystem.
> 
> Anyway, sure, we can move the code to another file. Sure, we can add a
> configuration option. That means we'll also need to make several functions
> non-static, and possibly move some functions out of watchdog_dev.c
> into yet another file. But we'll need some guidance for that and an idea
> what is going to be acceptable.
> 
> >The question is: can we make the /dev/watchdog entries optional ? That
> >would break the compatibility, right? Imho, it would be saner to keep
> >only one way to interact with watchdogs (ie. keep /dev/watchdog as is
> >and don't make it optional, and sysfs read-only and eventually
> >optional). I think that question should be answered before we can decide
> >how we want to split the code between watchdog_dev.c

Re: [PATCH v6 3/4] arm64: mm: support ARCH_MMAP_RND_BITS.

2015-12-14 Thread Daniel Cashman

On 12/14/2015 03:19 AM, Will Deacon wrote:
>> +# max bits determined by the following formula:
>> +#  VA_BITS - PAGE_SHIFT - 3
> 
> Now that we have this comment, I think we can drop the unsupported
> combinations from the list below. That means we just end up with:
> 
>> +config ARCH_MMAP_RND_BITS_MAX
>> +   default 19 if ARM64_VA_BITS=36
>> +   default 24 if ARM64_VA_BITS=39
>> +   default 27 if ARM64_VA_BITS=42
>> +   default 30 if ARM64_VA_BITS=47
>> +   default 29 if ARM64_VA_BITS=48 && ARM64_64K_PAGES
>> +   default 31 if ARM64_VA_BITS=48 && ARM64_16K_PAGES
>> +   default 33 if ARM64_VA_BITS=48

Unless you object, I'd like to keep the last 3 as well, to mirror the
min bits, should any new configurations be added but not reflected here:
+   default 15 if ARM64_64K_PAGES
+   default 17 if ARM64_16K_PAGES
+   default 18

The first two of these three should be changed as well to 14 and 16.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Is PROT_SOCK still relevant?

2015-12-14 Thread Jason Newton

On Mon, Dec 14, 2015 at 2:39 PM, One Thousand Gnomes
 wrote:
>> Perhaps lets consider this in another way if it is strongly held that
>> this is worth while in the default configuration: can it default off
>> in the context of selinux / other security frameworks (preferably
>> based on their detection and/or controllably settable at runtime)?
>> Those allow more powerful and finer grain control and don't need this
>> to be there as they already provide auditing on what operations and
>> port numbers should be allowed by what programs.
>
> That would be a regression and a very very bad one to have. The defaults
> need to always be the same as before - or stronger and never go back
> towards insecurity, otherwise they could make things less safe.

Even if you don't think it should be default, there's still a case
having a knob for leaving it to the auditing framework to deal with
it, or perhaps sysctl tunable ranges like on FreeBSD.  That way none
of the workarounds mentioned have to be invoked and tuned, which
increases maintenance and setup burden.  On some systems, these
methods may not be available, too.  Android is one that comes to mind.

I openly stated this issue has been brought up for me *this time* due
to Android, but it still does keep coming up.  It's on my Linux Kernel
bucket list to get it addressed/tunable.  This isn't isn't going to be
changed and make it to where it matters for me this occurrence with
any practical timing - but I'm trying to prevent the next occurrence
I'll have with it - and its not in my expectations it'll be Android at
that point.

>
>> Or how about letting port number concerns be handled by those security
>> frameworks all together considering it is limited security?
>
> There are already half a dozen different ways to handle it from xinetd
> through setcap, to systemd spawning it, to iptables.

Most (all?) of those methods have sacrifices as previously noted:
Systemd isn't everywhere still and may never be, setcap doesn't work
with java/python and the like, iptables has significant performance
loss when scalability is important and increased configuration
detail... never tried with xinetd.  Is one of these the sure fire way
or should we be happy we have so many choices with each their own
caveats?

-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 2094 matches

Mail list logo