Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

2016-12-21 Thread Peter Zijlstra
On Thu, Dec 22, 2016 at 08:21:23PM +1300, Eric W. Biederman wrote:
> 
> And please make the array the last item in the structure so that
> expanding or contracting it does not affect the ability to read the rest
> of the structure.

Sorry, sample_id must be last, because hysterical crud :/

(basically because that was the only way to add a field to records like
PERF_RECORD_MMAP which used the record length to determine the
filename[] length, yes I know, we won't ever do that again).


Re: [PATCH 2/4] vfio-mdev: de-polute the namespace, rename parent_device & parent_ops

2016-12-21 Thread Jike Song
Not sure if this is appropriate, but if not having the Documentation considered,
for patch 2-4:

Reviewed-by: Jike Song 

--
Thanks,
Jike

On 12/22/2016 07:27 AM, Alex Williamson wrote:
> From: Alex Williamson 
> 
> Add an mdev_ prefix so we're not poluting the namespace so much.
> 
> Cc: Kirti Wankhede 
> Cc: Zhenyu Wang 
> Cc: Zhi Wang 
> Cc: Jike Song 
> Signed-off-by: Alex Williamson 
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c |2 +-
>  drivers/vfio/mdev/mdev_core.c|   28 ++--
>  drivers/vfio/mdev/mdev_private.h |6 +++---
>  drivers/vfio/mdev/mdev_sysfs.c   |8 
>  drivers/vfio/mdev/vfio_mdev.c|   12 ++--
>  include/linux/mdev.h |   16 
>  samples/vfio-mdev/mtty.c |2 +-
>  7 files changed, 37 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 4dd6722..081ada2 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1089,7 +1089,7 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, 
> unsigned int cmd,
>   return 0;
>  }
>  
> -static const struct parent_ops intel_vgpu_ops = {
> +static const struct mdev_parent_ops intel_vgpu_ops = {
>   .supported_type_groups  = intel_vgpu_type_groups,
>   .create = intel_vgpu_create,
>   .remove = intel_vgpu_remove,
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index be1ee89..4a140e0 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -42,7 +42,7 @@ static int _find_mdev_device(struct device *dev, void *data)
>   return 0;
>  }
>  
> -static bool mdev_device_exist(struct parent_device *parent, uuid_le uuid)
> +static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
>  {
>   struct device *dev;
>  
> @@ -56,9 +56,9 @@ static bool mdev_device_exist(struct parent_device *parent, 
> uuid_le uuid)
>  }
>  
>  /* Should be called holding parent_list_lock */
> -static struct parent_device *__find_parent_device(struct device *dev)
> +static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> - struct parent_device *parent;
> + struct mdev_parent *parent;
>  
>   list_for_each_entry(parent, &parent_list, next) {
>   if (parent->dev == dev)
> @@ -69,8 +69,8 @@ static struct parent_device *__find_parent_device(struct 
> device *dev)
>  
>  static void mdev_release_parent(struct kref *kref)
>  {
> - struct parent_device *parent = container_of(kref, struct parent_device,
> - ref);
> + struct mdev_parent *parent = container_of(kref, struct mdev_parent,
> +   ref);
>   struct device *dev = parent->dev;
>  
>   kfree(parent);
> @@ -78,7 +78,7 @@ static void mdev_release_parent(struct kref *kref)
>  }
>  
>  static
> -inline struct parent_device *mdev_get_parent(struct parent_device *parent)
> +inline struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
>  {
>   if (parent)
>   kref_get(&parent->ref);
> @@ -86,7 +86,7 @@ inline struct parent_device *mdev_get_parent(struct 
> parent_device *parent)
>   return parent;
>  }
>  
> -static inline void mdev_put_parent(struct parent_device *parent)
> +static inline void mdev_put_parent(struct mdev_parent *parent)
>  {
>   if (parent)
>   kref_put(&parent->ref, mdev_release_parent);
> @@ -95,7 +95,7 @@ static inline void mdev_put_parent(struct parent_device 
> *parent)
>  static int mdev_device_create_ops(struct kobject *kobj,
> struct mdev_device *mdev)
>  {
> - struct parent_device *parent = mdev->parent;
> + struct mdev_parent *parent = mdev->parent;
>   int ret;
>  
>   ret = parent->ops->create(kobj, mdev);
> @@ -122,7 +122,7 @@ static int mdev_device_create_ops(struct kobject *kobj,
>   */
>  static int mdev_device_remove_ops(struct mdev_device *mdev, bool 
> force_remove)
>  {
> - struct parent_device *parent = mdev->parent;
> + struct mdev_parent *parent = mdev->parent;
>   int ret;
>  
>   /*
> @@ -153,10 +153,10 @@ static int mdev_device_remove_cb(struct device *dev, 
> void *data)
>   * Add device to list of registered parent devices.
>   * Returns a negative value on error, otherwise 0.
>   */
> -int mdev_register_device(struct device *dev, const struct parent_ops *ops)
> +int mdev_register_device(struct device *dev, const struct mdev_parent_ops 
> *ops)
>  {
>   int ret;
> - struct parent_device *parent;
> + struct mdev_parent *parent;
>  
>   /* check for mandatory ops */
>   if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
> @@ -229,7 +229,7 @@ int mdev_register_device(struct device *dev, const struct 
> parent_ops *ops)
>  
>  void mdev_unregister_device(struct device *dev)
>  

Re: [PATCH] ARM64: zynqmp: Fix i2c node's compatible string

2016-12-21 Thread Michal Simek
On 22.12.2016 06:49, Moritz Fischer wrote:
> From: Moritz Fischer 
> 
> The Zynq Ultrascale MP uses version 1.4 of the Cadence IP core
> which fixes some silicon bugs that needed software workarounds
> in Version 1.0 that was used on Zynq systems.
> 
> Signed-off-by: Moritz Fischer 
> Cc: Michal Simek 
> Cc: Sören Brinkmann 
> Cc: U-Boot List 
> Cc: Rob Herring 
> ---
> 
> Hi Michal,
> 
> I think this is a slip up and should be r1p14 for
> Ultrascale ZynqMP. drivers/i2c/i2c-cadence.c already uses this.
> I Cc'd the u-boot list, because the same change would be required there.
> 
> Cheers,
> 
> Moritz
> 
> ---
>  arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
> b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
> index 68a90833..a5a5f91 100644
> --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
> +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
> @@ -175,7 +175,7 @@
>   };
>  
>   i2c0: i2c@ff02 {
> - compatible = "cdns,i2c-r1p10";
> + compatible = "cdns,i2c-r1p14";

I was checking this internally and p10 is doing something what p14
doesn't need to do. That's why this should be

compatible = "cdns,i2c-r1p14", "cdns,i2c-r1p10";

The same of course for u-boot where also p14 should be added to the driver.

Thanks,
Michal


Re: [PATCH 2/2] nsfs: Add an ioctl() to return creator UID of a userns

2016-12-21 Thread Eric W. Biederman
Andrei Vagin  writes:

> On Mon, Dec 19, 2016 at 03:38:35PM +0100, Michael Kerrisk (man-pages) wrote:
>> @@ -174,6 +175,11 @@ static long ns_ioctl(struct file *filp, unsigned int 
>> ioctl,
>>  return open_related_ns(ns, ns->ops->get_parent);
>>  case NS_GET_NSTYPE:
>>  return ns->ops->type;
>> +case NS_GET_CREATOR_UID:
>> +if (ns->ops->type != CLONE_NEWUSER)
>> +return -EINVAL;
>> +user_ns = container_of(ns, struct user_namespace, ns);
>> +return from_kuid_munged(current_user_ns(), user_ns->owner);
>
> uid_t is "unsigned int", ioctl() returns long, so it may be hard to
> distinguish user id-s from errors on x32.

Very good point.

> off-topic: What is about user_ns->group? I can't find where it is
> used...

Over design. I put it in because I thought it might be useful.  It turns
out it never was used so we can clean things up and remove it.  The
group has never been exposed to userspace so no one will care.

Eric


Re: [PATCH v4 1/3] perf: add PERF_RECORD_NAMESPACES to include namespaces related info

2016-12-21 Thread Eric W. Biederman
Hari Bathini  writes:

> On Wednesday 21 December 2016 06:54 PM, Peter Zijlstra wrote:
>> On Wed, Dec 21, 2016 at 06:39:01PM +0530, Hari Bathini wrote:
>>> Hi Peter,
 I don't see how the tool can parse old records (with NAMESPACES_MAX ==
 7) if you set its NAMESPACES_MAX to say 10.

 Then it will expect the link_info array to be 10 entries and either read
 past the end of the record (if !sample_all) or try and interpret
 sample_id as link_info records.

>>> Right. There will be inconsistency with data the perf tool tries to read
>>> beyond
>>> what the kernel supports. IIUC, you mean, include nr_namespaces field in the
>>> record and warn the user if it doesn't match with the one perf-tool supports
>>> before proceeding..?
>> Yes, if you add a nr_namespaces field its always parsable. If an old
>> tool finds more namespace than it has 'names' for it can always display
>> the raw index number. If a new tool finds the array short, it will not
>> display the missing ones.
>>
>
> Sure, Peter. Will post the next version as soon as
> I am back from vacation..

And please make the array the last item in the structure so that
expanding or contracting it does not affect the ability to read the rest
of the structure.

Eric



[PATCH 2/3] xen: return xenstore command failures via response instead of rc

2016-12-21 Thread Juergen Gross
When the xenbus driver does some special handling for a Xenstore
command any error condition related to the command should be returned
via an error response instead of letting the related write operation
fail. Otherwise the user land handler might take wrong decisions
assuming the connection to Xenstore is broken.

While at it try to return the same error values xenstored would
return for those cases.

Signed-off-by: Juergen Gross 
---
 drivers/xen/xenbus/xenbus_dev_frontend.c | 47 ++--
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c 
b/drivers/xen/xenbus/xenbus_dev_frontend.c
index a068281..79130b3 100644
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -302,6 +302,29 @@ static void watch_fired(struct xenbus_watch *watch,
mutex_unlock(&adap->dev_data->reply_mutex);
 }
 
+static int xenbus_command_reply(struct xenbus_file_priv *u,
+   unsigned int msg_type, const char *reply)
+{
+   struct {
+   struct xsd_sockmsg hdr;
+   const char body[16];
+   } msg;
+   int rc;
+
+   msg.hdr = u->u.msg;
+   msg.hdr.type = msg_type;
+   msg.hdr.len = strlen(reply) + 1;
+   if (msg.hdr.len > sizeof(msg.body))
+   return -E2BIG;
+
+   mutex_lock(&u->reply_mutex);
+   rc = queue_reply(&u->read_buffers, &msg, sizeof(msg.hdr) + msg.hdr.len);
+   wake_up(&u->read_waitq);
+   mutex_unlock(&u->reply_mutex);
+
+   return rc;
+}
+
 static int xenbus_write_transaction(unsigned msg_type,
struct xenbus_file_priv *u)
 {
@@ -321,7 +344,7 @@ static int xenbus_write_transaction(unsigned msg_type,
if (trans->handle.id == u->u.msg.tx_id)
break;
if (&trans->list == &u->transactions)
-   return -ESRCH;
+   return xenbus_command_reply(u, XS_ERROR, "ENOENT");
}
 
reply = xenbus_dev_request_and_reply(&u->u.msg);
@@ -372,12 +395,12 @@ static int xenbus_write_watch(unsigned msg_type, struct 
xenbus_file_priv *u)
path = u->u.buffer + sizeof(u->u.msg);
token = memchr(path, 0, u->u.msg.len);
if (token == NULL) {
-   rc = -EILSEQ;
+   rc = xenbus_command_reply(u, XS_ERROR, "EINVAL");
goto out;
}
token++;
if (memchr(token, 0, u->u.msg.len - (token - path)) == NULL) {
-   rc = -EILSEQ;
+   rc = xenbus_command_reply(u, XS_ERROR, "EINVAL");
goto out;
}
 
@@ -411,23 +434,7 @@ static int xenbus_write_watch(unsigned msg_type, struct 
xenbus_file_priv *u)
}
 
/* Success.  Synthesize a reply to say all is OK. */
-   {
-   struct {
-   struct xsd_sockmsg hdr;
-   char body[3];
-   } __packed reply = {
-   {
-   .type = msg_type,
-   .len = sizeof(reply.body)
-   },
-   "OK"
-   };
-
-   mutex_lock(&u->reply_mutex);
-   rc = queue_reply(&u->read_buffers, &reply, sizeof(reply));
-   wake_up(&u->read_waitq);
-   mutex_unlock(&u->reply_mutex);
-   }
+   rc = xenbus_command_reply(u, msg_type, "OK");
 
 out:
return rc;
-- 
2.10.2



[PATCH 3/3] xen: remove stale xs_input_avail() from header

2016-12-21 Thread Juergen Gross
In drivers/xen/xenbus/xenbus_comms.h there is a stale declaration of
xs_input_avail(). Remove it.

Signed-off-by: Juergen Gross 
---
 drivers/xen/xenbus/xenbus_comms.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/xen/xenbus/xenbus_comms.h 
b/drivers/xen/xenbus/xenbus_comms.h
index e74f9c1..867a2e4 100644
--- a/drivers/xen/xenbus/xenbus_comms.h
+++ b/drivers/xen/xenbus/xenbus_comms.h
@@ -42,7 +42,6 @@ int xb_write(const void *data, unsigned len);
 int xb_read(void *data, unsigned len);
 int xb_data_to_read(void);
 int xb_wait_for_data_to_read(void);
-int xs_input_avail(void);
 extern struct xenstore_domain_interface *xen_store_interface;
 extern int xen_store_evtchn;
 extern enum xenstore_init xen_store_domain_type;
-- 
2.10.2



[PATCH 1/3] xen: xenbus driver must not accept invalid transaction ids

2016-12-21 Thread Juergen Gross
When accessing Xenstore in a transaction the user is specifying a
transaction id which he normally obtained from Xenstore when starting
the transaction. Xenstore is validating a transaction id against all
known transaction ids of the connection the request came in. As all
requests of a domain not being the one where Xenstore lives share
one connection, validation of transaction ids of different users of
Xenstore in that domain should be done by the kernel of that domain
being the multiplexer between the Xenstore users in that domain and
Xenstore.

In order to prohibit one Xenstore user to be able to "hijack" a
transaction from another user the xenbus driver has to verify a
given transaction id against all known transaction ids of the user
before forwarding it to Xenstore.

Signed-off-by: Juergen Gross 
---
 drivers/xen/xenbus/xenbus_dev_frontend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c 
b/drivers/xen/xenbus/xenbus_dev_frontend.c
index 6c0ead4..a068281 100644
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -316,7 +316,7 @@ static int xenbus_write_transaction(unsigned msg_type,
rc = -ENOMEM;
goto out;
}
-   } else if (msg_type == XS_TRANSACTION_END) {
+   } else if (u->u.msg.tx_id != 0) {
list_for_each_entry(trans, &u->transactions, list)
if (trans->handle.id == u->u.msg.tx_id)
break;
-- 
2.10.2



[PATCH 0/3] xen: fix some minor bugs and cleanup of xenbus

2016-12-21 Thread Juergen Gross
Do some minor bug fixes and cleanup of xenbus driver.

Juergen Gross (3):
  xen: xenbus driver must not accept invalid transaction ids
  xen: return xenstore command failures via response instead of rc
  xen: remove stale xs_input_avail() from header

 drivers/xen/xenbus/xenbus_comms.h|  1 -
 drivers/xen/xenbus/xenbus_dev_frontend.c | 49 ++--
 2 files changed, 28 insertions(+), 22 deletions(-)

-- 
2.10.2



Re: [PATCH 0/2] Add further ioctl() operations for namespace discovery

2016-12-21 Thread Michael Kerrisk (man-pages)
Hi Eric,

On 12/22/2016 01:27 AM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)"  writes:
> 
>> Hi Eric,
>>
>> On 12/21/2016 01:17 AM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)"  writes:
>>>
 Hi Eric,

 On 12/20/2016 09:22 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)"  writes:
>
>> Hello Eric,
>>
>> On 12/19/2016 11:53 PM, Eric W. Biederman wrote:
>>> "Michael Kerrisk (man-pages)"  writes:
>>>
> 
>>> Now the question becomes who are the users of this?  Because it just
>>> occurred to me that we now have an interesting complication.  Userspace
>>> extending the meaning of the capability bits, and using to protect
>>> additional things.  Ugh.  That could be a maintenance problem of another
>>> flavor.  Definitely not my favorite.
>>
>> I don't follow you here. Could you say some more about what you mean?
> 
> I have seen user space userspace do thing such as extend CAP_SYS_REBOOT
> to things such as permission to invoke "shutdown -r now".  Which
> depending on what a clean reboot entails could be greately increasing
> the scope of CAP_SYS_REBOOT.
> 
> I am concerned for that and similar situations that userspace
> applications could lead us into situation that one wrong decision could
> wind up being an unfixable mistake because fixing the mistake would
> break userspsace.

Okay.

>>> So why are we asking the questions about what permissions a process has?
>>
>> My main interest here is monitoring/discovery/debugging on a running
>> system. NS_GET_PARENT, NS_GET_USERNS, NS_GET_CREATOR_UID, and 
>> NS_GET_NSTYPE provide most of what I'd like to see. Being able to ask
>> "does this process have permissions in that namespace?" would be nice 
>> to have in terms of understanding/debugging a system.
> 
> If we are just looking at explanations then I seem to have been
> over-engineering things.  So let's just aim at the two ioctls.
> Or at least the information in those ioctls.

Okay.

> With at least a comment on the ioctl returning the OWNER_UID that
> describes why it is not a problem to if the owners uid is something like
> ((uid_t)-3).  Which overlaps with the space for error return codes.
>
> I don't know if we are fine or not, but that review comment definitely
> deserves some consideration.


See my reply just sent to Andrei. We should instead then just return 
the UID via a buffer pointed to by the ioctl() argument:

ioctl(fd, NS_GET_OWNER_UID, &uid);

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


RE: ATH9 driver issues on ARM64

2016-12-21 Thread Bharat Kumar Gogada
Hi All,

After further debugging we know the place it hangs.

In function:
static int ath_reset_internal (struct ath_softc *sc, struct ath9k_channel 
*hchan)
{
disable_irq(sc->irq);
tasklet_disable(&sc->intr_tq);
tasklet_disable(&sc->bcon_tasklet);
spin_lock_bh(&sc->sc_pcu_lock);



if (!ath_complete_reset(sc, true))  -> This function enables 
hardware interrupts
r = -EIO;

out:
enable_irq(sc->irq);-> Here IRQ line state is 
changed to enable state
spin_unlock_bh(&sc->sc_pcu_lock);
tasklet_enable(&sc->bcon_tasklet);
tasklet_enable(&sc->intr_tq);

}

static bool ath_complete_reset(struct ath_softc *sc, bool start)
{
struct ath_hw *ah = sc->sc_ah;
struct ath_common *common = ath9k_hw_common(ah);
unsigned long flags;

ath9k_calculate_summary_state(sc, sc->cur_chan);
ath_startrecv(sc);


  
sc->gtt_cnt = 0;

ath9k_hw_set_interrupts(ah);-> Here hardware interrupts are 
being enabled
ath9k_hw_enable_interrupts(ah); -> We see hang after this line
ieee80211_wake_queues(sc->hw);
ath9k_p2p_ps_timer(sc);

return true;
}

Before changing IRQ line to to enabled state, hardware interrupts are being 
enabled. 
Wont this cause a race condition where within this period of hardware raises an 
interrupt, but IRQ line state is disabled state, this will 
reach the following condition making EP handler not being invoked.

void handle_simple_irq(struct irq_desc *desc)
{
raw_spin_lock(&desc->lock);
   ... 
if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) {
// This condition is reaching and becoming true.
desc->istate |= IRQS_PENDING;
goto out_unlock;
}

kstat_incr_irqs_this_cpu(desc);
handle_irq_event(desc);

out_unlock:
raw_spin_unlock(&desc->lock);
}

We see hang at that statement, without reaching back enable_irq, looks like by 
this time CPU is in stall.

Can any tell why hardware interrupts are being enabled before kernel changing 
IRQ line state?


Regards,
Bharat

> 
> > On Sat, Dec 10, 2016 at 02:40:48PM +, Bharat Kumar Gogada wrote:
> > > Hi,
> > >
> > > After taking some more lecroy traces, we see that after 2nd ASSERT
> > > from EP
> > on ARM64 we see continuous data movement of 32 dwords or 12 dwords and
> > never sign of DEASSERT.
> > > Comparatively on working traces (x86) after 2nd assert there are
> > > only BAR
> > register reads and writes and then DEASSERT, for almost most of the
> > interrupts and we haven't seen 12 or 32 dwords data movement on this trace.
> > >
> > > I did not work on EP wifi/network drivers, any help why EP needs
> > > those many
> > number of data at scan time ?
> >
> > The device doesn't know whether it's in an x86 or an arm64 system.  If
> > it works differently, it must be because the PCI core or the driver is
> > programming the device differently.
> >
> > You should be able to match up Memory transactions from the host in
> > the trace with things the driver does.  For example, if you see an
> > Assert_INTx message from the device, you should eventually see a
> > Memory Read from the host to get the ISR, i.e., some read done in the bowels
> of ath9k_hw_getisr().
> >
> > I don't know how the ath9k device works, but there must be some Memory
> > Read or Write done by the driver that tells the device "we've handled this
> interrupt".
> > The device should then send a Deassert_INTx; of course, if the device
> > still requires service, e.g., because it has received more packets, it
> > might leave the INTx asserted.
> >
> > I doubt you'd see exactly the same traces on x86 and arm64 because
> > they aren't seeing the same network packets and the driver is executing at
> different rates.
> > But you should at least be able to identify interrupt assertion and
> > the actions of the driver's interrupt service routine.
> 
> 
> Thanks Bjorn.
> 
> As you mentioned we did try to debug in that path. After we start scan after 
> 2nd
> ASSERT we see lots of 32 and 12 dword data, and in function void
> ath9k_hw_enable_interrupts(struct ath_hw *ah) {
>   ...
>   ..
>   REG_WRITE(ah, AR_IER, AR_IER_ENABLE);
>   // EP driver hangs at this
> position after 2nd ASSERT
>   // The following writes are not
> happening
> if (!AR_SREV_9100(ah)) {
> REG_WRITE(ah, AR_INTR_ASYNC_ENABLE, async_mask);
> REG_WRITE(ah, AR_INTR_ASYNC_MASK, async_mask);
> 
> REG_WRITE(ah, AR_INTR_SYNC_ENABLE, sync_default);
> REG_WRITE(ah, AR_INTR_SYNC_MASK, sync_default);
> }
> ath_dbg(common, INTERRUPT, "AR_IMR 0x%x IER 0x%x\n",
> REG_R

Re: [PATCH 2/2] nsfs: Add an ioctl() to return creator UID of a userns

2016-12-21 Thread Michael Kerrisk (man-pages)
Hi Andrei,

On 12/21/2016 04:13 AM, Andrei Vagin wrote:
> On Mon, Dec 19, 2016 at 03:38:35PM +0100, Michael Kerrisk (man-pages) wrote:
>> # Some open questions about this patch below.
>> #
>> One of the rules regarding capabilities is:
>>
>> A process that resides in the parent of the user namespace and
>> whose effective user ID matches the owner of the namespace has
>> all capabilities in the namespace.
>>
>> Therefore, in order to write code that discovers whether process X has
>> capabilities in namespace Y, we need a way to find out who the creator
>> of a user namespace is. This patch adds an NS_GET_CREATOR_UID ioctl()
>> that returns the (munged) UID of the creator of the user namespace
>> referred to by the specified file descriptor.
>>
>> If the supplied file descriptor does not refer to a user namespace,
>> the operation fails with the error EINVAL.
>>
>> Signed-off-by: Michael Kerrisk 
>> ---
>>  fs/nsfs.c | 6 ++
>>  include/uapi/linux/nsfs.h | 8 +---
>>  2 files changed, 11 insertions(+), 3 deletions(-)
>>
>> Open questions:
>>
>> * Would it be preferabe to separate the logic for NS_GET_CREATOR_UID
>>   into a small helper function?
>> * Is this a correct use of container_of()? I did not immediately
>>   see another way to get to the user_namespace struct, but I
>>   may well have missed something.
>>
>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>> index 5d53476..26f6d94 100644
>> --- a/fs/nsfs.c
>> +++ b/fs/nsfs.c
>> @@ -163,6 +163,7 @@ int open_related_ns(struct ns_common *ns,
>>  static long ns_ioctl(struct file *filp, unsigned int ioctl,
>>  unsigned long arg)
>>  {
>> +struct user_namespace *user_ns;
>>  struct ns_common *ns = get_proc_ns(file_inode(filp));
>>  
>>  switch (ioctl) {
>> @@ -174,6 +175,11 @@ static long ns_ioctl(struct file *filp, unsigned int 
>> ioctl,
>>  return open_related_ns(ns, ns->ops->get_parent);
>>  case NS_GET_NSTYPE:
>>  return ns->ops->type;
>> +case NS_GET_CREATOR_UID:
>> +if (ns->ops->type != CLONE_NEWUSER)
>> +return -EINVAL;
>> +user_ns = container_of(ns, struct user_namespace, ns);
>> +return from_kuid_munged(current_user_ns(), user_ns->owner);
> 
> uid_t is "unsigned int", ioctl() returns long, so it may be hard to
> distinguish user id-s from errors on x32.

Good point. So, we could instead return the UID via a buffer pointed to 
by the ioctl() arg. That would seem better, right?

> off-topic: What is about user_ns->group? I can't find where it is used...

I've no idea. Like you, I can't see any place where it's being used.

Cheers,

Michael


>>  default:
>>  return -ENOTTY;
>>  }
>> diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
>> index 2b48df1..b3c6c78 100644
>> --- a/include/uapi/linux/nsfs.h
>> +++ b/include/uapi/linux/nsfs.h
>> @@ -6,11 +6,13 @@
>>  #define NSIO0xb7
>>  
>>  /* Returns a file descriptor that refers to an owning user namespace */
>> -#define NS_GET_USERNS   _IO(NSIO, 0x1)
>> +#define NS_GET_USERNS   _IO(NSIO, 0x1)
>>  /* Returns a file descriptor that refers to a parent namespace */
>> -#define NS_GET_PARENT   _IO(NSIO, 0x2)
>> +#define NS_GET_PARENT   _IO(NSIO, 0x2)
>>  /* Returns the type of namespace (CLONE_NEW* value) referred to by
>> file descriptor */
>> -#define NS_GET_NSTYPE   _IO(NSIO, 0x3)
>> +#define NS_GET_NSTYPE   _IO(NSIO, 0x3)
>> +/* Get creator UID for a user namespace */
>> +#define NS_GET_CREATOR_UID  _IO(NSIO, 0x4)
>>  
>>  #endif /* __LINUX_NSFS_H */
>> -- 
>> 2.5.5
>>
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: [PATCH v18 0/4] Introduce usb charger framework to deal with the usb gadget power negotation

2016-12-21 Thread Baolin Wang
On 22 December 2016 at 07:47, NeilBrown  wrote:
> On Wed, Dec 21 2016, Baolin Wang wrote:
>
>> On 21 December 2016 at 11:48, NeilBrown  wrote:
>>> On Wed, Dec 21 2016, Baolin Wang wrote:
>>>
 Hi,

 On 21 December 2016 at 06:07, NeilBrown  wrote:
> On Tue, Dec 20 2016, Baolin Wang wrote:
>
>> Hi Neil,
>>
>> On 3 November 2016 at 09:25, NeilBrown  wrote:
>>> On Tue, Nov 01 2016, Baolin Wang wrote:
>>>
>>>
> So I won't be responding on this topic any further until I see a 
> genuine
> attempt to understand and resolve the inconsistencies with
> usb_register_notifier().

 Any better solution?
>>>
>>> I'm not sure exactly what you are asking, so I'll assume you are asking
>>> the question I want to answer :-)
>>>
>>> 1/ Liase with the extcon developers to resolve the inconsistencies
>>>   with USB connector types.
>>>   e.g. current there is both "EXTCON_USB" and "EXTCON_CHG_USB_SDP"
>>>   which both seem to suggest a standard downstream port.  There is no
>>>   documentation describing how these relate, and no consistent practice
>>>   to copy.
>>>   I suspect the intention is that
>>> EXTCON_USB and EXTCON_USB_HOST indicated that data capabilities of
>>> the cable, while EXTCON_CHG_USB* indicate the power capabilities of
>>> the cable.
>>> So EXTCON_CHG_USB_SDP should always appear together with EXTCON_USB
>>> while EXTCON_CHG_USB_DCP would not, and EXTCON_CHG_USB_ACA
>>> would normally appear with EXTCON_USB_HOST (I think).
>>>   Some drivers follow this model, particularly extcon-max14577.c
>>>   but it is not consistent.
>>>
>>>   This policy should be well documented and possibly existing drivers
>>>   should be updated to follow it.
>>>
>>>   At the same time it would make sense to resolve EXTCON_CHG_USB_SLOW
>>>   and EXTCON_CHG_USB_FAST.  These names don't mean much.
>>>   They were recently removed from drivers/power/axp288_charger.c
>>>   which is good, but are still used in drivers/extcon/extcon-max*
>>>   Possibly they should be changed to names from the standard, or
>>>   possibly they should be renamed to identify the current they are
>>>   expected to provide. e.g. EXTCON_CHG_USB_500MA and EXTCON_CHG_USB_1A
>>
>> Now I am creating the new patchset with fixing and converting exist 
>> drivers.
>
> Thanks!
>
>>
>> I did some investigation about EXTCON subsystem. From your suggestion:
>> 1. EXTCON_CHG_USB_SDP should always appear together with EXTCON_USB.
>>  After checking, now all extcon drivers were following this rule.
>
> what about extcon-axp288.c ?
> axp288_handle_chrg_det_event() sets or clears EXTCON_CHG_USB_SDP but
> never sets EXTCON_USB.
> Similarly phy-rockchip-inno-usb2.c never sets EXTCON_USB.

 Ha, sorry, I missed these 2 files, and I will fix them.

>
>>
>> 2. EXTCON_CHG_USB_ACA would normally appear with EXTCON_USB_HOST.
>>  Now no extcon drivers used EXTCON_CHG_USB_ACA, then no need to
>> change.
>
> Agreed.
>
>>
>> 3. Change EXTCON_CHG_USB_SLOW/FAST to EXTCON_CHG_USB_500MA/1A.
>>  There are no model that shows the slow charger should be 500mA
>> and fast charger is 1A. (In extcon-max77693.c file, the fast charger
>> can be drawn 2A), so changing to EXTCON_CHG_USB_500MA/1A is not useful
>> I think.
>
> Leaving the names a SLOW/FAST is less useful as those names don't *mean*
> anything.
> The only place where the cable types are registered are in
>  extcon-max{14577,77693,77843,8997}.c
>
> In each case, the code strongly suggests that the meaning is that "slow"
> means "500mA" and that "fast" means "1A" (or sometimes 1A-2A).
>
> With names like "fast" and "slow", any common changer framework cannot
> make use of these cable types as the name doesn't mean anything.
> If the names were changed to 500MA/1A, then common code could reasonably
> assume how much current can safely be drawn from each.

 As I know, some fast charger can be drawn 5A, then do we need another
 macro named 5A? then will introduce more macros in future, I am not
 true this is helpful.
>>>
>>> It isn't really a question of what the charger can provide.  It is a
>>> question of what the cable reports to the phy.
>>
>> Yes, there is no spec to describe fast/slow charger type and how much
>> current fast/slow charger can provide. Maybe some fast charger can
>> provide 1A/2A, others can provide 5A, which depends on users'
>> platform. If we change to EXTCON_CHG_USB_500MA/1A and some fast
>> charger can provide 1.5A on user's platform, will it report the fast
>> charger type by EXTCON_CHG_USB_1A on user's platform (but it can
>> provide 1.5A)? So what I mean, can we keep EXTCON_CHG

Re: [PATCH 2/2] net: wireless: fix to uses struct

2016-12-21 Thread kbuild test robot
Hi Ozgur,

[auto build test ERROR on mac80211-next/master]
[also build test ERROR on v4.9 next-20161221]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Ozgur-Karatas/net-wireless-fixed-to-checkpatch-errors/20161222-125128
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git 
master
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/wireless/reg.c: In function 'reg_query_builtin':
>> net/wireless/reg.c:493:28: error: 'reg_regdb_apply_request' undeclared 
>> (first use in this function)
 request = kzalloc(sizeof(*reg_regdb_apply_request), GFP_KERNEL);
   ^~~
   net/wireless/reg.c:493:28: note: each undeclared identifier is reported only 
once for each function it appears in
   net/wireless/reg.c: In function 'regulatory_hint_core':
   net/wireless/reg.c:2294:28: error: 'regulatory_request' undeclared (first 
use in this function)
 request = kzalloc(sizeof(*regulatory_request), GFP_KERNEL);
   ^~
   net/wireless/reg.c: In function 'regulatory_hint_user':
   net/wireless/reg.c:2316:28: error: 'regulatory_request' undeclared (first 
use in this function)
 request = kzalloc(sizeof(*regulatory_request), GFP_KERNEL);
   ^~
   net/wireless/reg.c: In function 'regulatory_hint':
   net/wireless/reg.c:2388:28: error: 'regulatory_request' undeclared (first 
use in this function)
 request = kzalloc(sizeof(*regulatory_request), GFP_KERNEL);
   ^~

vim +/reg_regdb_apply_request +493 net/wireless/reg.c

   487  }
   488  }
   489  
   490  if (!regdom)
   491  return -ENODATA;
   492  
 > 493  request = kzalloc(sizeof(*reg_regdb_apply_request), GFP_KERNEL);
   494  if (!request)
   495  return -ENOMEM;
   496  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


RE: [PATCH] acpi: Fix format string type mistakes

2016-12-21 Thread Zheng, Lv
The original change doesn't handle ACPI_SIZE (which can be UINT64/UINT32 for 
different architectures) correctly.
I changed it when it is back ported.
This sounds like a different problem.

Thanks and best regards
Lv

> From: Moore, Robert
> Subject: RE: [PATCH] acpi: Fix format string type mistakes
> 
> These formatting changes will not compile under:
> 
> Gcc 4.4.5
> Gcc 5.4.0
> 
> The printf formatting stuff is very delicate, as ACPICA has to be compiled 
> under many different
> compilers.
> 
> Bob
> 
> > From: Zheng, Lv
> > Subject: RE: [PATCH] acpi: Fix format string type mistakes
> >
> > Hi, Kees and Emese
> >
> > The pull request is under rebasing.
> > So if you cannot reach the URL, find the commit here:
> > https://github.com/acpica/acpica/pull/196
> >
> > Thanks and best regards
> > Lv
> >
> > > From: Zheng, Lv
> > > Subject: RE: [PATCH] acpi: Fix format string type mistakes
> > >
> > > Hi, Kees and Emese
> > >
> > > I just helped to back port the commit here:
> > > https://github.com/acpica/acpica/pull/196/commits/5e64857f
> > > If you can see something wrong in it, please let me know.
> > >
> > > Thanks and best regards
> > > Lv
> > >
> > > > From: Devel [mailto:devel-boun...@acpica.org] On Behalf Of Zheng, Lv
> > > > Subject: Re: [Devel] [PATCH] acpi: Fix format string type mistakes
> > > >
> > > > Hi,
> > > >
> > > > > From: Kees Cook [mailto:keesc...@chromium.org]
> > > > > Subject: [PATCH] acpi: Fix format string type mistakes
> > > > >
> > > > > From: Emese Revfy 
> > > > >
> > > > > This adds the missing __printf attribute which allows compile time
> > > > > format string checking (and will be used by the coming initify gcc
> > > > > plugin). Additionally, this fixes the warnings exposed by the
> > attribute.
> > > > >
> > > > > Signed-off-by: Emese Revfy 
> > > > > [kees: split scsi/acpi, merged attr and fix, new commit messages]
> > > > > Signed-off-by: Kees Cook 
> > > > > ---
> > > > >  drivers/acpi/acpica/dbhistry.c |  2 +-
> > > > > drivers/acpi/acpica/dbinput.c  | 10 ++---
> > > > > drivers/acpi/acpica/dbstats.c  | 88
> > > > > +-
> > > > >  drivers/acpi/acpica/utdebug.c  |  2 +-
> > > > >  include/acpi/acpiosxf.h|  3 +-
> > > > >  5 files changed, 53 insertions(+), 52 deletions(-)
> > > > >
> > > > > diff --git a/drivers/acpi/acpica/dbhistry.c
> > > > > b/drivers/acpi/acpica/dbhistry.c index 46bd65d38df9..ec9da4830f6a
> > > > > 100644
> > > > > --- a/drivers/acpi/acpica/dbhistry.c
> > > > > +++ b/drivers/acpi/acpica/dbhistry.c
> > > > > @@ -155,7 +155,7 @@ void acpi_db_display_history(void)
> > > > >
> > > > >   for (i = 0; i < acpi_gbl_num_history; i++) {
> > > > >   if (acpi_gbl_history_buffer[history_index].command) {
> > > > > - acpi_os_printf("%3ld %s\n",
> > > > > + acpi_os_printf("%3u %s\n",
> > > > >
> > acpi_gbl_history_buffer[history_index].
> > > > >  cmd_num,
> > > > >
> > acpi_gbl_history_buffer[history_index].
> > > > > diff --git a/drivers/acpi/acpica/dbinput.c
> > > > > b/drivers/acpi/acpica/dbinput.c index 068214f9cc9d..43be06bdb790
> > > > > 100644
> > > > > --- a/drivers/acpi/acpica/dbinput.c
> > > > > +++ b/drivers/acpi/acpica/dbinput.c
> > > > > @@ -608,7 +608,7 @@ static u32 acpi_db_get_line(char
> > *input_buffer)
> > > > >   (acpi_gbl_db_parsed_buf, sizeof(acpi_gbl_db_parsed_buf),
> > > > >input_buffer)) {
> > > > >   acpi_os_printf
> > > > > - ("Buffer overflow while parsing input line (max %u
> > characters)\n",
> > > > > + ("Buffer overflow while parsing input line (max %lu
> > > > > +characters)\n",
> > > > >sizeof(acpi_gbl_db_parsed_buf));
> > > > >   return (0);
> > > > >   }
> > > > > @@ -864,24 +864,24 @@ acpi_db_command_dispatch(char *input_buffer,
> > > > >
> > > > >   if (param_count == 0) {
> > > > >   acpi_os_printf
> > > > > - ("Current debug level for file output is:
> > %8.8lX\n",
> > > > > + ("Current debug level for file output is:
> > %8.8X\n",
> > > > >acpi_gbl_db_debug_level);
> > > > >   acpi_os_printf
> > > > > - ("Current debug level for console output is:
> > %8.8lX\n",
> > > > > + ("Current debug level for console output is:
> > %8.8X\n",
> > > > >acpi_gbl_db_console_debug_level);
> > > > >   } else if (param_count == 2) {
> > > > >   temp = acpi_gbl_db_console_debug_level;
> > > > >   acpi_gbl_db_console_debug_level =
> > > > >   strtoul(acpi_gbl_db_args[1], NULL, 16);
> > > > >   acpi_os_printf
> > > > > - ("Debug Level for console output was %8.8lX,
> > now %8.8lX\n",
> > > 

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
On Wed, Dec 21, 2016 at 09:46:37PM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner  wrote:
> >
> > There may be deeper issues. I just started running scalability tests
> > (e.g. 16-way fsmark create tests) and about a minute in I got a
> > directory corruption reported - something I hadn't seen in the dev
> > cycle at all.
> 
> By "in the dev cycle", do you mean your XFS changes, or have you been
> tracking the merge cycle at least for some testing?

I mean the three months leading up to the 4.10 merge, when all the
XFS changes were being tested against 4.9-rc kernels.

The iscsi problem showed up when I updated the base kernel from
4.9 to 4.10-current last week to test the pullreq I was going to
send you. I've been bust with other stuff until now, so I didn't
upgrade my working trees again until today in the hope the iscsi
problem had already been found and fixed.

> > I unmounted the fs, mkfs'd it again, ran the
> > workload again and about a minute in this fired:
> >
> > [628867.607417] [ cut here ]
> > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> > shadow_lru_isolate+0x171/0x220
> 
> Well, part of the changes during the merge window were the shadow
> entry tracking changes that came in through Andrew's tree. Adding
> Johannes Weiner to the participants.
> 
> > Now, this workload does not touch the page cache at all - it's
> > entirely an XFS metadata workload, so it should not really be
> > affecting the working set code.
> 
> Well, I suspect that anything that creates memory pressure will end up
> triggering the working set code, so ..
> 
> That said, obviously memory corruption could be involved and result in
> random issues too, but I wouldn't really expect that in this code.
> 
> It would probably be really useful to get more data points - is the
> problem reliably in this area, or is it going to be random and all
> over the place.

The iscsi problem is 100% reproducable. create a pair of iscsi luns,
mkfs, run xfstests on them. iscsi fails a second after xfstests mounts
the filesystems.

The test machine I'm having all these other problems on? stable and
steady as a rock using PMEM devices. Moment I go to use /dev/vdc
(i.e. run load/perf benchmarks) it starts falling over left, right
and center.

And I just smacked into this in the bulkstat phase of the benchmark
(mkfs, fsmark, xfs_repair, mount, bulkstat, find, grep, rm):

[ 2729.750563] BUG: Bad page state in process bstat  pfn:14945
[ 2729.751863] page:ea525140 count:-1 mapcount:0 mapping:  
(null) index:0x0
[ 2729.753763] flags: 0x4000()
[ 2729.754671] raw: 4000   

[ 2729.756469] raw: dead0100 dead0200  

[ 2729.758276] page dumped because: nonzero _refcount
[ 2729.759393] Modules linked in:
[ 2729.760137] CPU: 7 PID: 25902 Comm: bstat Tainted: GB   
4.9.0-dgc #18
[ 2729.761888] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[ 2729.763943] Call Trace:
[ 2729.764523]  
[ 2729.765004]  dump_stack+0x63/0x83
[ 2729.765784]  bad_page+0xc4/0x130
[ 2729.766552]  free_pages_check_bad+0x4f/0x70
[ 2729.767531]  free_pcppages_bulk+0x3c5/0x3d0
[ 2729.768513]  ? page_alloc_cpu_dead+0x30/0x30
[ 2729.769510]  drain_pages_zone+0x41/0x60
[ 2729.770417]  drain_pages+0x3e/0x60
[ 2729.771215]  drain_local_pages+0x24/0x30
[ 2729.772138]  flush_smp_call_function_queue+0x88/0x160
[ 2729.773317]  generic_smp_call_function_single_interrupt+0x13/0x30
[ 2729.774742]  smp_call_function_single_interrupt+0x27/0x40
[ 2729.776000]  smp_call_function_interrupt+0xe/0x10
[ 2729.777102]  call_function_interrupt+0x8e/0xa0
[ 2729.778147] RIP: 0010:delay_tsc+0x41/0x90
[ 2729.779085] RSP: 0018:c9000f0cf500 EFLAGS: 0202 ORIG_RAX: 
ff03
[ 2729.780852] RAX: 77541291 RBX: 88008b5efe40 RCX: 002e
[ 2729.782514] RDX: 0577 RSI: 05541291 RDI: 0001
[ 2729.784167] RBP: c9000f0cf500 R08: 0007 R09: c9000f0cf678
[ 2729.785818] R10: 0006 R11: 1000 R12: 0061
[ 2729.787480] R13: 0001 R14: 83214e30 R15: 0080
[ 2729.789124]  
[ 2729.789626]  __delay+0xf/0x20
[ 2729.790333]  do_raw_spin_lock+0x8c/0x160
[ 2729.791255]  _raw_spin_lock+0x15/0x20
[ 2729.792112]  list_lru_add+0x1a/0x70
[ 2729.792932]  xfs_buf_rele+0x3e7/0x410
[ 2729.793792]  xfs_buftarg_shrink_scan+0x6b/0x80
[ 2729.794841]  shrink_slab.part.65.constprop.86+0x1dc/0x410
[ 2729.796099]  shrink_node+0x57/0x90
[ 2729.796905]  do_try_to_free_pages+0xdd/0x230
[ 2729.797914]  try_to_free_pages+0xce/0x1a0
[ 2729.798852]  __alloc_pages_slowpath+0x2df/0x960
[ 2729.799908]  __alloc_pages_nodemask+0x24b/0x290
[ 2729.800963]  new_slab+0x2ac/0x380
[ 2729.801743]  ___slab_alloc.constprop.82+0x336/0x440
[ 2729.802890]  ? kmem_zone_alloc

Detecting kprobes generated code addresses

2016-12-21 Thread Josh Poimboeuf
Hi Masami,

I would like to make __kernel_text_address() be able to detect whether
an address belongs to code which was generated by kprobes.  As far as I
can tell, that information seems to be in the 'pages' lists of
kprobe_insn_slots and kprobe_optinsn_slots.  But they seem to be
protected by mutexes.  Do you know if there's a sleep-free way to access
that
protected 

-- 
Josh


[PATCH] ib umem: bug: put pid back before return from error path

2016-12-21 Thread Kenneth Lee
I catched this bug when reading the code. I'm sorry I have no hardware to test
it. But it is abviously a bug.

Signed-off-by: Kenneth Lee 
---
 drivers/infiniband/core/umem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 1e62a5f..4609b92 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -134,6 +134,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
 IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND));
 
if (access & IB_ACCESS_ON_DEMAND) {
+   put_pid(umem->pid);
ret = ib_umem_odp_get(context, umem);
if (ret) {
kfree(umem);
@@ -149,6 +150,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
 
page_list = (struct page **) __get_free_page(GFP_KERNEL);
if (!page_list) {
+   put_pid(umem->pid);
kfree(umem);
return ERR_PTR(-ENOMEM);
}
-- 
1.9.1



Detecting kprobes generated code addresses

2016-12-21 Thread Josh Poimboeuf
Hi Masami,

I would like to make __kernel_text_address() be able to detect whether
an address belongs to code which was generated by kprobes.  As far as I
can tell, that information seems to be in the 'pages' lists of
kprobe_insn_slots and kprobe_optinsn_slots.  But they seem to be
protected by mutexes.  Do you know if there's a sleep-free way to access
that information?

-- 
Josh


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Christoph Hellwig
On Thu, Dec 22, 2016 at 05:30:46PM +1100, Dave Chinner wrote:
> > For "normal" bios the for_each_segment loop iterates over bi_vcnt,
> > so it will be ignored anyway.  That being said both I and the lists
> > got CCed halfway through the thread and I haven't seen the original
> > report, so I'm not really sure what's going on here anyway.
> 
> http://www.gossamer-threads.com/lists/linux/kernel/2587485

This doesn't look like the discard changes, but if Chris wants to test
without them f9d03f96b988 reverts cleanly.


Re: [PATCH v5 0/6] inherit dma configuration from parent dev

2016-12-21 Thread Vivek Gautam
On Thu, Nov 17, 2016 at 5:13 PM, Sriram Dash  wrote:
> For xhci-hcd platform device, all the DMA parameters are not
> configured properly, notably dma ops for dwc3 devices.
>
> The idea here is that you pass in the parent of_node along
> with the child device pointer, so it would behave exactly
> like the parent already does. The difference is that it also
> handles all the other attributes besides the mask.
>
> Arnd Bergmann (6):
>   usb: separate out sysdev pointer from usb_bus
>   usb: chipidea: use bus->sysdev for DMA configuration
>   usb: ehci: fsl: use bus->sysdev for DMA configuration
>   usb: xhci: use bus->sysdev for DMA configuration
>   usb: dwc3: use bus->sysdev for DMA configuration
>   usb: dwc3: Do not set dma coherent mask

Tested patches 1, 4 & 5 on db820c platform with required set of patches [1] for
phy.

Tested-by: Vivek Gautam 
for the above mentioned patches.

[1] https://lkml.org/lkml/2016/12/20/392


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


[GIT PULL] SELinux fix for 4.10

2016-12-21 Thread James Morris
Please pull.

>From Paul: "A small SELinux patch to fix some clang/llvm compiler warnings 
and ensure the tools under scripts work well in the face of kernel 
changes."


The following changes since commit 52bce91165e5f2db422b2b972e83d389e5e4725c:

  splice: reinstate SIGPIPE/EPIPE handling (2016-12-21 10:59:34 -0800)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
for-linus

James Morris (1):
  Merge branch 'stable-4.10' of 
git://git.infradead.org/users/pcmoore/selinux into for-linus

Paul Moore (1):
  selinux: use the kernel headers when building scripts/selinux

 scripts/selinux/genheaders/Makefile |4 +++-
 scripts/selinux/genheaders/genheaders.c |4 
 scripts/selinux/mdp/Makefile|4 +++-
 scripts/selinux/mdp/mdp.c   |4 
 security/selinux/include/classmap.h |2 ++
 5 files changed, 16 insertions(+), 2 deletions(-)

---

commit bfc5e3a6af397dcf9c99a6c1872458e7867c4680
Author: Paul Moore 
Date:   Wed Dec 21 10:39:25 2016 -0500

selinux: use the kernel headers when building scripts/selinux

Commit 3322d0d64f4e ("selinux: keep SELinux in sync with new capability
definitions") added a check on the defined capabilities without
explicitly including the capability header file which caused problems
when building genheaders for users of clang/llvm.  Resolve this by
using the kernel headers when building genheaders, which is arguably
the right thing to do regardless, and explicitly including the
kernel's capability.h header file in classmap.h.  We also update the
mdp build, even though it wasn't causing an error we really should
be using the headers from the kernel we are building.

Reported-by: Nicolas Iooss 
Signed-off-by: Paul Moore 

diff --git a/scripts/selinux/genheaders/Makefile 
b/scripts/selinux/genheaders/Makefile
index 1d1ac51..6fc2b87 100644
--- a/scripts/selinux/genheaders/Makefile
+++ b/scripts/selinux/genheaders/Makefile
@@ -1,4 +1,6 @@
 hostprogs-y:= genheaders
-HOST_EXTRACFLAGS += -Isecurity/selinux/include
+HOST_EXTRACFLAGS += \
+   -I$(srctree)/include/uapi -I$(srctree)/include \
+   -I$(srctree)/security/selinux/include
 
 always := $(hostprogs-y)
diff --git a/scripts/selinux/genheaders/genheaders.c 
b/scripts/selinux/genheaders/genheaders.c
index 539855f..f4dd41f 100644
--- a/scripts/selinux/genheaders/genheaders.c
+++ b/scripts/selinux/genheaders/genheaders.c
@@ -1,3 +1,7 @@
+
+/* NOTE: we really do want to use the kernel headers here */
+#define __EXPORTED_HEADERS__
+
 #include 
 #include 
 #include 
diff --git a/scripts/selinux/mdp/Makefile b/scripts/selinux/mdp/Makefile
index dba7eff..d6a83ca 100644
--- a/scripts/selinux/mdp/Makefile
+++ b/scripts/selinux/mdp/Makefile
@@ -1,5 +1,7 @@
 hostprogs-y:= mdp
-HOST_EXTRACFLAGS += -Isecurity/selinux/include
+HOST_EXTRACFLAGS += \
+   -I$(srctree)/include/uapi -I$(srctree)/include \
+   -I$(srctree)/security/selinux/include
 
 always := $(hostprogs-y)
 clean-files:= policy.* file_contexts
diff --git a/scripts/selinux/mdp/mdp.c b/scripts/selinux/mdp/mdp.c
index e10beb1..c29fa4a 100644
--- a/scripts/selinux/mdp/mdp.c
+++ b/scripts/selinux/mdp/mdp.c
@@ -24,6 +24,10 @@
  * Authors: Serge E. Hallyn 
  */
 
+
+/* NOTE: we really do want to use the kernel headers here */
+#define __EXPORTED_HEADERS__
+
 #include 
 #include 
 #include 
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index e2d4ad3..13ae49b 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -1,3 +1,5 @@
+#include 
+
 #define COMMON_FILE_SOCK_PERMS "ioctl", "read", "write", "create", \
 "getattr", "setattr", "lock", "relabelfrom", "relabelto", "append"
 


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
On Thu, Dec 22, 2016 at 07:18:27AM +0100, Christoph Hellwig wrote:
> On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> > Looking around a bit, the only even halfway suspicious scatterlist
> > initialization thing I see is commit f9d03f96b988 ("block: improve
> > handling of the magic discard payload") which used to have a magic
> > hack wrt !bio->bi_vcnt, and that got removed. See __blk_bios_map_sg(),
> > now it does __blk_bvec_map_sg() instead.
> 
> But that check was only for discard (and discard-like) bios which
> had the maic single page that sometimes was unused attached.
> 
> For "normal" bios the for_each_segment loop iterates over bi_vcnt,
> so it will be ignored anyway.  That being said both I and the lists
> got CCed halfway through the thread and I haven't seen the original
> report, so I'm not really sure what's going on here anyway.

http://www.gossamer-threads.com/lists/linux/kernel/2587485

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: Patch to include/linux/kernel.h breaks 3rd party modules.

2016-12-21 Thread Christoph Hellwig
On Wed, Dec 21, 2016 at 03:42:05PM -0500, Valdis Kletnieks wrote:
> Yes, I know that usually out-of-tree modules are on their own.
> However, this one may require a rethink..
> 
> (Sorry for not catching this sooner, I hadn't tried to deal with the
> affected module since this patch hit linux-next in next-20161128)

So fix your out of tree module.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
On Thu, Dec 22, 2016 at 04:13:22PM +1100, Dave Chinner wrote:
> On Wed, Dec 21, 2016 at 04:13:03PM -0800, Chris Leech wrote:
> > On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> > > Hi,
> > > 
> > > On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner  wrote:
> > > > On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris Leech wrote:
> > > >> Thanks Dave,
> > > >>
> > > >> I'm hitting a bug at scatterlist.h:140 before I even get any iSCSI
> > > >> modules loaded (virtio block) so there's something else going on in the
> > > >> current merge window.  I'll keep an eye on it and make sure there's
> > > >> nothing iSCSI needs fixing for.
> > > >
> > > > OK, so before this slips through the cracks.
> > > >
> > > > Linus - your tree as of a few minutes ago still panics immediately
> > > > when starting xfstests on iscsi devices. It appears to be a
> > > > scatterlist corruption and not an iscsi problem, so the iscsi guys
> > > > seem to have bounced it and no-one is looking at it.
> > > 
> > > Hmm. There's not much to go by.
> > > 
> > > Can somebody in iscsi-land please try to just bisect it - I'm not
> > > seeing a lot of clues to where this comes from otherwise.
> > 
> > Yeah, my hopes of this being quickly resolved by someone else didn't
> > work out and whatever is going on in that test VM is looking like a
> > different kind of odd.  I'm saving that off for later, and seeing if I
> > can't be a bisect on the iSCSI issue.
> 
> There may be deeper issues. I just started running scalability tests
> (e.g. 16-way fsmark create tests) and about a minute in I got a
> directory corruption reported - something I hadn't seen in the dev
> cycle at all. I unmounted the fs, mkfs'd it again, ran the
> workload again and about a minute in this fired:
> 
> [628867.607417] [ cut here ]
> [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> shadow_lru_isolate+0x171/0x220
> [628867.610702] Modules linked in:
> [628867.611375] CPU: 2 PID: 16925 Comm: kworker/2:97 Tainted: GW  
>  4.9.0-dgc #18
> [628867.613382] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Debian-1.8.2-1 04/01/2014
> [628867.616179] Workqueue: events rht_deferred_worker
> [628867.632422] Call Trace:
> [628867.634691]  dump_stack+0x63/0x83
> [628867.637937]  __warn+0xcb/0xf0
> [628867.641359]  warn_slowpath_null+0x1d/0x20
> [628867.643362]  shadow_lru_isolate+0x171/0x220
> [628867.644627]  __list_lru_walk_one.isra.11+0x79/0x110
> [628867.645780]  ? __list_lru_init+0x70/0x70
> [628867.646628]  list_lru_walk_one+0x17/0x20
> [628867.647488]  scan_shadow_nodes+0x34/0x50
> [628867.648358]  shrink_slab.part.65.constprop.86+0x1dc/0x410
> [628867.649506]  shrink_node+0x57/0x90
> [628867.650233]  do_try_to_free_pages+0xdd/0x230
> [628867.651157]  try_to_free_pages+0xce/0x1a0
> [628867.652342]  __alloc_pages_slowpath+0x2df/0x960
> [628867.653332]  ? __might_sleep+0x4a/0x80
> [628867.654148]  __alloc_pages_nodemask+0x24b/0x290
> [628867.655237]  kmalloc_order+0x21/0x50
> [628867.656016]  kmalloc_order_trace+0x24/0xc0
> [628867.656878]  __kmalloc+0x17d/0x1d0
> [628867.657644]  bucket_table_alloc+0x195/0x1d0
> [628867.658564]  ? __might_sleep+0x4a/0x80
> [628867.659449]  rht_deferred_worker+0x287/0x3c0
> [628867.660366]  ? _raw_spin_unlock_irq+0xe/0x30
> [628867.661294]  process_one_work+0x1de/0x4d0
> [628867.662208]  worker_thread+0x4b/0x4f0
> [628867.662990]  kthread+0x10c/0x140
> [628867.663687]  ? process_one_work+0x4d0/0x4d0
> [628867.664564]  ? kthread_create_on_node+0x40/0x40
> [628867.665523]  ret_from_fork+0x25/0x30
> [628867.666317] ---[ end trace 7c38634006a9955e ]---
> 
> Now, this workload does not touch the page cache at all - it's
> entirely an XFS metadata workload, so it should not really be
> affecting the working set code.

The system back up, and I haven't reproduced this problem yet.
However, benchmark results are way off where they should be, and at
times the performance is utterly abysmal. The XFS for-next tree
based on the 4.9 kernel shows none of these problems, so I don't
think there's an XFS problem here. Workload is the same 16-way
fsmark workload that I've been using for years as a performance
regression test.

The workload normally averages around 230k files/s - i'm seeing
and average of ~175k files/s on you current kernel. And there are
periods where performance just completely tanks:

#  ./fs_mark  -D  1  -S0  -n  10  -s  0  -L  32  -d  /mnt/scratch/0  -d 
 /mnt/scratch/1  -d  /mnt/scratch/2  -d  /mnt/scratch/3  -d  /mnt/scratch/4  -d 
 /mnt/scratch/5  -d  /mnt/scratch/6  -d  /mnt/scratch/7  -d  /mnt/scratch/8  -d 
 /mnt/scratch/9  -d  /mnt/scratch/10  -d  /mnt/scratch/11  -d  /mnt/scratch/12  
-d  /mnt/scratch/13  -d  /mnt/scratch/14  -d  /mnt/scratch/15
#   Version 3.3, 16 thread(s) starting at Thu Dec 22 16:29:20 2016
#   Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
#   Directories:  Time based hash between directories acr

Re: [PATCH v1] security: Add a new hook: inode_touch_atime

2016-12-21 Thread Christoph Hellwig
On Thu, Dec 22, 2016 at 12:15:06AM +0100, Mickaël Salaün wrote:
> Add a new LSM hook named inode_touch_atime which is needed to deny
> indirect update of extended file attributes (i.e. access time) which are
> not catched by the inode_setattr hook. By creating a new hook instead of
> calling inode_setattr, we avoid to simulate a useless struct iattr.
> 
> This hook allows to create read-only environments as with read-only
> mount points. It can also take care of anonymous inodes.

And LSM has absolutely no business doing that - that's what the mount
code is for.


Re: [PATCH] Btrfs: add another missing end_page_writeback on submit_extent_page failure

2016-12-21 Thread Liu Bo
On Fri, Dec 16, 2016 at 03:41:50PM +0900, Takafumi Kubota wrote:
> This is actually inspired by Filipe's patch(55e3bd2e0c2e1).
> 
> When submit_extent_page() in __extent_writepage_io() fails,
> Btrfs misses clearing a writeback bit of the failed page.
> This causes the false under-writeback page.
> Then, another sync task hangs in filemap_fdatawait_range(),
> because it waits the false under-writeback page.
> 
> CPU0CPU1
> 
> __extent_writepage_io()
>   ret = submit_extent_page() // fail
> 
>   if (ret)
> SetPageError(page)
> // miss clearing the writeback bit
> 
> sync()
>   ...
>   filemap_fdatawait_range()
> wait_on_page_writeback(page);
> // wait the false under-writeback page
> 
> Signed-off-by: Takafumi Kubota 
> ---
>  fs/btrfs/extent_io.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1e67723..ef9793b 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3443,8 +3443,10 @@ static noinline_for_stack int 
> __extent_writepage_io(struct inode *inode,
>bdev, &epd->bio, max_nr,
>end_bio_extent_writepage,
>0, 0, 0, false);
> - if (ret)
> + if (ret) {
>   SetPageError(page);
> + end_page_writeback(page);
> + }

OK...this could be complex as we don't know which part in
submit_extent_page gets the error, if the page has been added into bio
and bio_end would call end_page_writepage(page) as well, so whichever
comes later, the BUG() in end_page_writeback() would complain.

Looks like commit 55e3bd2e0c2e1 also has the same problem although I
gave it my reviewed-by.

Thanks,

-liubo

>  
>   cur = cur + iosize;
>   pg_offset += iosize;
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm64: setup: introduce kaslr_offset()

2016-12-21 Thread Yury Norov
On Sun, Dec 11, 2016 at 03:50:55AM +0300, Alexander Popov wrote:
> Introduce kaslr_offset() similarly to x86_64 for fixing kcov.
> 
> Signed-off-by: Alexander Popov 
> ---
>  arch/arm64/include/asm/setup.h  | 19 +++
>  arch/arm64/include/uapi/asm/setup.h |  4 ++--
>  arch/arm64/kernel/setup.c   |  8 
>  3 files changed, 25 insertions(+), 6 deletions(-)
>  create mode 100644 arch/arm64/include/asm/setup.h
> 
> diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
> new file mode 100644
> index 000..e7b59b9
> --- /dev/null
> +++ b/arch/arm64/include/asm/setup.h
> @@ -0,0 +1,19 @@
> +/*
> + * arch/arm64/include/asm/setup.h
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef __ASM_SETUP_H
> +#define __ASM_SETUP_H
> +
> +#include 
> +
> +static inline unsigned long kaslr_offset(void)
> +{
> + return kimage_vaddr - KIMAGE_VADDR;
> +}
> +
> +#endif

Hi Alexander,

I found today's linux-next master broken:
In file included from init/main.c:88:0:
./arch/arm64/include/asm/setup.h:14:100: error: redefinition of ‘kaslr_offset’
In file included from ./arch/arm64/include/asm/page.h:54:0,
   from ./include/linux/mm_types.h:16,
   from ./include/linux/sched.h:27,
   from ./arch/arm64/include/asm/compat.h:25,
   from ./arch/arm64/include/asm/stat.h:23,
   from ./include/linux/stat.h:5,
   from ./include/linux/module.h:10,
   from init/main.c:15:
/arch/arm64/include/asm/memory.h:168:100: note: previous definition of 
‘kaslr_offset’ was here scripts/Makefile.build:293: recipe for target 
'init/main.o' failed
make[1]: *** [init/main.o] Error 1

It looks like you declare kaslr_offset() twice - in this patch, and in 7ede8665f
(arm64: setup: introduce kaslr_offset()). 

Yury


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Christoph Hellwig
On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> Looking around a bit, the only even halfway suspicious scatterlist
> initialization thing I see is commit f9d03f96b988 ("block: improve
> handling of the magic discard payload") which used to have a magic
> hack wrt !bio->bi_vcnt, and that got removed. See __blk_bios_map_sg(),
> now it does __blk_bvec_map_sg() instead.

But that check was only for discard (and discard-like) bios which
had the maic single page that sometimes was unused attached.

For "normal" bios the for_each_segment loop iterates over bi_vcnt,
so it will be ignored anyway.  That being said both I and the lists
got CCed halfway through the thread and I haven't seen the original
report, so I'm not really sure what's going on here anyway.


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
Hi Ted,

On Thu, Dec 22, 2016 at 6:41 AM, Theodore Ts'o  wrote:
> The bottom line is that I think we're really "pixel peeping" at this
> point --- which is what obsessed digital photographers will do when
> debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
> a thousand times, and then trying to claim that this is visible to the
> human eye.  Or people who obsessing over the frequency response curves
> of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
> likely that in a blind head-to-head comparison, most people wouldn't
> be able to tell the difference

This is hilarious, thanks for the laugh. I believe you're right about this...

>
> I think the main argument for using the batched getrandom approach is
> that it, I would argue, simpler than introducing siphash into the
> picture.  On 64-bit platforms it is faster and more consistent, so
> it's basically that versus complexity of having to adding siphash to
> the things that people have to analyze when considering random number
> security on Linux.   But it's a close call either way, I think.

I find this compelling. We'll have one csprng for both
get_random_int/long and for /dev/urandom, and we'll be able to update
that silly warning on the comment of get_random_int/long to read
"gives output of either rdrand quality or of /dev/urandom quality",
which makes it more useful for other things. It introduces less error
prone code, and it lets the RNG analysis be spent on just one RNG, not
two.

So, with your blessing, I'm going to move ahead with implementing a
pretty version of this for v8.

Regards,
Jason


[PATCH 3/3] perf sched timehist: Show total scheduling time

2016-12-21 Thread Namhyung Kim
Show length of analyzed sample time and rate of idle task running.
This also takes care of time range given by --time option.

  $ perf sched timehist -sI | tail
  Samples do not have callchains.
  Idle stats:
  CPU  0 idle for930.316  msec  ( 92.93%)
  CPU  1 idle for963.614  msec  ( 96.25%)
  CPU  2 idle for885.482  msec  ( 88.45%)
  CPU  3 idle for938.635  msec  ( 93.76%)

  Total number of unique tasks: 118
  Total number of context switches: 2337
 Total run time (msec): 3718.048
  Total scheduling time (msec): 1001.131  (x 4)

Suggested-by: David Ahern 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-sched.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index d53e706a6f17..5b134b0d1ff3 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -209,6 +209,7 @@ struct perf_sched {
u64 skipped_samples;
const char  *time_str;
struct perf_time_interval ptime;
+   struct perf_time_interval hist_time;
 };
 
 /* per thread run time data */
@@ -2460,6 +2461,11 @@ static int timehist_sched_change_event(struct perf_tool 
*tool,
timehist_print_sample(sched, sample, &al, thread, t);
 
 out:
+   if (sched->hist_time.start == 0 && t >= ptime->start)
+   sched->hist_time.start = t;
+   if (ptime->end == 0 || t <= ptime->end)
+   sched->hist_time.end = t;
+
if (tr) {
/* time of this sched_switch event becomes last time task seen 
*/
tr->last_time = sample->time;
@@ -2624,6 +2630,7 @@ static void timehist_print_summary(struct perf_sched 
*sched,
struct thread *t;
struct thread_runtime *r;
int i;
+   u64 hist_time = sched->hist_time.end - sched->hist_time.start;
 
memset(&totals, 0, sizeof(totals));
 
@@ -2665,7 +2672,7 @@ static void timehist_print_summary(struct perf_sched 
*sched,
totals.sched_count += r->run_stats.n;
printf("CPU %2d idle for ", i);
print_sched_time(r->total_run_time, 6);
-   printf(" msec\n");
+   printf(" msec  (%6.2f%%)\n", 100.0 * r->total_run_time 
/ hist_time);
} else
printf("CPU %2d idle entire time window\n", i);
}
@@ -2701,12 +2708,16 @@ static void timehist_print_summary(struct perf_sched 
*sched,
 
printf("\n"
   "Total number of unique tasks: %" PRIu64 "\n"
-  "Total number of context switches: %" PRIu64 "\n"
-  "   Total run time (msec): ",
+  "Total number of context switches: %" PRIu64 "\n",
   totals.task_count, totals.sched_count);
 
+   printf("   Total run time (msec): ");
print_sched_time(totals.total_run_time, 2);
printf("\n");
+
+   printf("Total scheduling time (msec): ");
+   print_sched_time(hist_time, 2);
+   printf(" (x %d)\n", sched->max_cpu);
 }
 
 typedef int (*sched_handler)(struct perf_tool *tool,
-- 
2.10.0



[PATCH 2/3] perf sched timehist: Fix invalid period calculation

2016-12-21 Thread Namhyung Kim
When --time option is given with a value outside recorded time, the last
sample time (tprev) was set to that value and run time calculation might
be incorrect.  This is a problem of the first samples for each cpus
since it would skip the runtime update when tprev is 0.  But with --time
option it had non-zero (which is invalid) value so the calculation is
also incorrect.

For example, let's see the followging:

  $ perf sched timehist
 timecpu  task name   wait time  sch delay  
 run time
  [tid/pid]  (msec) (msec)  
   (msec)
  --- --  --  -  -  
-
  3195.968367 [0003]0.000  0.000  
0.000
  3195.968386 [0002]  Timer[4306/4277]0.000  0.000  
0.018
  3195.968397 [0002]  Web Content[4277]   0.000  0.000  
0.000
  3195.968595 [0001]  JS Helper[4302/4277]0.000  0.000  
0.000
  3195.969217 []0.000  0.000  
0.621
  3195.969251 [0001]  kworker/1:1H[291]   0.000  0.000  
0.033

The sample starts at 3195.968367 but when I gave a time interval from
3194 to 3196 (in sec) it will calculate the whole 2 second as runtime.
In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as
idle time.

Before:

  $ perf sched timehist --time 3194,3196 -s | tail
  Idle stats:
  CPU  0 idle for   1995.991  msec
  CPU  1 idle for 20.793  msec
  CPU  2 idle for 30.191  msec
  CPU  3 idle for   1999.852  msec

  Total number of unique tasks: 23
  Total number of context switches: 128
 Total run time (msec): 3724.940

After:

  $ perf sched timehist --time 3194,3196 -s | tail
  Idle stats:
  CPU  0 idle for 10.811  msec
  CPU  1 idle for 20.793  msec
  CPU  2 idle for 30.191  msec
  CPU  3 idle for 18.337  msec

  Total number of unique tasks: 23
  Total number of context switches: 128
 Total run time (msec): 18.139

Cc: David Ahern 
Fixes: 853b74071110 ("perf sched timehist: Add option to specify time window of 
interest")
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 5052caa91caa..d53e706a6f17 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2405,7 +2405,7 @@ static int timehist_sched_change_event(struct perf_tool 
*tool,
if (ptime->start && ptime->start > t)
goto out;
 
-   if (ptime->start > tprev)
+   if (tprev && ptime->start > tprev)
tprev = ptime->start;
 
/*
-- 
2.10.0



[PATCH 1/3] perf sched timehist: Enlarge default comm_width

2016-12-21 Thread Namhyung Kim
Current default value is 20 but it's easily changed to a bigger value as
task has a long name and different tid and pid.  And it makes the output
not aligned.  So change it to have a large value as summary shows.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-sched.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index c1c07bfe132c..5052caa91caa 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1775,7 +1775,7 @@ static u64 perf_evsel__get_time(struct perf_evsel *evsel, 
u32 cpu)
return r->last_time[cpu];
 }
 
-static int comm_width = 20;
+static int comm_width = 30;
 
 static char *timehist_get_commstr(struct thread *thread)
 {
@@ -1817,7 +1817,7 @@ static void timehist_header(struct perf_sched *sched)
printf(" ");
}
 
-   printf(" %-20s  %9s  %9s  %9s",
+   printf(" %-*s  %9s  %9s  %9s", comm_width,
"task name", "wait time", "sch delay", "run time");
 
printf("\n");
@@ -1830,7 +1830,8 @@ static void timehist_header(struct perf_sched *sched)
if (sched->show_cpu_visual)
printf(" %*s ", ncpus, "");
 
-   printf(" %-20s  %9s  %9s  %9s\n", "[tid/pid]", "(msec)", "(msec)", 
"(msec)");
+   printf(" %-*s  %9s  %9s  %9s\n", comm_width,
+  "[tid/pid]", "(msec)", "(msec)", "(msec)");
 
/*
 * separator
@@ -1840,7 +1841,7 @@ static void timehist_header(struct perf_sched *sched)
if (sched->show_cpu_visual)
printf(" %.*s ", ncpus, graph_dotted_line);
 
-   printf(" %.20s  %.9s  %.9s  %.9s",
+   printf(" %.*s  %.9s  %.9s  %.9s", comm_width,
graph_dotted_line, graph_dotted_line, graph_dotted_line,
graph_dotted_line);
 
@@ -2626,9 +2627,6 @@ static void timehist_print_summary(struct perf_sched 
*sched,
 
memset(&totals, 0, sizeof(totals));
 
-   if (comm_width < 30)
-   comm_width = 30;
-
if (sched->idle_hist) {
printf("\nIdle-time summary\n");
printf("%*s  parent  sched-out  ", comm_width, "comm");
-- 
2.10.0



Re: [PATCH] pci: add kernel config option for disabling common PCI quirks

2016-12-21 Thread John Crispin


On 21/12/2016 15:26, Christoph Hellwig wrote:
> On Wed, Dec 21, 2016 at 02:11:25PM +0100, John Crispin wrote:
>> I can turn it into an enable patch that is selected by default.
>>
>> The current patch disables all those quirks that are used for x86/PC
>> style machines and hence are not required in the embedded world.
> 
> Maybe we'll just need to reorganize the quirks so that most of them
> arch in arch code or the affected drivers?
> 

Hi Christoph

to be honest i have no opinion on this. I am currently trying to reduce
the amount of patches that we have inside the LEDE tree. the patches
were written by other people and then dumped on us. obviously i am
interested to get this upstream with the least amount of effort. I am
quite aware though that some patches will need an overhaul to be
applicable for upstream. its not really my call if it is enough to make
this an enable patch and review the quirks enabled by it or if the code
needs to be moved.

John


Fwd: [PATCH 1/1] of/fdt: failed to mark hotplug range message

2016-12-21 Thread Heinrich Schuchardt
scripts/get_maintainers.pl did not show the people involved in creating
the code to be changed.

On 12/22/2016 06:34 AM, Heinrich Schuchardt wrote:
> If marking a hotplug range fails a message
> "failed to mark hotplug range" is written.
> 
> The end address is base + size - 1.
> 
> Signed-off-by: Heinrich Schuchardt 
> ---
>  drivers/of/fdt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index c9b5cac03b36..fd129b6e5396 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1057,7 +1057,7 @@ int __init early_init_dt_scan_memory(unsigned long 
> node, const char *uname,
>  
>   if (early_init_dt_mark_hotplug_memory_arch(base, size))
>   pr_warn("failed to mark hotplug range 0x%llx - 
> 0x%llx\n",
> - base, base + size);
> + base, base + size - 1);
>   }
>  
>   return 0;
> 



[PATCH v5 14/14] irqchip: mbigen: Add ACPI support

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

With the preparation of platform msi support and interrupt producer
in DSDT, we can add mbigen ACPI support now.

We are using _PRS methd to indicate number of irq pins instead
of num_pins in DT to avoid _DSD usage in this case.

For mbi-gen,
Device(MBI0) {
  Name(_HID, "HISI0152")
  Name(_UID, Zero)
  Name(_CRS, ResourceTemplate() {
  Memory32Fixed(ReadWrite, 0xa008, 0x1)
  })

  Name (_PRS, ResourceTemplate() {
  Interrupt(ResourceProducer,...) {12,14,}
  })
}

For devices,

   Device(COM0) {
  Name(_HID, "ACPIIDxx")
  Name(_UID, Zero)
  Name(_CRS, ResourceTemplate() {
 Memory32Fixed(ReadWrite, 0xb003, 0x1)
 Interrupt(ResourceConsumer,..., "\_SB.MBI0") {12}
  })
}

With the helpe of platform msi and interrupt producer, then devices
will get the virq from mbi-gen's irqdomain.

Signed-off-by: Hanjun Guo 
Cc: Marc Zyngier 
Cc: Thomas Gleixner 
Cc: Ma Jun 
---
 drivers/irqchip/irq-mbigen.c | 70 ++--
 1 file changed, 67 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-mbigen.c b/drivers/irqchip/irq-mbigen.c
index 4e11da5..17d35fa 100644
--- a/drivers/irqchip/irq-mbigen.c
+++ b/drivers/irqchip/irq-mbigen.c
@@ -16,6 +16,7 @@
  * along with this program.  If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -180,7 +181,7 @@ static int mbigen_domain_translate(struct irq_domain *d,
unsigned long *hwirq,
unsigned int *type)
 {
-   if (is_of_node(fwspec->fwnode)) {
+   if (is_of_node(fwspec->fwnode) || is_acpi_device_node(fwspec->fwnode)) {
if (fwspec->param_count != 2)
return -EINVAL;
 
@@ -271,6 +272,54 @@ static int mbigen_of_create_domain(struct platform_device 
*pdev,
return 0;
 }
 
+#ifdef CONFIG_ACPI
+static acpi_status mbigen_acpi_process_resource(struct acpi_resource *ares,
+void *context)
+{
+   struct acpi_resource_extended_irq *ext_irq;
+   u32 *num_irqs = context;
+
+   switch (ares->type) {
+   case ACPI_RESOURCE_TYPE_EXTENDED_IRQ:
+   ext_irq = &ares->data.extended_irq;
+   *num_irqs += ext_irq->interrupt_count;
+   break;
+   default:
+   break;
+   }
+
+   return AE_OK;
+}
+
+static int mbigen_acpi_create_domain(struct platform_device *pdev,
+struct mbigen_device *mgn_chip)
+{
+   struct irq_domain *domain;
+   u32 num_msis = 0;
+   acpi_status status;
+
+   status = acpi_walk_resources(ACPI_HANDLE(&pdev->dev), METHOD_NAME__PRS,
+mbigen_acpi_process_resource, &num_msis);
+if (ACPI_FAILURE(status) || num_msis == 0)
+   return -EINVAL;
+
+   domain = platform_msi_create_device_domain(&pdev->dev, num_msis,
+  mbigen_write_msg,
+  &mbigen_domain_ops,
+  mgn_chip);
+   if (!domain)
+   return -ENOMEM;
+
+   return 0;
+}
+#else
+static int mbigen_acpi_create_domain(struct platform_device *pdev,
+struct mbigen_device *mgn_chip)
+{
+   return -ENODEV;
+}
+#endif
+
 static int mbigen_device_probe(struct platform_device *pdev)
 {
struct mbigen_device *mgn_chip;
@@ -288,9 +337,17 @@ static int mbigen_device_probe(struct platform_device 
*pdev)
if (IS_ERR(mgn_chip->base))
return PTR_ERR(mgn_chip->base);
 
-   err = mbigen_of_create_domain(pdev, mgn_chip);
-   if (err)
+   if (IS_ENABLED(CONFIG_OF) && pdev->dev.of_node)
+   err = mbigen_of_create_domain(pdev, mgn_chip);
+   else if (ACPI_COMPANION(&pdev->dev))
+   err = mbigen_acpi_create_domain(pdev, mgn_chip);
+   else
+   err = -EINVAL;
+
+   if (err) {
+   dev_err(&pdev->dev, "Failed to create mbi-gen@%p irqdomain", 
mgn_chip->base);
return err;
+   }
 
platform_set_drvdata(pdev, mgn_chip);
return 0;
@@ -302,10 +359,17 @@ static int mbigen_device_probe(struct platform_device 
*pdev)
 };
 MODULE_DEVICE_TABLE(of, mbigen_of_match);
 
+static const struct acpi_device_id mbigen_acpi_match[] = {
+{ "HISI0152", 0 },
+   {}
+};
+MODULE_DEVICE_TABLE(acpi, mbigen_acpi_match);
+
 static struct platform_driver mbigen_platform_driver = {
.driver = {
.name   = "Hisilicon MBIGEN-V2",
.of_match_table = mbigen_of_match,
+   .acpi_match_table = ACPI_PTR(mbigen_acpi_match),
},
.probe  = mbigen_d

[PATCH] ARM64: zynqmp: Fix i2c node's compatible string

2016-12-21 Thread Moritz Fischer
From: Moritz Fischer 

The Zynq Ultrascale MP uses version 1.4 of the Cadence IP core
which fixes some silicon bugs that needed software workarounds
in Version 1.0 that was used on Zynq systems.

Signed-off-by: Moritz Fischer 
Cc: Michal Simek 
Cc: Sören Brinkmann 
Cc: U-Boot List 
Cc: Rob Herring 
---

Hi Michal,

I think this is a slip up and should be r1p14 for
Ultrascale ZynqMP. drivers/i2c/i2c-cadence.c already uses this.
I Cc'd the u-boot list, because the same change would be required there.

Cheers,

Moritz

---
 arch/arm64/boot/dts/xilinx/zynqmp.dtsi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi 
b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
index 68a90833..a5a5f91 100644
--- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
+++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi
@@ -175,7 +175,7 @@
};
 
i2c0: i2c@ff02 {
-   compatible = "cdns,i2c-r1p10";
+   compatible = "cdns,i2c-r1p14";
status = "disabled";
interrupt-parent = <&gic>;
interrupts = <0 17 4>;
@@ -185,7 +185,7 @@
};
 
i2c1: i2c@ff03 {
-   compatible = "cdns,i2c-r1p10";
+   compatible = "cdns,i2c-r1p14";
status = "disabled";
interrupt-parent = <&gic>;
interrupts = <0 18 4>;
-- 
2.4.11



Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner  wrote:
>
> There may be deeper issues. I just started running scalability tests
> (e.g. 16-way fsmark create tests) and about a minute in I got a
> directory corruption reported - something I hadn't seen in the dev
> cycle at all.

By "in the dev cycle", do you mean your XFS changes, or have you been
tracking the merge cycle at least for some testing?

> I unmounted the fs, mkfs'd it again, ran the
> workload again and about a minute in this fired:
>
> [628867.607417] [ cut here ]
> [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> shadow_lru_isolate+0x171/0x220

Well, part of the changes during the merge window were the shadow
entry tracking changes that came in through Andrew's tree. Adding
Johannes Weiner to the participants.

> Now, this workload does not touch the page cache at all - it's
> entirely an XFS metadata workload, so it should not really be
> affecting the working set code.

Well, I suspect that anything that creates memory pressure will end up
triggering the working set code, so ..

That said, obviously memory corruption could be involved and result in
random issues too, but I wouldn't really expect that in this code.

It would probably be really useful to get more data points - is the
problem reliably in this area, or is it going to be random and all
over the place.

That said:

> And worse, on that last error, the /host/ is now going into meltdown
> (running 4.7.5) with 32 CPUs all burning down in ACPI code:

The obvious question here is how much you trust the environment if the
host ends up also showing problems. Maybe you do end up having hw
issues pop up too.

The primary suspect would presumably be the development kernel you're
testing triggering something, but it has to be asked..

 Linus


[PATCH v5 00/14] ACPI platform MSI support and its example mbigen

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

v4 -> v5:
- Add mbigen support back with tested on with Agustin's patchset,
  and it's a good example of how ACPI platform MSI works
- rebased on top of lastest Linus tree (commit 52bce91 splice: 
reinstate SIGPIPE/EPIPE handling)

v3 -> v4:
- Drop mbi-gen patches to just submit platform msi support because
  will rebase mbi-gen patches on top of Agustin's patchset, and 
discusion
  is going there.
- Add a patch to support device topology such as NC(named componant, 
paltform device)
  ->SMMU->ITS which suggested by Lorenzo;
- rebased on top of Lorenzo's v9 of ACPI IORT ARM SMMU support;
- rebased on top of 4.9-rc7

v2 -> v3:
- Drop RFC tag
- Rebase against v4.9-rc2 and Lorenzo's v6 of ACPI IORT ARM SMMU 
support [1]
- Add 3 cleanup patches (patch 1, 2, 3)
- Drop arch_init call patch from last version
- Introduce a callback for platform device to set msi domain
- Introduce a new API to get paltform device's domain instead of
  reusing the PCI one in previous version
- Add a patch to rework iort_node_get_id()

[1]: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1251993.html

v1 -> v2:
- Fix the bug of if multi Interrupt() resoures in single _PRS,
  we need to calculate all the irq numbers (I missed it in previous
  version);
- Rebased on Marc's irq/irqchip-4.9 branch and Lorenzo's v5
  SMMU patches (also Robin's SMMu patches)
- Add patch irqchip: mbigen: promote mbigen init.

With platform msi support landed in the kernel, and the introduction
of IORT for GICv3 ITS (PCI MSI) and SMMU, the framework for platform msi
is ready, this patch set add few patches to enable the ACPI platform
msi support.

For platform device connecting to ITS on arm platform, we have IORT
table with the named componant node to describe the mappings of paltform
device and ITS, so we can retrieve the dev id and find its parent
irqdomain (ITS) from IORT table (simlar with the ACPI ITS support).

The fisrt 3 patches are cleanups;

Patch 4,5 are refactoring its_pmsi_prepare() for both DT and ACPI
then retrieve the dev id from iort;

Patch 6,7 to create platform msi domain to ACPI case which scanned
the MADT table;

Patch 8,9,10,11 to setup the msi domain for platform device based
on IORT table.

Patch 12,13,14 convert dt based mbigen driver to support ACPI.

Teasted on Hisilicon D03/D05.

Happy holidays!

Thanks
Hanjun

Hanjun Guo (12):
  ACPI: ARM64: IORT: minor cleanup for iort_match_node_callback()
  irqchip: gic-v3-its: keep the head file include in alphabetic order
  ACPI: ARM64: IORT: add missing comment for iort_dev_find_its_id()
  irqchip: gicv3-its: platform-msi: refactor its_pmsi_prepare()
  ACPI: platform-msi: retrieve dev id from IORT
  irqchip: gicv3-its: platform-msi: refactor its_pmsi_init() to prepare
for ACPI
  irqchip: gicv3-its: platform-msi: scan MADT to create platform msi
domain
  ACPI: ARM64: IORT: rework iort_node_get_id()
  ACPI: platform: setup MSI domain for ACPI based platform device
  ACPI: ARM64: IORT: rework iort_node_get_id() for NC->SMMU->ITS case
  msi: platform: make platform_msi_create_device_domain() ACPI aware
  irqchip: mbigen: Add ACPI support

Kefeng Wang (2):
  irqchip: mbigen: drop module owner
  irqchip: mbigen: introduce mbigen_of_create_domain()

 drivers/acpi/acpi_platform.c  |  11 ++
 drivers/acpi/arm64/iort.c | 138 --
 drivers/base/platform-msi.c   |   3 +-
 drivers/base/platform.c   |   3 +
 drivers/irqchip/irq-gic-v3-its-platform-msi.c | 106 +++-
 drivers/irqchip/irq-gic-v3-its.c  |   3 +-
 drivers/irqchip/irq-mbigen.c  | 109 
 include/linux/acpi_iort.h |  11 ++
 include/linux/platform_device.h   |   3 +
 9 files changed, 309 insertions(+), 78 deletions(-)

-- 
1.7.12.4



[PATCH v5 09/14] ACPI: platform: setup MSI domain for ACPI based platform device

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

With the platform msi domain created, we can set up the msi domain
for a platform device when it's probed.

In order to do that, we need to get the domain that the platform
device connecting to, so the iort_get_platform_device_domain() is
introduced to retrieve the domain from iort.

After the domain is retrieved, we need a proper way to set the
domain to paltform device, as some platform devices such as an
irqchip needs the msi irqdomain to be the interrupt parent domain,
we need to get irqdomain before platform device is probed but after
the platform device is allocated, so introduce a callback (pre_add_cb)
in pdevinfo to prepare firmware related information which is needed
for device probe, then set the msi domain in that callback.

Signed-off-by: Hanjun Guo 
Cc: Marc Zyngier 
Cc: Rafael J. Wysocki 
Cc: Greg KH 
Cc: Lorenzo Pieralisi 
Cc: Thomas Gleixner 
---
 drivers/acpi/acpi_platform.c| 11 +++
 drivers/acpi/arm64/iort.c   | 43 +
 drivers/base/platform.c |  3 +++
 include/linux/acpi_iort.h   |  3 +++
 include/linux/platform_device.h |  3 +++
 5 files changed, 63 insertions(+)

diff --git a/drivers/acpi/acpi_platform.c b/drivers/acpi/acpi_platform.c
index b4c1a6a..5d8d61b4 100644
--- a/drivers/acpi/acpi_platform.c
+++ b/drivers/acpi/acpi_platform.c
@@ -12,6 +12,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -48,6 +49,15 @@ static void acpi_platform_fill_resource(struct acpi_device 
*adev,
 }
 
 /**
+ * acpi_platform_pre_add_cb - callback before platform device is added, to
+ * prepare firmware related information which is needed for device probe
+ */
+static void acpi_platform_pre_add_cb(struct device *dev)
+{
+   acpi_configure_pmsi_domain(dev);
+}
+
+/**
  * acpi_create_platform_device - Create platform device for ACPI device node
  * @adev: ACPI device node to create a platform device for.
  * @properties: Optional collection of build-in properties.
@@ -109,6 +119,7 @@ struct platform_device *acpi_create_platform_device(struct 
acpi_device *adev,
pdevinfo.num_res = count;
pdevinfo.fwnode = acpi_fwnode_handle(adev);
pdevinfo.properties = properties;
+   pdevinfo.pre_add_cb = acpi_platform_pre_add_cb;
 
if (acpi_dma_supported(adev))
pdevinfo.dma_mask = DMA_BIT_MASK(32);
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index bc68d93..6b72fcb 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -527,6 +527,49 @@ struct irq_domain *iort_get_device_domain(struct device 
*dev, u32 req_id)
return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI);
 }
 
+/**
+ * iort_get_platform_device_domain() - Find MSI domain related to a
+ * platform device
+ * @dev: the dev pointer associated with the platform device
+ *
+ * Returns: the MSI domain for this device, NULL otherwise
+ */
+static struct irq_domain *iort_get_platform_device_domain(struct device *dev)
+{
+   struct acpi_iort_node *node, *msi_parent;
+   struct fwnode_handle *iort_fwnode;
+   struct acpi_iort_its_group *its;
+
+   /* find its associated iort node */
+   node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT,
+ iort_match_node_callback, dev);
+   if (!node)
+   return NULL;
+
+   /* then find its msi parent node */
+   msi_parent = iort_node_get_id(node, NULL, IORT_MSI_TYPE, 0);
+   if (!msi_parent)
+   return NULL;
+
+   /* Move to ITS specific data */
+   its = (struct acpi_iort_its_group *)msi_parent->node_data;
+
+   iort_fwnode = iort_find_domain_token(its->identifiers[0]);
+   if (!iort_fwnode)
+   return NULL;
+
+   return irq_find_matching_fwnode(iort_fwnode, DOMAIN_BUS_PLATFORM_MSI);
+}
+
+void acpi_configure_pmsi_domain(struct device *dev)
+{
+   struct irq_domain *msi_domain;
+
+   msi_domain = iort_get_platform_device_domain(dev);
+   if (msi_domain)
+   dev_set_msi_domain(dev, msi_domain);
+}
+
 static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
 {
u32 *rid = data;
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index c4af003..3e68f31 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -537,6 +537,9 @@ struct platform_device *platform_device_register_full(
goto err;
}
 
+   if (pdevinfo->pre_add_cb)
+   pdevinfo->pre_add_cb(&pdev->dev);
+
ret = platform_device_add(pdev);
if (ret) {
 err:
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index ef99fd52..33f5ac3 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -38,6 +38,7 @@
 /* IOMMU interface */
 void iort_set_dma_mask(struct device *dev);
 const struct iommu_ops *iort_iommu_configure(struct device *dev);
+void acpi_configure_pmsi_domain(struct device *dev);
 #else
 s

[PATCH v5 06/14] irqchip: gicv3-its: platform-msi: refactor its_pmsi_init() to prepare for ACPI

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

Introduce its_pmsi_init_one() to refactor the code to isolate
ACPI&DT common code to prepare for ACPI later.

Signed-off-by: Hanjun Guo 
Tested-by: Sinan Kaya 
Cc: Marc Zyngier 
Cc: Tomasz Nowicki 
Cc: Thomas Gleixner 
---
 drivers/irqchip/irq-gic-v3-its-platform-msi.c | 45 ---
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c 
b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
index 16587a9..ff72704 100644
--- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
@@ -84,34 +84,43 @@ static int its_pmsi_prepare(struct irq_domain *domain, 
struct device *dev,
{},
 };
 
-static int __init its_pmsi_init(void)
+static int __init its_pmsi_init_one(struct fwnode_handle *fwnode,
+   const char *name)
 {
-   struct device_node *np;
struct irq_domain *parent;
 
+   parent = irq_find_matching_fwnode(fwnode, DOMAIN_BUS_NEXUS);
+   if (!parent || !msi_get_domain_info(parent)) {
+   pr_err("%s: unable to locate ITS domain\n", name);
+   return -ENXIO;
+   }
+
+   if (!platform_msi_create_irq_domain(fwnode, &its_pmsi_domain_info,
+   parent)) {
+   pr_err("%s: unable to create platform domain\n", name);
+   return -ENXIO;
+   }
+
+   pr_info("Platform MSI: %s domain created\n", name);
+   return 0;
+}
+
+static void __init its_pmsi_of_init(void)
+{
+   struct device_node *np;
+
for (np = of_find_matching_node(NULL, its_device_id); np;
 np = of_find_matching_node(np, its_device_id)) {
if (!of_property_read_bool(np, "msi-controller"))
continue;
 
-   parent = irq_find_matching_host(np, DOMAIN_BUS_NEXUS);
-   if (!parent || !msi_get_domain_info(parent)) {
-   pr_err("%s: unable to locate ITS domain\n",
-  np->full_name);
-   continue;
-   }
-
-   if (!platform_msi_create_irq_domain(of_node_to_fwnode(np),
-   &its_pmsi_domain_info,
-   parent)) {
-   pr_err("%s: unable to create platform domain\n",
-  np->full_name);
-   continue;
-   }
-
-   pr_info("Platform MSI: %s domain created\n", np->full_name);
+   its_pmsi_init_one(of_node_to_fwnode(np), np->full_name);
}
+}
 
+static int __init its_pmsi_init(void)
+{
+   its_pmsi_of_init();
return 0;
 }
 early_initcall(its_pmsi_init);
-- 
1.7.12.4



[PATCH v5 11/14] msi: platform: make platform_msi_create_device_domain() ACPI aware

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

With the platform msi domain created for ITS, irqchip such as
mbi-gen connecting ITS, which needs ctreate its own irqdomain.

Fortunately with the platform msi support upstreamed by Marc,
we just need to add minor code to make it run properly.

platform_msi_create_device_domain() is almost ready for ACPI use
except of_node_to_fwnode() is for dt only, make it ACPI aware then
things will work in both DTS and ACPI.

Signed-off-by: Hanjun Guo 
Cc: Marc Zyngier 
Cc: Greg KH 
Cc: Thomas Gleixner 
Cc: Greg KH 
---
 drivers/base/platform-msi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index be6a599..035ca3b 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -345,8 +345,7 @@ struct irq_domain *
 
data->host_data = host_data;
domain = irq_domain_create_hierarchy(dev->msi_domain, 0, nvec,
-of_node_to_fwnode(dev->of_node),
-ops, data);
+dev->fwnode, ops, data);
if (!domain)
goto free_priv;
 
-- 
1.7.12.4



[PATCH v5 12/14] irqchip: mbigen: drop module owner

2016-12-21 Thread Hanjun Guo
From: Kefeng Wang 

Module owner will be set by driver core, so drop it.

Signed-off-by: Kefeng Wang 
Signed-off-by: Hanjun Guo 
Cc: Marc Zyngier 
Cc: Thomas Gleixner 
Cc: Ma Jun 
---
 drivers/irqchip/irq-mbigen.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/irqchip/irq-mbigen.c b/drivers/irqchip/irq-mbigen.c
index 03b79b0..c01ab41 100644
--- a/drivers/irqchip/irq-mbigen.c
+++ b/drivers/irqchip/irq-mbigen.c
@@ -293,7 +293,6 @@ static int mbigen_device_probe(struct platform_device *pdev)
 static struct platform_driver mbigen_platform_driver = {
.driver = {
.name   = "Hisilicon MBIGEN-V2",
-   .owner  = THIS_MODULE,
.of_match_table = mbigen_of_match,
},
.probe  = mbigen_device_probe,
-- 
1.7.12.4



[PATCH v5 10/14] ACPI: ARM64: IORT: rework iort_node_get_id() for NC->SMMU->ITS case

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

iort_node_get_id() for now only support NC(named componant)->SMMU
or NC->ITS cases, we also have other device topology such NC->
SMMU->ITS, so rework iort_node_get_id() for those cases.

Signed-off-by: Hanjun Guo 
Cc: Lorenzo Pieralisi 
---
 drivers/acpi/arm64/iort.c | 59 ++-
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 6b72fcb..9b3f268 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -292,22 +292,28 @@ static acpi_status iort_match_node_callback(struct 
acpi_iort_node *node,
return status;
 }
 
-static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
-  u32 *rid_out)
+static int iort_id_single_map(struct acpi_iort_id_mapping *map, u8 type,
+ u32 *rid_out)
 {
/* Single mapping does not care for input id */
if (map->flags & ACPI_IORT_ID_SINGLE_MAPPING) {
if (type == ACPI_IORT_NODE_NAMED_COMPONENT ||
type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX) {
-   *rid_out = map->output_base;
+   if (rid_out)
+   *rid_out = map->output_base;
return 0;
}
 
pr_warn(FW_BUG "[map %p] SINGLE MAPPING flag not allowed for 
node type %d, skipping ID map\n",
map, type);
-   return -ENXIO;
}
 
+   return -ENXIO;
+}
+
+static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
+  u32 *rid_out)
+{
if (rid_in < map->input_base ||
(rid_in >= map->input_base + map->id_count))
return -ENXIO;
@@ -324,33 +330,34 @@ struct acpi_iort_node *iort_node_get_id(struct 
acpi_iort_node *node,
struct acpi_iort_node *parent;
struct acpi_iort_id_mapping *map;
 
-   if (!node->mapping_offset || !node->mapping_count ||
-index >= node->mapping_count)
-   return NULL;
-
-   map = ACPI_ADD_PTR(struct acpi_iort_id_mapping, node,
-  node->mapping_offset);
+   while (node) {
+   if (!node->mapping_offset || !node->mapping_count ||
+index >= node->mapping_count)
+   return NULL;
 
-   /* Firmware bug! */
-   if (!map->output_reference) {
-   pr_err(FW_BUG "[node %p type %d] ID map has NULL parent 
reference\n",
-  node, node->type);
-   return NULL;
-   }
+   map = ACPI_ADD_PTR(struct acpi_iort_id_mapping, node,
+  node->mapping_offset);
 
-   parent = ACPI_ADD_PTR(struct acpi_iort_node, iort_table,
-  map->output_reference);
+   /* Firmware bug! */
+   if (!map->output_reference) {
+   pr_err(FW_BUG "[node %p type %d] ID map has NULL parent 
reference\n",
+  node, node->type);
+   return NULL;
+   }
 
-   if (!(IORT_TYPE_MASK(parent->type) & type_mask))
-   return NULL;
+   parent = ACPI_ADD_PTR(struct acpi_iort_node, iort_table,
+ map->output_reference);
 
-   if (map[index].flags & ACPI_IORT_ID_SINGLE_MAPPING) {
-   if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
-   node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX) {
-   if (id_out)
-   *id_out = map[index].output_base;
-   return parent;
+   /* go upstream to find its parent */
+   if (!(IORT_TYPE_MASK(parent->type) & type_mask)) {
+   node = parent;
+   continue;
}
+
+   if (iort_id_single_map(&map[index], node->type, id_out))
+   break;
+
+   return parent;
}
 
return NULL;
-- 
1.7.12.4



Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 9:01 PM, George Spelvin
 wrote:
> Andy Lutomirski wrote:
>> I don't even think it needs that.  This is just adding a
>> non-destructive final operation, right?
>
> It is, but the problem is that SipHash is intended for *small* inputs,
> so the standard implementations aren't broken into init/update/final
> functions.
>
> There's just one big function that keeps the state variables in
> registers and never stores them anywhere.
>
> If we *had* init/update/final functions, then it would be trivial.
>
>> Just to clarify, if we replace SipHash with a black box, I think this
>> effectively means, where "entropy" is random_get_entropy() || jiffies
>> || current->pid:
>
>> The first call returns H(random seed || entropy_0 || secret).  The
>> second call returns H(random seed || entropy_0 || secret || entropy_1
>> || secret).  Etc.
>
> Basically, yes.  I was skipping the padding byte and keying the
> finalization rounds on the grounds of "can't hurt and might help",
> but we could do it a more standard way.
>
>> If not, then I have a fairly strong preference to keep whatever
>> construction we come up with consistent with something that could
>> actually happen with invocations of unmodified SipHash -- then all the
>> security analysis on SipHash goes through.
>
> Okay.  I don't think it makes a difference, but it's not a *big* waste
> of time.  If we have finalization rounds, we can reduce the secret
> to 128 bits.
>
> If we include the padding byte, we can do one of two things:
> 1) Make the secret 184 bits, to fill up the final partial word as
>much as possible, or
> 2) Make the entropy 1 byte smaller and conceptually misalign the
>secret.  What we'd actually do is remove the last byte of
>the secret and include it in the entropy words, but that's
>just a rotation of the secret between storage and hashing.
>
> Also, I assume you'd like SipHash-2-4, since you want to rely
> on a security analysis.

I haven't looked, but I assume that the analysis at least thought
about reduced rounds, so maybe other variants are okay.

>> The one thing I don't like is
>> that I don't see how to prove that you can't run it backwards if you
>> manage to acquire a memory dump.  In fact, I that that there exist, at
>> least in theory, hash functions that are secure in the random oracle
>> model but that *can* be run backwards given the full state.  From
>> memory, SHA-3 has exactly that property, and it would be a bit sad for
>> a CSPRNG to be reversible.
>
> Er...  get_random_int() is specifically *not* designed to be resistant
> to state capture, and I didn't try.  Remember, what it's used for
> is ASLR, what we're worried about is somene learning the layouts
> of still-running processes, and and if you get a memory dump, you have
> the memory layout!

True, but it's called get_random_int(), and it seems like making it
stronger, especially if the performance cost is low to zero, is a good
thing.

>
> If you want anti-backtracking, though, it's easy to add.  What we
> hash is:
>
> entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ...
>
> You mix the output word right back in to the (unfinalized) state after
> generating it.  This is still equivalent to unmodified back-box SipHash,
> you're just using a (conceptually independent) SipHash invocation to
> produce some of its input.

Ah, cute.  This could probably be sped up by doing something like:

entropy_0 || secret || output_0 ^ entropy_1 || secret || ...

It's a little weak because the output is only 64 bits, so you could
plausibly backtrack it on a GPU or FPGA cluster or on an ASIC if the
old entropy is guessable.  I suspect there are sneaky ways around it
like using output_n-1 ^ output_n-2 or similar.  I'll sleep on it.

>
> The only remaining issues are:
> 1) How many rounds, and
> 2) May we use HalfSipHash?

I haven't looked closely enough to have a real opinion here.  I don't
know what the security margin is believed to be.

>
> I'd *like* to persuade you that skipping the padding byte wouldn't
> invalidate any security proofs, because it's true and would simplify
> the code.  But if you want 100% stock, I'm willing to cater to that.

I lean toward stock in the absence of a particularly good reason.  At
the very least I'd want to read that paper carefully.

>
> Ted, what do you think?



-- 
Andy Lutomirski
AMA Capital Management, LLC


[PATCH v5 08/14] ACPI: ARM64: IORT: rework iort_node_get_id()

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

iort_node_get_id() has two output, one is the mapped ids,
the other is the referenced parent node which is returned
from the function.

For now we need a API just return its parent node for
single mapping, so just update this function slightly then
reuse it later.

Signed-off-by: Hanjun Guo 
Cc: Lorenzo Pieralisi 
Cc: Marc Zyngier 
---
 drivers/acpi/arm64/iort.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index ab7bae7..bc68d93 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -347,7 +347,8 @@ struct acpi_iort_node *iort_node_get_id(struct 
acpi_iort_node *node,
if (map[index].flags & ACPI_IORT_ID_SINGLE_MAPPING) {
if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT ||
node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX) {
-   *id_out = map[index].output_base;
+   if (id_out)
+   *id_out = map[index].output_base;
return parent;
}
}
-- 
1.7.12.4



[PATCH v5 02/14] irqchip: gic-v3-its: keep the head file include in alphabetic order

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

The head file is strictly in alphabetic order now, so let's
be the rule breaker. As acpi_iort.h includes acpi.h so remove
the duplidate acpi.h inclusion as well.

Signed-off-by: Hanjun Guo 
Cc: Marc Zyngier 
Cc: Tomasz Nowicki 
---
 drivers/irqchip/irq-gic-v3-its.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 69b040f..f471939 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -15,14 +15,13 @@
  * along with this program.  If not, see .
  */
 
-#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
1.7.12.4



[PATCH v5 03/14] ACPI: ARM64: IORT: add missing comment for iort_dev_find_its_id()

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

We are missing req_id's comment for iort_dev_find_its_id(),
add it back.

Signed-off-by: Hanjun Guo 
Cc: Lorenzo Pieralisi 
Cc: Tomasz Nowicki 
---
 drivers/acpi/arm64/iort.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 46e2d82..174e983 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -446,6 +446,7 @@ u32 iort_msi_map_rid(struct device *dev, u32 req_id)
 /**
  * iort_dev_find_its_id() - Find the ITS identifier for a device
  * @dev: The device.
+ * @req_id: Device's Requster ID
  * @idx: Index of the ITS identifier list.
  * @its_id: ITS identifier.
  *
-- 
1.7.12.4



[PATCH v5 05/14] ACPI: platform-msi: retrieve dev id from IORT

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

For devices connecting to ITS, it needs dev id to identify
itself, and this dev id is represented in the IORT table in
named componant node [1] for platform devices, so in this
patch we will scan the IORT to retrieve device's dev id.

Introduce iort_pmsi_get_dev_id() with pointer dev passed
in for that purpose.

[1]: https://static.docs.arm.com/den0049/b/DEN0049B_IO_Remapping_Table.pdf

Signed-off-by: Hanjun Guo 
Tested-by: Sinan Kaya 
Cc: Marc Zyngier 
Cc: Lorenzo Pieralisi 
Cc: Tomasz Nowicki 
Cc: Thomas Gleixner 
---
 drivers/acpi/arm64/iort.c | 26 ++
 drivers/irqchip/irq-gic-v3-its-platform-msi.c |  4 +++-
 include/linux/acpi_iort.h |  8 
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 174e983..ab7bae7 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -444,6 +444,32 @@ u32 iort_msi_map_rid(struct device *dev, u32 req_id)
 }
 
 /**
+ * iort_pmsi_get_dev_id() - Get the device id for a device
+ * @dev: The device for which the mapping is to be done.
+ * @dev_id: The device ID found.
+ *
+ * Returns: 0 for successful find a dev id, errors otherwise
+ */
+int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id)
+{
+   struct acpi_iort_node *node;
+
+   if (!iort_table)
+   return -ENODEV;
+
+   node = iort_find_dev_node(dev);
+   if (!node) {
+   dev_err(dev, "can't find related IORT node\n");
+   return -ENODEV;
+   }
+
+   if(!iort_node_get_id(node, dev_id, IORT_MSI_TYPE, 0))
+   return -ENODEV;
+
+   return 0;
+}
+
+/**
  * iort_dev_find_its_id() - Find the ITS identifier for a device
  * @dev: The device.
  * @req_id: Device's Requster ID
diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c 
b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
index 3c94278..16587a9 100644
--- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
@@ -15,6 +15,7 @@
  * along with this program.  If not, see .
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -56,7 +57,8 @@ static int its_pmsi_prepare(struct irq_domain *domain, struct 
device *dev,
 
msi_info = msi_get_domain_info(domain->parent);
 
-   ret = of_pmsi_get_dev_id(domain, dev, &dev_id);
+   ret = dev->of_node ? of_pmsi_get_dev_id(domain, dev, &dev_id) :
+   iort_pmsi_get_dev_id(dev, &dev_id);
if (ret)
return ret;
 
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 77e0809..ef99fd52 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -33,6 +33,7 @@
 void acpi_iort_init(void);
 bool iort_node_match(u8 type);
 u32 iort_msi_map_rid(struct device *dev, u32 req_id);
+int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id);
 /* IOMMU interface */
 void iort_set_dma_mask(struct device *dev);
@@ -42,9 +43,16 @@ static inline void acpi_iort_init(void) { }
 static inline bool iort_node_match(u8 type) { return false; }
 static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id)
 { return req_id; }
+
 static inline struct irq_domain *iort_get_device_domain(struct device *dev,
u32 req_id)
 { return NULL; }
+
+static inline int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id)
+{
+   return -ENODEV;
+}
+
 /* IOMMU interface */
 static inline void iort_set_dma_mask(struct device *dev) { }
 static inline
-- 
1.7.12.4



Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Theodore Ts'o
On Thu, Dec 22, 2016 at 03:49:39AM +0100, Jason A. Donenfeld wrote:
> 
> Funny -- while you guys were sending this back & forth, I was writing
> my reply to Andy which essentially arrives at the same conclusion.
> Given that we're all arriving to the same thing, and that Ted shot in
> this direction long before we all did, I'm leaning toward abandoning
> SipHash for the de-MD5-ification of get_random_int/long, and working
> on polishing Ted's idea into something shiny for this patchset.

here are my numbers comparing siphash (using the first three patches
of the v7 siphash patches) with my batched chacha20 implementation.
The results are taken by running get_random_* 1 times, and then
dividing the numbers by 1 to get the average number of cycles for
the call.  I compiled 32-bit and 64-bit kernels, and ran the results
using kvm:

   siphashbatched chacha20
 get_random_int  get_random_long   get_random_int   get_random_long   

32-bit270  278 114146
64-bit 75   75 106186

> I did have two objections to this. The first was that my SipHash
> construction is faster.

Well, it's faster on everything except 32-bit x86.  :-P

> The second, and the more
> important one, was that batching entropy up like this means that 32
> calls will be really fast, and then the 33rd will be slow, since it
> has to do a whole ChaCha round, because get_random_bytes must be
> called to refill the batch.

... and this will take 2121 cycles on 64-bit x86, and 2315 cycles on a
32-bit x86.  Which on a 2.3 GHz processor, is just under a
microsecond.  As far as being inconsistent on process startup, I very
much doubt a microsecond is really going to be visible to the user.  :-)

The bottom line is that I think we're really "pixel peeping" at this
point --- which is what obsessed digital photographers will do when
debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
a thousand times, and then trying to claim that this is visible to the
human eye.  Or people who obsessing over the frequency response curves
of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
likely that in a blind head-to-head comparison, most people wouldn't
be able to tell the difference

I think the main argument for using the batched getrandom approach is
that it, I would argue, simpler than introducing siphash into the
picture.  On 64-bit platforms it is faster and more consistent, so
it's basically that versus complexity of having to adding siphash to
the things that people have to analyze when considering random number
security on Linux.   But it's a close call either way, I think.

  - Ted

P.S.  My benchmarking code

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a51f0ff43f00..41860864b775 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1682,6 +1682,55 @@ static int rand_initialize(void)
 }
 early_initcall(rand_initialize);
 
+static unsigned int get_random_int_new(void);
+static unsigned long get_random_long_new(void);
+
+#define NUM_CYCLES 1
+#define AVG(finish, start) ((unsigned int)(finish - start + NUM_CYCLES/2) / 
NUM_CYCLES)
+
+static int rand_benchmark(void)
+{
+   cycles_t start,finish;
+   int i, out;
+
+   pr_crit("random benchmark!!\n");
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_int();}
+   finish = get_cycles();
+   pr_err("get_random_int # cycles: %u\n", AVG(finish, start));
+
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_int_new();
+   }
+   finish = get_cycles();
+   pr_err("get_random_int_new (batched chacha20) # cycles: %u\n", 
AVG(finish, start));
+
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_long();
+   }
+   finish = get_cycles();
+   pr_err("get_random_long # cycles: %u\n", AVG(finish, start));
+
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_long_new();
+   }
+   finish = get_cycles();
+   pr_err("get_random_long_new (batched chacha20) # cycles: %u\n", 
AVG(finish, start));
+
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_bytes(&out, sizeof(out));
+   }
+   finish = get_cycles();
+   pr_err("get_random_bytes # cycles: %u\n", AVG(finish, start));
+   return 0;
+}
+device_initcall(rand_benchmark);
+
 #ifdef CONFIG_BLOCK
 void rand_initialize_disk(struct gendisk *disk)
 {
@@ -2064,8 +2113,10 @@ unsigned int get_random_int(void)
unsigned int ret;
u64 *chaining;
 
+#if 0  // force slow path
if (arch_get_random_int(&ret))
return ret;
+#endif
 
chaining = &get_cpu_var(get_random_int_chaining);
ret = 

[PATCH v5 13/14] irqchip: mbigen: introduce mbigen_of_create_domain()

2016-12-21 Thread Hanjun Guo
From: Kefeng Wang 

Introduce mbigen_of_create_domain() to consolidate OF related
code and prepare for ACPI later, no funtional change.

Signed-off-by: Kefeng Wang 
Signed-off-by: Hanjun Guo 
Cc: Marc Zyngier 
Cc: Thomas Gleixner 
Cc: Ma Jun 
---
 drivers/irqchip/irq-mbigen.c | 42 +++---
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/irqchip/irq-mbigen.c b/drivers/irqchip/irq-mbigen.c
index c01ab41..4e11da5 100644
--- a/drivers/irqchip/irq-mbigen.c
+++ b/drivers/irqchip/irq-mbigen.c
@@ -236,27 +236,15 @@ static int mbigen_irq_domain_alloc(struct irq_domain 
*domain,
.free   = irq_domain_free_irqs_common,
 };
 
-static int mbigen_device_probe(struct platform_device *pdev)
+static int mbigen_of_create_domain(struct platform_device *pdev,
+  struct mbigen_device *mgn_chip)
 {
-   struct mbigen_device *mgn_chip;
+   struct device *parent;
struct platform_device *child;
struct irq_domain *domain;
struct device_node *np;
-   struct device *parent;
-   struct resource *res;
u32 num_pins;
 
-   mgn_chip = devm_kzalloc(&pdev->dev, sizeof(*mgn_chip), GFP_KERNEL);
-   if (!mgn_chip)
-   return -ENOMEM;
-
-   mgn_chip->pdev = pdev;
-
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   mgn_chip->base = devm_ioremap_resource(&pdev->dev, res);
-   if (IS_ERR(mgn_chip->base))
-   return PTR_ERR(mgn_chip->base);
-
for_each_child_of_node(pdev->dev.of_node, np) {
if (!of_property_read_bool(np, "interrupt-controller"))
continue;
@@ -280,6 +268,30 @@ static int mbigen_device_probe(struct platform_device 
*pdev)
return -ENOMEM;
}
 
+   return 0;
+}
+
+static int mbigen_device_probe(struct platform_device *pdev)
+{
+   struct mbigen_device *mgn_chip;
+   struct resource *res;
+   int err;
+
+   mgn_chip = devm_kzalloc(&pdev->dev, sizeof(*mgn_chip), GFP_KERNEL);
+   if (!mgn_chip)
+   return -ENOMEM;
+
+   mgn_chip->pdev = pdev;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   mgn_chip->base = devm_ioremap(&pdev->dev, res->start, 
resource_size(res));
+   if (IS_ERR(mgn_chip->base))
+   return PTR_ERR(mgn_chip->base);
+
+   err = mbigen_of_create_domain(pdev, mgn_chip);
+   if (err)
+   return err;
+
platform_set_drvdata(pdev, mgn_chip);
return 0;
 }
-- 
1.7.12.4



[PATCH v5 07/14] irqchip: gicv3-its: platform-msi: scan MADT to create platform msi domain

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

With the introduction of its_pmsi_init_one(), we can add some code
on top for ACPI support of platform MSI.

We are scanning the MADT table to get the ITS entry(ies), then use
the information to create the platform msi domain for devices connect
to it, just like the PCI MSI for ITS did.

Signed-off-by: Hanjun Guo 
Tested-by: Sinan Kaya 
Cc: Marc Zyngier 
Cc: Tomasz Nowicki 
Cc: Thomas Gleixner 
---
 drivers/irqchip/irq-gic-v3-its-platform-msi.c | 36 +++
 1 file changed, 36 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c 
b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
index ff72704..0be0437 100644
--- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
@@ -105,6 +105,41 @@ static int __init its_pmsi_init_one(struct fwnode_handle 
*fwnode,
return 0;
 }
 
+#ifdef CONFIG_ACPI
+static int __init
+its_pmsi_parse_madt(struct acpi_subtable_header *header,
+   const unsigned long end)
+{
+   struct acpi_madt_generic_translator *its_entry;
+   struct fwnode_handle *domain_handle;
+   const char *node_name;
+   int err = -ENXIO;
+
+   its_entry = (struct acpi_madt_generic_translator *)header;
+   node_name = kasprintf(GFP_KERNEL, "ITS@0x%lx",
+ (long)its_entry->base_address);
+   domain_handle = iort_find_domain_token(its_entry->translation_id);
+   if (!domain_handle) {
+   pr_err("%s: Unable to locate ITS domain handle\n", node_name);
+   goto out;
+   }
+
+   err = its_pmsi_init_one(domain_handle, node_name);
+
+out:
+   kfree(node_name);
+   return err;
+}
+
+static void __init its_acpi_pmsi_init(void)
+{
+   acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_TRANSLATOR,
+ its_pmsi_parse_madt, 0);
+}
+#else
+static inline void its_acpi_pmsi_init(void) { }
+#endif
+
 static void __init its_pmsi_of_init(void)
 {
struct device_node *np;
@@ -121,6 +156,7 @@ static void __init its_pmsi_of_init(void)
 static int __init its_pmsi_init(void)
 {
its_pmsi_of_init();
+   its_acpi_pmsi_init();
return 0;
 }
 early_initcall(its_pmsi_init);
-- 
1.7.12.4



[PATCH v5 04/14] irqchip: gicv3-its: platform-msi: refactor its_pmsi_prepare()

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

Adding ACPI support for platform MSI, we need to retrieve the
dev id in ACPI way instead of device tree, we already have
a well formed function its_pmsi_prepare() to get the dev id
but it's OF dependent, so collect OF related code and put them
into a single function to make its_pmsi_prepare() more friendly
to ACPI later.

Signed-off-by: Hanjun Guo 
Tested-by: Sinan Kaya 
Cc: Marc Zyngier 
Cc: Lorenzo Pieralisi 
Cc: Tomasz Nowicki 
Cc: Thomas Gleixner 
---
 drivers/irqchip/irq-gic-v3-its-platform-msi.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-platform-msi.c 
b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
index 470b4aa..3c94278 100644
--- a/drivers/irqchip/irq-gic-v3-its-platform-msi.c
+++ b/drivers/irqchip/irq-gic-v3-its-platform-msi.c
@@ -24,15 +24,11 @@
.name   = "ITS-pMSI",
 };
 
-static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
-   int nvec, msi_alloc_info_t *info)
+static int of_pmsi_get_dev_id(struct irq_domain *domain, struct device *dev,
+ u32 *dev_id)
 {
-   struct msi_domain_info *msi_info;
-   u32 dev_id;
int ret, index = 0;
 
-   msi_info = msi_get_domain_info(domain->parent);
-
/* Suck the DeviceID out of the msi-parent property */
do {
struct of_phandle_args args;
@@ -43,11 +39,24 @@ static int its_pmsi_prepare(struct irq_domain *domain, 
struct device *dev,
if (args.np == irq_domain_get_of_node(domain)) {
if (WARN_ON(args.args_count != 1))
return -EINVAL;
-   dev_id = args.args[0];
+   *dev_id = args.args[0];
break;
}
} while (!ret);
 
+   return ret;
+}
+
+static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
+   int nvec, msi_alloc_info_t *info)
+{
+   struct msi_domain_info *msi_info;
+   u32 dev_id;
+   int ret;
+
+   msi_info = msi_get_domain_info(domain->parent);
+
+   ret = of_pmsi_get_dev_id(domain, dev, &dev_id);
if (ret)
return ret;
 
-- 
1.7.12.4



[PATCH v5 01/14] ACPI: ARM64: IORT: minor cleanup for iort_match_node_callback()

2016-12-21 Thread Hanjun Guo
From: Hanjun Guo 

Cleanup iort_match_node_callback() a little bit to reduce
some lines of code, aslo fix the indentation in iort_scan_node().

Signed-off-by: Hanjun Guo 
Cc: Lorenzo Pieralisi 
Cc: Marc Zyngier 
Cc: Tomasz Nowicki 
---
 drivers/acpi/arm64/iort.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index e0d2e6e..46e2d82 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -225,7 +225,7 @@ static struct acpi_iort_node *iort_scan_node(enum 
acpi_iort_node_type type,
 
if (iort_node->type == type &&
ACPI_SUCCESS(callback(iort_node, context)))
-   return iort_node;
+   return iort_node;
 
iort_node = ACPI_ADD_PTR(struct acpi_iort_node, iort_node,
 iort_node->length);
@@ -253,17 +253,15 @@ static acpi_status iort_match_node_callback(struct 
acpi_iort_node *node,
void *context)
 {
struct device *dev = context;
-   acpi_status status;
+   acpi_status status = AE_NOT_FOUND;
 
if (node->type == ACPI_IORT_NODE_NAMED_COMPONENT) {
struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
struct acpi_device *adev = to_acpi_device_node(dev->fwnode);
struct acpi_iort_named_component *ncomp;
 
-   if (!adev) {
-   status = AE_NOT_FOUND;
+   if (!adev)
goto out;
-   }
 
status = acpi_get_name(adev->handle, ACPI_FULL_PATHNAME, &buf);
if (ACPI_FAILURE(status)) {
@@ -289,8 +287,6 @@ static acpi_status iort_match_node_callback(struct 
acpi_iort_node *node,
 */
status = pci_rc->pci_segment_number == pci_domain_nr(bus) ?
AE_OK : AE_NOT_FOUND;
-   } else {
-   status = AE_NOT_FOUND;
}
 out:
return status;
-- 
1.7.12.4



[PATCH 1/1] of/fdt: failed to mark hotplug range message

2016-12-21 Thread Heinrich Schuchardt
If marking a hotplug range fails a message
"failed to mark hotplug range" is written.

The end address is base + size - 1.

Signed-off-by: Heinrich Schuchardt 
---
 drivers/of/fdt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index c9b5cac03b36..fd129b6e5396 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1057,7 +1057,7 @@ int __init early_init_dt_scan_memory(unsigned long node, 
const char *uname,
 
if (early_init_dt_mark_hotplug_memory_arch(base, size))
pr_warn("failed to mark hotplug range 0x%llx - 
0x%llx\n",
-   base, base + size);
+   base, base + size - 1);
}
 
return 0;
-- 
2.11.0



Re: [PATCHv6 6/7] printk: use printk_safe buffers in printk

2016-12-21 Thread Sergey Senozhatsky
Hello,

On (12/21/16 23:36), Sergey Senozhatsky wrote:
> Use printk_safe per-CPU buffers in printk recursion-prone blocks:
> -- around logbuf_lock protected sections in vprintk_emit() and
>console_unlock()
> -- around down_trylock_console_sem() and up_console_sem()
> 
> Note that this solution addresses deadlocks caused by printk()
> recursive calls only. That is vprintk_emit() and console_unlock().

several questions.

so my plan was to introduce printk-safe and to switch vprintk_emit()
and console_sem related functions (like console_unlock(), etc.) to
printk-safe first. and switch the remaining logbuf_lock users, like
devkmsg_open()/syslog_print()/etc, in a followup, pretty much
mechanical "find logbuf_lock - add printk_safe", patch. but that
followup patch is bigger than I expected (still mechanical tho);
so I want to re-group.

there are
9 raw_spin_lock_irq(&logbuf_lock)
7 raw_spin_lock_irqsave(&logbuf_lock, flags)
and
12 raw_spin_lock_irq(&logbuf_lock)
8 raw_spin_unlock_irqrestore(&logbuf_lock, flags)

wrapping each one of them in printk_safe_enter()/printk_safe_enter_irq()
and printk_safe_exit()/printk_safe_exit_irq() is a bit boring. so I have
several options: one of them is to add printk_safe_{enter,exit}_irq() and,
along with it, a bunch of help macros (to printk.c):

(questions below)

/*
 * Helper macros to lock/unlock logbuf_lock in deadlock safe
 * manner (logbuf_lock may spin_dump() in lock/unlock).
 */
#define lock_logbuf(flags)  \
do {\
printk_safe_enter(flags);   \
raw_spin_lock(&logbuf_lock);\
} while (0)

#define unlock_logbuf(flags)\
do {\
raw_spin_unlock(&logbuf_lock);  \
printk_safe_exit(flags);\
} while (0)

#define lock_logbuf_irq()   \
do {\
printk_safe_enter_irq();\
raw_spin_lock(&logbuf_lock);\
} while (0)

#define unlock_logbuf_irq() \
do {\
raw_spin_unlock(&logbuf_lock);  \
printk_safe_exit_irq(); \
} while (0)


so this

printk_safe_enter_irq();
raw_spin_lock(&logbuf_lock);
...
raw_spin_unlock(&logbuf_lock);
printk_safe_exit(flags);

or this

printk_safe_enter_irq();
raw_spin_lock(&logbuf_lock);
...
raw_spin_unlock(&logbuf_lock);
printk_safe_exit_irq();


becomes this

lock_logbuf(flags);
...
unlock_logbuf(flags);

and this

lock_logbuf_irq();
...
unlock_logbuf_irq();


questions:

-- the approach
 another solution? switch those raw_spin_{lock,unlock}_irq to irqsave/irqrestore
 (?) and use the existing printk_safe_enter()/printk_safe_exit(),
 so *_irq() versions of lock_logbuf/printk_safe macros will not be needed?

-- the naming
 are lock_logbuf()/unlock_logbuf() and lock_logbuf_irq()/unlock_logbuf_irq()
 good enough? (if good at all)

-ss


Re: [PATCH 2/2] net: wireless: fix to uses struct

2016-12-21 Thread kbuild test robot
Hi Ozgur,

[auto build test ERROR on mac80211-next/master]
[also build test ERROR on v4.9 next-20161221]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Ozgur-Karatas/net-wireless-fixed-to-checkpatch-errors/20161222-125128
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git 
master
config: i386-randconfig-x006-201651 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/wireless/reg.c: In function 'regulatory_hint_core':
>> net/wireless/reg.c:2294:28: error: 'regulatory_request' undeclared (first 
>> use in this function)
 request = kzalloc(sizeof(*regulatory_request), GFP_KERNEL);
   ^~
   net/wireless/reg.c:2294:28: note: each undeclared identifier is reported 
only once for each function it appears in
   net/wireless/reg.c: In function 'regulatory_hint_user':
   net/wireless/reg.c:2316:28: error: 'regulatory_request' undeclared (first 
use in this function)
 request = kzalloc(sizeof(*regulatory_request), GFP_KERNEL);
   ^~
   net/wireless/reg.c: In function 'regulatory_hint':
   net/wireless/reg.c:2388:28: error: 'regulatory_request' undeclared (first 
use in this function)
 request = kzalloc(sizeof(*regulatory_request), GFP_KERNEL);
   ^~

vim +/regulatory_request +2294 net/wireless/reg.c

  2288   * and when we restore regulatory settings.
  2289   */
  2290  static int regulatory_hint_core(const char *alpha2)
  2291  {
  2292  struct regulatory_request *request;
  2293  
> 2294  request = kzalloc(sizeof(*regulatory_request), GFP_KERNEL);
  2295  if (!request)
  2296  return -ENOMEM;
  2297  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: x86: warning in unwind_get_return_address

2016-12-21 Thread Josh Poimboeuf
On Wed, Dec 21, 2016 at 01:46:36PM +0100, Andrey Konovalov wrote:
> On Wed, Dec 21, 2016 at 12:36 AM, Josh Poimboeuf  wrote:
> >
> > Thanks.  Looking at the stack trace, my guess is that an interrupt hit
> > while running in generated BPF code, and the unwinder got confused
> > because regs->ip points to the generated code.  I may need to disable
> > that warning until we figure out a better solution.
> >
> > Can you share your .config file?
> 
> Sure, attached.

Ok, I was able to recreate with your config.  The culprit was generated
code, as I suspected, though it wasn't BPF, it was a kprobe (created by
dccpprobe_init()).

I'll make a patch to disable the warning.

-- 
Josh


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-21 Thread Dave Chinner
On Wed, Dec 21, 2016 at 04:13:03PM -0800, Chris Leech wrote:
> On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> > Hi,
> > 
> > On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner  wrote:
> > > On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris Leech wrote:
> > >> Thanks Dave,
> > >>
> > >> I'm hitting a bug at scatterlist.h:140 before I even get any iSCSI
> > >> modules loaded (virtio block) so there's something else going on in the
> > >> current merge window.  I'll keep an eye on it and make sure there's
> > >> nothing iSCSI needs fixing for.
> > >
> > > OK, so before this slips through the cracks.
> > >
> > > Linus - your tree as of a few minutes ago still panics immediately
> > > when starting xfstests on iscsi devices. It appears to be a
> > > scatterlist corruption and not an iscsi problem, so the iscsi guys
> > > seem to have bounced it and no-one is looking at it.
> > 
> > Hmm. There's not much to go by.
> > 
> > Can somebody in iscsi-land please try to just bisect it - I'm not
> > seeing a lot of clues to where this comes from otherwise.
> 
> Yeah, my hopes of this being quickly resolved by someone else didn't
> work out and whatever is going on in that test VM is looking like a
> different kind of odd.  I'm saving that off for later, and seeing if I
> can't be a bisect on the iSCSI issue.

There may be deeper issues. I just started running scalability tests
(e.g. 16-way fsmark create tests) and about a minute in I got a
directory corruption reported - something I hadn't seen in the dev
cycle at all. I unmounted the fs, mkfs'd it again, ran the
workload again and about a minute in this fired:

[628867.607417] [ cut here ]
[628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
shadow_lru_isolate+0x171/0x220
[628867.610702] Modules linked in:
[628867.611375] CPU: 2 PID: 16925 Comm: kworker/2:97 Tainted: GW   
4.9.0-dgc #18
[628867.613382] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[628867.616179] Workqueue: events rht_deferred_worker
[628867.632422] Call Trace:
[628867.634691]  dump_stack+0x63/0x83
[628867.637937]  __warn+0xcb/0xf0
[628867.641359]  warn_slowpath_null+0x1d/0x20
[628867.643362]  shadow_lru_isolate+0x171/0x220
[628867.644627]  __list_lru_walk_one.isra.11+0x79/0x110
[628867.645780]  ? __list_lru_init+0x70/0x70
[628867.646628]  list_lru_walk_one+0x17/0x20
[628867.647488]  scan_shadow_nodes+0x34/0x50
[628867.648358]  shrink_slab.part.65.constprop.86+0x1dc/0x410
[628867.649506]  shrink_node+0x57/0x90
[628867.650233]  do_try_to_free_pages+0xdd/0x230
[628867.651157]  try_to_free_pages+0xce/0x1a0
[628867.652342]  __alloc_pages_slowpath+0x2df/0x960
[628867.653332]  ? __might_sleep+0x4a/0x80
[628867.654148]  __alloc_pages_nodemask+0x24b/0x290
[628867.655237]  kmalloc_order+0x21/0x50
[628867.656016]  kmalloc_order_trace+0x24/0xc0
[628867.656878]  __kmalloc+0x17d/0x1d0
[628867.657644]  bucket_table_alloc+0x195/0x1d0
[628867.658564]  ? __might_sleep+0x4a/0x80
[628867.659449]  rht_deferred_worker+0x287/0x3c0
[628867.660366]  ? _raw_spin_unlock_irq+0xe/0x30
[628867.661294]  process_one_work+0x1de/0x4d0
[628867.662208]  worker_thread+0x4b/0x4f0
[628867.662990]  kthread+0x10c/0x140
[628867.663687]  ? process_one_work+0x4d0/0x4d0
[628867.664564]  ? kthread_create_on_node+0x40/0x40
[628867.665523]  ret_from_fork+0x25/0x30
[628867.666317] ---[ end trace 7c38634006a9955e ]---

Now, this workload does not touch the page cache at all - it's
entirely an XFS metadata workload, so it should not really be
affecting the working set code.

And worse, on that last error, the /host/ is now going into meltdown
(running 4.7.5) with 32 CPUs all burning down in ACPI code:

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
35074 root  -2   0   0  0  0 R  99.0  0.0  12:38.92 acpi_pad/12
35079 root  -2   0   0  0  0 R  99.0  0.0  12:39.40 acpi_pad/16
35080 root  -2   0   0  0  0 R  99.0  0.0  12:39.29 acpi_pad/17
35085 root  -2   0   0  0  0 R  99.0  0.0  12:39.35 acpi_pad/22
35087 root  -2   0   0  0  0 R  99.0  0.0  12:39.13 acpi_pad/24
35090 root  -2   0   0  0  0 R  99.0  0.0  12:38.89 acpi_pad/27
35093 root  -2   0   0  0  0 R  99.0  0.0  12:38.88 acpi_pad/30
35063 root  -2   0   0  0  0 R  98.1  0.0  12:40.64 acpi_pad/1
35065 root  -2   0   0  0  0 R  98.1  0.0  12:40.38 acpi_pad/3
35066 root  -2   0   0  0  0 R  98.1  0.0  12:40.30 acpi_pad/4
35067 root  -2   0   0  0  0 R  98.1  0.0  12:40.82 acpi_pad/5
35077 root  -2   0   0  0  0 R  98.1  0.0  12:39.65 acpi_pad/14
35078 root  -2   0   0  0  0 R  98.1  0.0  12:39.58 acpi_pad/15
35081 root  -2   0   0  0  0 R  98.1  0.0  12:39.32 acpi_pad/18
35072 root  -2   0   0  0  0 R  96.2  0.0  12:40.14 acpi_pad

Re: kmod: provide wrappers for kmod_concurrent inc/dec

2016-12-21 Thread Jessica Yu

+++ Luis R. Rodriguez [08/12/16 11:48 -0800]:

kmod_concurrent is used as an atomic counter for enabling
the allowed limit of modprobe calls, provide wrappers for it
to enable this to be expanded on more easily. This will be done
later.

Signed-off-by: Luis R. Rodriguez 
---
kernel/kmod.c | 27 +--
1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index cb6f7ca7b8a5..049d7eabda38 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -44,6 +44,9 @@
#include 

extern int max_threads;
+
+static atomic_t kmod_concurrent = ATOMIC_INIT(0);
+
unsigned int max_modprobes;
module_param(max_modprobes, uint, 0644);
MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
@@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
return -ENOMEM;
}

+static int kmod_umh_threads_get(void)
+{
+   atomic_inc(&kmod_concurrent);
+   if (atomic_read(&kmod_concurrent) < max_modprobes)


Should this not be <=? I think this only allows up to max_modprobes-1 
concurrent threads.


+   return 0;
+   atomic_dec(&kmod_concurrent);
+   return -ENOMEM;
+}
+
+static void kmod_umh_threads_put(void)
+{
+   atomic_dec(&kmod_concurrent);
+}
+
/**
 * __request_module - try to load a kernel module
 * @wait: wait (or not) for the operation to complete
@@ -129,7 +146,6 @@ int __request_module(bool wait, const char *fmt, ...)
va_list args;
char module_name[MODULE_NAME_LEN];
int ret;
-   static atomic_t kmod_concurrent = ATOMIC_INIT(0);
static int kmod_loop_msg;

/*
@@ -153,8 +169,8 @@ int __request_module(bool wait, const char *fmt, ...)
if (ret)
return ret;

-   atomic_inc(&kmod_concurrent);
-   if (atomic_read(&kmod_concurrent) > max_modprobes) {
+   ret = kmod_umh_threads_get();
+   if (ret) {
/* We may be blaming an innocent here, but unlikely */
if (kmod_loop_msg < 5) {
printk(KERN_ERR
@@ -162,15 +178,14 @@ int __request_module(bool wait, const char *fmt, ...)
   module_name);
kmod_loop_msg++;
}
-   atomic_dec(&kmod_concurrent);
-   return -ENOMEM;
+   return ret;
}

trace_module_request(module_name, wait, _RET_IP_);

ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);

-   atomic_dec(&kmod_concurrent);
+   kmod_umh_threads_put();
return ret;
}
EXPORT_SYMBOL(__request_module);
--
2.10.1



Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread George Spelvin
Andy Lutomirski wrote:
> I don't even think it needs that.  This is just adding a
> non-destructive final operation, right?

It is, but the problem is that SipHash is intended for *small* inputs,
so the standard implementations aren't broken into init/update/final
functions.

There's just one big function that keeps the state variables in
registers and never stores them anywhere.

If we *had* init/update/final functions, then it would be trivial.

> Just to clarify, if we replace SipHash with a black box, I think this
> effectively means, where "entropy" is random_get_entropy() || jiffies
> || current->pid:

> The first call returns H(random seed || entropy_0 || secret).  The
> second call returns H(random seed || entropy_0 || secret || entropy_1
> || secret).  Etc.

Basically, yes.  I was skipping the padding byte and keying the
finalization rounds on the grounds of "can't hurt and might help",
but we could do it a more standard way.

> If not, then I have a fairly strong preference to keep whatever
> construction we come up with consistent with something that could
> actually happen with invocations of unmodified SipHash -- then all the
> security analysis on SipHash goes through.

Okay.  I don't think it makes a difference, but it's not a *big* waste
of time.  If we have finalization rounds, we can reduce the secret
to 128 bits.

If we include the padding byte, we can do one of two things:
1) Make the secret 184 bits, to fill up the final partial word as
   much as possible, or
2) Make the entropy 1 byte smaller and conceptually misalign the
   secret.  What we'd actually do is remove the last byte of
   the secret and include it in the entropy words, but that's
   just a rotation of the secret between storage and hashing.

Also, I assume you'd like SipHash-2-4, since you want to rely
on a security analysis.

(Regarding the padding byte, getting it right might be annoying
to do exactly.  All of the security analysis depends *only* on
its low 3 bits indicating how much of the final block is used.
As it says in the SipHash paper, they included 8 bits just because
it was easy.  But if you want it exact, it's just one more byte of
state.)

> The one thing I don't like is
> that I don't see how to prove that you can't run it backwards if you
> manage to acquire a memory dump.  In fact, I that that there exist, at
> least in theory, hash functions that are secure in the random oracle
> model but that *can* be run backwards given the full state.  From
> memory, SHA-3 has exactly that property, and it would be a bit sad for
> a CSPRNG to be reversible.

Er...  get_random_int() is specifically *not* designed to be resistant
to state capture, and I didn't try.  Remember, what it's used for
is ASLR, what we're worried about is somene learning the layouts
of still-running processes, and and if you get a memory dump, you have
the memory layout!

If you want anti-backtracking, though, it's easy to add.  What we
hash is:

entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ...

You mix the output word right back in to the (unfinalized) state after
generating it.  This is still equivalent to unmodified back-box SipHash,
you're just using a (conceptually independent) SipHash invocation to
produce some of its input.

Each output is produced by copying the state, padding & finalizing after the
secret.


In fact, to make our lives easier, let's define the secret to end with
a counter byte that happens to be equal to the padding byte.  The input
stream will be:

Previous output: 8 (or 4 for HalfSipHash) bytes
Entropy: 15 bytes (8 bytes timer, 4 bytes jiffies, 3 bytes pid)
Secret: 16 bytes
Counter: 1 byte
...repeat...

> We could also periodically mix in a big (128-bit?) chunk of fresh
> urandom output to keep the bad guys guessing.

Simpler and faster to just update the global master secret.
The state is per-CPU, so mixing in has to be repeated per CPU.


With these changes, I'm satisifed that it's secure, cheap, has a
sufficiently wide state size, *and* all standard SipHash analysis applies.

The only remaining issues are:
1) How many rounds, and
2) May we use HalfSipHash?

I'd *like* to persuade you that skipping the padding byte wouldn't
invalidate any security proofs, because it's true and would simplify
the code.  But if you want 100% stock, I'm willing to cater to that.

Ted, what do you think?


Re: kmod: provide wrappers for kmod_concurrent inc/dec

2016-12-21 Thread Jessica Yu

+++ Luis R. Rodriguez [16/12/16 09:05 +0100]:

On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:

On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez  
wrote:
> > > kmod_concurrent is used as an atomic counter for enabling
> > > the allowed limit of modprobe calls, provide wrappers for it
> > > to enable this to be expanded on more easily. This will be done
> > > later.
> > >
> > > Signed-off-by: Luis R. Rodriguez 
> > > ---
> > >  kernel/kmod.c | 27 +--
> > >  1 file changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > index cb6f7ca7b8a5..049d7eabda38 100644
> > > --- a/kernel/kmod.c
> > > +++ b/kernel/kmod.c
> > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > > return -ENOMEM;
> > >  }
> > >
> > > +static int kmod_umh_threads_get(void)
> > > +{
> > > +   atomic_inc(&kmod_concurrent);

This approach might actually cause false failures. If we
are on the limit and more processes do this increment
in parallel, it makes the number bigger that it should be.


This approach is *exactly* what the existing code does :P
I just provided wrappers. I agree with the old approach though,
reason is it acts as a lock in for the bump. 


I think what Petr meant was that we could run into false failures when multiple
atomic increments happen between the first increment and the subsequent
atomic_read.

Say max_modprobes is 64 -

  atomic_inc(&kmod_concurrent); // thread 1: kmod_concurrent is 63
   atomic_inc(&kmod_concurrent); // thread 2: kmod_concurrent is 64
atomic_inc(&kmod_concurrent); // thread 3: kmod_concurrent is 65
  if (atomic_read(&kmod_concurrent) < max_modprobes) // if all threads read 
65 here, then all will error out
  return 0;  // when the first two 
should have succeeded (false failures)
  atomic_dec(&kmod_concurrent);
  return -ENOMEM;

But yeah, I think this issue was already in the existing kmod code..

Jessica


Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage

2016-12-21 Thread Jason A. Donenfeld
Hi George,

On Thu, Dec 22, 2016 at 4:55 AM, George Spelvin
 wrote:
> Do we have to go through this?  No, the benchmark was *not* bogus.
> Then I replaced the kernel #includes with the necessary typedefs
> and #defines to make it compile in user-space.
> * I didn't iterate 100K times, I timed the functions *once*.
> * I saved the times in a buffer and printed them all at the end
>   so printf() wouldn't pollute the caches.
> * Before every even-numbered iteration, I flushed the I-cache
>   of everything from _init to _fini (i.e. all the non-library code).
>   This cold-cache case is what is going to happen in the kernel.

Wow! Great. Thanks for the pointers on the right way to do this. Very
helpful, and enlightening results indeed. Think you could send me the
whole .c of what you finally came up with? I'd like to run this on
some other architectures; I've got a few odd boxes laying around here.

> The P4 results were:
> SipHash actually wins slightly in the cold-cache case, because
> it iterates more.  In the hot-cache case, it loses
> Core 2 duo:
> Pretty much a tie, honestly.
> Ivy Bridge:
> Modern processors *hate* cold caches.  But notice how md5 is *faster*
> than SipHash on hot-cache IPv6.
> Ivy Bridge, -m64:
> Of course, when you compile -m64, SipHash is unbeatable.

Okay, so I think these results are consistent with some of the
assessments from before -- that SipHash is really just fine as a
replacement for MD5. Not great on older 32-bit x86, but not too
horrible, and the performance improvements on every other architecture
and the security improvements everywhere are a net good.

> Here's the modified benchmark() code.  The entire package is
> a bit voluminous for the mailing list, but anyone is welcome to it.

Please do send! I'm sure I'll learn from reading it. Thanks again for
doing the hardwork of putting something proper together.

Thanks,
Jason


[PATCH v3 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel

2016-12-21 Thread Geoff Lansberry
From: Jaret Cantu 

Repeated polling attempts cause a NULL dereference error to occur.
This is because the state of the trf7970a is currently reading but
another request has been made to send a command before it has finished.

The solution is to properly kill the waiting reading (workqueue)
before failing on the send.

Signed-off-by: Geoff Lansberry 
---
 drivers/nfc/trf7970a.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index e3c72c6..ba5f9b8 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -1496,6 +1496,10 @@ static int trf7970a_send_cmd(struct nfc_digital_dev 
*ddev,
(trf->state != TRF7970A_ST_IDLE_RX_BLOCKED)) {
dev_err(trf->dev, "%s - Bogus state: %d\n", __func__,
trf->state);
+   if (trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA ||
+   trf->state == TRF7970A_ST_WAIT_FOR_RX_DATA_CONT)
+   trf->ignore_timeout =
+   !cancel_delayed_work(&trf->timeout_work);
ret = -EIO;
goto out_err;
}
-- 
2.7.4



[PATCH v3 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage

2016-12-21 Thread Geoff Lansberry
The TRF7970A has configuration options for supporting hardware designs
with 1.8 Volt or 3.3 Volt IO.   This commit adds a device tree option,
using a fixed regulator binding, for setting the io voltage to match
the hardware configuration. If no option is supplied it defaults to
3.3 volt configuration.

Signed-off-by: Geoff Lansberry 
---
 .../devicetree/bindings/net/nfc/trf7970a.txt   |  2 ++
 drivers/nfc/trf7970a.c | 26 +-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 8b01fc81..b5777d8 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -21,6 +21,7 @@ Optional SoC Specific Properties:
 - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
   where an extra byte is returned by Read Multiple Block commands issued
   to Type 5 tags.
+- vdd-io-supply: Regulator specifying voltage for vdd-io
 - clock-frequency: Set to specify that the input frequency to the trf7970a is 
1356Hz or 2712Hz
 
 Example (for ARM-based BeagleBone with TRF7970A on SPI1):
@@ -40,6 +41,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
  <&gpio2 5 GPIO_ACTIVE_LOW>;
vin-supply = <&ldo3_reg>;
vin-voltage-override = <500>;
+   vdd-io-supply = <&ldo2_reg>;
autosuspend-delay = <3>;
irq-status-read-quirk;
en2-rf-quirk;
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index b1cd4ef..e3c72c6 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -444,6 +444,7 @@ struct trf7970a {
u8  iso_ctrl_tech;
u8  modulator_sys_clk_ctrl;
u8  special_fcn_reg1;
+   u8  io_ctrl;
unsigned intguard_time;
int technology;
int framing;
@@ -1051,6 +1052,11 @@ static int trf7970a_init(struct trf7970a *trf)
if (ret)
goto err_out;
 
+   ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
+   trf->io_ctrl | TRF7970A_REG_IO_CTRL_VRS(0x1));
+   if (ret)
+   goto err_out;
+
ret = trf7970a_write(trf, TRF7970A_NFC_TARGET_LEVEL, 0);
if (ret)
goto err_out;
@@ -1767,7 +1773,7 @@ static int _trf7970a_tg_listen(struct nfc_digital_dev 
*ddev, u16 timeout,
goto out_err;
 
ret = trf7970a_write(trf, TRF7970A_REG_IO_CTRL,
-   TRF7970A_REG_IO_CTRL_VRS(0x1));
+   trf->io_ctrl | TRF7970A_REG_IO_CTRL_VRS(0x1));
if (ret)
goto out_err;
 
@@ -2105,6 +2111,24 @@ static int trf7970a_probe(struct spi_device *spi)
if (uvolts > 400)
trf->chip_status_ctrl = TRF7970A_CHIP_STATUS_VRS5_3;
 
+   trf->regulator = devm_regulator_get(&spi->dev, "vdd-io");
+   if (IS_ERR(trf->regulator)) {
+   ret = PTR_ERR(trf->regulator);
+   dev_err(trf->dev, "Can't get VDD_IO regulator: %d\n", ret);
+   goto err_destroy_lock;
+   }
+
+   ret = regulator_enable(trf->regulator);
+   if (ret) {
+   dev_err(trf->dev, "Can't enable VDD_IO: %d\n", ret);
+   goto err_destroy_lock;
+   }
+
+   if (regulator_get_voltage(trf->regulator) == 180) {
+   trf->io_ctrl = TRF7970A_REG_IO_CTRL_IO_LOW;
+   dev_dbg(trf->dev, "trf7970a config vdd_io to 1.8V\n");
+   }
+
trf->ddev = nfc_digital_allocate_device(&trf7970a_nfc_ops,
TRF7970A_SUPPORTED_PROTOCOLS,
NFC_DIGITAL_DRV_CAPS_IN_CRC |
-- 
2.7.4



[PATCH v3 1/3] NFC: trf7970a: add device tree option for 27MHz clock

2016-12-21 Thread Geoff Lansberry
The TRF7970A has configuration options to support hardware designs
which use a 27.12MHz clock. This commit adds a device tree option
'clock-frequency' to support configuring the this chip for default
13.56MHz clock or the optional 27.12MHz clock.

Signed-off-by: Geoff Lansberry 
---
 .../devicetree/bindings/net/nfc/trf7970a.txt   |  2 +
 drivers/nfc/trf7970a.c | 50 +-
 2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt 
b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
index 32b35a0..8b01fc81 100644
--- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
+++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
@@ -21,6 +21,7 @@ Optional SoC Specific Properties:
 - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
   where an extra byte is returned by Read Multiple Block commands issued
   to Type 5 tags.
+- clock-frequency: Set to specify that the input frequency to the trf7970a is 
1356Hz or 2712Hz
 
 Example (for ARM-based BeagleBone with TRF7970A on SPI1):
 
@@ -43,6 +44,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
irq-status-read-quirk;
en2-rf-quirk;
t5t-rmb-extra-byte-quirk;
+   clock-frequency = <2712>;
status = "okay";
};
 };
diff --git a/drivers/nfc/trf7970a.c b/drivers/nfc/trf7970a.c
index 26c9dbb..b1cd4ef 100644
--- a/drivers/nfc/trf7970a.c
+++ b/drivers/nfc/trf7970a.c
@@ -124,6 +124,9 @@
 NFC_PROTO_ISO15693_MASK | NFC_PROTO_NFC_DEP_MASK)
 
 #define TRF7970A_AUTOSUSPEND_DELAY 3 /* 30 seconds */
+#define TRF7970A_13MHZ_CLOCK_FREQUENCY 1356
+#define TRF7970A_27MHZ_CLOCK_FREQUENCY 2712
+
 
 #define TRF7970A_RX_SKB_ALLOC_SIZE 256
 
@@ -1056,12 +1059,11 @@ static int trf7970a_init(struct trf7970a *trf)
 
trf->chip_status_ctrl &= ~TRF7970A_CHIP_STATUS_RF_ON;
 
-   ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL, 0);
+   ret = trf7970a_write(trf, TRF7970A_MODULATOR_SYS_CLK_CTRL,
+   trf->modulator_sys_clk_ctrl);
if (ret)
goto err_out;
 
-   trf->modulator_sys_clk_ctrl = 0;
-
ret = trf7970a_write(trf, TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS,
TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLH_96 |
TRF7970A_ADJUTABLE_FIFO_IRQ_LEVELS_WLL_32);
@@ -1181,27 +1183,37 @@ static int trf7970a_in_config_rf_tech(struct trf7970a 
*trf, int tech)
switch (tech) {
case NFC_DIGITAL_RF_TECH_106A:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443A_106;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xf8) |
+   TRF7970A_MODULATOR_DEPTH_OOK;
trf->guard_time = TRF7970A_GUARD_TIME_NFCA;
break;
case NFC_DIGITAL_RF_TECH_106B:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_14443B_106;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xf8) |
+   TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCB;
break;
case NFC_DIGITAL_RF_TECH_212F:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_212;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xf8) |
+   TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
break;
case NFC_DIGITAL_RF_TECH_424F:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_FELICA_424;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_ASK10;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xf8) |
+   TRF7970A_MODULATOR_DEPTH_ASK10;
trf->guard_time = TRF7970A_GUARD_TIME_NFCF;
break;
case NFC_DIGITAL_RF_TECH_ISO15693:
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_15693_SGL_1OF4_2648;
-   trf->modulator_sys_clk_ctrl = TRF7970A_MODULATOR_DEPTH_OOK;
+   trf->modulator_sys_clk_ctrl =
+   (trf->modulator_sys_clk_ctrl & 0xf8) |
+   TRF7970A_MODULATOR_DEPTH_OOK;
trf->guard_time = TRF7970A_GUARD_TIME_15693;
break;
default:
@@ -1571,17 +1583,23 @@ static int trf7970a_tg_config_rf_tech(struct trf7970a 
*trf, int tech)
trf->iso_ctrl_tech = TRF7970A_ISO_CTRL_NFC_NFC_CE_MODE |
 

[PATCH 2/5] arm: mvebu: support for SMP on 98DX3336 SoC

2016-12-21 Thread Chris Packham
Compared to the armada-xp the 98DX3336 uses different registers to set
the boot address for the secondary CPU so a new enable-method is needed.
This will only work if the machine definition doesn't define an overall
smp_ops because there is not currently a way of overriding this from the
device tree if it is set in the machine definition.

Signed-off-by: Chris Packham 
---
 .../bindings/arm/marvell/98dx3236-resume-ctrl.txt  | 18 ++
 arch/arm/mach-mvebu/Makefile   |  1 +
 arch/arm/mach-mvebu/common.h   |  1 +
 arch/arm/mach-mvebu/platsmp.c  | 43 ++
 arch/arm/mach-mvebu/pmsu-98dx3236.c| 69 ++
 5 files changed, 132 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/arm/marvell/98dx3236-resume-ctrl.txt
 create mode 100644 arch/arm/mach-mvebu/pmsu-98dx3236.c

diff --git 
a/Documentation/devicetree/bindings/arm/marvell/98dx3236-resume-ctrl.txt 
b/Documentation/devicetree/bindings/arm/marvell/98dx3236-resume-ctrl.txt
new file mode 100644
index ..8082ba872edd
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/marvell/98dx3236-resume-ctrl.txt
@@ -0,0 +1,18 @@
+Resume Control
+--
+Available on Marvell SOCs: 98DX3336 and 98DX4251
+
+Required properties:
+
+- compatible: must be "marvell,98dx3336-resume-ctrl"
+
+- reg: Should contain resume control registers location and length
+
+Example:
+
+resume@20980 {
+   compatible = "marvell,98dx3336-resume-ctrl";
+   reg = <0x20980 0x10>;
+};
+
+
diff --git a/arch/arm/mach-mvebu/Makefile b/arch/arm/mach-mvebu/Makefile
index 6c6497e80a7b..2a2dd8324fb8 100644
--- a/arch/arm/mach-mvebu/Makefile
+++ b/arch/arm/mach-mvebu/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_MACH_MVEBU_ANY) += system-controller.o 
mvebu-soc-id.o
 
 ifeq ($(CONFIG_MACH_MVEBU_V7),y)
 obj-y   += cpu-reset.o board-v7.o coherency.o 
coherency_ll.o pmsu.o pmsu_ll.o
+obj-y   += pmsu-98dx3236.o
 
 obj-$(CONFIG_PM)+= pm.o pm-board.o
 obj-$(CONFIG_SMP)   += platsmp.o headsmp.o platsmp-a9.o 
headsmp-a9.o
diff --git a/arch/arm/mach-mvebu/common.h b/arch/arm/mach-mvebu/common.h
index 6b775492cfad..099dabf23461 100644
--- a/arch/arm/mach-mvebu/common.h
+++ b/arch/arm/mach-mvebu/common.h
@@ -27,4 +27,5 @@ void __iomem *mvebu_get_scu_base(void);
 
 int mvebu_pm_suspend_init(void (*board_pm_enter)(void __iomem *sdram_reg,
u32 srcmd));
+void mv98dx3236_resume_set_cpu_boot_addr(int hw_cpu, void *boot_addr);
 #endif
diff --git a/arch/arm/mach-mvebu/platsmp.c b/arch/arm/mach-mvebu/platsmp.c
index 46c742d3bd41..3c9ab9a008ad 100644
--- a/arch/arm/mach-mvebu/platsmp.c
+++ b/arch/arm/mach-mvebu/platsmp.c
@@ -182,5 +182,48 @@ const struct smp_operations armada_xp_smp_ops __initconst 
= {
 #endif
 };
 
+static int mv98dx3236_boot_secondary(unsigned int cpu, struct task_struct 
*idle)
+{
+   int ret, hw_cpu;
+
+   pr_info("Booting CPU %d\n", cpu);
+
+   hw_cpu = cpu_logical_map(cpu);
+   set_secondary_cpu_clock(hw_cpu);
+   mv98dx3236_resume_set_cpu_boot_addr(hw_cpu,
+   armada_xp_secondary_startup);
+
+   /*
+* This is needed to wake up CPUs in the offline state after
+* using CPU hotplug.
+*/
+   arch_send_wakeup_ipi_mask(cpumask_of(cpu));
+
+   /*
+* This is needed to take secondary CPUs out of reset on the
+* initial boot.
+*/
+   ret = mvebu_cpu_reset_deassert(hw_cpu);
+   if (ret) {
+   pr_warn("unable to boot CPU: %d\n", ret);
+   return ret;
+   }
+
+   return 0;
+}
+
+struct smp_operations mv98dx3236_smp_ops __initdata = {
+   .smp_init_cpus  = armada_xp_smp_init_cpus,
+   .smp_prepare_cpus   = armada_xp_smp_prepare_cpus,
+   .smp_boot_secondary = mv98dx3236_boot_secondary,
+   .smp_secondary_init = armada_xp_secondary_init,
+#ifdef CONFIG_HOTPLUG_CPU
+   .cpu_die= armada_xp_cpu_die,
+   .cpu_kill   = armada_xp_cpu_kill,
+#endif
+};
+
 CPU_METHOD_OF_DECLARE(armada_xp_smp, "marvell,armada-xp-smp",
  &armada_xp_smp_ops);
+CPU_METHOD_OF_DECLARE(mv98dx3236_smp, "marvell,98dx3236-smp",
+ &mv98dx3236_smp_ops);
diff --git a/arch/arm/mach-mvebu/pmsu-98dx3236.c 
b/arch/arm/mach-mvebu/pmsu-98dx3236.c
new file mode 100644
index ..fadc81d0c051
--- /dev/null
+++ b/arch/arm/mach-mvebu/pmsu-98dx3236.c
@@ -0,0 +1,69 @@
+/**
+ * CPU resume support for 98DX4521 internal CPU (a.k.a. MSYS).
+ */
+
+#define pr_fmt(fmt) "mv98dx3236-resume: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include "common.h"
+
+static void __iomem *mv98dx3236_resume_base;
+#define MV98DX3236_CPU_RESUME_CTRL_OFFSET  0x08
+#define MV98DX3236_CPU_RESUME_ADDR_OFFSET  0x04
+
+static con

[PATCH 5/5] arm: mvebu: Add device tree for db-dxbc2 and db-xc3-24g4xg boards

2016-12-21 Thread Chris Packham
These boards are Marvell's evaluation boards for the 98DX4251 and
98DX3336 SoCs.

Signed-off-by: Chris Packham 
---
 arch/arm/boot/dts/db-dxbc2.dts  | 159 
 arch/arm/boot/dts/db-xc3-24g4xg.dts | 155 +++
 2 files changed, 314 insertions(+)
 create mode 100644 arch/arm/boot/dts/db-dxbc2.dts
 create mode 100644 arch/arm/boot/dts/db-xc3-24g4xg.dts

diff --git a/arch/arm/boot/dts/db-dxbc2.dts b/arch/arm/boot/dts/db-dxbc2.dts
new file mode 100644
index ..f56786cea5f8
--- /dev/null
+++ b/arch/arm/boot/dts/db-dxbc2.dts
@@ -0,0 +1,159 @@
+/*
+ * Device Tree file for DB-DXBC2 board
+ *
+ * Copyright (C) 2016 Allied Telesis Labs
+ *
+ * Based on armada-xp-db.dts
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED , WITHOUT WARRANTY OF ANY KIND
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Note: this Device Tree assumes that the bootloader has remapped the
+ * internal registers to 0xf100 (instead of the default
+ * 0xd000). The 0xf100 is the default used by the recent,
+ * DT-capable, U-Boot bootloaders provided by Marvell. Some earlier
+ * boards were delivered with an older version of the bootloader that
+ * left internal registers mapped at 0xd000. If you are in this
+ * situation, you should either update your bootloader (preferred
+ * solution) or the below Device Tree should be adjusted.
+ */
+
+/dts-v1/;
+#include "armada-xp-98dx4251.dtsi"
+
+/ {
+   model = "Marvell Bobcat2 Evaluation Board";
+   compatible = "marvell,db-dxbc2", "marvell,armadaxp-98dx4251", 
"marvell,armadaxp", "marvell,armada-370-xp";
+
+   chosen {
+   bootargs = "console=ttyS0,115200 earlyprintk";
+   };
+
+   memory {
+   device_type = "memory";
+   reg = <0 0x 0 0x2000>; /* 512 MB */
+   };
+
+   soc {
+   ranges = ;
+
+   devbus-bootcs {
+   status = "okay";
+
+   /* Device Bus parameters are required */
+
+   /* Read parameters */
+   devbus,bus-width= <16>;
+   devbus,turn-off-ps  = <6>;
+   devbus,badr-skew-ps = <0>;
+   devbus,acc-first-ps = <124000>;
+   devbus,acc-next-ps  = <248000>;
+   devbus,rd-setup-ps  = <0>;
+   devbus,rd-hold-ps   = <0>;
+
+   /* Write parameters */
+   devbus,sync-enable = <0>;
+   devbus,wr-high-ps  = <6>;
+   devbus,wr-low-ps   = <6>;
+   devbus,ale-wr-ps   = <6>;
+   };
+
+   internal-regs {
+   serial@12000 {
+   status = "okay";
+   };
+   serial@12100 {
+   status = "okay";
+   };
+
+   i2c@11000 {
+   clock-frequency = <10>;
+   status = "okay";
+

[PATCH 3/5] pinctrl: mvebu: pinctrl driver for 98DX3236 SoC

2016-12-21 Thread Chris Packham
From: Kalyan Kinthada 

This pinctrl driver supports the 98DX3236, 98DX3336 and 98DX4251 SoCs
from Marvell.

Signed-off-by: Kalyan Kinthada 
Signed-off-by: Chris Packham 
---
 .../pinctrl/marvell,armada-98dx3236-pinctrl.txt|  46 +++
 drivers/pinctrl/mvebu/pinctrl-armada-xp.c  | 145 +
 2 files changed, 191 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/pinctrl/marvell,armada-98dx3236-pinctrl.txt

diff --git 
a/Documentation/devicetree/bindings/pinctrl/marvell,armada-98dx3236-pinctrl.txt 
b/Documentation/devicetree/bindings/pinctrl/marvell,armada-98dx3236-pinctrl.txt
new file mode 100644
index ..34c1e380adaa
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/pinctrl/marvell,armada-98dx3236-pinctrl.txt
@@ -0,0 +1,46 @@
+* Marvell 98dx3236 pinctrl driver for mpp
+
+Please refer to marvell,mvebu-pinctrl.txt in this directory for common binding
+part and usage
+
+Required properties:
+- compatible: "marvell,98dx3236-pinctrl"
+- reg: register specifier of MPP registers
+
+This driver supports all 98dx3236, 98dx3336 and 98dx4251 variants
+
+name  pins functions
+
+mpp0  0gpio, spi0(mosi), dev(ad8)
+mpp1  1gpio, spi0(miso), dev(ad9)
+mpp2  2gpio, spi0(sck), dev(ad10)
+mpp3  3gpio, spi0(cs0), dev(ad11)
+mpp4  4gpio, spi0(cs1), smi(mdc), dev(cs0)
+mpp5  5gpio, pex(rsto), dev(bootcs)
+mpp6  6gpio, dev(a2)
+mpp7  7gpio, dev(ale0)
+mpp8  8gpio, dev(ale1)
+mpp9  9gpio, dev(ready0)
+mpp10 10   gpio, dev(ad12)
+mpp11 11   gpio, uart1(rxd), uart0(cts), dev(ad13)
+mpp12 12   gpio, uart1(txd), uart0(rts), dev(ad14)
+mpp13 13   gpio, intr(out), dev(ad15)
+mpp14 14   gpio, i2c0(sck)
+mpp15 15   gpio, i2c0(sda)
+mpp16 16   gpio, dev(oe)
+mpp17 17   gpio, dev(clk)
+mpp18 18   gpio, uart1(txd)
+mpp19 19   gpio, uart1(rxd), dev(rb)
+mpp20 20   gpio, dev(we)
+mpp21 21   gpio, dev(ad0)
+mpp22 22   gpio, dev(ad1)
+mpp23 23   gpio, dev(ad2)
+mpp24 24   gpio, dev(ad3)
+mpp25 25   gpio, dev(ad4)
+mpp26 26   gpio, dev(ad5)
+mpp27 27   gpio, dev(ad6)
+mpp28 28   gpio, dev(ad7)
+mpp29 29   gpio, dev(a0)
+mpp30 30   gpio, dev(a1)
+mpp31 31   gpio, slv_smi(mdc), smi(mdc), dev(we1)
+mpp32 32   gpio, slv_smi(mdio), smi(mdio), dev(cs1)
diff --git a/drivers/pinctrl/mvebu/pinctrl-armada-xp.c 
b/drivers/pinctrl/mvebu/pinctrl-armada-xp.c
index e4ea71a9d985..2586903c59f0 100644
--- a/drivers/pinctrl/mvebu/pinctrl-armada-xp.c
+++ b/drivers/pinctrl/mvebu/pinctrl-armada-xp.c
@@ -49,6 +49,10 @@ enum armada_xp_variant {
V_MV78460   = BIT(2),
V_MV78230_PLUS  = (V_MV78230 | V_MV78260 | V_MV78460),
V_MV78260_PLUS  = (V_MV78260 | V_MV78460),
+   V_98DX3236  = BIT(3),
+   V_98DX3336  = BIT(4),
+   V_98DX4251  = BIT(5),
+   V_98DX3236_PLUS = (V_98DX3236 | V_98DX3336 | V_98DX4251),
 };
 
 static struct mvebu_mpp_mode armada_xp_mpp_modes[] = {
@@ -360,6 +364,124 @@ static struct mvebu_mpp_mode armada_xp_mpp_modes[] = {
 MPP_VAR_FUNCTION(0x1, "dev", "ad31",   V_MV78260_PLUS)),
 };
 
+static struct mvebu_mpp_mode mv98dx3236_mpp_modes[] = {
+   MPP_MODE(0,
+MPP_VAR_FUNCTION(0x0, "gpo", NULL,  V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x2, "spi0", "mosi",   V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x4, "dev", "ad8", V_98DX3236_PLUS)),
+   MPP_MODE(1,
+MPP_VAR_FUNCTION(0x0, "gpio", NULL, V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x2, "spi0", "miso",   V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x4, "dev", "ad9", V_98DX3236_PLUS)),
+   MPP_MODE(2,
+MPP_VAR_FUNCTION(0x0, "gpio", NULL, V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x2, "spi0", "csk",V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x4, "dev", "ad10",V_98DX3236_PLUS)),
+   MPP_MODE(3,
+MPP_VAR_FUNCTION(0x0, "gpio", NULL, V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x2, "spi0", "cs0",V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x4, "dev", "ad11",V_98DX3236_PLUS)),
+   MPP_MODE(4,
+MPP_VAR_FUNCTION(0x0, "gpio", NULL, V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x2, "spi0", "cs1",V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x3, "smi", "mdc", V_98DX3236_PLUS),
+MPP_VAR_FUNCTION(0x4, "dev", "cs0", V_98DX3236_PLUS)),
+   MPP_MODE(5

[PATCH 1/5] clk: mvebu: support for 98DX3236 SoC

2016-12-21 Thread Chris Packham
The 98DX3236, 98DX3336, 98DX4521 and variants have a different TCLK from
the Armada XP (200MHz vs 250MHz). The CPU core clock is fixed at 800MHz.

The clock gating options are a subset of those on the Armada XP.

The core clock divider is different to the Armada XP also.

Signed-off-by: Chris Packham 
---
 drivers/clk/mvebu/Makefile |   2 +-
 drivers/clk/mvebu/armada-xp.c  |  42 +++
 drivers/clk/mvebu/clk-cpu.c|  33 +-
 drivers/clk/mvebu/mv98dx3236-corediv.c | 207 +
 4 files changed, 280 insertions(+), 4 deletions(-)
 create mode 100644 drivers/clk/mvebu/mv98dx3236-corediv.c

diff --git a/drivers/clk/mvebu/Makefile b/drivers/clk/mvebu/Makefile
index d9ae97fb43c4..6a3681e3d6db 100644
--- a/drivers/clk/mvebu/Makefile
+++ b/drivers/clk/mvebu/Makefile
@@ -9,7 +9,7 @@ obj-$(CONFIG_ARMADA_39X_CLK)+= armada-39x.o
 obj-$(CONFIG_ARMADA_37XX_CLK)  += armada-37xx-xtal.o
 obj-$(CONFIG_ARMADA_37XX_CLK)  += armada-37xx-tbg.o
 obj-$(CONFIG_ARMADA_37XX_CLK)  += armada-37xx-periph.o
-obj-$(CONFIG_ARMADA_XP_CLK)+= armada-xp.o
+obj-$(CONFIG_ARMADA_XP_CLK)+= armada-xp.o mv98dx3236-corediv.o
 obj-$(CONFIG_ARMADA_AP806_SYSCON) += ap806-system-controller.o
 obj-$(CONFIG_ARMADA_CP110_SYSCON) += cp110-system-controller.o
 obj-$(CONFIG_DOVE_CLK) += dove.o dove-divider.o
diff --git a/drivers/clk/mvebu/armada-xp.c b/drivers/clk/mvebu/armada-xp.c
index b3094315a3c0..0413bf8284e0 100644
--- a/drivers/clk/mvebu/armada-xp.c
+++ b/drivers/clk/mvebu/armada-xp.c
@@ -52,6 +52,12 @@ static u32 __init axp_get_tclk_freq(void __iomem *sar)
return 25000;
 }
 
+/* MV98DX3236 TCLK frequency is fixed to 200MHz */
+static u32 __init mv98dx3236_get_tclk_freq(void __iomem *sar)
+{
+   return 2;
+}
+
 static const u32 axp_cpu_freqs[] __initconst = {
10,
106600,
@@ -89,6 +95,12 @@ static u32 __init axp_get_cpu_freq(void __iomem *sar)
return cpu_freq;
 }
 
+/* MV98DX3236 CLK frequency is fixed to 800MHz */
+static u32 __init mv98dx3236_get_cpu_freq(void __iomem *sar)
+{
+   return 8;
+}
+
 static const int axp_nbclk_ratios[32][2] __initconst = {
{0, 1}, {1, 2}, {2, 2}, {2, 2},
{1, 2}, {1, 2}, {1, 1}, {2, 3},
@@ -158,6 +170,14 @@ static const struct coreclk_soc_desc axp_coreclks = {
.num_ratios = ARRAY_SIZE(axp_coreclk_ratios),
 };
 
+static const struct coreclk_soc_desc mv98dx3236_coreclks = {
+   .get_tclk_freq = mv98dx3236_get_tclk_freq,
+   .get_cpu_freq = mv98dx3236_get_cpu_freq,
+   .get_clk_ratio = NULL,
+   .ratios = NULL,
+   .num_ratios = 0,
+};
+
 /*
  * Clock Gating Control
  */
@@ -195,6 +215,15 @@ static const struct clk_gating_soc_desc axp_gating_desc[] 
__initconst = {
{ }
 };
 
+static const struct clk_gating_soc_desc mv98dx3236_gating_desc[] __initconst = 
{
+   { "ge1", NULL, 3, 0 },
+   { "ge0", NULL, 4, 0 },
+   { "pex00", NULL, 5, 0 },
+   { "sdio", NULL, 17, 0 },
+   { "xor0", NULL, 22, 0 },
+   { }
+};
+
 static void __init axp_clk_init(struct device_node *np)
 {
struct device_node *cgnp =
@@ -206,3 +235,16 @@ static void __init axp_clk_init(struct device_node *np)
mvebu_clk_gating_setup(cgnp, axp_gating_desc);
 }
 CLK_OF_DECLARE(axp_clk, "marvell,armada-xp-core-clock", axp_clk_init);
+
+static void __init mv98dx3236_clk_init(struct device_node *np)
+{
+   struct device_node *cgnp =
+   of_find_compatible_node(NULL, NULL, 
"marvell,armada-xp-gating-clock");
+
+   mvebu_coreclk_setup(np, &mv98dx3236_coreclks);
+
+   if (cgnp)
+   mvebu_clk_gating_setup(cgnp, mv98dx3236_gating_desc);
+}
+CLK_OF_DECLARE(mv98dx3236_clk, "marvell,mv98dx3236-core-clock",
+  mv98dx3236_clk_init);
diff --git a/drivers/clk/mvebu/clk-cpu.c b/drivers/clk/mvebu/clk-cpu.c
index 5837eb8a212f..29f295e7a36b 100644
--- a/drivers/clk/mvebu/clk-cpu.c
+++ b/drivers/clk/mvebu/clk-cpu.c
@@ -165,7 +165,9 @@ static const struct clk_ops cpu_ops = {
.set_rate = clk_cpu_set_rate,
 };
 
-static void __init of_cpu_clk_setup(struct device_node *node)
+/* Add parameter to allow this to support different clock operations. */
+static void __init _of_cpu_clk_setup(struct device_node *node,
+   const struct clk_ops *cpu_clk_ops)
 {
struct cpu_clk *cpuclk;
void __iomem *clock_complex_base = of_iomap(node, 0);
@@ -218,7 +220,7 @@ static void __init of_cpu_clk_setup(struct device_node 
*node)
cpuclk[cpu].hw.init = &init;
 
init.name = cpuclk[cpu].clk_name;
-   init.ops = &cpu_ops;
+   init.ops = cpu_clk_ops;
init.flags = 0;
init.parent_names = &cpuclk[cpu].parent_name;
init.num_parents = 1;
@@ -243,5 +245,30 @@ static void __init of_cpu_clk_setup(struct device_node 
*node)
iounmap(clock_complex_base);
 }
 
+/* Use this function

[PATCH 4/5] arm: mvebu: Add device tree for 98DX3236 SoCs

2016-12-21 Thread Chris Packham
The Marvell 98DX3236, 98DX3336, 98DX4521 and variants are switch ASICs
with integrated CPUs. They are similar to the Armada XP SoCs but have
different I/O interfaces.

Signed-off-by: Chris Packham 
---
 .../devicetree/bindings/arm/marvell/98dx3236.txt   |  10 +
 arch/arm/boot/dts/armada-xp-98dx3236.dtsi  | 231 +
 arch/arm/boot/dts/armada-xp-98dx3336.dtsi  |  78 +++
 arch/arm/boot/dts/armada-xp-98dx4251.dtsi  |  78 +++
 4 files changed, 397 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/marvell/98dx3236.txt
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx3236.dtsi
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx3336.dtsi
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx4251.dtsi

diff --git a/Documentation/devicetree/bindings/arm/marvell/98dx3236.txt 
b/Documentation/devicetree/bindings/arm/marvell/98dx3236.txt
new file mode 100644
index ..e7dc9b2dd90b
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/marvell/98dx3236.txt
@@ -0,0 +1,10 @@
+Marvell 98DX3236, 98DX3336 and 98DX4251 Platforms Device Tree Bindings
+--
+
+Boards with a SoC of the Marvell 98DX3236, 98DX3336 and 98DX4251 families
+shall have the following property:
+
+Required root node property:
+
+compatible: one of "marvell,armadaxp-98dx3236", "marvell,armadaxp-98dx3336"
+or "marvell,armadaxp-98dx4251"
diff --git a/arch/arm/boot/dts/armada-xp-98dx3236.dtsi 
b/arch/arm/boot/dts/armada-xp-98dx3236.dtsi
new file mode 100644
index ..bac53f8b44af
--- /dev/null
+++ b/arch/arm/boot/dts/armada-xp-98dx3236.dtsi
@@ -0,0 +1,231 @@
+/*
+ * Device Tree Include file for Marvell 98dx3236 family SoC
+ *
+ * Copyright (C) 2016 Allied Telesis Labs
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED , WITHOUT WARRANTY OF ANY KIND
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Contains definitions specific to the 98dx3236 SoC that are not
+ * common to all Armada XP SoCs.
+ */
+
+#include "armada-xp.dtsi"
+
+/ {
+   model = "Marvell 98DX3236 SoC";
+   compatible = "marvell,armadaxp-98dx3236", "marvell,armadaxp", 
"marvell,armada-370-xp";
+
+   aliases {
+   gpio0 = &gpio0;
+   gpio1 = &gpio1;
+   gpio2 = &gpio2;
+   };
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   enable-method = "marvell,98dx3236-smp";
+
+   cpu@0 {
+   device_type = "cpu";
+   compatible = "marvell,sheeva-v7";
+   reg = <0>;
+   clocks = <&cpuclk 0>;
+   clock-latency = <100>;
+   };
+   };
+
+   soc {
+   ranges = ;
+
+   /*
+* 98DX3236 has 1 x1 PCIe unit Gen2.0: One unit can be
+*/
+   pcie-controller {
+   compatible = "marvell,armada-xp-pcie";
+   status = "disabled";
+   device_type = "pci";
+
+   #address-cel

Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
> Plus the benchmark was bogus anyway, and when I built a more specific
> harness -- actually comparing the TCP sequence number functions --
> SipHash was faster than MD5, even on register starved x86. So I think
> we're fine and this chapter of the discussion can come to a close, in
> order to move on to more interesting things.

Do we have to go through this?  No, the benchmark was *not* bogus.

Here's myresults from *your* benchmark.  I can't reboot some of my test
machines, so I took net/core/secure_seq.c, lib/siphash.c, lib/md5.c and
include/linux/siphash.h straight out of your test tree.

Then I replaced the kernel #includes with the necessary typedefs
and #defines to make it compile in user-space.  (Voluminous but
straightforward.)  E.g.

#define __aligned(x) __attribute__((__aligned__(x)))
#define cacheline_aligned __aligned(64)
#define CONFIG_INET 1
#define IS_ENABLED(x) 1
#define ktime_get_real_ns() 0
#define sysctl_tcp_timestamps 0

... etc.

Then I modified your benchmark code into the appended code.  The
differences are:
* I didn't iterate 100K times, I timed the functions *once*.
* I saved the times in a buffer and printed them all at the end
  so printf() wouldn't pollute the caches.
* Before every even-numbered iteration, I flushed the I-cache
  of everything from _init to _fini (i.e. all the non-library code).
  This cold-cache case is what is going to happen in the kernel.

In the results below, note that I did *not* re-flush between phases
of the test.  The effects of cacheing is clearly apparent in the tcpv4
results, where the tcpv6 code loaded the cache.

You can also see that the SipHash code benefits more from cacheing when
entered with a cold cache, as it iterates over the input words, while
the MD5 code is one big unrolled blob.

Order of computation is down the columns first, across second.

The P4 results were:
tcpv6 md5 cold: 40843488358435843568
tcpv4 md5 cold: 1052 996 9961060 996
tcpv6 siphash cold: 40803296331232963312
tcpv4 siphash cold: 29682748297227162716
tcpv6 md5 hot:   900 712 712712  712
tcpv4 md5 hot:   632 672 672672  672
tcpv6 siphash hot:  24842292234023402340
tcpv4 siphash hot:  16601560156423401564

SipHash actually wins slightly in the cold-cache case, because
it iterates more.  In the hot-cache case, it loses horribly.

Core 2 duo:
tcpv6 md5 cold: 33962868296430122832
tcpv4 md5 cold: 13681044132013321308
tcpv6 siphash cold: 29402952291624482604
tcpv4 siphash cold: 31922988357635043624
tcpv6 md5 hot:  11161032 99610081008
tcpv4 md5 hot:   936 936 936 936 936
tcpv6 siphash hot:  12001236123611881188
tcpv4 siphash hot:   936 804 804 804 804

Pretty much a tie, honestly.

Ivy Bridge:
tcpv6 md5 cold: 60866136696263586060
tcpv4 md5 cold:  816 732104610541012
tcpv6 siphash cold: 37561886215223902566
tcpv4 siphash cold: 32642108302631203526
tcpv6 md5 hot:  1062 808 824 824 832
tcpv4 md5 hot:   730 730 740 748 748
tcpv6 siphash hot:   960 952 9361112 926
tcpv4 siphash hot:   638 544 562 552 560

Modern processors *hate* cold caches.  But notice how md5 is *faster*
than SipHash on hot-cache IPv6.

Ivy Bridge, -m64:
tcpv6 md5 cold: 46803672395636163525
tcpv4 md5 cold: 10661416117911791134
tcpv6 siphash cold:  9401258199516092255
tcpv4 siphash cold: 14401269129218701621
tcpv6 md5 hot:  1372112210881088
tcpv4 md5 hot:   997 997 997 997 998
tcpv6 siphash hot:   340 340 340 352 340
tcpv4 siphash hot:   227 238 238 238 238

Of course, when you compile -m64, SipHash is unbeatable.


Here's the modified benchmark() code.  The entire package is
a bit voluminous for the mailing list, but anyone is welcome to it.

static void clflush(void)
{
extern char const _init, _fini;
char const *p = &_init;

while (p < &_fini) {
asm("clflush %0" : : "m" (*p));
p += 64;
}
}

typedef uint32_t cycles_t;
static cycles_t get_cycles(void)
{
uint32_t eax, edx;
asm volatile("rdtsc" : "=a" (eax), "=d" (edx));
return eax;
}

static int benchmark(void)
{
cycles_t start, finish;
int i;
u32 seq_number = 0;
__be32 saddr6[4] = { 1, 4, 182, 393 }, daddr6[4] = { 9192, 18288, 
222, 0xff10 };
__be32 saddr4 = 2, daddr4 = 182112;
__be16 sport = 22, dport = 41992;

[PATCH] fddi: skfp: Use more common logging styles

2016-12-21 Thread Joe Perches
Several macros use non-standard styles where format and arguments
are not verified.  Convert these to a more typical fmt, ##__VA_ARGS__
use so format and arguments match as appropriate.

Miscellanea:

o Fix format and argument mismatches
o Realign and reindent misindented block
o Strip newlines from formats and add to macro defines
o Coalesce a few consecutive logging uses to more simple single uses

Signed-off-by: Joe Perches 
---
 drivers/net/fddi/skfp/cfm.c  |  22 
 drivers/net/fddi/skfp/drvfbi.c   |   4 +-
 drivers/net/fddi/skfp/ecm.c  |  34 ++--
 drivers/net/fddi/skfp/ess.c  |  66 
 drivers/net/fddi/skfp/fplustm.c  |  24 -
 drivers/net/fddi/skfp/h/cmtdef.h |  67 +---
 drivers/net/fddi/skfp/hwmtm.c|   2 +-
 drivers/net/fddi/skfp/pcmplc.c   |  83 +++--
 drivers/net/fddi/skfp/pmf.c  |   4 +-
 drivers/net/fddi/skfp/rmt.c  |  40 +++---
 drivers/net/fddi/skfp/smt.c  | 109 +++
 drivers/net/fddi/skfp/srf.c  |  14 +++--
 12 files changed, 232 insertions(+), 237 deletions(-)

diff --git a/drivers/net/fddi/skfp/cfm.c b/drivers/net/fddi/skfp/cfm.c
index e395ace3120b..648ff9fdb909 100644
--- a/drivers/net/fddi/skfp/cfm.c
+++ b/drivers/net/fddi/skfp/cfm.c
@@ -52,7 +52,6 @@ static const char ID_sccs[] = "@(#)cfm.c  2.18 98/10/06 
(C) SK " ;
 #define ACTIONS_DONE() (smc->mib.fddiSMTCF_State &= ~AFLAG)
 #define ACTIONS(x) (x|AFLAG)
 
-#ifdef DEBUG
 /*
  * symbolic state names
  */
@@ -68,7 +67,6 @@ static const char * const cfm_states[] = {
 static const char * const cfm_events[] = {
"NONE","CF_LOOP_A","CF_LOOP_B","CF_JOIN_A","CF_JOIN_B"
 } ;
-#endif
 
 /*
  * map from state to downstream port type
@@ -230,10 +228,10 @@ void cfm(struct s_smc *smc, int event)
 
oldstate = smc->mib.fddiSMTCF_State ;
do {
-   DB_CFM("CFM : state %s%s",
-   (smc->mib.fddiSMTCF_State & AFLAG) ? "ACTIONS " : "",
-   cfm_states[smc->mib.fddiSMTCF_State & ~AFLAG]) ;
-   DB_CFM(" event %s\n",cfm_events[event],0) ;
+   DB_CFM("CFM : state %s%s event %s",
+  smc->mib.fddiSMTCF_State & AFLAG ? "ACTIONS " : "",
+  cfm_states[smc->mib.fddiSMTCF_State & ~AFLAG],
+  cfm_events[event]);
state = smc->mib.fddiSMTCF_State ;
cfm_fsm(smc,event) ;
event = 0 ;
@@ -297,7 +295,7 @@ static void cfm_fsm(struct s_smc *smc, int cmd)
queue_event(smc,EVENT_RMT,RM_JOIN) ;/* signal RMT */
/* Don't do the WC-Flag changing here */
ACTIONS_DONE() ;
-   DB_CFMN(1,"CFM : %s\n",cfm_states[smc->mib.fddiSMTCF_State],0) ;
+   DB_CFMN(1, "CFM : %s", cfm_states[smc->mib.fddiSMTCF_State]);
break;
case SC0_ISOLATED :
/*SC07*/
@@ -338,7 +336,7 @@ static void cfm_fsm(struct s_smc *smc, int cmd)
queue_event(smc,EVENT_RMT,RM_JOIN) ;/* signal RMT */
}
ACTIONS_DONE() ;
-   DB_CFMN(1,"CFM : %s\n",cfm_states[smc->mib.fddiSMTCF_State],0) ;
+   DB_CFMN(1, "CFM : %s", cfm_states[smc->mib.fddiSMTCF_State]);
break ;
case SC9_C_WRAP_A :
/*SC10*/
@@ -403,7 +401,7 @@ static void cfm_fsm(struct s_smc *smc, int cmd)
queue_event(smc,EVENT_RMT,RM_JOIN) ;/* signal RMT */
}
ACTIONS_DONE() ;
-   DB_CFMN(1,"CFM : %s\n",cfm_states[smc->mib.fddiSMTCF_State],0) ;
+   DB_CFMN(1, "CFM : %s", cfm_states[smc->mib.fddiSMTCF_State]);
break ;
case SC10_C_WRAP_B :
/*SC20*/
@@ -448,7 +446,7 @@ static void cfm_fsm(struct s_smc *smc, int cmd)
smc->r.rm_join = TRUE ;
queue_event(smc,EVENT_RMT,RM_JOIN) ;/* signal RMT */
ACTIONS_DONE() ;
-   DB_CFMN(1,"CFM : %s\n",cfm_states[smc->mib.fddiSMTCF_State],0) ;
+   DB_CFMN(1, "CFM : %s", cfm_states[smc->mib.fddiSMTCF_State]);
break ;
case SC4_THRU_A :
/*SC41*/
@@ -481,7 +479,7 @@ static void cfm_fsm(struct s_smc *smc, int cmd)
smc->r.rm_join = TRUE ;
queue_event(smc,EVENT_RMT,RM_JOIN) ;/* signal RMT */
ACTIONS_DONE() ;
-   DB_CFMN(1,"CFM : %s\n",cfm_states[smc->mib.fddiSMTCF_State],0) ;
+   DB_CFMN(1, "CFM : %s", cfm_states[smc->mib.fddiSMTCF_State]);
break ;
case SC5_THRU_B :
/*SC51*/
@@ -519,7 +517,7 @@ static void cfm_fsm(struct s_smc *smc, int cmd)
queue_event(smc,EVENT_RMT,RM_JOIN) ;/* signal RMT */
}
ACTIONS_DONE() ;
-   DB_CFMN(1,"CFM : %s\n",cfm_states[smc->mib

[ata] bea5b158ff WARNING: CPU: 0 PID: 1 at drivers/ata/libata-core.c:6482 ata_port_detach

2016-12-21 Thread Fengguang Wu
Greetings,

Here is an libata WARNING triggered by Rob's test patch.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit bea5b158ff0da9c7246ff391f754f5f38e34577a
Author: Rob Herring 
AuthorDate: Thu Aug 11 10:20:58 2016 -0500
Commit: Greg Kroah-Hartman 
CommitDate: Wed Aug 31 15:13:55 2016 +0200

 driver core: add test of driver remove calls during probe
 
 In recent discussions on ksummit-discuss[1], it was suggested to do a
 sequence of probe, remove, probe for testing driver remove paths. This
 adds a kconfig option for said test.
 
 [1] 
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003459.html
 
 Suggested-by: Arnd Bergmann 
 Cc: Greg Kroah-Hartman 
 Signed-off-by: Rob Herring 
 Signed-off-by: Greg Kroah-Hartman 

+---++++
|   | cebf8fd169 | 
bea5b158ff | e575a3b48c |
+---++++
| boot_successes| 63 | 0
  | 0  |
| boot_failures | 0  | 22   
  | 13 |
| WARNING:at_drivers/ata/libata-core.c:#ata_port_detach | 0  | 21   
  | 13 |
| calltrace:piix_init   | 0  | 21   
  ||
| calltrace:async_run_entry_fn  | 0  | 22   
  ||
| WARNING:at_include/linux/kref.h:#kobject_get  | 0  | 22   
  | 13 |
| WARNING:at_fs/sysfs/group.c:#sysfs_remove_group   | 0  | 16   
  | 9  |
| general_protection_fault:#[##]DEBUG_PAGEALLOC | 0  | 21   
  | 13 |
| RIP:kstrdup   | 0  | 17   
  ||
| Kernel_panic-not_syncing:Fatal_exception  | 0  | 21   
  | 13 |
| RIP:vsnprintf | 0  | 4
  ||
| calltrace:i8042_init  | 0  | 1
  ||
| calltrace:serio_handle_event  | 0  | 1
  ||
| BUG:unable_to_handle_kernel   | 0  | 0
  | 2  |
| Oops  | 0  | 0
  | 2  |
+---++++

[   10.651919] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc208 irq 15
[   10.816182] ata2.01: NODEV after polling detection
[   10.817008] [ cut here ]
[   10.818022] WARNING: CPU: 0 PID: 1 at drivers/ata/libata-core.c:6482 
ata_port_detach+0x87/0x127
[   10.819174] Modules linked in:
[   10.819531] CPU: 0 PID: 1 Comm: swapper Not tainted 4.8.0-rc4-3-gbea5b15 
#1
[   10.820322] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[   10.821417]  820ccb54 88001ccdfb78 813e8456 
88001ccdfbc8
[   10.822264]  810a2873  00091e3ea5c0 
810cf5c4
[   10.823128]  880015c2 0001 88001cd11000 
88001cd113a0
[   10.823977] Call Trace:
[   10.824251]  [] dump_stack+0x19/0x1b
[   10.824811]  [] __warn+0xd5/0xf3
[   10.825330]  [] ? woken_wake_function+0x13/0x13
[   10.825992]  [] warn_slowpath_null+0x1d/0x1f
[   10.826628]  [] ata_port_detach+0x87/0x127
[   10.827240]  [] ata_host_detach+0x25/0x31
[   10.827848]  [] ata_pci_remove_one+0x15/0x17
[   10.828476]  [] piix_remove_one+0x38/0x3c
[   10.829075]  [] pci_device_remove+0x4e/0xf7
[   10.829700]  [] really_probe+0x170/0x2fa
[   10.830294]  [] ? trace_hardirqs_on+0xd/0xf
[   10.830920]  [] driver_probe_device+0x49/0x77
[   10.831561]  [] __driver_attach+0x76/0x9c
[   10.832161]  [] ? driver_probe_device+0x77/0x77
[   10.832819]  [] ? driver_probe_device+0x77/0x77
[   10.833477]  [] bus_for_each_dev+0x5b/0x99
[   10.834085]  [] driver_attach+0x1e/0x20
[   10.834672]  [] bus_add_driver+0xf0/0x1e8
[   10.835275]  [] driver_register+0xad/0xe5
[   10.835894]  [] __pci_register_driver+0x68/0x6f
[   10.836557]  [] ? nvme_core_init+0x88/0x88
[   10.837167]  [] ? inic_pci_driver_init+0x1b/0x1b
[   10.837833]  [] piix_init+0x19/0x29
[   10.838380]  [] do_one_initcall+0x8b/0x150
[   10.838994]  [] do_basic_setup+0xa2/0xc5
[   10.839587]  [] ? kernel_init_freeable+0xec/0xec
[   10.840253]  [] kernel_init_freeable+0x77/0xec
[   10.840906]  [] kernel_init+0xe/0xf3
[   10.841465]  [] ret_from_fork+0x1f/0x40
[   10.842051]  [] ? rest_init+0x13a/0x13a
[   10.842641] ---[ end trace b67e72a9ee09950c ]---
[   10.843181] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100

gi

[drm] bea5b158ff WARNING: CPU: 1 PID: 1 at drivers/gpu/drm/drm_crtc.c:5776 drm_mode_config_cleanup

2016-12-21 Thread Fengguang Wu
Greetings,

Here is another DRM WARNING triggered by Rob's test patch.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit bea5b158ff0da9c7246ff391f754f5f38e34577a
Author: Rob Herring 
AuthorDate: Thu Aug 11 10:20:58 2016 -0500
Commit: Greg Kroah-Hartman 
CommitDate: Wed Aug 31 15:13:55 2016 +0200

 driver core: add test of driver remove calls during probe
 
 In recent discussions on ksummit-discuss[1], it was suggested to do a
 sequence of probe, remove, probe for testing driver remove paths. This
 adds a kconfig option for said test.
 
 [1] 
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003459.html
 
 Suggested-by: Arnd Bergmann 
 Cc: Greg Kroah-Hartman 
 Signed-off-by: Rob Herring 
 Signed-off-by: Greg Kroah-Hartman 

+---++++
|   | 
cebf8fd169 | bea5b158ff | 62f1cfcae8 |
+---++++
| boot_successes| 63
 | 0  | 0  |
| boot_failures | 0 
 | 22 | 13 |
| WARNING:at_drivers/gpu/drm/drm_crtc.c:#drm_mode_config_cleanup| 0 
 | 22 ||
| calltrace:bochs_init  | 0 
 | 22 | 13 |
| calltrace:init| 0 
 | 22 ||
| BUG:unable_to_handle_kernel   | 0 
 | 22 | 13 |
| Oops  | 0 
 | 22 | 13 |
| EIP_is_at_i2c_do_del_adapter  | 0 
 | 17 ||
| calltrace:of_unittest | 0 
 | 17 ||
| Kernel_panic-not_syncing:Fatal_exception  | 0 
 | 22 | 13 |
| EIP_is_at_kobject_get | 0 
 | 3  ||
| WARNING:at_drivers/usb/core/urb.c:#usb_submit_urb | 0 
 | 3  ||
| calltrace:hub_init_func3  | 0 
 | 1  ||
| EIP_is_at_kernfs_link_sibling | 0 
 | 2  ||
| calltrace:pm_runtime_work | 0 
 | 1  ||
| calltrace:hub_init_func2  | 0 
 | 1  ||
| WARNING:at_drivers/gpu/drm/drm_mode_config.c:#drm_mode_config_cleanup | 0 
 | 0  | 13 |
| EIP_is_at_bochs_kms_fini  | 0 
 | 0  | 13 |
+---++++

[9.942154] bochsdrmfb: enable CONFIG_FB_LITTLE_ENDIAN to support this 
framebuffer
[9.942868] [drm] Initialized bochs-drm 1.0.0 20130925 for :00:02.0 on 
minor 1
[9.943620] [ cut here ]
[9.944051] WARNING: CPU: 1 PID: 1 at drivers/gpu/drm/drm_crtc.c:5776 
drm_mode_config_cleanup+0x1a8/0x1c0
[9.944982] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.8.0-rc4-3-gbea5b15 #1
[9.945612]  00200286 00200286 92e5fde8 9382d4e4  946e08d4 92e5fe00 
93642829
[9.946316]  1690 8d4e861c 8d4e8008 8d4e84dc 92e5fe14 936428b1 0009 

[9.947024]   92e5fe28 939cada8 8d4f0008 8d4e8008 94803fc0 92e5fe34 
93c703ce
[9.947711] Call Trace:
[9.947911]  [<9382d4e4>] dump_stack+0x58/0x74
[9.948287]  [<93642829>] __warn+0xb9/0xd0
[9.948634]  [<936428b1>] warn_slowpath_null+0x11/0x20
[9.949042]  [<939cada8>] drm_mode_config_cleanup+0x1a8/0x1c0
[9.949603]  [<93c703ce>] bochs_kms_fini+0x1e/0x30
[9.949983]  [<93c6f338>] bochs_unload+0x18/0x40
[9.950370]  [<939c0964>] drm_dev_unregister+0x24/0xa0
[9.950809]  [<939c09fa>] drm_put_dev+0x1a/0x60
[9.951166]  [<93c6f31e>] bochs_pci_remove+0xe/0x10
[9.951568]  [<93880088>] pci_device_remove+0x28/0xb0
[9.952006]  [<93c8355b>] driver_probe_device+0x9b/0x2e0
[9.952470]  [<93c83831>] __driver_attach+0x91/0xa0
[9.952922]  [<93c837a0>] ? driver_probe_device+0x2e0/0x2e0
[9.953400]  [<93c81bdf>] bus_for_each_dev+0x4f/0x80
[9.953879]  [<93c83924>] driver_attach+0x14/0x20
[9.954287]  [<93c837a0>] ? driver_probe_device+0x2e0/0x2e0
[9.954747]  [<93c82482>] bus_add_

[PATCH v3] mmc: core: Export device lifetime information through sysfs

2016-12-21 Thread Jungseung Lee
In the eMMC 5.0 version of the spec, several EXT_CSD fields about
device lifetime are added.

 - Two types of estimated indications reflected by averaged wear out of memory
 - An indication reflected by average reserved blocks

Export the information through sysfs.

Signed-off-by: Jungseung Lee 
Reviewed-by: Jaehoon Chung 
Reviewed-by: Shawn Lin 
---
 drivers/mmc/core/mmc.c   | 12 
 include/linux/mmc/card.h |  3 +++
 include/linux/mmc/mmc.h  |  3 +++
 3 files changed, 18 insertions(+)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index b61b52f9..c0e2507 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -617,6 +617,12 @@ static int mmc_decode_ext_csd(struct mmc_card *card, u8 
*ext_csd)
card->ext_csd.ffu_capable =
(ext_csd[EXT_CSD_SUPPORTED_MODE] & 0x1) &&
!(ext_csd[EXT_CSD_FW_CONFIG] & 0x1);
+
+   card->ext_csd.pre_eol_info = ext_csd[EXT_CSD_PRE_EOL_INFO];
+   card->ext_csd.device_life_time_est_typ_a =
+   ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A];
+   card->ext_csd.device_life_time_est_typ_b =
+   ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B];
}
 
/* eMMC v5.1 or later */
@@ -764,6 +770,10 @@ MMC_DEV_ATTR(manfid, "0x%06x\n", card->cid.manfid);
 MMC_DEV_ATTR(name, "%s\n", card->cid.prod_name);
 MMC_DEV_ATTR(oemid, "0x%04x\n", card->cid.oemid);
 MMC_DEV_ATTR(prv, "0x%x\n", card->cid.prv);
+MMC_DEV_ATTR(pre_eol_info, "%02x\n", card->ext_csd.pre_eol_info);
+MMC_DEV_ATTR(life_time, "0x%02x 0x%02x\n",
+   card->ext_csd.device_life_time_est_typ_a,
+   card->ext_csd.device_life_time_est_typ_b);
 MMC_DEV_ATTR(serial, "0x%08x\n", card->cid.serial);
 MMC_DEV_ATTR(enhanced_area_offset, "%llu\n",
card->ext_csd.enhanced_area_offset);
@@ -817,6 +827,8 @@ static struct attribute *mmc_std_attrs[] = {
&dev_attr_name.attr,
&dev_attr_oemid.attr,
&dev_attr_prv.attr,
+   &dev_attr_pre_eol_info.attr,
+   &dev_attr_life_time.attr,
&dev_attr_serial.attr,
&dev_attr_enhanced_area_offset.attr,
&dev_attr_enhanced_area_size.attr,
diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
index 95d69d4..00449e5 100644
--- a/include/linux/mmc/card.h
+++ b/include/linux/mmc/card.h
@@ -121,6 +121,9 @@ struct mmc_ext_csd {
u8  raw_pwr_cl_ddr_200_360; /* 253 */
u8  raw_bkops_status;   /* 246 */
u8  raw_sectors[4]; /* 212 - 4 bytes */
+   u8  pre_eol_info;   /* 267 */
+   u8  device_life_time_est_typ_a; /* 268 */
+   u8  device_life_time_est_typ_b; /* 269 */
 
unsigned intfeature_support;
 #define MMC_DISCARD_FEATUREBIT(0)  /* CMD38 feature */
diff --git a/include/linux/mmc/mmc.h b/include/linux/mmc/mmc.h
index 672730a..a074082 100644
--- a/include/linux/mmc/mmc.h
+++ b/include/linux/mmc/mmc.h
@@ -339,6 +339,9 @@ struct _mmc_csd {
 #define EXT_CSD_CACHE_SIZE 249 /* RO, 4 bytes */
 #define EXT_CSD_PWR_CL_DDR_200_360 253 /* RO */
 #define EXT_CSD_FIRMWARE_VERSION   254 /* RO, 8 bytes */
+#define EXT_CSD_PRE_EOL_INFO   267 /* RO */
+#define EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A 268 /* RO */
+#define EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B 269 /* RO */
 #define EXT_CSD_CMDQ_DEPTH 307 /* RO */
 #define EXT_CSD_CMDQ_SUPPORT   308 /* RO */
 #define EXT_CSD_SUPPORTED_MODE 493 /* RO */
-- 
2.10.1



Re: [PATCH] RDS: use rb_entry()

2016-12-21 Thread Doug Ledford
On 12/20/2016 9:02 AM, Geliang Tang wrote:
> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.
> 
> Signed-off-by: Geliang Tang 
> ---
>  net/rds/rdma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> index 4c93bad..ea96114 100644
> --- a/net/rds/rdma.c
> +++ b/net/rds/rdma.c
> @@ -135,7 +135,7 @@ void rds_rdma_drop_keys(struct rds_sock *rs)
>   /* Release any MRs associated with this socket */
>   spin_lock_irqsave(&rs->rs_rdma_lock, flags);
>   while ((node = rb_first(&rs->rs_rdma_keys))) {
> - mr = container_of(node, struct rds_mr, r_rb_node);
> + mr = rb_entry(node, struct rds_mr, r_rb_node);
>   if (mr->r_trans == rs->rs_transport)
>   mr->r_invalidate = 0;
>   rb_erase(&mr->r_rb_node, &rs->rs_rdma_keys);
> 

Dave, I know you already took this, but am I the only one that thinks
these patches are a step backwards?  They claim to promote readability,
but I disagree that they actually do so.  The original code used the
container_of() API with three specific arguments that made sense in the
context of a function named container_of().  The new API uses the exact
same three arguments, but they no longer make the same sense just
comparing the arguments to the function name.  The relationship has been
lost.  And on top of that, if you do this for all of the standard things
in the kernel (rb_entry, list_item, etc.), then you've created a myriad
of APIs that all duplicate one functional API that made sense.  Is it
really an improvement to go from one generic function that makes sense
and works everywhere to multiple implementations of basically just name
wrappers that mean you now need to know many aliases for the same
function?  How do we justify API bloat like this as better or easier to
read when it requires useless API memorization?

-- 
Doug Ledford 
GPG Key ID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3 03/12] locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after

2016-12-21 Thread zhoucm1



On 2016年12月22日 02:46, Nicolai Hähnle wrote:

+static inline bool __sched
+__ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
+{
+   return a->stamp - b->stamp <= LONG_MAX &&
+  (a->stamp != b->stamp || a > b);
I want to ask a stupid question, why a can compare with b? They are 
pointers of structure. Isn't stamp enough for compare?


Thanks,
David Zhou


Re: [PATCH] platform: Print the resource range if device failed to claim

2016-12-21 Thread Chen Yu
Hi,
On Thu, Dec 22, 2016 at 02:19:22AM +0100, Rafael J. Wysocki wrote:
> [CC Mika and linux-acpi]
> 
> On Wednesday, December 21, 2016 05:24:55 PM Chen Yu wrote:
> > Sometimes we have the following error message:
> >  platform MSFT0101:00: failed to claim resource 1
> >  acpi MSFT0101:00: platform device creation failed: -16
> > But there is not enough information to figure out which resource range
> > failed to claim.
> > 
> > Thus print the resource range at first-place thus /proc/iomem or
> > ioports should tell us who already claimed this resource, then
> > the driver bug or incorrect resource assignment which is running
> > into this conflict can be diagnosed:
> >  platform MSFT0101:00: failed to claim resource 1: [mem 
> > 0xfed4-0xfed40fff]
> >  acpi MSFT0101:00: platform device creation failed: -16
> > 
> > Suggested-by: Len Brown 
> > Reported-by: Wendy Wang 
> > Signed-off-by: Chen Yu 
> > ---
> >  drivers/base/platform.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > index c4af003..22a6430 100644
> > --- a/drivers/base/platform.c
> > +++ b/drivers/base/platform.c
> > @@ -396,7 +396,7 @@ int platform_device_add(struct platform_device *pdev)
> > }
> >  
> > if (p && insert_resource(p, r)) {
> > -   dev_err(&pdev->dev, "failed to claim resource %d\n", i);
> > +   dev_err(&pdev->dev, "failed to claim resource %d: 
> > %pR\n", i, r);
> 
> Do we still need the resource number?
> 
Seems we don't need the resource number anymore.
(As platform.c was written earlier than 2005, and the support of %pR was 
introduced
later in 2008.)
> > ret = -EBUSY;
> > goto failed;
> > }
> > 
> 
> Thanks,
> Rafael
> 

Thanks,
Yu


Re: [PATCH] x86/crash: Update the stale comment in reserve_crashkernel()

2016-12-21 Thread Baoquan He
On 12/15/16 at 11:30am, Xunlei Pang wrote:
> CRASH_KERNEL_ADDR_MAX was missing for a long time, update it
> with more detailed explanation.
> 
> Cc: Robert LeBlanc 
> Cc: Baoquan He 
> Signed-off-by: Xunlei Pang 
> ---
>  arch/x86/kernel/setup.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 9c337b0..79ee507 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -575,7 +575,10 @@ static void __init reserve_crashkernel(void)
>   /* 0 means: find the address automatically */
>   if (crash_base <= 0) {
>   /*
> -  *  kexec want bzImage is below CRASH_KERNEL_ADDR_MAX
> +  * Set CRASH_ADDR_LOW_MAX upper bound for crash range
> +  * as old kexec-tools loads bzImage below that, unless
> +  * "size,high" or "size@offset"(nonzero offset, see the
> +  * else leg below) is specified.

Yes, this is a good catch. It might be better to add comment only about
this if branch. If you want to say more about the upper bounds, better
discuss with Robert LeBlanc to see if it can be detailed in kdump.txt.

Also please CC to x86 maintainers, or akpm. They can help merge this.

Thanks
Baoquan


Re: [PATCH 1/2] soc: ti: Use remoteproc auto_boot feature

2016-12-21 Thread Suman Anna
Hi Sarang,

On 12/15/2016 06:03 PM, Sarangdhar Joshi wrote:
> The function wkup_m3_rproc_boot_thread waits for asynchronous
> firmware loading to complete successfully before calling
> rproc_boot(). The same can be achieved by just setting
> rproc->auto_boot flag. Change this. As a result this change
> removes wkup_m3_rproc_boot_thread and moves m3_ipc->sync_complete
> initialization to the wkup_m3_ipc_probe().
> 
> Other than the current usage, the firmware_loading_complete is
> only used in rproc_del() where it's no longer needed.  This
> change is in preparation for removing firmware_loading_complete
> completely.

Based on the comments so far, I am assuming that you are dropping this
series.

In any case, this series did break our PM stack. We definitely don't
want to auto-boot the wkup_m3_rproc device, that responsibility will
need to stay with the wkup_m3_ipc driver.

regards
Suman

> 
> CC: Dave Gerlach 
> CC: Suman Anna 
> CC: Bjorn Andersson 
> Signed-off-by: Sarangdhar Joshi 
> ---
> 
> Hi Suman,
> 
> Unfortunately, I don't have a TI device and couldn't test this
> change. Is it possible for you to test this change on TI device?
> 
> Thanks in advance.
> 
> Regards,
> Sarang
> 
>  drivers/remoteproc/wkup_m3_rproc.c |  2 +-
>  drivers/soc/ti/wkup_m3_ipc.c   | 35 ++-
>  2 files changed, 3 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/remoteproc/wkup_m3_rproc.c 
> b/drivers/remoteproc/wkup_m3_rproc.c
> index 18175d0..79ea022 100644
> --- a/drivers/remoteproc/wkup_m3_rproc.c
> +++ b/drivers/remoteproc/wkup_m3_rproc.c
> @@ -167,7 +167,7 @@ static int wkup_m3_rproc_probe(struct platform_device 
> *pdev)
>   goto err;
>   }
>  
> - rproc->auto_boot = false;
> + rproc->auto_boot = true;
>  
>   wkupm3 = rproc->priv;
>   wkupm3->rproc = rproc;
> diff --git a/drivers/soc/ti/wkup_m3_ipc.c b/drivers/soc/ti/wkup_m3_ipc.c
> index 8823cc8..31090d70 100644
> --- a/drivers/soc/ti/wkup_m3_ipc.c
> +++ b/drivers/soc/ti/wkup_m3_ipc.c
> @@ -17,7 +17,6 @@
>  
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -365,22 +364,6 @@ void wkup_m3_ipc_put(struct wkup_m3_ipc *m3_ipc)
>  }
>  EXPORT_SYMBOL_GPL(wkup_m3_ipc_put);
>  
> -static void wkup_m3_rproc_boot_thread(struct wkup_m3_ipc *m3_ipc)
> -{
> - struct device *dev = m3_ipc->dev;
> - int ret;
> -
> - wait_for_completion(&m3_ipc->rproc->firmware_loading_complete);
> -
> - init_completion(&m3_ipc->sync_complete);
> -
> - ret = rproc_boot(m3_ipc->rproc);
> - if (ret)
> - dev_err(dev, "rproc_boot failed\n");
> -
> - do_exit(0);
> -}
> -
>  static int wkup_m3_ipc_probe(struct platform_device *pdev)
>  {
>   struct device *dev = &pdev->dev;
> @@ -388,7 +371,6 @@ static int wkup_m3_ipc_probe(struct platform_device *pdev)
>   phandle rproc_phandle;
>   struct rproc *m3_rproc;
>   struct resource *res;
> - struct task_struct *task;
>   struct wkup_m3_ipc *m3_ipc;
>  
>   m3_ipc = devm_kzalloc(dev, sizeof(*m3_ipc), GFP_KERNEL);
> @@ -402,6 +384,8 @@ static int wkup_m3_ipc_probe(struct platform_device *pdev)
>   return PTR_ERR(m3_ipc->ipc_mem_base);
>   }
>  
> + init_completion(&m3_ipc->sync_complete);
> +
>   irq = platform_get_irq(pdev, 0);
>   if (!irq) {
>   dev_err(&pdev->dev, "no irq resource\n");
> @@ -449,25 +433,10 @@ static int wkup_m3_ipc_probe(struct platform_device 
> *pdev)
>  
>   m3_ipc->ops = &ipc_ops;
>  
> - /*
> -  * Wait for firmware loading completion in a thread so we
> -  * can boot the wkup_m3 as soon as it's ready without holding
> -  * up kernel boot
> -  */
> - task = kthread_run((void *)wkup_m3_rproc_boot_thread, m3_ipc,
> -"wkup_m3_rproc_loader");
> -
> - if (IS_ERR(task)) {
> - dev_err(dev, "can't create rproc_boot thread\n");
> - goto err_put_rproc;
> - }
> -
>   m3_ipc_state = m3_ipc;
>  
>   return 0;
>  
> -err_put_rproc:
> - rproc_put(m3_rproc);
>  err_free_mbox:
>   mbox_free_channel(m3_ipc->mbox);
>   return ret;
> 



Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 3:49 AM, Jason A. Donenfeld  wrote:
> I did have two objections to this. The first was that my SipHash
> construction is faster. But in any case, they're both faster than the
> current MD5, so it's just extra rice. The second, and the more
> important one, was that batching entropy up like this means that 32
> calls will be really fast, and then the 33rd will be slow, since it
> has to do a whole ChaCha round, because get_random_bytes must be
> called to refill the batch. Since get_random_long is called for every
> process startup, I didn't really like there being inconsistent
> performance on process startup. And I'm pretty sure that one ChaCha
> whole block is slower than computing MD5, even though it lasts 32
> times as long, though I need to measure this. But maybe that's dumb in
> the end? Are these concerns that should point us toward the
> determinism (and speed) of SipHash? Are these concerns that don't
> matter and so we should roll with the simplicity of reusing ChaCha?

I ran some measurements in order to quantify what I'm talking about.
Repeatedly running md5_transform is about 2.3 times faster than
repeatedly running extract_crng. What does this mean?

One call to extract_crng gives us 32 times as many longs as one call
to md5_transform. This means that spread over 32 process creations,
chacha will be 13.9 times faster. However, every 32nd process will
take 2.3 times as long to generate its ASLR value as it would with the
old md5_transform code.

Personally, I don't think that 2.3 is a big deal. And I really like
how much this simplifies the analysis.
But if it's a big deal to you, then we can continue to discuss my
SipHash construction, which gives faster and more consistent
performance, at the cost of a more complicated and probably less
impressive security analysis.

Jason


[PATCH] fixed spelling error in TODO file for dgnc driver

2016-12-21 Thread Scott Matheina
fixed a missing letter in the TODO file 'unneeded'

Signed-off-by: Scott Matheina 

---
 drivers/staging/dgnc/TODO | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/dgnc/TODO b/drivers/staging/dgnc/TODO
index 0e0825b..6c91bbd 100644
--- a/drivers/staging/dgnc/TODO
+++ b/drivers/staging/dgnc/TODO
@@ -4,7 +4,7 @@
   own error message. Adding an extra one is useless.
 * use goto statements for error handling when appropriate
 * there is a lot of unnecessary code in the driver. It was
-  originally a standalone driver. Remove uneeded code.
+  originally a standalone driver. Remove unneeded code.

 Please send patches to Greg Kroah-Hartman  and
 Cc: Lidza Louina 
--
2.7.4



Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
Hi Andy & Hannes,

On Thu, Dec 22, 2016 at 3:07 AM, Hannes Frederic Sowa
 wrote:
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.

On Thu, Dec 22, 2016 at 3:09 AM, Andy Lutomirski  wrote:
> Unless I've misunderstood it, Ted's proposal causes get_random_int()
> to return bytes straight from urandom (effectively), which should make
> it very strong.  And if urandom is competitively fast now, I don't see
> the problem.  ChaCha20 is designed for speed, after all.

Funny -- while you guys were sending this back & forth, I was writing
my reply to Andy which essentially arrives at the same conclusion.
Given that we're all arriving to the same thing, and that Ted shot in
this direction long before we all did, I'm leaning toward abandoning
SipHash for the de-MD5-ification of get_random_int/long, and working
on polishing Ted's idea into something shiny for this patchset.

I did have two objections to this. The first was that my SipHash
construction is faster. But in any case, they're both faster than the
current MD5, so it's just extra rice. The second, and the more
important one, was that batching entropy up like this means that 32
calls will be really fast, and then the 33rd will be slow, since it
has to do a whole ChaCha round, because get_random_bytes must be
called to refill the batch. Since get_random_long is called for every
process startup, I didn't really like there being inconsistent
performance on process startup. And I'm pretty sure that one ChaCha
whole block is slower than computing MD5, even though it lasts 32
times as long, though I need to measure this. But maybe that's dumb in
the end? Are these concerns that should point us toward the
determinism (and speed) of SipHash? Are these concerns that don't
matter and so we should roll with the simplicity of reusing ChaCha?

Jason


Re: [PATCH v3 3/4] ARM: Add support for CONFIG_DEBUG_VIRTUAL

2016-12-21 Thread Laura Abbott
On 12/09/2016 03:36 PM, Florian Fainelli wrote:
> x86 has an option: CONFIG_DEBUG_VIRTUAL to do additional checks on
> virt_to_phys calls. The goal is to catch users who are calling
> virt_to_phys on non-linear addresses immediately. This includes caller
> using __virt_to_phys() on image addresses instead of __pa_symbol(). This
> is a generally useful debug feature to spot bad code (particulary in
> drivers).
> 
> Signed-off-by: Florian Fainelli 
> ---
>  arch/arm/Kconfig  |  1 +
>  arch/arm/include/asm/memory.h | 16 --
>  arch/arm/mm/Makefile  |  1 +
>  arch/arm/mm/physaddr.c| 51 
> +++
>  4 files changed, 67 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm/mm/physaddr.c
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index b5d529fdffab..5e66173c5787 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -2,6 +2,7 @@ config ARM
>   bool
>   default y
>   select ARCH_CLOCKSOURCE_DATA
> + select ARCH_HAS_DEBUG_VIRTUAL
>   select ARCH_HAS_DEVMEM_IS_ALLOWED
>   select ARCH_HAS_ELF_RANDOMIZE
>   select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
> index bee7511c5098..d90300193adf 100644
> --- a/arch/arm/include/asm/memory.h
> +++ b/arch/arm/include/asm/memory.h
> @@ -213,7 +213,7 @@ extern const void *__pv_table_begin, *__pv_table_end;
>   : "r" (x), "I" (__PV_BITS_31_24)\
>   : "cc")
>  
> -static inline phys_addr_t __virt_to_phys(unsigned long x)
> +static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x)
>  {
>   phys_addr_t t;
>  
> @@ -245,7 +245,7 @@ static inline unsigned long __phys_to_virt(phys_addr_t x)
>  #define PHYS_OFFSET  PLAT_PHYS_OFFSET
>  #define PHYS_PFN_OFFSET  ((unsigned long)(PHYS_OFFSET >> PAGE_SHIFT))
>  
> -static inline phys_addr_t __virt_to_phys(unsigned long x)
> +static inline phys_addr_t __virt_to_phys_nodebug(unsigned long x)
>  {
>   return (phys_addr_t)x - PAGE_OFFSET + PHYS_OFFSET;
>  }
> @@ -261,6 +261,16 @@ static inline unsigned long __phys_to_virt(phys_addr_t x)
>   unsigned long)(kaddr) - PAGE_OFFSET) >> PAGE_SHIFT) + \
>PHYS_PFN_OFFSET)
>  
> +#define __pa_symbol_nodebug(x)   __virt_to_phys_nodebug((x))
> +
> +#ifdef CONFIG_DEBUG_VIRTUAL
> +extern phys_addr_t __virt_to_phys(unsigned long x);
> +extern phys_addr_t __phys_addr_symbol(unsigned long x);
> +#else
> +#define __virt_to_phys(x)__virt_to_phys_nodebug(x)
> +#define __phys_addr_symbol(x)__pa_symbol_nodebug(x)
> +#endif
> +
>  /*
>   * These are *only* valid on the kernel direct mapped RAM memory.
>   * Note: Drivers should NOT use these.  They are the wrong
> @@ -283,9 +293,11 @@ static inline void *phys_to_virt(phys_addr_t x)
>   * Drivers should NOT use these either.
>   */
>  #define __pa(x)  __virt_to_phys((unsigned long)(x))
> +#define __pa_symbol(x)   __phys_addr_symbol(RELOC_HIDE((unsigned 
> long)(x), 0))
>  #define __va(x)  ((void 
> *)__phys_to_virt((phys_addr_t)(x)))
>  #define pfn_to_kaddr(pfn)__va((phys_addr_t)(pfn) << PAGE_SHIFT)
>  
> +
>  extern long long arch_phys_to_idmap_offset;
>  
>  /*
> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
> index e8698241ece9..b3dea80715b4 100644
> --- a/arch/arm/mm/Makefile
> +++ b/arch/arm/mm/Makefile
> @@ -14,6 +14,7 @@ endif
>  
>  obj-$(CONFIG_ARM_PTDUMP) += dump.o
>  obj-$(CONFIG_MODULES)+= proc-syms.o
> +obj-$(CONFIG_DEBUG_VIRTUAL)  += physaddr.o
>  
>  obj-$(CONFIG_ALIGNMENT_TRAP) += alignment.o
>  obj-$(CONFIG_HIGHMEM)+= highmem.o
> diff --git a/arch/arm/mm/physaddr.c b/arch/arm/mm/physaddr.c
> new file mode 100644
> index ..0288760306ce
> --- /dev/null
> +++ b/arch/arm/mm/physaddr.c
> @@ -0,0 +1,51 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "mm.h"
> +
> +static inline bool __virt_addr_valid(unsigned long x)
> +{
> + /* high_memory does not get immediately defined, and there
> +  * are early callers of __pa() against PAGE_OFFSET, just catch
> +  * these here, then do normal checks, with the exception of
> +  * MAX_DMA_ADDRESS.
> +  */
> + if ((x >= PAGE_OFFSET && !high_memory) ||
> +(x >= PAGE_OFFSET &&
> + high_memory && x < (unsigned long)high_memory) ||
> + x == MAX_DMA_ADDRESS)
> + return true;

This is difficult to read, it's easier to read if it's split out:

if (!high_memory && x >= PAGE_OFFSET)
return true;

if (high_memory && x >= PAGE_OFFSET && x < (unsigned long) high_memory)
return true;

if (x == MAX_DMA_ADDRESS)
return true;

I'm really not a fan of the check for MAX_DMA_ADDRESS. arm64 gets away
with this because it uses the asm-generic version of dma.h whi

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread Jason A. Donenfeld
> On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin
>> After some thinking, I still like the "state-preserving" construct
>> that's equivalent to the current MD5 code.  Yes, we could just do
>> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
>> preserve a bit more.
>>
>> It requires library support from the SipHash code to return the full
>> SipHash state, but I hope that's a fair thing to ask for.

This is not a good idea. If I understand correctly, the idea here is
to just keep around SipHash's internal state variables, and chain them
over to the next call, sort of like how md5_transform with the current
code works on the same scratch space. There has been no security
analysis in the literature on this use of the primitive, and I have no
confidence that this is a secure use of the function. Unless somebody
can point me toward a paper I missed or a comment from a real
cryptographer about the specifics of SipHash, I think I'm right to
admonish against this dangerous road.

Let's talk about constructions. And let's only decide on a
construction that we're actually equipped to analyze. Let's definitely
not talk about making our own primitives, or retrofitting nice
primitive's internals into our own Frankenstein.

Alternatively, if I'm wrong, please send me an eprint/arXiv link to a
paper that discusses this use of SipHash.


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
Hi Andy,

On Thu, Dec 22, 2016 at 12:42 AM, Andy Lutomirski  wrote:
> So this is probably good enough, and making it better is hard.  Changing it 
> to:
>
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
>
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.

Ahh, so that's the reasoning behind a similar suggestion of yours in a
previous email. Makes sense to me. I'll include this in the next merge
if we don't come up with a different idea before then. Your reasoning
seems good for it.

Part of what makes this process a bit goofy is that it's not all
together clear what the design goals are. Right now we're going for
"not worse than before", which we've nearly achieved. How good of an
RNG do we want? I'm willing to examine and analyze the security and
performance of all constructions we can come up with. One thing I
don't want to do, however, is start tweaking the primitives themselves
in ways not endorsed by the designers. So, I believe that precludes
things like carrying over SipHash's internal state (like what was done
with MD5), because there hasn't been a formal security analysis of
this like there has with other uses of SipHash. I also don't want to
change any internals of how SipHash actually works. I mention that
because of some of the suggestions on other threads, which make me
rather uneasy.

So with that said, while writing this reply to you, I was
simultaneously reading some other crypto code and was reminded that
there's a variant of SipHash which outputs an additional 64-bits; it's
part of the siphash reference code, which they call the "128-bit
mode". It has the benefit that we can return 64-bits to the caller and
save 64-bits for the chaining key. That way there's no correlation
between the returned secret and the chaining key, which I think would
completely alleviate all of your concerns, and simplify the analysis a
bit.

Here's what it looks like:
https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=46fbe5b408e66b2d16b4447860f8083480e1c08d

The downside is that it takes 4 extra Sip rounds. This puts the
performance still better than MD5, though, and likely still better
than the other batched entropy solution. We could optimize this, I
suppose, by giving it only two parameters -- chaining,
jiffies+entropy+pid -- instead of the current three -- chaining,
jiffies, entropy+pid -- which would then shave off 2 Sip rounds. But I
liked the idea of having a bit more spread in the entropy input field.

Anyway, with this in mind, we now have three possibilities:

1. result = siphash(chaining, entropy, key); chaining += result + entropy
2. result = siphash_extra_output(chaining, entropy, key, &chaining);
3. Ted's batched entropy idea using chacha20

The more I think about this, the more I suspect that we should just
use chacha20. It will still be faster than MD5. I don't like the
non-determinism of it (some processes will start slower than others,
if the batched entropy has run out and ASLR demands more), but I guess
I can live with that. But, most importantly, it greatly simplifies
both the security analysis and what we can promise to callers about
the function. Right now in the comment documentation, we're coy with
callers about the security of the RNG. If we moved to a known
construction like chacha20/get_random_bytes_batched, then we could
just be straight up with a promise that the numbers it returns are
high quality.

Thoughts on 2 and 3, and on 1 vs 2 vs 3?

Jason


Re: [PATCH 0/4] vfio-mdev: Clean namespace and better define ABI

2016-12-21 Thread Alex Williamson
On Thu, 22 Dec 2016 10:11:00 +0800
Jike Song  wrote:

> On 12/22/2016 07:27 AM, Alex Williamson wrote:
> > Cleanup the namespace a bit by prefixing structures with mdev_ and
> > also more concretely define the mdev interface.  Structs with comments
> > defining which fields are private vs public tempts poor behavior,
> > especially for an interface where we expect out of tree vendor drivers.  
> 
> Personally I like this series :)
> 
> Side notes: 1) There is also Documentation to be updated; 2) your mail
> address in Author field is @nuc.home?

Thank you on both points, clearly I wrote these on something other than
my usual system.  Thanks,

Alex

> > 
> > ---
> > 
> > Alex Williamson (4):
> >   vfio-mdev: Remove an unused structure element
> >   vfio-mdev: de-polute the namespace, rename parent_device & parent_ops
> >   vfio-mdev: Make mdev_parent private
> >   vfio-mdev: Make mdev_device private and abstract interfaces
> > 
> > 
> >  drivers/gpu/drm/i915/gvt/kvmgt.c |   22 +++--
> >  drivers/vfio/mdev/mdev_core.c|   64 
> > ++
> >  drivers/vfio/mdev/mdev_private.h |   28 +++--
> >  drivers/vfio/mdev/mdev_sysfs.c   |8 ++---
> >  drivers/vfio/mdev/vfio_mdev.c|   12 ---
> >  include/linux/mdev.h |   54 +++-
> >  samples/vfio-mdev/mtty.c |   28 +
> >  7 files changed, 123 insertions(+), 93 deletions(-)
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> >   



linux-next: Tree for Dec 22

2016-12-21 Thread Stephen Rothwell
Hi all,

Please do not add any material for v4.11 to your linux-next included
branches until after v4.10-rc1 has been released.

There will be no linux-next releases from me between Dec 24 and Jan 2
inclusive (unless I get really bored with my new toys :-)).

Changes since 20161221:

Non-merge commits (relative to Linus' tree): 685
 1103 files changed, 27589 insertions(+), 9147 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(with KALLSYMS_EXTRA_PASS=1) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 246 trees (counting Linus' and 35 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (52bce91165e5 splice: reinstate SIGPIPE/EPIPE handling)
Merging fixes/master (30066ce675d3 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (152b695d7437 builddeb: fix cross-building to 
arm64 producing host-arch debs)
Merging arc-current/for-curr (08fe007968b2 ARC: mm: arc700: Don't assume 2 
colours for aliasing VIPT dcache)
Merging arm-current/fixes (8478132a8784 Revert "arm: move exports to 
definitions")
Merging m68k-current/for-linus (7e251bb21ae0 m68k: Fix ndelay() macro)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (69973b830859 Linux 4.9)
Merging sparc/master (ba6d973f78eb Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging net/master (551cde192343 net: fddi: skfp: use %p format specifier for 
addresses rather than %x)
Merging ipsec/master (bc3913a5378c Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging netfilter/master (053d20f57125 netfilter: nft_payload: mangle ckecksum 
if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set)
Merging ipvs/master (045169816b31 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging wireless-drivers/master (22b68b93ae25 rtlwifi: Fix kernel oops 
introduced with commit e49656147359)
Merging mac80211/master (a17d93ff3a95 mac80211: fix legacy and invalid rx-rate 
report)
Merging sound-current/for-linus (995c6a7fd9b9 ALSA: hiface: Fix M2Tech hiFace 
driver sampling rate change)
Merging pci-current/for-linus (e42010d8207f PCI: Set Read Completion Boundary 
to 128 iff Root Port supports it (_HPX))
Merging driver-core.current/driver-core-linus (cdb98c2698b4 Revert "nvme: add 
support for the Write Zeroes command")
Merging tty.current/tty-linus (cdb98c2698b4 Revert "nvme: add support for the 
Write Zeroes command")
Merging usb.current/usb-linus (cdb98c2698b4 Revert "nvme: add support for the 
Write Zeroes command")
Merging usb-gadget-fixes/fixes (05e78c6933d6 usb: gadget: f_fs: fix wrong 
parenthesis in ffs_func_req_match())
Merging usb-serial-fixes/usb-linus (46490c347df4 USB: serial: option: add dlink 
dwm-158)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (4320f9d4c183 phy: sun4i: check PMU presence when poking 
unknown bit of pmu)
Merging staging.current/staging-linus (cdb98c2698b4 Revert "nvme: add support 
for the Write Zeroes command")
Merging char-misc.current/char-misc-linus (cdb98c2698b4 Revert "nvme: add 
support for the Write Zeroes command")
Merging input-current/for-linus (67626c9

Re: [PATCH 0/4] vfio-mdev: Clean namespace and better define ABI

2016-12-21 Thread Jike Song
On 12/22/2016 07:27 AM, Alex Williamson wrote:
> Cleanup the namespace a bit by prefixing structures with mdev_ and
> also more concretely define the mdev interface.  Structs with comments
> defining which fields are private vs public tempts poor behavior,
> especially for an interface where we expect out of tree vendor drivers.

Personally I like this series :)

Side notes: 1) There is also Documentation to be updated; 2) your mail
address in Author field is @nuc.home?

--
Thanks,
Jike

> 
> ---
> 
> Alex Williamson (4):
>   vfio-mdev: Remove an unused structure element
>   vfio-mdev: de-polute the namespace, rename parent_device & parent_ops
>   vfio-mdev: Make mdev_parent private
>   vfio-mdev: Make mdev_device private and abstract interfaces
> 
> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c |   22 +++--
>  drivers/vfio/mdev/mdev_core.c|   64 
> ++
>  drivers/vfio/mdev/mdev_private.h |   28 +++--
>  drivers/vfio/mdev/mdev_sysfs.c   |8 ++---
>  drivers/vfio/mdev/vfio_mdev.c|   12 ---
>  include/linux/mdev.h |   54 +++-
>  samples/vfio-mdev/mtty.c |   28 +
>  7 files changed, 123 insertions(+), 93 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 6:07 PM, Hannes Frederic Sowa
 wrote:
> On 22.12.2016 00:42, Andy Lutomirski wrote:
>> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld  wrote:
>>>  unsigned int get_random_int(void)
>>>  {
>>> -   __u32 *hash;
>>> -   unsigned int ret;
>>> -
>>> -   if (arch_get_random_int(&ret))
>>> -   return ret;
>>> -
>>> -   hash = get_cpu_var(get_random_int_hash);
>>> -
>>> -   hash[0] += current->pid + jiffies + random_get_entropy();
>>> -   md5_transform(hash, random_int_secret);
>>> -   ret = hash[0];
>>> -   put_cpu_var(get_random_int_hash);
>>> -
>>> -   return ret;
>>> +   unsigned int arch_result;
>>> +   u64 result;
>>> +   struct random_int_secret *secret;
>>> +
>>> +   if (arch_get_random_int(&arch_result))
>>> +   return arch_result;
>>> +
>>> +   secret = get_random_int_secret();
>>> +   result = siphash_3u64(secret->chaining, jiffies,
>>> + (u64)random_get_entropy() + current->pid,
>>> + secret->secret);
>>> +   secret->chaining += result;
>>> +   put_cpu_var(secret);
>>> +   return result;
>>>  }
>>>  EXPORT_SYMBOL(get_random_int);
>>
>> Hmm.  I haven't tried to prove anything for real.  But here goes (in
>> the random oracle model):
>>
>> Suppose I'm an attacker and I don't know the secret or the chaining
>> value.  Then, regardless of what the entropy is, I can't predict the
>> numbers.
>>
>> Now suppose I do know the secret and the chaining value due to some
>> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
>> to find a value "result" such that prev_chaining + result = chaining
>> and result = H(prev_chaining, ..., secret);.  I don't think this can
>> be done efficiently in the random oracle model regardless of what the
>> "..." is.
>>
>> But, if I know the secret and chaining value, I can predict the next
>> output assuming I can guess the entropy.  What's worse is that, even
>> if I can't guess the entropy, if I *observe* the next output then I
>> can calculate the next chaining value.
>>
>> So this is probably good enough, and making it better is hard.  Changing it 
>> to:
>>
>> u64 entropy = (u64)random_get_entropy() + current->pid;
>> result = siphash(..., entropy, ...);
>> secret->chaining += result + entropy;
>>
>> would reduce this problem by forcing an attacker to brute-force the
>> entropy on each iteration, which is probably an improvement.
>>
>> To fully fix it, something like "catastrophic reseeding" would be
>> needed, but that's hard to get right.
>
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.
>

Unless I've misunderstood it, Ted's proposal causes get_random_int()
to return bytes straight from urandom (effectively), which should make
it very strong.  And if urandom is competitively fast now, I don't see
the problem.  ChaCha20 is designed for speed, after all.


George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin
 wrote:
> As a separate message, to disentangle the threads, I'd like to
> talk about get_random_long().
>
> After some thinking, I still like the "state-preserving" construct
> that's equivalent to the current MD5 code.  Yes, we could just do
> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
> preserve a bit more.
>
> It requires library support from the SipHash code to return the full
> SipHash state, but I hope that's a fair thing to ask for.

I don't even think it needs that.  This is just adding a
non-destructive final operation, right?

>
> Here's my current straw man design for comment.  It's very similar to
> the current MD5-based design, but feeds all the seed material in the
> "correct" way, as opposed to Xring directly into the MD5 state.
>
> * Each CPU has a (Half)SipHash state vector,
>   "unsigned long get_random_int_hash[4]".  Unlike the current
>   MD5 code, we take care to initialize it to an asymmetric state.
>
> * There's a global 256-bit random_int_secret (which we could
>   reseed periodically).
>
> To generate a random number:
> * If get_random_int_hash is all-zero, seed it with fresh a half-sized
>   SipHash key and the appropriate XOR constants.
> * Generate three words of random_get_entropy(), jiffies, and current->pid.
>   (This is arbitary seed material, copied from the current code.)
> * Crank through that with (Half)SipHash-1-0.
> * Crank through the random_int_secret with (Half)SipHash-1-0.
> * Return v1 ^ v3.

Just to clarify, if we replace SipHash with a black box, I think this
effectively means, where "entropy" is random_get_entropy() || jiffies
|| current->pid:

The first call returns H(random seed || entropy_0 || secret).  The
second call returns H(random seed || entropy_0 || secret || entropy_1
|| secret).  Etc.

If not, then I have a fairly strong preference to keep whatever
construction we come up with consistent with something that could
actually happen with invocations of unmodified SipHash -- then all the
security analysis on SipHash goes through.

Anyway, I have mixed thoughts about the construction.  It manages to
have a wide state at essentially no cost, which buys us quite a bit of
work factor to break it.  Even with full knowledge of the state, an
output doesn't reveal the entropy except to the extent that it can be
brute-force (this is just whatever the appropriate extended version of
first preimage resistance gives us).  The one thing I don't like is
that I don't see how to prove that you can't run it backwards if you
manage to acquire a memory dump.  In fact, I that that there exist, at
least in theory, hash functions that are secure in the random oracle
model but that *can* be run backwards given the full state.  From
memory, SHA-3 has exactly that property, and it would be a bit sad for
a CSPRNG to be reversible.

We could also periodically mix in a big (128-bit?) chunk of fresh
urandom output to keep the bad guys guessing.

(P.S.  This kind of resembles the duplex sponge construction.  If
hardware SHA-3 ever shows up, a duplex sponge RNG might nice indeed.)


Re: [RFC][PATCH] make global bitlock waitqueues per-node

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 11:50:49 -0800
Linus Torvalds  wrote:

> On Wed, Dec 21, 2016 at 11:01 AM, Nicholas Piggin  wrote:
> > Peter's patch is less code and in that regard a bit nicer. I tried
> > going that way once, but I just thought it was a bit too sloppy to
> > do nicely with wait bit APIs.  
> 
> So I have to admit that when I read through your and PeterZ's patches
> back-to-back, yours was easier to understand.
> 
> PeterZ's is smaller but kind of subtle. The whole "return zero from
> lock_page_wait() and go around again" and the locking around that
> isn't exactly clear. In contrast, yours has the obvious waitqueue
> spinlock.
> 
> I'll think about it.  And yes, it would be good to have more testing,
> but at the same time xmas is imminent, and waiting around too much
> isn't going to help either..

Sure. Let's see if Dave and Mel get a chance to do some testing.

It might be a squeeze before Christmas. I realize we're going to fix
it anyway so on one hand might as well get something in. On the other
I didn't want to add a subtle bug then have everyone go on vacation.

How about I send up the page flag patch by Friday and that can bake
while the main patch gets more testing / review?

Thanks,
Nick


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Hannes Frederic Sowa
On 22.12.2016 00:42, Andy Lutomirski wrote:
> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld  wrote:
>>  unsigned int get_random_int(void)
>>  {
>> -   __u32 *hash;
>> -   unsigned int ret;
>> -
>> -   if (arch_get_random_int(&ret))
>> -   return ret;
>> -
>> -   hash = get_cpu_var(get_random_int_hash);
>> -
>> -   hash[0] += current->pid + jiffies + random_get_entropy();
>> -   md5_transform(hash, random_int_secret);
>> -   ret = hash[0];
>> -   put_cpu_var(get_random_int_hash);
>> -
>> -   return ret;
>> +   unsigned int arch_result;
>> +   u64 result;
>> +   struct random_int_secret *secret;
>> +
>> +   if (arch_get_random_int(&arch_result))
>> +   return arch_result;
>> +
>> +   secret = get_random_int_secret();
>> +   result = siphash_3u64(secret->chaining, jiffies,
>> + (u64)random_get_entropy() + current->pid,
>> + secret->secret);
>> +   secret->chaining += result;
>> +   put_cpu_var(secret);
>> +   return result;
>>  }
>>  EXPORT_SYMBOL(get_random_int);
> 
> Hmm.  I haven't tried to prove anything for real.  But here goes (in
> the random oracle model):
> 
> Suppose I'm an attacker and I don't know the secret or the chaining
> value.  Then, regardless of what the entropy is, I can't predict the
> numbers.
> 
> Now suppose I do know the secret and the chaining value due to some
> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
> to find a value "result" such that prev_chaining + result = chaining
> and result = H(prev_chaining, ..., secret);.  I don't think this can
> be done efficiently in the random oracle model regardless of what the
> "..." is.
> 
> But, if I know the secret and chaining value, I can predict the next
> output assuming I can guess the entropy.  What's worse is that, even
> if I can't guess the entropy, if I *observe* the next output then I
> can calculate the next chaining value.
> 
> So this is probably good enough, and making it better is hard.  Changing it 
> to:
> 
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
> 
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.
> 
> To fully fix it, something like "catastrophic reseeding" would be
> needed, but that's hard to get right.

I wonder if Ted's proposal was analyzed further in terms of performance
if get_random_int should provide cprng alike properties?

For reference: https://lkml.org/lkml/2016/12/14/351

The proposal made sense to me and would completely solve the above
mentioned problem on the cost of repeatedly reseeding from the crng.

Bye,
Hannes




Re: Build warning on 32-bit PPC - bisected to commit 989cea5c14be

2016-12-21 Thread Nicholas Piggin
On Wed, 21 Dec 2016 13:49:07 -0600
Larry Finger  wrote:

> I am getting the following warning when I build kernel 4.9-git on my 
> PowerBook 
> G4 with a 32-bit PPC processor:
> 
>AS  arch/powerpc/kernel/misc_32.o
> arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not 
> defined 
> [-Wundef]
> 
> This problem has been bisected to commit 989cea5c14be ("kbuild: prevent 
> lib-ksyms.o rebuilds").
> 
> Thanks,
> 
> Larry

Hi Larry,

This is strange you've bisected it there, I can't see how that patch would
trigger it. That said, powerpc has had a few small build system glitches.

It looks like this warning could be fixed by changing #elif CONFIG_FSL_BOOKE
to #elif defined (CONFIG_FSL_BOOKE). Want to send a patch (if it works)?

Thanks,
Nick


Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf

2016-12-21 Thread Kefeng Wang


On 2016/12/21 10:20, Alexander Duyck wrote:
> I find it curious that only the last 4 bytes have data in them.  I'm
> wondering if the NIC/driver in the Windows/Nessus system is
> interpreting the 4 byte CRC on the end of the frame as padding instead
> of stripping it.
> 
> Is there any chance you could capture the entire frame instead of just
> the padding?  Maybe you could run something like wireshark without
> enabling promiscuous mode on the VF and capture the frames it is
> trying to send and receive.  What I want to verify is what the actual
> amount of padding is that is needed to get to 60 bytes and where the
> CRC should start.
> 
> - Alex

Here is the verbose output, is this useful?
Or we will try according to your advice, thanks,

D:\Program Files\Tenable\Nessus>nasl.exe -aX -t 192.169.0.151 etherleak.nasl
--
 ---[ ICMP ]---
0x00:  45 00 00 1D 20 81 00 00 40 01 D7 F3 C0 A9 00 97E... ...@...
0x10:  C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00x...
0x20:  00 00 00 00 00 00 00 00 00 00 98 E4 75 DF  u.
--
 ---[ ICMP ]---
0x00:  45 00 00 1D 20 85 00 00 40 01 D7 EF C0 A9 00 97E... ...@...
0x10:  C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00x...
0x20:  00 00 00 00 00 00 00 00 00 00 FB DA F8 13  ..
---[ ether1 ]---
0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75...u
0x10:  DF .
---[ ether2 ]---
0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8
0x10:  13 .

Padding observed in one frame :

  0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75...u
  0x10:  DF .

Padding observed in another frame :

  0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8
  0x10:  13




Re: HalfSipHash Acceptable Usage

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 9:25 AM, Linus Torvalds
 wrote:
> On Wed, Dec 21, 2016 at 7:55 AM, George Spelvin
>  wrote:
>>
>> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
>
> It's now better than it used to be, but it's absolutely disastrous
> still. We're talking easily many hundreds of cycles. Under some loads,
> thousands.
>
> And I warn you already: it will _benchmark_ a hell of a lot better
> than it will work in reality. In benchmarks, you'll hit all the
> optimizations ("oh, I've already saved away all the FP registers, no
> need to do it again").
>
> In contrast, in reality, especially with things like "do it once or
> twice per incoming packet", you'll easily hit the absolute worst
> cases, where not only does it take a few hundred cycles to save the FP
> state, you'll then return to user space in between packets, which
> triggers the slow-path return code and reloads the FP state, which is
> another few hundred cycles plus.

Hah, you're thinking that the x86 code works the way that Rik and I
want it to work, and you just made my day. :)  What actually happens
is that the state is saved in kernel_fpu_begin() and restored in
kernel_fpu_end(), and it'll take a few hundred cycles best case.  If
you do it a bunch of times in a loop, you *might* trigger a CPU
optimization that notices that the state being saved is the same state
that was just restored, but you're still going to pay the full restore
code each round trip no matter what.

The code is much clearer in 4.10 kernels now that I deleted the unused
"lazy" branches.

>
> Similarly, in benchmarks you'll hit the "modern CPU's power on the AVX
> unit and keep it powered up for a while afterwards", while in real
> life you would quite easily hit the "oh, AVX is powered down because
> we were idle, now it powers up at half speed which is another latency
> hit _and_ the AVX unit won't run full out anyway".

I *think* that was mostly fixed in Broadwell or thereabouts (in terms
of latency -- throughput and power consumption still suffers).


Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-21 Thread Lu Baolu
Hi,

On 12/21/2016 11:18 PM, OGAWA Hirofumi wrote:
> Mathias Nyman  writes:
>
>>> We set CMD_RING_STATE_ABORTED state under locking. I'm not checking what
>>> is for taking lock for register though, I guess it should be enough just
>>> lock around of read=>write of ->cmd_ring if need lock.
>> After your patch it should be enough to have the lock only while
>> reading and writing the cmd_ring register.
>>
>> If we want a locking fix that applies more easily to older stable
>> releases before your change then the lock needs to cover set
>> CMD_RING_STATE_ABORT, read cmd_reg, write cmd_reg and busiloop
>> checking CRR bit.  Otherwise the stop cmd ring interrupt handler may
>> restart the ring just before we start checing CRR. The stop cmd ring
>> interrupt will set the CMD_RING_STATE_ABORTED to
>> CMD_RING_STATE_RUNNING so ring will really restart in the interrupt
>> handler.
> Just for record (no chance to make patch I myself for now, sorry), while
> checking locking slightly, I noticed unrelated missing locking.
>
>   xhci_cleanup_command_queue()
>
> We are calling it without locking, but we need to lock for accessing list.

Yeah. I can make the patch.

Best regards,
Lu Baolu


Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

2016-12-21 Thread Lu Baolu
Hi,

On 12/21/2016 08:48 PM, Mathias Nyman wrote:
> On 21.12.2016 08:17, Lu Baolu wrote:
>> Hi Mathias,
>>
>> I have some comments for the implementation of xhci_abort_cmd_ring() below.
>>
>> On 12/20/2016 11:13 PM, Mathias Nyman wrote:
>>> On 20.12.2016 09:30, Baolin Wang wrote:
>>> ...
>>>
>>> Alright, I gathered all current work related to xhci races and timeouts
>>> and put them into a branch:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git 
>>> timeout_race_fixes
>>>
>>> Its based on 4.9
>>> It includes a few other patches just to avoid conflicts and  make my life 
>>> easier
>>>
>>> Interesting patches are:
>>>
>>> ee4eb91 xhci: remove unnecessary check for pending timer
>>> 0cba67d xhci: detect stop endpoint race using pending timer instead of 
>>> counter.
>>> 4f2535f xhci: Handle command completion and timeout race
>>> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort 
>>> command
>>> 529a5a0 usb: xhci: fix possible wild pointer
>>> 4766555 xhci: Fix race related to abort operation
>>> de834a3 xhci: Use delayed_work instead of timer for command timeout
>>> 69973b8 Linux 4.9
>>>
>>> The fixes for command queue races will go to usb-linus and stable, the
>>> reworks for stop ep watchdog timer will go to usb-next.
>>>
>>> Still completely untested, (well it compiles)
>>>
>>> Felipe gave instructions how to modify dwc3 driver to timeout on address
>>> devicecommands to test these, I'll try to set that up.
>>>
>>> All additional testing is welcome, especially if you can trigger timeouts
>>> and races
>>>
>>> -Mathias
>>>
>>>
>>
>> Below is the latest code. I put my comments in line.
>>
>>   322 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>>   323 {
>>   324 u64 temp_64;
>>   325 int ret;
>>   326
>>   327 xhci_dbg(xhci, "Abort command ring\n");
>>   328
>>   329 reinit_completion(&xhci->cmd_ring_stop_completion);
>>   330
>>   331 temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>>   332 xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
>>   333 &xhci->op_regs->cmd_ring);
>>
>> We should hold xhci->lock when we are modifying xhci registers
>> at runtime.
>>
>
> Makes sense, but we need to unlock it before sleeping or waiting for 
> completion.
> I need to look into that in more detail.
>
> But this was an issue already before these changes.
>
>> The retry of setting CMD_RING_ABORT is not necessary according to
>> previous discussion. We have cleaned code for second try in
>> xhci_handle_command_timeout(). Need to clean up here as well.
>>
>
> Yes it can be cleaned up as well, but the two cases are a bit different.
> The cleaned up one was about command ring not starting again after it was 
> stopped.
>
> This second try is a workaround for what we thought was the command ring 
> failing
> to stop in the first place, but is most likely due to the race that OGAWA 
> Hirofumi
> fixed.  It races if the stop command ring interrupt happens between writing 
> the abort
> bit and polling for the ring stopped bit. The interrupt hander may start the 
> command
> ring again, and we would believe we failed to stop it in the first place.
>
> This race could probably be fixed by just extending the lock (and preventing
> interrupts) to cover both writing the abort bit and polling for the command 
> ring
> running bit, as you pointed out here previously.
>
> But then again I really like OGAWA Hiroumi's solution that separates the
> command ring stopping from aborting commands and restarting the ring.
>
> The current way of always restarting the command ring as a response to
> a stop command ring command really limits its usage.
>
> So, with this in mind most reasonable would be to
> 1. fix the lock to cover abort+CRR check, and send it to usb-linus +stable
> 2. rebase OGAWA Hirofumi's changes on top of that, and send to usb-linus only
> 3. remove unnecessary second abort try as a separate patch, send to usb-next
> 4. remove polling for the Command ring running (CRR), waiting for completion
>is enough, if completion times out then we can check CRR. for usb-next
>I'll fix the typos these patches would introduce. Fixing old typos can be 
> done as separate
> patches later.

This is exactly the same as what I am thinking of. I will submit the patches 
later.

Best regards,
Lu Baolu


Re: [PATCH v7 1/6] siphash: add cryptographically secure PRF

2016-12-21 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 2:40 AM, Stephen Hemminger
 wrote:
> The networking tree (net-next) which is where you are submitting to is 
> technically
> closed right now.

That's okay. At some point in the future it will be open. By then v83
of this patch set will be shiny and done, just waiting for the merge
window to open. There's a lot to discuss with this, so getting the
feedback early is beneficial.

Jason


  1   2   3   4   5   6   7   >