date:20170113

Re: [PATCH v2] partially revert "xen: Remove event channel notification through Xen PCI platform device"

2017-01-13 Thread Boris Ostrovsky

On 01/13/2017 01:26 PM, Stefano Stabellini wrote:
> On Fri, 13 Jan 2017, Boris Ostrovsky wrote:
>> On 01/12/2017 04:39 PM, Stefano Stabellini wrote:
>>> The following commit:
>>>
>>> commit 72a9b186292d98494f26cfd24a1621796209
>>> Author: KarimAllah Ahmed 
>>> Date:   Fri Aug 26 23:55:36 2016 +0200
>>>
>>> xen: Remove event channel notification through Xen PCI platform device
>>>
>>> broke Linux when booting as Dom0 on Xen in a nested Xen environment (Xen
>>> installed inside a Xen VM). In this scenario, Linux is a PV guest, but
>>> at the same time it uses the platform-pci driver to receive
>>> notifications from L0 Xen. vector callbacks are not available because L1
>>> Xen doesn't allow them.
>>>
>>> Partially revert the offending commit, by restoring IRQ based
>>> notifications for PV guests only. I restored only the code which is
>>> strictly needed and replaced the xen_have_vector_callback checks within
>>> it with xen_pv_domain() checks.
>>>
>>> Signed-off-by: Stefano Stabellini 
>> Reviewed-by: Boris Ostrovsky 
> Applied to xentip/for-linus-4.10, improving the commit message as
> suggested before.
>
> Do you plan on sending out another pull request for 4.10?

Juergen?

BTW, this probably wants to go to 4.9-stable too?

-boris

Re: [PATCH v2] partially revert "xen: Remove event channel notification through Xen PCI platform device"

2017-01-13 Thread Boris Ostrovsky

On 01/13/2017 01:26 PM, Stefano Stabellini wrote:
> On Fri, 13 Jan 2017, Boris Ostrovsky wrote:
>> On 01/12/2017 04:39 PM, Stefano Stabellini wrote:
>>> The following commit:
>>>
>>> commit 72a9b186292d98494f26cfd24a1621796209
>>> Author: KarimAllah Ahmed 
>>> Date:   Fri Aug 26 23:55:36 2016 +0200
>>>
>>> xen: Remove event channel notification through Xen PCI platform device
>>>
>>> broke Linux when booting as Dom0 on Xen in a nested Xen environment (Xen
>>> installed inside a Xen VM). In this scenario, Linux is a PV guest, but
>>> at the same time it uses the platform-pci driver to receive
>>> notifications from L0 Xen. vector callbacks are not available because L1
>>> Xen doesn't allow them.
>>>
>>> Partially revert the offending commit, by restoring IRQ based
>>> notifications for PV guests only. I restored only the code which is
>>> strictly needed and replaced the xen_have_vector_callback checks within
>>> it with xen_pv_domain() checks.
>>>
>>> Signed-off-by: Stefano Stabellini 
>> Reviewed-by: Boris Ostrovsky 
> Applied to xentip/for-linus-4.10, improving the commit message as
> suggested before.
>
> Do you plan on sending out another pull request for 4.10?

Juergen?

BTW, this probably wants to go to 4.9-stable too?

-boris

Re: [RFC PATCH v2 10/10] dt-bindings: Document devicetree binding for ARM SPE

2017-01-13 Thread Mark Rutland

On Fri, Jan 13, 2017 at 04:03:49PM +, Will Deacon wrote:
> This patch documents the devicetree binding in use for ARM SPE.
> 
> Cc: Mark Rutland 
> Cc: Rob Herring 
> Signed-off-by: Will Deacon 
> ---
>  Documentation/devicetree/bindings/arm/spe-pmu.txt | 20 
>  1 file changed, 20 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/spe-pmu.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/spe-pmu.txt 
> b/Documentation/devicetree/bindings/arm/spe-pmu.txt
> new file mode 100644
> index ..d6540b491af4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/spe-pmu.txt
> @@ -0,0 +1,20 @@
> +* ARMv8.2 Statistical Profiling Extension (SPE) Performance Monitor Units 
> (PMU)
> +
> +ARMv8.2 introduces the optional Statistical Profiling Extension for 
> collecting
> +performance sample data using an in-memory trace buffer.
> +
> +** SPE Required properties:
> +
> +- compatible : should be one of:
> +"arm,arm-spe-pmu-v1"

The second "arm" here doesn't seem to add much. Should that be "armv8.2"
instead?

That would roughly match what we do with the architected timers to
describe the specific architectural version.

Otherwise, this looks fine to me.

Thanks,
Mark.

> +- interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
> +   SPE is only supported on a subset of the CPUs, please consult
> +the arm,gic-v3 binding for details on describing a PPI partition.
> +
> +** Example:
> +
> +spe-pmu {
> +compatible = "arm,arm-spe-pmu-v1";
> +interrupts = ;
> +};
> -- 
> 2.1.4
>

Re: [RFC PATCH v2 10/10] dt-bindings: Document devicetree binding for ARM SPE

2017-01-13 Thread Mark Rutland

On Fri, Jan 13, 2017 at 04:03:49PM +, Will Deacon wrote:
> This patch documents the devicetree binding in use for ARM SPE.
> 
> Cc: Mark Rutland 
> Cc: Rob Herring 
> Signed-off-by: Will Deacon 
> ---
>  Documentation/devicetree/bindings/arm/spe-pmu.txt | 20 
>  1 file changed, 20 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/spe-pmu.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/spe-pmu.txt 
> b/Documentation/devicetree/bindings/arm/spe-pmu.txt
> new file mode 100644
> index ..d6540b491af4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/spe-pmu.txt
> @@ -0,0 +1,20 @@
> +* ARMv8.2 Statistical Profiling Extension (SPE) Performance Monitor Units 
> (PMU)
> +
> +ARMv8.2 introduces the optional Statistical Profiling Extension for 
> collecting
> +performance sample data using an in-memory trace buffer.
> +
> +** SPE Required properties:
> +
> +- compatible : should be one of:
> +"arm,arm-spe-pmu-v1"

The second "arm" here doesn't seem to add much. Should that be "armv8.2"
instead?

That would roughly match what we do with the architected timers to
describe the specific architectural version.

Otherwise, this looks fine to me.

Thanks,
Mark.

> +- interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
> +   SPE is only supported on a subset of the CPUs, please consult
> +the arm,gic-v3 binding for details on describing a PPI partition.
> +
> +** Example:
> +
> +spe-pmu {
> +compatible = "arm,arm-spe-pmu-v1";
> +interrupts = ;
> +};
> -- 
> 2.1.4
>

Re: irq domain hierarchy vs. chaining w/ PCI MSI-X...

2017-01-13 Thread Marc Zyngier

On 13/01/17 17:37, David Daney wrote:
> On 01/13/2017 08:15 AM, Marc Zyngier wrote:
>> Thanks Linus for looping me in.
>>
>> On 12/01/17 22:35, David Daney wrote:
>>> Hi Thomas,
>>>
>>> I am trying to figure out how to handle this situation:
>>>
>>>handle_level_irq()
>>>+---+ handle_fasteoi_irq()
>>>| PCIe hosted   | +---+
>>> +-+
>>>   --level_gpio>| GPIO to MSI-X |--MSI_message--+>| gicv3-ITS |---> |
>>> CPU |
>>>| widget|   | +---+
>>> +-+
>>>+---+   |
>>>|
>>>+---+   |
>>>| other PCIe device |---MSI_message-+
>>>+---+
>>>
>>>
>>> The question is how to structure the interrupt handling.  My initial
>>> attempt was a chaining arrangement where the GPIO driver does
>>> request_irq() for the appropriate MSI-X vector, and the handler calls
>>> back into the irq system like this:
>>>
>>>
>>> static irqreturn_t thunderx_gpio_chain_handler(int irq, void *dev)
>>> {
>>> struct thunderx_irqdev *irqdev = dev;
>>> int chained_irq;
>>> int ret;
>>>
>>> chained_irq = irq_find_mapping(irqdev->gpio->chip.irqdomain,
>>>irqdev->line);
>>> if (!chained_irq)
>>> return IRQ_NONE;
>>>
>>> ret = generic_handle_irq(chained_irq);
>>>
>>> return ret ? IRQ_NONE : IRQ_HANDLED;
>>> }
>>>
>>> Thus getting the proper GPIO irq_chip functions called to manage the
>>> level triggering semantics.
>>>
>>> The drawbacks of this approach are that there are then two irqs
>>> associated with the GPIO line (the base MSI-X and the chained GPIO),
>>> also there can be up to 80-100 of these widgets, so potentially we can
>>> consume twice that many irq numbers.
>>>
>>> It was suggested by Linus Walleij that using an irq domain hierarchy
>>> might be a better idea.  However, I cannot figure out how this might
>>> work.  The gicv3-ITS needs to use handle_fasteoi_irq(), and we need
>>> handle_level_irq() for the GPIO-level lines.  Getting the proper
>>> irq_chip functions called in a hierarchical configuration doesn't seem
>>> doable given the heterogeneous flow handlers.
>>
>> My main worry here is that you're trying to handle something that is
>> apparently a level interrupt using a transport that only handles edge
>> interrupts. It means that each time you EOI an interrupt, you need to be
>> able to resample the level and retrigger it if still high. Do you have a
>> HW facility to do so? Or do you emulate this in SW?
> 
> Yes.
> 
> The first thing the handle_level_irq() flow handler does is to mask the 
> source.  After calling the handler, it then unmasks the source.
> 
> The act of unmasking in the HW causes another MSI to be emitted if the 
> level signal is still active.  This is what we want as it ensures that 
> interrupts keep firing as long as the signal is active, even though the 
> underlying mechanism uses edge semantics MSI.
> 
> 
>>
>>> Can you think of a better way of structuring this than chaining from the
>>> MSI-X handler as I outlined above?
>>
>> We already have similar horrors - see irq-mbigen.c which does exactly
>> that. It creates IRQ domains spanning a bunch of MSIs allocated to that
>> platform device. Once you have that, the rest is pretty easy.
>>
>> In your case, it probably requires adding the same kind of surgery so
>> that we can create an IRQ domain from the MSIs associated with a PCIe
>> device. Not too hard, just not there yet, and we can probably reuse some
>> of the code that lives in platform-msi.c
> 
> That seems too ugly.
> 
> I would propose one of the following:
> 
> 1) Just keep the crappy chaining more or less as I originally 
> implemented it, as it works well enough to get useful work done.

I think that's the really ugly bit. It may work, but it is not something
I wish to see in code that I'd end-up being responsible for.

> 2) add an optional hierarchical flow handler function to struct 
> irq_data.  If populated, this flow handler would be called from 
> handle_fasteoi_irq() instead of calling handle_irq_event().  This could 
> allow each irq_chip to have its own flow handler, instead of a single 
> flow handler shared by each of the hierarchically nested irq_chip.

So you want to generalize CONFIG_IRQ_PREFLOW_FASTEOI so that it is on
each level of a domain stack? Humm. I personally think that this is a
massive bloat that is going to impact all the hot paths for no gain
whatsoever, but I'll let tglx speak his mind on that.

I still think that being able to create an irqdomain based on the
interrupts allocated to a device and make that an irqchip is
conceptually simpler, already exists in the kernel, and fits the
existing infrastructure that has been put in place over the

Re: irq domain hierarchy vs. chaining w/ PCI MSI-X...

2017-01-13 Thread Marc Zyngier

On 13/01/17 17:37, David Daney wrote:
> On 01/13/2017 08:15 AM, Marc Zyngier wrote:
>> Thanks Linus for looping me in.
>>
>> On 12/01/17 22:35, David Daney wrote:
>>> Hi Thomas,
>>>
>>> I am trying to figure out how to handle this situation:
>>>
>>>handle_level_irq()
>>>+---+ handle_fasteoi_irq()
>>>| PCIe hosted   | +---+
>>> +-+
>>>   --level_gpio>| GPIO to MSI-X |--MSI_message--+>| gicv3-ITS |---> |
>>> CPU |
>>>| widget|   | +---+
>>> +-+
>>>+---+   |
>>>|
>>>+---+   |
>>>| other PCIe device |---MSI_message-+
>>>+---+
>>>
>>>
>>> The question is how to structure the interrupt handling.  My initial
>>> attempt was a chaining arrangement where the GPIO driver does
>>> request_irq() for the appropriate MSI-X vector, and the handler calls
>>> back into the irq system like this:
>>>
>>>
>>> static irqreturn_t thunderx_gpio_chain_handler(int irq, void *dev)
>>> {
>>> struct thunderx_irqdev *irqdev = dev;
>>> int chained_irq;
>>> int ret;
>>>
>>> chained_irq = irq_find_mapping(irqdev->gpio->chip.irqdomain,
>>>irqdev->line);
>>> if (!chained_irq)
>>> return IRQ_NONE;
>>>
>>> ret = generic_handle_irq(chained_irq);
>>>
>>> return ret ? IRQ_NONE : IRQ_HANDLED;
>>> }
>>>
>>> Thus getting the proper GPIO irq_chip functions called to manage the
>>> level triggering semantics.
>>>
>>> The drawbacks of this approach are that there are then two irqs
>>> associated with the GPIO line (the base MSI-X and the chained GPIO),
>>> also there can be up to 80-100 of these widgets, so potentially we can
>>> consume twice that many irq numbers.
>>>
>>> It was suggested by Linus Walleij that using an irq domain hierarchy
>>> might be a better idea.  However, I cannot figure out how this might
>>> work.  The gicv3-ITS needs to use handle_fasteoi_irq(), and we need
>>> handle_level_irq() for the GPIO-level lines.  Getting the proper
>>> irq_chip functions called in a hierarchical configuration doesn't seem
>>> doable given the heterogeneous flow handlers.
>>
>> My main worry here is that you're trying to handle something that is
>> apparently a level interrupt using a transport that only handles edge
>> interrupts. It means that each time you EOI an interrupt, you need to be
>> able to resample the level and retrigger it if still high. Do you have a
>> HW facility to do so? Or do you emulate this in SW?
> 
> Yes.
> 
> The first thing the handle_level_irq() flow handler does is to mask the 
> source.  After calling the handler, it then unmasks the source.
> 
> The act of unmasking in the HW causes another MSI to be emitted if the 
> level signal is still active.  This is what we want as it ensures that 
> interrupts keep firing as long as the signal is active, even though the 
> underlying mechanism uses edge semantics MSI.
> 
> 
>>
>>> Can you think of a better way of structuring this than chaining from the
>>> MSI-X handler as I outlined above?
>>
>> We already have similar horrors - see irq-mbigen.c which does exactly
>> that. It creates IRQ domains spanning a bunch of MSIs allocated to that
>> platform device. Once you have that, the rest is pretty easy.
>>
>> In your case, it probably requires adding the same kind of surgery so
>> that we can create an IRQ domain from the MSIs associated with a PCIe
>> device. Not too hard, just not there yet, and we can probably reuse some
>> of the code that lives in platform-msi.c
> 
> That seems too ugly.
> 
> I would propose one of the following:
> 
> 1) Just keep the crappy chaining more or less as I originally 
> implemented it, as it works well enough to get useful work done.

I think that's the really ugly bit. It may work, but it is not something
I wish to see in code that I'd end-up being responsible for.

> 2) add an optional hierarchical flow handler function to struct 
> irq_data.  If populated, this flow handler would be called from 
> handle_fasteoi_irq() instead of calling handle_irq_event().  This could 
> allow each irq_chip to have its own flow handler, instead of a single 
> flow handler shared by each of the hierarchically nested irq_chip.

So you want to generalize CONFIG_IRQ_PREFLOW_FASTEOI so that it is on
each level of a domain stack? Humm. I personally think that this is a
massive bloat that is going to impact all the hot paths for no gain
whatsoever, but I'll let tglx speak his mind on that.

I still think that being able to create an irqdomain based on the
interrupts allocated to a device and make that an irqchip is
conceptually simpler, already exists in the kernel, and fits the
existing infrastructure that has been put in place over the

Re: [PATCH] platform/x86: Add IMS/ZII SCU driver

2017-01-13 Thread Darren Hart

On Fri, Jan 13, 2017 at 06:38:28PM +0200, Andy Shevchenko wrote:
> On Wed, Jan 11, 2017 at 11:26 PM, Florian Fainelli  
> wrote:
> > From: Guenter Roeck 
> >
> > This patch adds support for the IMS (now Zodiac Inflight Innovations)
> > SCU Generation 1/2/3 platform driver. This driver registers all the
> > on-module peripherals: Ethernet switches (Broadcom or Marvell), I2C to
> > GPIO expanders, Kontrom CPLD/I2C master, and more.
> >
> > Signed-off-by: Guenter Roeck 
> > Signed-off-by: Florian Fainelli 
> 
> > ---
> > Darren,
> >
> > This is against your "for-next" branch thanks!
> 
> I'm going to review this later, though few of comments.
> 
> > +config SCU
> 
> No, no. We have enough mess with Intel's SCU/PMIC here, not add more.
> 
> At least add manufacturer as prefix to option and file name.
> 
> Btw, Darren, would it be good idea to start creating folders to make a
> bit of order in the subsystem? For first I would move Intel's PMIC/SCU
> stuff to somewhere (not sure if it should be per manufacturer or per
> function).

If we create folder, I'd recommend by manufacturer as that is typically along
the lines that our contributors work. Someone with a Dell laptop tends to focus
on Dell. The Intel engineers will focus primarily on Intel SoCs, etc.

I'd like to avoid getting too granualar as it adds overhead with Makefiles,
file navigation, etc. Any thoughts on a minimum file count to justify a new
directory? Maybe 5?

-- 
Darren Hart
Intel Open Source Technology Center

Re: [PATCH] platform/x86: Add IMS/ZII SCU driver

2017-01-13 Thread Darren Hart

On Fri, Jan 13, 2017 at 06:38:28PM +0200, Andy Shevchenko wrote:
> On Wed, Jan 11, 2017 at 11:26 PM, Florian Fainelli  
> wrote:
> > From: Guenter Roeck 
> >
> > This patch adds support for the IMS (now Zodiac Inflight Innovations)
> > SCU Generation 1/2/3 platform driver. This driver registers all the
> > on-module peripherals: Ethernet switches (Broadcom or Marvell), I2C to
> > GPIO expanders, Kontrom CPLD/I2C master, and more.
> >
> > Signed-off-by: Guenter Roeck 
> > Signed-off-by: Florian Fainelli 
> 
> > ---
> > Darren,
> >
> > This is against your "for-next" branch thanks!
> 
> I'm going to review this later, though few of comments.
> 
> > +config SCU
> 
> No, no. We have enough mess with Intel's SCU/PMIC here, not add more.
> 
> At least add manufacturer as prefix to option and file name.
> 
> Btw, Darren, would it be good idea to start creating folders to make a
> bit of order in the subsystem? For first I would move Intel's PMIC/SCU
> stuff to somewhere (not sure if it should be per manufacturer or per
> function).

If we create folder, I'd recommend by manufacturer as that is typically along
the lines that our contributors work. Someone with a Dell laptop tends to focus
on Dell. The Intel engineers will focus primarily on Intel SoCs, etc.

I'd like to avoid getting too granualar as it adds overhead with Makefiles,
file navigation, etc. Any thoughts on a minimum file count to justify a new
directory? Maybe 5?

-- 
Darren Hart
Intel Open Source Technology Center

Re: [Xen-devel] [PATCH] xen: do not re-use pirq number cached in pci device msi msg data

2017-01-13 Thread Stefano Stabellini

On Fri, 13 Jan 2017, Dan Streetman wrote:
> On Wed, Jan 11, 2017 at 6:25 PM, Dan Streetman  wrote:
> > On Wed, Jan 11, 2017 at 1:46 PM, Stefano Stabellini
> >  wrote:
> >> On Wed, 11 Jan 2017, Dan Streetman wrote:
> >>> On Tue, Jan 10, 2017 at 8:25 PM, Stefano Stabellini
> >>>  wrote:
> >>> > On Tue, 10 Jan 2017, Dan Streetman wrote:
> >>> >> On Tue, Jan 10, 2017 at 2:03 PM, Stefano Stabellini
> >>> >>  wrote:
> >>> >> > On Tue, 10 Jan 2017, Dan Streetman wrote:
> >>> >> >> On Tue, Jan 10, 2017 at 10:57 AM, Dan Streetman  
> >>> >> >> wrote:
> >>> >> >> > On Mon, Jan 9, 2017 at 2:30 PM, Stefano Stabellini
> >>> >> >> >  wrote:
> >>> >> >> >> On Mon, 9 Jan 2017, Konrad Rzeszutek Wilk wrote:
> >>> >> >> >>> On Mon, Jan 09, 2017 at 10:42:41AM -0500, Dan Streetman wrote:
> >>> >> >> >>> > On Mon, Jan 9, 2017 at 9:59 AM, Boris Ostrovsky
> >>> >> >> >>> >  wrote:
> >>> >> >> >>> > > On 01/06/2017 08:06 PM, Konrad Rzeszutek Wilk wrote:
> >>> >> >> >>> > >> On Thu, Jan 05, 2017 at 02:28:56PM -0500, Dan Streetman 
> >>> >> >> >>> > >> wrote:
> >>> >> >> >>> > >>> Do not read a pci device's msi message data to see if a 
> >>> >> >> >>> > >>> pirq was
> >>> >> >> >>> > >>> previously configured for the device's msi/msix, as the 
> >>> >> >> >>> > >>> old pirq was
> >>> >> >> >>> > >>> unmapped and may now be in use by another pci device.  
> >>> >> >> >>> > >>> The previous
> >>> >> >> >>> > >>> pirq should never be re-used; instead a new pirq should 
> >>> >> >> >>> > >>> always be
> >>> >> >> >>> > >>> allocated from the hypervisor.
> >>> >> >> >>> > >> Won't this cause a starvation problem? That is we will run 
> >>> >> >> >>> > >> out of PIRQs
> >>> >> >> >>> > >> as we are not reusing them?
> >>> >> >> >>> > >
> >>> >> >> >>> > > Don't we free the pirq when we unmap it?
> >>> >> >> >>> >
> >>> >> >> >>> > I think this is actually a bit worse than I initially 
> >>> >> >> >>> > thought.  After
> >>> >> >> >>> > looking a bit closer, and I think there's an asymmetry with 
> >>> >> >> >>> > pirq
> >>> >> >> >>> > allocation:
> >>> >> >> >>>
> >>> >> >> >>> Lets include Stefano,
> >>> >> >> >>>
> >>> >> >> >>> Thank you for digging in this! This has quite the deja-vu
> >>> >> >> >>> feeling as I believe I hit this at some point in the past and
> >>> >> >> >>> posted some possible ways of fixing this. But sadly I can't
> >>> >> >> >>> find the thread.
> >>> >> >> >>
> >>> >> >> >> This issue seems to be caused by:
> >>> >> >> >>
> >>> >> >> >> commit af42b8d12f8adec6711cb824549a0edac6a4ae8f
> >>> >> >> >> Author: Stefano Stabellini 
> >>> >> >> >> Date:   Wed Dec 1 14:51:44 2010 +
> >>> >> >> >>
> >>> >> >> >> xen: fix MSI setup and teardown for PV on HVM guests
> >>> >> >> >>
> >>> >> >> >> which was a fix to a bug:
> >>> >> >> >>
> >>> >> >> >> This fixes a bug in xen_hvm_setup_msi_irqs that manifests 
> >>> >> >> >> itself when
> >>> >> >> >> trying to enable the same MSI for the second time: the old 
> >>> >> >> >> MSI to pirq
> >>> >> >> >> mapping is still valid at this point but 
> >>> >> >> >> xen_hvm_setup_msi_irqs would
> >>> >> >> >> try to assign a new pirq anyway.
> >>> >> >> >> A simple way to reproduce this bug is to assign an MSI 
> >>> >> >> >> capable network
> >>> >> >> >> card to a PV on HVM guest, if the user brings down the 
> >>> >> >> >> corresponding
> >>> >> >> >> ethernet interface and up again, Linux would fail to enable 
> >>> >> >> >> MSIs on the
> >>> >> >> >> device.
> >>> >> >> >>
> >>> >> >> >> I don't remember any of the details. From the description of 
> >>> >> >> >> this bug,
> >>> >> >> >> it seems that Xen changed behavior in the past few years: before 
> >>> >> >> >> it used
> >>> >> >> >> to keep the pirq-MSI mapping, while today it doesn't. If I wrote 
> >>> >> >> >> "the
> >>> >> >> >> old MSI to pirq mapping is still valid at this point", the pirq 
> >>> >> >> >> couldn't
> >>> >> >> >> have been completely freed, then reassigned to somebody else the 
> >>> >> >> >> way it
> >>> >> >> >> is described in this email.
> >>> >> >> >>
> >>> >> >> >> I think we should indentify the changeset or Xen version that 
> >>> >> >> >> introduced
> >>> >> >> >> the new behavior. If it is old enough, we might be able to just 
> >>> >> >> >> revert
> >>> >> >> >> af42b8d12f8adec6711cb824549a0edac6a4ae8f. Otherwise we could 
> >>> >> >> >> make the
> >>> >> >> >> behavior conditional to the Xen version.
> >>> >> >> >
> >>> >> >> > Are PT devices the only MSI-capable devices available in a Xen 
> >>> >> >> > guest?
> >>> >> >> > That's where I'm seeing this problem, with PT NVMe devices.
> >>> >> >
> >>> >> > They are the main ones. It is possible to have emulated virtio 
> >>> >> > devices
> >>> >> > with emulated MSIs, for example virtio-net. Althought they

Re: [Xen-devel] [PATCH] xen: do not re-use pirq number cached in pci device msi msg data

2017-01-13 Thread Stefano Stabellini

On Fri, 13 Jan 2017, Dan Streetman wrote:
> On Wed, Jan 11, 2017 at 6:25 PM, Dan Streetman  wrote:
> > On Wed, Jan 11, 2017 at 1:46 PM, Stefano Stabellini
> >  wrote:
> >> On Wed, 11 Jan 2017, Dan Streetman wrote:
> >>> On Tue, Jan 10, 2017 at 8:25 PM, Stefano Stabellini
> >>>  wrote:
> >>> > On Tue, 10 Jan 2017, Dan Streetman wrote:
> >>> >> On Tue, Jan 10, 2017 at 2:03 PM, Stefano Stabellini
> >>> >>  wrote:
> >>> >> > On Tue, 10 Jan 2017, Dan Streetman wrote:
> >>> >> >> On Tue, Jan 10, 2017 at 10:57 AM, Dan Streetman  
> >>> >> >> wrote:
> >>> >> >> > On Mon, Jan 9, 2017 at 2:30 PM, Stefano Stabellini
> >>> >> >> >  wrote:
> >>> >> >> >> On Mon, 9 Jan 2017, Konrad Rzeszutek Wilk wrote:
> >>> >> >> >>> On Mon, Jan 09, 2017 at 10:42:41AM -0500, Dan Streetman wrote:
> >>> >> >> >>> > On Mon, Jan 9, 2017 at 9:59 AM, Boris Ostrovsky
> >>> >> >> >>> >  wrote:
> >>> >> >> >>> > > On 01/06/2017 08:06 PM, Konrad Rzeszutek Wilk wrote:
> >>> >> >> >>> > >> On Thu, Jan 05, 2017 at 02:28:56PM -0500, Dan Streetman 
> >>> >> >> >>> > >> wrote:
> >>> >> >> >>> > >>> Do not read a pci device's msi message data to see if a 
> >>> >> >> >>> > >>> pirq was
> >>> >> >> >>> > >>> previously configured for the device's msi/msix, as the 
> >>> >> >> >>> > >>> old pirq was
> >>> >> >> >>> > >>> unmapped and may now be in use by another pci device.  
> >>> >> >> >>> > >>> The previous
> >>> >> >> >>> > >>> pirq should never be re-used; instead a new pirq should 
> >>> >> >> >>> > >>> always be
> >>> >> >> >>> > >>> allocated from the hypervisor.
> >>> >> >> >>> > >> Won't this cause a starvation problem? That is we will run 
> >>> >> >> >>> > >> out of PIRQs
> >>> >> >> >>> > >> as we are not reusing them?
> >>> >> >> >>> > >
> >>> >> >> >>> > > Don't we free the pirq when we unmap it?
> >>> >> >> >>> >
> >>> >> >> >>> > I think this is actually a bit worse than I initially 
> >>> >> >> >>> > thought.  After
> >>> >> >> >>> > looking a bit closer, and I think there's an asymmetry with 
> >>> >> >> >>> > pirq
> >>> >> >> >>> > allocation:
> >>> >> >> >>>
> >>> >> >> >>> Lets include Stefano,
> >>> >> >> >>>
> >>> >> >> >>> Thank you for digging in this! This has quite the deja-vu
> >>> >> >> >>> feeling as I believe I hit this at some point in the past and
> >>> >> >> >>> posted some possible ways of fixing this. But sadly I can't
> >>> >> >> >>> find the thread.
> >>> >> >> >>
> >>> >> >> >> This issue seems to be caused by:
> >>> >> >> >>
> >>> >> >> >> commit af42b8d12f8adec6711cb824549a0edac6a4ae8f
> >>> >> >> >> Author: Stefano Stabellini 
> >>> >> >> >> Date:   Wed Dec 1 14:51:44 2010 +
> >>> >> >> >>
> >>> >> >> >> xen: fix MSI setup and teardown for PV on HVM guests
> >>> >> >> >>
> >>> >> >> >> which was a fix to a bug:
> >>> >> >> >>
> >>> >> >> >> This fixes a bug in xen_hvm_setup_msi_irqs that manifests 
> >>> >> >> >> itself when
> >>> >> >> >> trying to enable the same MSI for the second time: the old 
> >>> >> >> >> MSI to pirq
> >>> >> >> >> mapping is still valid at this point but 
> >>> >> >> >> xen_hvm_setup_msi_irqs would
> >>> >> >> >> try to assign a new pirq anyway.
> >>> >> >> >> A simple way to reproduce this bug is to assign an MSI 
> >>> >> >> >> capable network
> >>> >> >> >> card to a PV on HVM guest, if the user brings down the 
> >>> >> >> >> corresponding
> >>> >> >> >> ethernet interface and up again, Linux would fail to enable 
> >>> >> >> >> MSIs on the
> >>> >> >> >> device.
> >>> >> >> >>
> >>> >> >> >> I don't remember any of the details. From the description of 
> >>> >> >> >> this bug,
> >>> >> >> >> it seems that Xen changed behavior in the past few years: before 
> >>> >> >> >> it used
> >>> >> >> >> to keep the pirq-MSI mapping, while today it doesn't. If I wrote 
> >>> >> >> >> "the
> >>> >> >> >> old MSI to pirq mapping is still valid at this point", the pirq 
> >>> >> >> >> couldn't
> >>> >> >> >> have been completely freed, then reassigned to somebody else the 
> >>> >> >> >> way it
> >>> >> >> >> is described in this email.
> >>> >> >> >>
> >>> >> >> >> I think we should indentify the changeset or Xen version that 
> >>> >> >> >> introduced
> >>> >> >> >> the new behavior. If it is old enough, we might be able to just 
> >>> >> >> >> revert
> >>> >> >> >> af42b8d12f8adec6711cb824549a0edac6a4ae8f. Otherwise we could 
> >>> >> >> >> make the
> >>> >> >> >> behavior conditional to the Xen version.
> >>> >> >> >
> >>> >> >> > Are PT devices the only MSI-capable devices available in a Xen 
> >>> >> >> > guest?
> >>> >> >> > That's where I'm seeing this problem, with PT NVMe devices.
> >>> >> >
> >>> >> > They are the main ones. It is possible to have emulated virtio 
> >>> >> > devices
> >>> >> > with emulated MSIs, for example virtio-net. Althought they are not in
> >>> >> > the Xen Project CI-loop, so I wouldn't be surprised if they are 
> >>> >> > broken
> >>> >> > too.
> >>> >> >
> >>> >> >
> >>> >> >> > I can say that on the Xen guest with

Re: [PATCH] flow disector: check if arp_eth is null rather than arp

2017-01-13 Thread Eric Dumazet

On Fri, 2017-01-13 at 18:25 +, Colin Ian King wrote:
> On 13/01/17 18:24, Eric Dumazet wrote:

> > It looks that we try very hard to add critical bugs in flow dissector.
> > 
> > This is embarrassing really.
> > 
> > I am questioning if the __skb_header_pointer() is correct
> > 
> > Why using hlen - sizeof(_arp) ?
> > 
> >arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
> >   sizeof(_arp_eth), data,
> >   hlen - sizeof(_arp),
> >   &_arp_eth);
> > 
> 
> Yep, the sizeof maybe dubious too, I overlooked that one; if somebody
> can clarify that then I'll send a V2 if it needs fixing up too.

I am pretty sure we should use hlen instead of (hlen - sizeof(_arp))

A V2 would be nice ;)

Re: [PATCH] flow disector: check if arp_eth is null rather than arp

2017-01-13 Thread Eric Dumazet

On Fri, 2017-01-13 at 18:25 +, Colin Ian King wrote:
> On 13/01/17 18:24, Eric Dumazet wrote:

> > It looks that we try very hard to add critical bugs in flow dissector.
> > 
> > This is embarrassing really.
> > 
> > I am questioning if the __skb_header_pointer() is correct
> > 
> > Why using hlen - sizeof(_arp) ?
> > 
> >arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
> >   sizeof(_arp_eth), data,
> >   hlen - sizeof(_arp),
> >   &_arp_eth);
> > 
> 
> Yep, the sizeof maybe dubious too, I overlooked that one; if somebody
> can clarify that then I'll send a V2 if it needs fixing up too.

I am pretty sure we should use hlen instead of (hlen - sizeof(_arp))

A V2 would be nice ;)

Re: [PATCH] flow disector: check if arp_eth is null rather than arp

2017-01-13 Thread Eric Dumazet

On Fri, 2017-01-13 at 13:34 +, Colin King wrote:
> From: Colin Ian King 
> 
> arp is being checked instead of arp_eth to see if the call to
> __skb_header_pointer failed. Fix this by checking arp_eth is
> null instead of arp.
> 
> CoverityScan CID#1396428 ("Logically dead code") on 2nd
> arp comparison (which should be arp_eth instead).
> 
> Fixes: commit 55733350e5e8b70c5 ("flow disector: ARP support")
> Signed-off-by: Colin Ian King 
> ---
>  net/core/flow_dissector.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index e3dffc7..fec48e9 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -409,7 +409,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>  sizeof(_arp_eth), data,
>  hlen - sizeof(_arp),
>  &_arp_eth);
> - if (!arp)
> + if (!arp_eth)
>   goto out_bad;
>  
>   if (dissector_uses_key(flow_dissector,

It looks that we try very hard to add critical bugs in flow dissector.

This is embarrassing really.

I am questioning if the __skb_header_pointer() is correct

Why using hlen - sizeof(_arp) ?

   arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
  sizeof(_arp_eth), data,
  hlen - sizeof(_arp),
  &_arp_eth);

Re: [PATCH] flow disector: check if arp_eth is null rather than arp

2017-01-13 Thread Eric Dumazet

On Fri, 2017-01-13 at 13:34 +, Colin King wrote:
> From: Colin Ian King 
> 
> arp is being checked instead of arp_eth to see if the call to
> __skb_header_pointer failed. Fix this by checking arp_eth is
> null instead of arp.
> 
> CoverityScan CID#1396428 ("Logically dead code") on 2nd
> arp comparison (which should be arp_eth instead).
> 
> Fixes: commit 55733350e5e8b70c5 ("flow disector: ARP support")
> Signed-off-by: Colin Ian King 
> ---
>  net/core/flow_dissector.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index e3dffc7..fec48e9 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -409,7 +409,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>  sizeof(_arp_eth), data,
>  hlen - sizeof(_arp),
>  &_arp_eth);
> - if (!arp)
> + if (!arp_eth)
>   goto out_bad;
>  
>   if (dissector_uses_key(flow_dissector,

It looks that we try very hard to add critical bugs in flow dissector.

This is embarrassing really.

I am questioning if the __skb_header_pointer() is correct

Why using hlen - sizeof(_arp) ?

   arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
  sizeof(_arp_eth), data,
  hlen - sizeof(_arp),
  &_arp_eth);

[PATCH] misc: mic: constify virtio_config_ops structures

2017-01-13 Thread Bhumika Goyal

Declare virtio_config_ops structure as const as it is only stored in the
config field of a virtio_device structure. This field is of type const, so
virtio_config_ops structures having this property can be declared const.
Done using Coccinelle:

@r1 disable optional_qualifier@
identifier i;
position p;
@@
static struct virtio_config_ops i@p={...};

@ok1@
identifier r1.i;
position p;
struct _vop_vdev x;
@@
x.vdev.config=@p

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct virtio_config_ops i;

File size before: drivers/misc/mic/vop/vop_main.o
   textdata bss dec hex filename
   4588 256   0484412ec drivers/misc/mic/vop/vop_main.o

File size after: drivers/misc/mic/vop/vop_main.o
   textdata bss dec hex filename
   4704 160   048641300 drivers/misc/mic/vop/vop_main.o

Signed-off-by: Bhumika Goyal 
---
 drivers/misc/mic/vop/vop_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 1a2b67f3..3b7c8ea 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -422,7 +422,7 @@ static int vop_find_vqs(struct virtio_device *dev, unsigned 
nvqs,
 /*
  * The config ops structure as defined by virtio config
  */
-static struct virtio_config_ops vop_vq_config_ops = {
+static const struct virtio_config_ops vop_vq_config_ops = {
.get_features = vop_get_features,
.finalize_features = vop_finalize_features,
.get = vop_get,
-- 
1.9.1

[PATCH] misc: mic: constify virtio_config_ops structures

2017-01-13 Thread Bhumika Goyal

Declare virtio_config_ops structure as const as it is only stored in the
config field of a virtio_device structure. This field is of type const, so
virtio_config_ops structures having this property can be declared const.
Done using Coccinelle:

@r1 disable optional_qualifier@
identifier i;
position p;
@@
static struct virtio_config_ops i@p={...};

@ok1@
identifier r1.i;
position p;
struct _vop_vdev x;
@@
x.vdev.config=@p

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct virtio_config_ops i;

File size before: drivers/misc/mic/vop/vop_main.o
   textdata bss dec hex filename
   4588 256   0484412ec drivers/misc/mic/vop/vop_main.o

File size after: drivers/misc/mic/vop/vop_main.o
   textdata bss dec hex filename
   4704 160   048641300 drivers/misc/mic/vop/vop_main.o

Signed-off-by: Bhumika Goyal 
---
 drivers/misc/mic/vop/vop_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 1a2b67f3..3b7c8ea 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -422,7 +422,7 @@ static int vop_find_vqs(struct virtio_device *dev, unsigned 
nvqs,
 /*
  * The config ops structure as defined by virtio config
  */
-static struct virtio_config_ops vop_vq_config_ops = {
+static const struct virtio_config_ops vop_vq_config_ops = {
.get_features = vop_get_features,
.finalize_features = vop_finalize_features,
.get = vop_get,
-- 
1.9.1

RE: [PATCH v2 00/26] IB: Optimize DMA mapping

2017-01-13 Thread Estrin, Alex

Hi Bart,

This series applied to 4.10-rc3 breaks hfi1 sdma engines.
[   34.712343] hfi1 :82:00.0: hfi1_0: SDMA (0) engine error: 0x21 state 
s50_HwHaltWait
[   34.722752] hfi1 :82:00.0: hfi1_0: SDMA (0) descq_head: 0 descq_tail: 3 
freecnt: 2044 FLE 0
[   34.733933] hfi1 :82:00.0: hfi1_0: SDMA sdmadesc[0]: flags:--F- 
addr:0x8808248c8000 gen:0 len:36 bytes
[   34.746595] hfi1 :82:00.0: hfi1_0:   desc0:0x80248808248c8000 desc1 
0x
[   34.757218] hfi1 :82:00.0: hfi1_0:   aidx: 0 amode: 0 alen: 0
[   34.765448] hfi1 :82:00.0: hfi1_0: SDMA sdmadesc[1]: flags: 
addr:0x880829fd5000 gen:0 len:24 bytes
[   34.778184] hfi1 :82:00.0: hfi1_0:   desc0:0x0018880829fd5000 desc1 
0x
[   34.788941] hfi1 :82:00.0: hfi1_0: SDMA sdmadesc[2]: flags:-H-L 
addr:0x880829fd5018 gen:0 len:440 bytes
[   34.801849] hfi1 :82:00.0: hfi1_0:   desc0:0x41b8880829fd5018 desc1 
0x0001
...etc

BTW, what was the base for this series, as 19/26 did not apply cleanly to 
4.10-rc3?

Thanks,
Alex.

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Bart Van Assche
> Sent: Thursday, January 12, 2017 2:07 PM
> To: Doug Ledford 
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org; Greg 
> Kroah-Hartman
> ; Bart Van Assche 
> Subject: [PATCH v2 00/26] IB: Optimize DMA mapping
> 
> Hello Doug,
> 
> As you know there are two sets of DMA mapping operations in the Linux
> kernel:
> - One set of DMA mapping operations that is used by most drivers.
> - Another set of DMA mapping operations that is only used by the RDMA
>   drivers.
> Having two sets of DMA mapping operations is not only a source of
> confusion but also a source of unnecessary overhead. The DMA mapping
> operations are in the hot path so it is important that the overhead
> of these operations is as low as possible. Hence this patch series
> that converts the RDMA code to the standard DMA mapping API and
> thereby eliminates the if (dev->dma_ops) test from the hot path. An
> additional benefit is that the size of HW and SW drivers that do not
> use DMA is reduced by switching to dma_virt_ops.
> 
> The changes compared to version 1 of this patch series are:
> - Patch "Move dma_ops from archdata into struct device" has been
>   split into three patches.
> - Patch "treewide: Inline ib_dma_map_*() functions" has been split
>   into 15 patches (one per driver).
> - A patch has been added that builds dma_noop_ops only for the
>   architectures that need it.
> - The new dma_virt_ops is only built if it is used by a driver.
> - In these last 15 patches indentation has been adjusted to keep
>   the arguments aligned with the opening parenthesis.
> 
> Bart Van Assche (26):
>   treewide: Constify most dma_map_ops structures
>   treewide: Move dma_ops from struct dev_archdata into struct device
>   treewide: Consolidate set_dma_ops() implementations
>   treewide: Consolidate get_dma_ops() implementations
>   lib/dma-noop: Clarify a comment
>   lib/dma-noop: Only build dma_noop_ops for m32r and s390
>   lib/dma-virt: Add dma_virt_ops
>   IB/hf1: Remove DMA mapping code
>   IB/qib: Remove DMA mapping code
>   IB: Use dma_virt_ops instead of duplicating it
>   RDS: IB: Remove an unused structure member
>   IB: Convert ib_dma_*_coherent() argument type from u64 into dma_addr_t
>   IB/core: Inline ib_dma_map_*() functions
>   IB/mlx4: Inline ib_dma_map_*() functions
>   IB/mlx5: Inline ib_dma_map_*() functions
>   IB/IPoIB: Inline ib_dma_map_*() functions
>   IB/iser: Inline ib_dma_map_*() functions
>   IB/isert: Inline ib_dma_map_*() functions
>   IB/srp: Inline ib_dma_map_*() functions
>   IB/srpt: Inline ib_dma_map_*() functions
>   staging/lustre: Inline ib_dma_map_*() functions
>   nvme-rdma: Inline ib_dma_map_*() functions
>   net/9p: Inline ib_dma_map_*() functions
>   net/rds: Inline ib_dma_map_*() functions
>   xprtrdma: Inline ib_dma_map_*() functions
>   IB/core: Remove ib_dma_map_*() functions
> 
>  arch/alpha/include/asm/dma-mapping.h   |   4 +-
>  arch/alpha/kernel/pci-noop.c   |   4 +-
>  arch/alpha/kernel/pci_iommu.c  |   4 +-
>  arch/arc/include/asm/dma-mapping.h |   4 +-
>  arch/arc/mm/dma.c  |   2 +-
>  arch/arm/common/dmabounce.c|   2 +-
>  arch/arm/include/asm/device.h  |   1 -
>  arch/arm/include/asm/dma-mapping.h |  20 +-
>  arch/arm/mm/dma-mapping.c  |  22 +-
>  arch/arm/xen/mm.c  |   4 +-
>  arch/arm64/include/asm/device.h|   1 -
>  arch/arm64/include/asm/dma-mapping.h   |  12 +-
>  arch/arm64/mm/dma-mapping.c|  14 +-
>

RE: [PATCH v2 00/26] IB: Optimize DMA mapping

2017-01-13 Thread Estrin, Alex

Hi Bart,

This series applied to 4.10-rc3 breaks hfi1 sdma engines.
[   34.712343] hfi1 :82:00.0: hfi1_0: SDMA (0) engine error: 0x21 state 
s50_HwHaltWait
[   34.722752] hfi1 :82:00.0: hfi1_0: SDMA (0) descq_head: 0 descq_tail: 3 
freecnt: 2044 FLE 0
[   34.733933] hfi1 :82:00.0: hfi1_0: SDMA sdmadesc[0]: flags:--F- 
addr:0x8808248c8000 gen:0 len:36 bytes
[   34.746595] hfi1 :82:00.0: hfi1_0:   desc0:0x80248808248c8000 desc1 
0x
[   34.757218] hfi1 :82:00.0: hfi1_0:   aidx: 0 amode: 0 alen: 0
[   34.765448] hfi1 :82:00.0: hfi1_0: SDMA sdmadesc[1]: flags: 
addr:0x880829fd5000 gen:0 len:24 bytes
[   34.778184] hfi1 :82:00.0: hfi1_0:   desc0:0x0018880829fd5000 desc1 
0x
[   34.788941] hfi1 :82:00.0: hfi1_0: SDMA sdmadesc[2]: flags:-H-L 
addr:0x880829fd5018 gen:0 len:440 bytes
[   34.801849] hfi1 :82:00.0: hfi1_0:   desc0:0x41b8880829fd5018 desc1 
0x0001
...etc

BTW, what was the base for this series, as 19/26 did not apply cleanly to 
4.10-rc3?

Thanks,
Alex.

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Bart Van Assche
> Sent: Thursday, January 12, 2017 2:07 PM
> To: Doug Ledford 
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org; Greg 
> Kroah-Hartman
> ; Bart Van Assche 
> Subject: [PATCH v2 00/26] IB: Optimize DMA mapping
> 
> Hello Doug,
> 
> As you know there are two sets of DMA mapping operations in the Linux
> kernel:
> - One set of DMA mapping operations that is used by most drivers.
> - Another set of DMA mapping operations that is only used by the RDMA
>   drivers.
> Having two sets of DMA mapping operations is not only a source of
> confusion but also a source of unnecessary overhead. The DMA mapping
> operations are in the hot path so it is important that the overhead
> of these operations is as low as possible. Hence this patch series
> that converts the RDMA code to the standard DMA mapping API and
> thereby eliminates the if (dev->dma_ops) test from the hot path. An
> additional benefit is that the size of HW and SW drivers that do not
> use DMA is reduced by switching to dma_virt_ops.
> 
> The changes compared to version 1 of this patch series are:
> - Patch "Move dma_ops from archdata into struct device" has been
>   split into three patches.
> - Patch "treewide: Inline ib_dma_map_*() functions" has been split
>   into 15 patches (one per driver).
> - A patch has been added that builds dma_noop_ops only for the
>   architectures that need it.
> - The new dma_virt_ops is only built if it is used by a driver.
> - In these last 15 patches indentation has been adjusted to keep
>   the arguments aligned with the opening parenthesis.
> 
> Bart Van Assche (26):
>   treewide: Constify most dma_map_ops structures
>   treewide: Move dma_ops from struct dev_archdata into struct device
>   treewide: Consolidate set_dma_ops() implementations
>   treewide: Consolidate get_dma_ops() implementations
>   lib/dma-noop: Clarify a comment
>   lib/dma-noop: Only build dma_noop_ops for m32r and s390
>   lib/dma-virt: Add dma_virt_ops
>   IB/hf1: Remove DMA mapping code
>   IB/qib: Remove DMA mapping code
>   IB: Use dma_virt_ops instead of duplicating it
>   RDS: IB: Remove an unused structure member
>   IB: Convert ib_dma_*_coherent() argument type from u64 into dma_addr_t
>   IB/core: Inline ib_dma_map_*() functions
>   IB/mlx4: Inline ib_dma_map_*() functions
>   IB/mlx5: Inline ib_dma_map_*() functions
>   IB/IPoIB: Inline ib_dma_map_*() functions
>   IB/iser: Inline ib_dma_map_*() functions
>   IB/isert: Inline ib_dma_map_*() functions
>   IB/srp: Inline ib_dma_map_*() functions
>   IB/srpt: Inline ib_dma_map_*() functions
>   staging/lustre: Inline ib_dma_map_*() functions
>   nvme-rdma: Inline ib_dma_map_*() functions
>   net/9p: Inline ib_dma_map_*() functions
>   net/rds: Inline ib_dma_map_*() functions
>   xprtrdma: Inline ib_dma_map_*() functions
>   IB/core: Remove ib_dma_map_*() functions
> 
>  arch/alpha/include/asm/dma-mapping.h   |   4 +-
>  arch/alpha/kernel/pci-noop.c   |   4 +-
>  arch/alpha/kernel/pci_iommu.c  |   4 +-
>  arch/arc/include/asm/dma-mapping.h |   4 +-
>  arch/arc/mm/dma.c  |   2 +-
>  arch/arm/common/dmabounce.c|   2 +-
>  arch/arm/include/asm/device.h  |   1 -
>  arch/arm/include/asm/dma-mapping.h |  20 +-
>  arch/arm/mm/dma-mapping.c  |  22 +-
>  arch/arm/xen/mm.c  |   4 +-
>  arch/arm64/include/asm/device.h|   1 -
>  arch/arm64/include/asm/dma-mapping.h   |  12 +-
>  arch/arm64/mm/dma-mapping.c|  14 +-
>  arch/avr32/include/asm/dma-mapping.h   |   4 +-
>  arch/avr32/mm/dma-coherent.c

Re: [PATCH v2] partially revert "xen: Remove event channel notification through Xen PCI platform device"

2017-01-13 Thread Stefano Stabellini

On Fri, 13 Jan 2017, Boris Ostrovsky wrote:
> On 01/12/2017 04:39 PM, Stefano Stabellini wrote:
> > The following commit:
> >
> > commit 72a9b186292d98494f26cfd24a1621796209
> > Author: KarimAllah Ahmed 
> > Date:   Fri Aug 26 23:55:36 2016 +0200
> >
> > xen: Remove event channel notification through Xen PCI platform device
> >
> > broke Linux when booting as Dom0 on Xen in a nested Xen environment (Xen
> > installed inside a Xen VM). In this scenario, Linux is a PV guest, but
> > at the same time it uses the platform-pci driver to receive
> > notifications from L0 Xen. vector callbacks are not available because L1
> > Xen doesn't allow them.
> >
> > Partially revert the offending commit, by restoring IRQ based
> > notifications for PV guests only. I restored only the code which is
> > strictly needed and replaced the xen_have_vector_callback checks within
> > it with xen_pv_domain() checks.
> >
> > Signed-off-by: Stefano Stabellini 
> 
> Reviewed-by: Boris Ostrovsky 

Applied to xentip/for-linus-4.10, improving the commit message as
suggested before.

Do you plan on sending out another pull request for 4.10?

Re: [PATCH v2] partially revert "xen: Remove event channel notification through Xen PCI platform device"

2017-01-13 Thread Stefano Stabellini

On Fri, 13 Jan 2017, Boris Ostrovsky wrote:
> On 01/12/2017 04:39 PM, Stefano Stabellini wrote:
> > The following commit:
> >
> > commit 72a9b186292d98494f26cfd24a1621796209
> > Author: KarimAllah Ahmed 
> > Date:   Fri Aug 26 23:55:36 2016 +0200
> >
> > xen: Remove event channel notification through Xen PCI platform device
> >
> > broke Linux when booting as Dom0 on Xen in a nested Xen environment (Xen
> > installed inside a Xen VM). In this scenario, Linux is a PV guest, but
> > at the same time it uses the platform-pci driver to receive
> > notifications from L0 Xen. vector callbacks are not available because L1
> > Xen doesn't allow them.
> >
> > Partially revert the offending commit, by restoring IRQ based
> > notifications for PV guests only. I restored only the code which is
> > strictly needed and replaced the xen_have_vector_callback checks within
> > it with xen_pv_domain() checks.
> >
> > Signed-off-by: Stefano Stabellini 
> 
> Reviewed-by: Boris Ostrovsky 

Applied to xentip/for-linus-4.10, improving the commit message as
suggested before.

Do you plan on sending out another pull request for 4.10?

Re: [PATCH] flow disector: check if arp_eth is null rather than arp

2017-01-13 Thread Colin Ian King

On 13/01/17 18:24, Eric Dumazet wrote:
> On Fri, 2017-01-13 at 13:34 +, Colin King wrote:
>> From: Colin Ian King 
>>
>> arp is being checked instead of arp_eth to see if the call to
>> __skb_header_pointer failed. Fix this by checking arp_eth is
>> null instead of arp.
>>
>> CoverityScan CID#1396428 ("Logically dead code") on 2nd
>> arp comparison (which should be arp_eth instead).
>>
>> Fixes: commit 55733350e5e8b70c5 ("flow disector: ARP support")
>> Signed-off-by: Colin Ian King 
>> ---
>>  net/core/flow_dissector.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> index e3dffc7..fec48e9 100644
>> --- a/net/core/flow_dissector.c
>> +++ b/net/core/flow_dissector.c
>> @@ -409,7 +409,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> sizeof(_arp_eth), data,
>> hlen - sizeof(_arp),
>> &_arp_eth);
>> -if (!arp)
>> +if (!arp_eth)
>>  goto out_bad;
>>  
>>  if (dissector_uses_key(flow_dissector,
> 
> It looks that we try very hard to add critical bugs in flow dissector.
> 
> This is embarrassing really.
> 
> I am questioning if the __skb_header_pointer() is correct
> 
> Why using hlen - sizeof(_arp) ?
> 
>arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
>   sizeof(_arp_eth), data,
>   hlen - sizeof(_arp),
>   &_arp_eth);
> 

Yep, the sizeof maybe dubious too, I overlooked that one; if somebody
can clarify that then I'll send a V2 if it needs fixing up too.

Colin

Re: [PATCH] flow disector: check if arp_eth is null rather than arp

2017-01-13 Thread Colin Ian King

On 13/01/17 18:24, Eric Dumazet wrote:
> On Fri, 2017-01-13 at 13:34 +, Colin King wrote:
>> From: Colin Ian King 
>>
>> arp is being checked instead of arp_eth to see if the call to
>> __skb_header_pointer failed. Fix this by checking arp_eth is
>> null instead of arp.
>>
>> CoverityScan CID#1396428 ("Logically dead code") on 2nd
>> arp comparison (which should be arp_eth instead).
>>
>> Fixes: commit 55733350e5e8b70c5 ("flow disector: ARP support")
>> Signed-off-by: Colin Ian King 
>> ---
>>  net/core/flow_dissector.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> index e3dffc7..fec48e9 100644
>> --- a/net/core/flow_dissector.c
>> +++ b/net/core/flow_dissector.c
>> @@ -409,7 +409,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>> sizeof(_arp_eth), data,
>> hlen - sizeof(_arp),
>> &_arp_eth);
>> -if (!arp)
>> +if (!arp_eth)
>>  goto out_bad;
>>  
>>  if (dissector_uses_key(flow_dissector,
> 
> It looks that we try very hard to add critical bugs in flow dissector.
> 
> This is embarrassing really.
> 
> I am questioning if the __skb_header_pointer() is correct
> 
> Why using hlen - sizeof(_arp) ?
> 
>arp_eth = __skb_header_pointer(skb, nhoff + sizeof(_arp),
>   sizeof(_arp_eth), data,
>   hlen - sizeof(_arp),
>   &_arp_eth);
> 

Yep, the sizeof maybe dubious too, I overlooked that one; if somebody
can clarify that then I'll send a V2 if it needs fixing up too.

Colin

[PATCH] perf script: Also allow forcing reading of non-root owned files by root

2017-01-13 Thread Yannick Brosseau

In 2059fc7a5a9e6677, perf report was added the option of forcing reading
of non-root owned symbol file.

This add the same behavior for perf script.

Reported-by: Mark Drayton 
Signed-off-by: Yannick Brosseau 
---
 tools/perf/builtin-script.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 2f3ff69fc4e7..c0783b4f7b6c 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2180,7 +2180,7 @@ int cmd_script(int argc, const char **argv, const char 
*prefix __maybe_unused)
"Show the mmap events"),
OPT_BOOLEAN('\0', "show-switch-events", _switch_events,
"Show context switch events (if recorded)"),
-   OPT_BOOLEAN('f', "force", , "don't complain, do it"),
+   OPT_BOOLEAN('f', "force", _conf.force, "don't complain, do it"),
OPT_BOOLEAN(0, "ns", ,
"Use 9 decimal places when displaying time"),
OPT_CALLBACK_OPTARG(0, "itrace", _synth_opts, NULL, "opts",
@@ -2212,6 +2212,7 @@ int cmd_script(int argc, const char **argv, const char 
*prefix __maybe_unused)
 PARSE_OPT_STOP_AT_NON_OPTION);
 
file.path = input_name;
+   file.force = symbol_conf.force;
 
if (argc > 1 && !strncmp(argv[0], "rec", strlen("rec"))) {
rec_script_path = get_script_path(argv[1], RECORD_SUFFIX);
-- 
2.11.0

[PATCH] perf script: Also allow forcing reading of non-root owned files by root

2017-01-13 Thread Yannick Brosseau

In 2059fc7a5a9e6677, perf report was added the option of forcing reading
of non-root owned symbol file.

This add the same behavior for perf script.

Reported-by: Mark Drayton 
Signed-off-by: Yannick Brosseau 
---
 tools/perf/builtin-script.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 2f3ff69fc4e7..c0783b4f7b6c 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2180,7 +2180,7 @@ int cmd_script(int argc, const char **argv, const char 
*prefix __maybe_unused)
"Show the mmap events"),
OPT_BOOLEAN('\0', "show-switch-events", _switch_events,
"Show context switch events (if recorded)"),
-   OPT_BOOLEAN('f', "force", , "don't complain, do it"),
+   OPT_BOOLEAN('f', "force", _conf.force, "don't complain, do it"),
OPT_BOOLEAN(0, "ns", ,
"Use 9 decimal places when displaying time"),
OPT_CALLBACK_OPTARG(0, "itrace", _synth_opts, NULL, "opts",
@@ -2212,6 +2212,7 @@ int cmd_script(int argc, const char **argv, const char 
*prefix __maybe_unused)
 PARSE_OPT_STOP_AT_NON_OPTION);
 
file.path = input_name;
+   file.force = symbol_conf.force;
 
if (argc > 1 && !strncmp(argv[0], "rec", strlen("rec"))) {
rec_script_path = get_script_path(argv[1], RECORD_SUFFIX);
-- 
2.11.0

[GIT PULL] arm64 fixes for 4.10-rc4

2017-01-13 Thread Catalin Marinas

Hi Linus,

Please pull the arm64 fixes below. Thanks.

The following changes since commit a121103c922847ba5010819a3f250f1f7fc84ab8:

  Linux 4.10-rc3 (2017-01-08 14:18:17 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux tags/arm64-fixes

for you to fetch changes up to 41c066f2c4d436c535616fe182331766c57838f0:

  arm64: assembler: make adr_l work in modules under KASLR (2017-01-12 18:10:52 
+)


- Fix huge_ptep_set_access_flags() to return "changed" when any of the
  ptes in the contiguous range is changed, not just the last one

- Fix the adr_l assembly macro to work in modules under KASLR


Ard Biesheuvel (1):
  arm64: assembler: make adr_l work in modules under KASLR

Huang Shijie (1):
  arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags

 arch/arm64/include/asm/assembler.h | 36 +++-
 arch/arm64/mm/hugetlbpage.c|  2 +-
 2 files changed, 28 insertions(+), 10 deletions(-)

-- 
Catalin

[GIT PULL] arm64 fixes for 4.10-rc4

2017-01-13 Thread Catalin Marinas

Hi Linus,

Please pull the arm64 fixes below. Thanks.

The following changes since commit a121103c922847ba5010819a3f250f1f7fc84ab8:

  Linux 4.10-rc3 (2017-01-08 14:18:17 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux tags/arm64-fixes

for you to fetch changes up to 41c066f2c4d436c535616fe182331766c57838f0:

  arm64: assembler: make adr_l work in modules under KASLR (2017-01-12 18:10:52 
+)


- Fix huge_ptep_set_access_flags() to return "changed" when any of the
  ptes in the contiguous range is changed, not just the last one

- Fix the adr_l assembly macro to work in modules under KASLR


Ard Biesheuvel (1):
  arm64: assembler: make adr_l work in modules under KASLR

Huang Shijie (1):
  arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags

 arch/arm64/include/asm/assembler.h | 36 +++-
 arch/arm64/mm/hugetlbpage.c|  2 +-
 2 files changed, 28 insertions(+), 10 deletions(-)

-- 
Catalin

Re: [PATCH v3] ARM: dts: Add LEGO MINDSTORMS EV3 dts

2017-01-13 Thread David Lechner


On 01/13/2017 12:16 PM, David Lechner wrote:

On 01/13/2017 06:04 AM, Sekhar Nori wrote:

On Friday 13 January 2017 02:02 AM, David Lechner wrote:

This adds a device tree definition file for LEGO MINDSTORMS EV3.


...


There are couple of checkpatch errors that show up. The compatible
"lego,ev3" needs to be documented in
Documentation/devicetree/bindings/arm/davinci.txt

"at24,24c128" is undocumented. Is that an atmel chip on the EV3? If the
manufacturer name is not clear,
Documentation/devicetree/bindings/eeprom/eeprom.txt advises using just
"24c128"


The mfg is microchip, so I guess I will just go with "24c128"


I think I have let the device tree bindings doc confuse me. On second 
though, I think the correct thing is to included the manufacturer.






Finally, lego needs to be added to
Documentation/devicetree/bindings/vendor-prefixes.txt

Can you please submit the documentation portions as separate patches in
a series along with this patch.


Yes.



Thanks,
Sekhar

Re: [PATCH v3] ARM: dts: Add LEGO MINDSTORMS EV3 dts

2017-01-13 Thread David Lechner


On 01/13/2017 12:16 PM, David Lechner wrote:

On 01/13/2017 06:04 AM, Sekhar Nori wrote:

On Friday 13 January 2017 02:02 AM, David Lechner wrote:

This adds a device tree definition file for LEGO MINDSTORMS EV3.


...


There are couple of checkpatch errors that show up. The compatible
"lego,ev3" needs to be documented in
Documentation/devicetree/bindings/arm/davinci.txt

"at24,24c128" is undocumented. Is that an atmel chip on the EV3? If the
manufacturer name is not clear,
Documentation/devicetree/bindings/eeprom/eeprom.txt advises using just
"24c128"


The mfg is microchip, so I guess I will just go with "24c128"


I think I have let the device tree bindings doc confuse me. On second 
though, I think the correct thing is to included the manufacturer.






Finally, lego needs to be added to
Documentation/devicetree/bindings/vendor-prefixes.txt

Can you please submit the documentation portions as separate patches in
a series along with this patch.


Yes.



Thanks,
Sekhar

Re: [PATCH 0/3] tty: serial: 8250_omap: Enable DMA support

2017-01-13 Thread Tony Lindgren

* Vignesh R  [170113 00:03]:
> This patch series re enables DMA support for UART 8250_omap driver.
> 
> Tested on AM335x, AM437x that use EDMA and OMAP5 and DRA74 EVM with
> SDMA.

Is 8250_omap serial console working for you on omap5 in general?

I've noticed that it's really unresponsive for me as if the FIFO
interrupt was not working. For example logging in might take several
attempts and a long time with each character showing up much later
after some timeout.

TX on the uart on omap5 seems to work OK, I do see all the console
messages fine. And these patches do not make the issue better or
worse.

Regards,

Tony

Re: [PATCH 0/3] tty: serial: 8250_omap: Enable DMA support

2017-01-13 Thread Tony Lindgren

* Vignesh R  [170113 00:03]:
> This patch series re enables DMA support for UART 8250_omap driver.
> 
> Tested on AM335x, AM437x that use EDMA and OMAP5 and DRA74 EVM with
> SDMA.

Is 8250_omap serial console working for you on omap5 in general?

I've noticed that it's really unresponsive for me as if the FIFO
interrupt was not working. For example logging in might take several
attempts and a long time with each character showing up much later
after some timeout.

TX on the uart on omap5 seems to work OK, I do see all the console
messages fine. And these patches do not make the issue better or
worse.

Regards,

Tony

Re: [RFC PATCH v2 09/10] drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension

2017-01-13 Thread Kim Phillips

On Fri, 13 Jan 2017 17:03:07 +
Will Deacon  wrote:

> On Fri, Jan 13, 2017 at 10:40:42AM -0600, Kim Phillips wrote:
> > On Fri, 13 Jan 2017 16:03:48 +
> > Will Deacon  wrote:
> > 
> > > +#define DRVNAME  "arm_spe_pmu"
> > 
> > PMU is implied.  "arm_spe"?
> 
> As stated before, I'm going for consistency here.

me too, but apparently under the user-visible interface domain rather
than the driver source path domain.

> Is it causing any
> real issues on the tooling side?

Intel has a consistent "intel_pt", "intel_bts", and 'pmu' occurs
nowhere in their nomenclature.

Whether good or bad, we currently have "cs_etm".  This patch now gives
us "arm_spe_pmu".  I'm just trying to save the suffix consistency for
now, esp. since IDK how amenable "cs_etm" is to change, and 'perf list'
calls things "PMU event"s anyway.

I think the root cause might be the device tree node's
"arm,arm-spe-pmu-v1" compatiblity string, which I also find
a bit self-redundant ("arm,arm-"), but I'm not familiar with what's
being denoted there either (e.g., if the latter 'arm-' is an arch
reference, then SPE's might be 'armv8'?).  The device tree node isn't
exposed to the user, however.

> > > + if (is_kernel_in_hyp_mode()) {
> > > + if (attr->exclude_kernel != attr->exclude_hv)
> > > + return -EOPNOTSUPP;
> > > + } else if (!attr->exclude_hv) {
> > > + return -EOPNOTSUPP;
> > > + }
> > > +
> > > + reg = arm_spe_event_to_pmsfcr(event);
> > > + if ((reg & BIT(PMSFCR_EL1_FE_SHIFT)) &&
> > > + !(spe_pmu->features & SPE_PMU_FEAT_FILT_EVT))
> > > + return -EOPNOTSUPP;
> > > +
> > > + if ((reg & BIT(PMSFCR_EL1_FT_SHIFT)) &&
> > > + !(spe_pmu->features & SPE_PMU_FEAT_FILT_TYP))
> > > + return -EOPNOTSUPP;
> > > +
> > > + if ((reg & BIT(PMSFCR_EL1_FL_SHIFT)) &&
> > > + !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
> > > + return -EOPNOTSUPP;
> > > +
> > > + return 0;
> > > +}
> > 
> > Please insert pr_* statements before blindly returning errors before a
> > better facility becomes available.
> 
> That was discussed in the thread I linked to last time:
> 
> https://lkml.org/lkml/2015/8/26/661

ok, thanks for pinpointing the exact message this time.

> and there are good reasons not to add those prints.

Processing that message (indentations are now quoting Peter Zijlstra):

> Not really. That is something that's limited to root. Whereas the
> problem is very much wider than that.

For the purposes of the SPE driver discussion, I'm ok limiting the
context of using the SPE as root.

> If you set one bit wrong in the pretty large perf_event_attr you've got
> a fair chance of getting -EINVAL on trying to create the event. Good
> luck finding what you did wrong.

yes, this is the problem, and the SPE introduces a whole new set of
validity requirements that should be being communicated clearly, e.g.,
its restrictive event frequency specification.

> Any user can create events (for their own tasks), this does not require
> root.

I don't think this is relevant to our discussion.

> Allowing users to flip your @debugging flag would be an insta DoS.

I think this is a reference to the non-root case, and might be mitigated
by either using dynamic or ratelimited pr_ versions if it were.

> Furthermore, its very unfriendly in that you have to (manually) go
> correlate random dmesg output with some program action.

Andrew Morton addresses this, and I did read all other follow-ups and
still conclude that adding pr_ messages is 1000x better than not, for
the user, and at least for the time being.

Kim

Re: [RFC PATCH v2 09/10] drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension

2017-01-13 Thread Kim Phillips

On Fri, 13 Jan 2017 17:03:07 +
Will Deacon  wrote:

> On Fri, Jan 13, 2017 at 10:40:42AM -0600, Kim Phillips wrote:
> > On Fri, 13 Jan 2017 16:03:48 +
> > Will Deacon  wrote:
> > 
> > > +#define DRVNAME  "arm_spe_pmu"
> > 
> > PMU is implied.  "arm_spe"?
> 
> As stated before, I'm going for consistency here.

me too, but apparently under the user-visible interface domain rather
than the driver source path domain.

> Is it causing any
> real issues on the tooling side?

Intel has a consistent "intel_pt", "intel_bts", and 'pmu' occurs
nowhere in their nomenclature.

Whether good or bad, we currently have "cs_etm".  This patch now gives
us "arm_spe_pmu".  I'm just trying to save the suffix consistency for
now, esp. since IDK how amenable "cs_etm" is to change, and 'perf list'
calls things "PMU event"s anyway.

I think the root cause might be the device tree node's
"arm,arm-spe-pmu-v1" compatiblity string, which I also find
a bit self-redundant ("arm,arm-"), but I'm not familiar with what's
being denoted there either (e.g., if the latter 'arm-' is an arch
reference, then SPE's might be 'armv8'?).  The device tree node isn't
exposed to the user, however.

> > > + if (is_kernel_in_hyp_mode()) {
> > > + if (attr->exclude_kernel != attr->exclude_hv)
> > > + return -EOPNOTSUPP;
> > > + } else if (!attr->exclude_hv) {
> > > + return -EOPNOTSUPP;
> > > + }
> > > +
> > > + reg = arm_spe_event_to_pmsfcr(event);
> > > + if ((reg & BIT(PMSFCR_EL1_FE_SHIFT)) &&
> > > + !(spe_pmu->features & SPE_PMU_FEAT_FILT_EVT))
> > > + return -EOPNOTSUPP;
> > > +
> > > + if ((reg & BIT(PMSFCR_EL1_FT_SHIFT)) &&
> > > + !(spe_pmu->features & SPE_PMU_FEAT_FILT_TYP))
> > > + return -EOPNOTSUPP;
> > > +
> > > + if ((reg & BIT(PMSFCR_EL1_FL_SHIFT)) &&
> > > + !(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
> > > + return -EOPNOTSUPP;
> > > +
> > > + return 0;
> > > +}
> > 
> > Please insert pr_* statements before blindly returning errors before a
> > better facility becomes available.
> 
> That was discussed in the thread I linked to last time:
> 
> https://lkml.org/lkml/2015/8/26/661

ok, thanks for pinpointing the exact message this time.

> and there are good reasons not to add those prints.

Processing that message (indentations are now quoting Peter Zijlstra):

> Not really. That is something that's limited to root. Whereas the
> problem is very much wider than that.

For the purposes of the SPE driver discussion, I'm ok limiting the
context of using the SPE as root.

> If you set one bit wrong in the pretty large perf_event_attr you've got
> a fair chance of getting -EINVAL on trying to create the event. Good
> luck finding what you did wrong.

yes, this is the problem, and the SPE introduces a whole new set of
validity requirements that should be being communicated clearly, e.g.,
its restrictive event frequency specification.

> Any user can create events (for their own tasks), this does not require
> root.

I don't think this is relevant to our discussion.

> Allowing users to flip your @debugging flag would be an insta DoS.

I think this is a reference to the non-root case, and might be mitigated
by either using dynamic or ratelimited pr_ versions if it were.

> Furthermore, its very unfriendly in that you have to (manually) go
> correlate random dmesg output with some program action.

Andrew Morton addresses this, and I did read all other follow-ups and
still conclude that adding pr_ messages is 1000x better than not, for
the user, and at least for the time being.

Kim

Re: [PATCH v3] ARM: dts: Add LEGO MINDSTORMS EV3 dts

2017-01-13 Thread David Lechner


On 01/13/2017 06:04 AM, Sekhar Nori wrote:

On Friday 13 January 2017 02:02 AM, David Lechner wrote:

This adds a device tree definition file for LEGO MINDSTORMS EV3.


...


There are couple of checkpatch errors that show up. The compatible
"lego,ev3" needs to be documented in
Documentation/devicetree/bindings/arm/davinci.txt

"at24,24c128" is undocumented. Is that an atmel chip on the EV3? If the
manufacturer name is not clear,
Documentation/devicetree/bindings/eeprom/eeprom.txt advises using just
"24c128"


The mfg is microchip, so I guess I will just go with "24c128"



Finally, lego needs to be added to
Documentation/devicetree/bindings/vendor-prefixes.txt

Can you please submit the documentation portions as separate patches in
a series along with this patch.


Yes.



Thanks,
Sekhar

Re: [PATCH v3] ARM: dts: Add LEGO MINDSTORMS EV3 dts

2017-01-13 Thread David Lechner


On 01/13/2017 06:04 AM, Sekhar Nori wrote:

On Friday 13 January 2017 02:02 AM, David Lechner wrote:

This adds a device tree definition file for LEGO MINDSTORMS EV3.


...


There are couple of checkpatch errors that show up. The compatible
"lego,ev3" needs to be documented in
Documentation/devicetree/bindings/arm/davinci.txt

"at24,24c128" is undocumented. Is that an atmel chip on the EV3? If the
manufacturer name is not clear,
Documentation/devicetree/bindings/eeprom/eeprom.txt advises using just
"24c128"


The mfg is microchip, so I guess I will just go with "24c128"



Finally, lego needs to be added to
Documentation/devicetree/bindings/vendor-prefixes.txt

Can you please submit the documentation portions as separate patches in
a series along with this patch.


Yes.



Thanks,
Sekhar

Re: [PATCH 17/17] spi/topcliff-pch: One check less in pch_spi_set_tx()

2017-01-13 Thread Geert Uytterhoeven

Hi Markus,

On Fri, Jan 13, 2017 at 6:28 PM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Fri, 13 Jan 2017 17:30:46 +0100
>
> Delete a duplicate check after a bit of exception handling was moved into
> a previous if branch of this function.

This is not equivalent: if data->pkt_tx_buff == NULL, the queue is no longer
flushed.

> Signed-off-by: Markus Elfring 
> ---
>  drivers/spi/spi-topcliff-pch.c | 25 ++---
>  1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/spi/spi-topcliff-pch.c b/drivers/spi/spi-topcliff-pch.c
> index 97fd1ea9826b..33043a830032 100644
> --- a/drivers/spi/spi-topcliff-pch.c
> +++ b/drivers/spi/spi-topcliff-pch.c
> @@ -584,22 +584,25 @@ static void pch_spi_set_tx(struct pch_spi_data *data, 
> int *bpw)
> data->pkt_tx_buff = kzalloc(size, GFP_KERNEL);
> if (data->pkt_tx_buff) {
> data->pkt_rx_buff = kzalloc(size, GFP_KERNEL);
> -   if (!data->pkt_rx_buff)
> +   if (!data->pkt_rx_buff) {
> kfree(data->pkt_tx_buff);
> -   }
>
> -   if (!data->pkt_rx_buff) {
> -   /* flush queue and set status of all transfers to -ENOMEM */
> -   list_for_each_entry_safe(pmsg, tmp, data->queue.next, queue) {
> -   pmsg->status = -ENOMEM;
> +   /*
> +* Flush queue and set status of all transfers
> +* to -ENOMEM.
> +*/
> +   list_for_each_entry_safe(pmsg, tmp, data->queue.next,
> +queue) {
> +   pmsg->status = -ENOMEM;
>
> -   if (pmsg->complete)
> -   pmsg->complete(pmsg->context);
> +   if (pmsg->complete)
> +   pmsg->complete(pmsg->context);
>
> -   /* delete from queue */
> -   list_del_init(>queue);
> +   /* delete from queue */
> +   list_del_init(>queue);
> +   }
> +   return;
> }
> -   return;
> }
>
> /* copy Tx Data */




-- 
Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 17/17] spi/topcliff-pch: One check less in pch_spi_set_tx()

2017-01-13 Thread Geert Uytterhoeven

Hi Markus,

On Fri, Jan 13, 2017 at 6:28 PM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Fri, 13 Jan 2017 17:30:46 +0100
>
> Delete a duplicate check after a bit of exception handling was moved into
> a previous if branch of this function.

This is not equivalent: if data->pkt_tx_buff == NULL, the queue is no longer
flushed.

> Signed-off-by: Markus Elfring 
> ---
>  drivers/spi/spi-topcliff-pch.c | 25 ++---
>  1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/spi/spi-topcliff-pch.c b/drivers/spi/spi-topcliff-pch.c
> index 97fd1ea9826b..33043a830032 100644
> --- a/drivers/spi/spi-topcliff-pch.c
> +++ b/drivers/spi/spi-topcliff-pch.c
> @@ -584,22 +584,25 @@ static void pch_spi_set_tx(struct pch_spi_data *data, 
> int *bpw)
> data->pkt_tx_buff = kzalloc(size, GFP_KERNEL);
> if (data->pkt_tx_buff) {
> data->pkt_rx_buff = kzalloc(size, GFP_KERNEL);
> -   if (!data->pkt_rx_buff)
> +   if (!data->pkt_rx_buff) {
> kfree(data->pkt_tx_buff);
> -   }
>
> -   if (!data->pkt_rx_buff) {
> -   /* flush queue and set status of all transfers to -ENOMEM */
> -   list_for_each_entry_safe(pmsg, tmp, data->queue.next, queue) {
> -   pmsg->status = -ENOMEM;
> +   /*
> +* Flush queue and set status of all transfers
> +* to -ENOMEM.
> +*/
> +   list_for_each_entry_safe(pmsg, tmp, data->queue.next,
> +queue) {
> +   pmsg->status = -ENOMEM;
>
> -   if (pmsg->complete)
> -   pmsg->complete(pmsg->context);
> +   if (pmsg->complete)
> +   pmsg->complete(pmsg->context);
>
> -   /* delete from queue */
> -   list_del_init(>queue);
> +   /* delete from queue */
> +   list_del_init(>queue);
> +   }
> +   return;
> }
> -   return;
> }
>
> /* copy Tx Data */




-- 
Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [tpmdd-devel] [PATCH RFC v2 5/5] tpm2: expose resource manager via a device link /dev/tpms

2017-01-13 Thread James Bottomley

On Fri, 2017-01-13 at 11:01 -0700, Jason Gunthorpe wrote:
> On Fri, Jan 13, 2017 at 09:40:08AM -0800, James Bottomley wrote:
> > On Fri, 2017-01-13 at 10:25 -0700, Jason Gunthorpe wrote:
> > > On Thu, Jan 12, 2017 at 10:56:28PM +0200, Jarkko Sakkinen wrote:
> > > 
> > > > >  dev_t tpm_devt;
> > > > 
> > > > But they should have different major device numbers.
> > > 
> > > major/minors don't really matter these days since they are
> > > dynamic
> > 
> > Right, although we have this weird piece of code:
> > 
> > 
> > if (chip->dev_num == 0)
> > chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
> > else
> > chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);
> > 
> > So the first TPM device gets the MISC_MAJOR with TPM_MINOR and the
> > rest
> > get the dynamic major/minor.  It means when you do an ls on a
> > complex
> > system you get something like:
> 
> The TPM has always used a static major/minor for tpm0 and dynamic for
> the following. Originally this used a misc_device and did:
> 
> } else if (chip->dev_num == 0)
> chip->vendor.miscdev.minor = TPM_MINOR;
> else
> chip->vendor.miscdev.minor = MISC_DYNAMIC_MINOR;
> 
> When we moved to struct device the !0 tpms got a dynamic major as
> well.
> 
> I don't really know what the broad kernel feeling is on historical
> major/minor stability - this was mainly continuing what had always
> been done in this code for pre-udev userspace compatibility.
> 
> Does it harm anything?

Devices are either supposed to be static, so you can rely on the
major/minor without needing udev or fully dynamic, so you need udev to
get the nodes and you may not rely on the major/minor number.  The
potential harm is that we're neither one nor the other, which sounds
like an accident waiting to happen if someone decides to rely on our
numbers but gets the wrong node.  Chances are that, since most linux
devices are fully dynamic, no-one ever thinks of relying on fixed
major/minor numbers anymore, so everyone avoids this banana skin and
perhaps it doesn't matter.

James

Re: [tpmdd-devel] [PATCH RFC v2 5/5] tpm2: expose resource manager via a device link /dev/tpms

2017-01-13 Thread James Bottomley

On Fri, 2017-01-13 at 11:01 -0700, Jason Gunthorpe wrote:
> On Fri, Jan 13, 2017 at 09:40:08AM -0800, James Bottomley wrote:
> > On Fri, 2017-01-13 at 10:25 -0700, Jason Gunthorpe wrote:
> > > On Thu, Jan 12, 2017 at 10:56:28PM +0200, Jarkko Sakkinen wrote:
> > > 
> > > > >  dev_t tpm_devt;
> > > > 
> > > > But they should have different major device numbers.
> > > 
> > > major/minors don't really matter these days since they are
> > > dynamic
> > 
> > Right, although we have this weird piece of code:
> > 
> > 
> > if (chip->dev_num == 0)
> > chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
> > else
> > chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);
> > 
> > So the first TPM device gets the MISC_MAJOR with TPM_MINOR and the
> > rest
> > get the dynamic major/minor.  It means when you do an ls on a
> > complex
> > system you get something like:
> 
> The TPM has always used a static major/minor for tpm0 and dynamic for
> the following. Originally this used a misc_device and did:
> 
> } else if (chip->dev_num == 0)
> chip->vendor.miscdev.minor = TPM_MINOR;
> else
> chip->vendor.miscdev.minor = MISC_DYNAMIC_MINOR;
> 
> When we moved to struct device the !0 tpms got a dynamic major as
> well.
> 
> I don't really know what the broad kernel feeling is on historical
> major/minor stability - this was mainly continuing what had always
> been done in this code for pre-udev userspace compatibility.
> 
> Does it harm anything?

Devices are either supposed to be static, so you can rely on the
major/minor without needing udev or fully dynamic, so you need udev to
get the nodes and you may not rely on the major/minor number.  The
potential harm is that we're neither one nor the other, which sounds
like an accident waiting to happen if someone decides to rely on our
numbers but gets the wrong node.  Chances are that, since most linux
devices are fully dynamic, no-one ever thinks of relying on fixed
major/minor numbers anymore, so everyone avoids this banana skin and
perhaps it doesn't matter.

James

Re: [tpmdd-devel] [PATCH v8 2/2] tpm: add securityfs support for TPM 2.0 firmware event log

2017-01-13 Thread Stefan Berger


On 01/11/2017 02:54 AM, Nayna Jain wrote:

Unlike the device driver support for TPM 1.2, the TPM 2.0 does
not support the securityfs pseudo files for displaying the
firmware event log.

This patch enables support for providing the TPM 2.0 event log in
binary form. TPM 2.0 event log supports a crypto agile format that
records multiple digests, which is different from TPM 1.2. This
patch enables the tpm_bios_log_setup for TPM 2.0  and adds the
event log parser which understand the TPM 2.0 crypto agile format.

Signed-off-by: Nayna Jain 
---
  drivers/char/tpm/Makefile  |   2 +-
  .../char/tpm/{tpm_eventlog.c => tpm1_eventlog.c}   |  35 ++--
  drivers/char/tpm/tpm2_eventlog.c   | 203 +
  drivers/char/tpm/tpm_acpi.c|   3 +
  drivers/char/tpm/tpm_eventlog.h|  63 +++
  5 files changed, 291 insertions(+), 15 deletions(-)
  rename drivers/char/tpm/{tpm_eventlog.c => tpm1_eventlog.c} (95%)
  create mode 100644 drivers/char/tpm/tpm2_eventlog.c


diff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
index b7718c9..169edf3 100644
--- a/drivers/char/tpm/tpm_acpi.c
+++ b/drivers/char/tpm/tpm_acpi.c
@@ -54,6 +54,9 @@ int tpm_read_log_acpi(struct tpm_chip *chip)
u64 len, start;
struct tpm_bios_log *log;

+   if (chip->flags & TPM_CHIP_FLAG_TPM2)
+   return -ENODEV;
+



Works with SeaBIOS when this check is disabled.


   Stefan

Re: [tpmdd-devel] [PATCH v8 2/2] tpm: add securityfs support for TPM 2.0 firmware event log

2017-01-13 Thread Stefan Berger


On 01/11/2017 02:54 AM, Nayna Jain wrote:

Unlike the device driver support for TPM 1.2, the TPM 2.0 does
not support the securityfs pseudo files for displaying the
firmware event log.

This patch enables support for providing the TPM 2.0 event log in
binary form. TPM 2.0 event log supports a crypto agile format that
records multiple digests, which is different from TPM 1.2. This
patch enables the tpm_bios_log_setup for TPM 2.0  and adds the
event log parser which understand the TPM 2.0 crypto agile format.

Signed-off-by: Nayna Jain 
---
  drivers/char/tpm/Makefile  |   2 +-
  .../char/tpm/{tpm_eventlog.c => tpm1_eventlog.c}   |  35 ++--
  drivers/char/tpm/tpm2_eventlog.c   | 203 +
  drivers/char/tpm/tpm_acpi.c|   3 +
  drivers/char/tpm/tpm_eventlog.h|  63 +++
  5 files changed, 291 insertions(+), 15 deletions(-)
  rename drivers/char/tpm/{tpm_eventlog.c => tpm1_eventlog.c} (95%)
  create mode 100644 drivers/char/tpm/tpm2_eventlog.c


diff --git a/drivers/char/tpm/tpm_acpi.c b/drivers/char/tpm/tpm_acpi.c
index b7718c9..169edf3 100644
--- a/drivers/char/tpm/tpm_acpi.c
+++ b/drivers/char/tpm/tpm_acpi.c
@@ -54,6 +54,9 @@ int tpm_read_log_acpi(struct tpm_chip *chip)
u64 len, start;
struct tpm_bios_log *log;

+   if (chip->flags & TPM_CHIP_FLAG_TPM2)
+   return -ENODEV;
+



Works with SeaBIOS when this check is disabled.


   Stefan

Re: [PATCH 16/37] PCI: endpoint: Introduce configfs entry for configuring EP functions

2017-01-13 Thread Christoph Hellwig

Hi Kishon,

a couple comments on the configfs layout based on my experiments with
your previous drop to implement a NVMe device using it.

I don't think most of these configfs files should be present here, as
they are properties of the implemented PCIe devices.  E.g. for my
NVMe device they will be sort of hardcoded most of the time, as they
would be for other devices that would always have a fixed vendor/device/
class ID, cacheline size, etc.

In the end what we'll to be able to do here is to be able to create
a directory for each function driver, which then can create it's own
attributes inside it.

Re: [PATCH 16/37] PCI: endpoint: Introduce configfs entry for configuring EP functions

2017-01-13 Thread Christoph Hellwig

Hi Kishon,

a couple comments on the configfs layout based on my experiments with
your previous drop to implement a NVMe device using it.

I don't think most of these configfs files should be present here, as
they are properties of the implemented PCIe devices.  E.g. for my
NVMe device they will be sort of hardcoded most of the time, as they
would be for other devices that would always have a fixed vendor/device/
class ID, cacheline size, etc.

In the end what we'll to be able to do here is to be able to create
a directory for each function driver, which then can create it's own
attributes inside it.

Re: [patch 2/3] KVM: x86: add KVM_HC_CLOCK_OFFSET hypercall

2017-01-13 Thread Marcelo Tosatti

On Fri, Jan 13, 2017 at 06:07:40PM +0100, Radim Krcmar wrote:
> 2017-01-13 13:43-0200, Marcelo Tosatti:
> > On Fri, Jan 13, 2017 at 04:31:58PM +0100, Radim Krcmar wrote:
> >> 2017-01-13 10:01-0200, Marcelo Tosatti:
> >> > Add a hypercall to retrieve the host realtime clock
> >> > and the TSC value used to calculate that clock read.
> >> > 
> >> > Used to implement clock synchronization between
> >> > host and guest.
> >> > 
> >> > Signed-off-by: Marcelo Tosatti 
> >> > 
> >> > ---
> >> > Index: kvm-ptpdriver/Documentation/virtual/kvm/hypercalls.txt
> >> > @@ -81,3 +81,33 @@
> >> > +6. KVM_HC_CLOCK_OFFSET
> >> > +
> >> > +Architecture: x86
> >> > +Status: active
> >> > +Purpose: Hypercall used to synchronize host and guest clocks.
> >> > +Usage:
> >> > +
> >> > +a0: guest physical address where host copies
> >> > +"struct kvm_clock_offset" structure.
> >> > +
> >> > +a1: clock_type, ATM only KVM_CLOCK_OFFSET_WALLCLOCK (0)
> >> > +is supported (hosts CLOCK_REALTIME clock).
> >> > +
> >> > +struct kvm_clock_offset {
> >> > +__s64 sec;
> >> > +__s64 nsec;
> >> 
> >> Why is nsec:
> >>  1) signed -- it is a remainder after division by NSEC_PER_SEC
> > 
> > Because "struct timespec" is signed because it can be used for
> > time deltas (you won't actually get signed values for
> > kvm_get_walltime_and_clockread).
> > 
> > Just wanted to match "struct timespec".
> > 
> >>  2) bigger than 32 bit -- NSEC_PER_SEC < 2^32
> >> ?
> > 
> > Again matching struct timespec.
> 
> It is "long" in struct timespec, which could also be "s32" ...

Which fits in a s64.

> I'd rather waste those 8 bytes inside padding -- its purpose is clear
> there. :)

The purpose of the s64 is to match timespec.

> >> > +__u64 tsc;
> >> > +__u32 flags;
> >> > +__u32 pad;
> >> > +};
> >> > +
> >> > +   Where:
> >> > +   * sec: seconds from clock_type clock.
> >> > +   * nsec: nanoseconds from clock_type clock.
> >> 
> >> The important part of an offset is the starting point -- I assume it is
> >> the the usual one, but documentation better be explicit.
> > 
> > Don't get what you mean? (the values have same meaning as hosts
> > clock_gettime(CLOCK_REALTIME), supposedly that is clear).
> 
> Ah, I didn't understand that clock_type was refering to CLOCK_REALTIME.
> 
> I'd drop offset from the hypercall name.  IIUC, all various clock types
> would be compared to TSC, so we could name the hypercall somewhat like
> KVM_HC_CLOCK_AT_TSC -- we want to know what was the time on the selected
> clock when TSC was at value __u64 tsc.

Well its the offset from TSC, so thats why "offset" in the name.

And as you say, the interface could be extended to support 
other clocks (using the available data in pad, i'll make it larger BTW).

Which would render the "TSC" from the hypercall name invalid.

> The only offset is between sec+nsec and the beginning of time (and tsc
> and 0), but we don't care about that offset by itself -- we care about
> relation of sec+nsec to tsc, which isn't a simple offset as they flow
> differently.
> 
> If we wanted to be more generic, then KVM_HC_CLOCK_PAIRING/SNAPSHOT/...
> and argument 1 would contain clock_types, here REALTIME and TSC.

Pairing is fine, changed.

Re: [patch 2/3] KVM: x86: add KVM_HC_CLOCK_OFFSET hypercall

2017-01-13 Thread Marcelo Tosatti

On Fri, Jan 13, 2017 at 06:07:40PM +0100, Radim Krcmar wrote:
> 2017-01-13 13:43-0200, Marcelo Tosatti:
> > On Fri, Jan 13, 2017 at 04:31:58PM +0100, Radim Krcmar wrote:
> >> 2017-01-13 10:01-0200, Marcelo Tosatti:
> >> > Add a hypercall to retrieve the host realtime clock
> >> > and the TSC value used to calculate that clock read.
> >> > 
> >> > Used to implement clock synchronization between
> >> > host and guest.
> >> > 
> >> > Signed-off-by: Marcelo Tosatti 
> >> > 
> >> > ---
> >> > Index: kvm-ptpdriver/Documentation/virtual/kvm/hypercalls.txt
> >> > @@ -81,3 +81,33 @@
> >> > +6. KVM_HC_CLOCK_OFFSET
> >> > +
> >> > +Architecture: x86
> >> > +Status: active
> >> > +Purpose: Hypercall used to synchronize host and guest clocks.
> >> > +Usage:
> >> > +
> >> > +a0: guest physical address where host copies
> >> > +"struct kvm_clock_offset" structure.
> >> > +
> >> > +a1: clock_type, ATM only KVM_CLOCK_OFFSET_WALLCLOCK (0)
> >> > +is supported (hosts CLOCK_REALTIME clock).
> >> > +
> >> > +struct kvm_clock_offset {
> >> > +__s64 sec;
> >> > +__s64 nsec;
> >> 
> >> Why is nsec:
> >>  1) signed -- it is a remainder after division by NSEC_PER_SEC
> > 
> > Because "struct timespec" is signed because it can be used for
> > time deltas (you won't actually get signed values for
> > kvm_get_walltime_and_clockread).
> > 
> > Just wanted to match "struct timespec".
> > 
> >>  2) bigger than 32 bit -- NSEC_PER_SEC < 2^32
> >> ?
> > 
> > Again matching struct timespec.
> 
> It is "long" in struct timespec, which could also be "s32" ...

Which fits in a s64.

> I'd rather waste those 8 bytes inside padding -- its purpose is clear
> there. :)

The purpose of the s64 is to match timespec.

> >> > +__u64 tsc;
> >> > +__u32 flags;
> >> > +__u32 pad;
> >> > +};
> >> > +
> >> > +   Where:
> >> > +   * sec: seconds from clock_type clock.
> >> > +   * nsec: nanoseconds from clock_type clock.
> >> 
> >> The important part of an offset is the starting point -- I assume it is
> >> the the usual one, but documentation better be explicit.
> > 
> > Don't get what you mean? (the values have same meaning as hosts
> > clock_gettime(CLOCK_REALTIME), supposedly that is clear).
> 
> Ah, I didn't understand that clock_type was refering to CLOCK_REALTIME.
> 
> I'd drop offset from the hypercall name.  IIUC, all various clock types
> would be compared to TSC, so we could name the hypercall somewhat like
> KVM_HC_CLOCK_AT_TSC -- we want to know what was the time on the selected
> clock when TSC was at value __u64 tsc.

Well its the offset from TSC, so thats why "offset" in the name.

And as you say, the interface could be extended to support 
other clocks (using the available data in pad, i'll make it larger BTW).

Which would render the "TSC" from the hypercall name invalid.

> The only offset is between sec+nsec and the beginning of time (and tsc
> and 0), but we don't care about that offset by itself -- we care about
> relation of sec+nsec to tsc, which isn't a simple offset as they flow
> differently.
> 
> If we wanted to be more generic, then KVM_HC_CLOCK_PAIRING/SNAPSHOT/...
> and argument 1 would contain clock_types, here REALTIME and TSC.

Pairing is fine, changed.

Re: [patch 1/3] KVM: x86: provide realtime host clock via vsyscall notifiers

2017-01-13 Thread Marcelo Tosatti

On Fri, Jan 13, 2017 at 05:28:09PM +0100, Radim Krcmar wrote:
> 2017-01-13 13:34-0200, Marcelo Tosatti:
> > On Fri, Jan 13, 2017 at 04:18:04PM +0100, Radim Krcmar wrote:
> >> 2017-01-13 10:01-0200, Marcelo Tosatti:
> >> > Expose the realtime host clock and save the TSC value
> >> > used for the clock calculation.
> >> > 
> >> > Signed-off-by: Marcelo Tosatti 
> >> > 
> >> > ---
> >> >  arch/x86/kvm/x86.c |   38 ++
> >> >  1 file changed, 38 insertions(+)
> >> > 
> >> > Index: kvm-ptpdriver/arch/x86/kvm/x86.c
> >> > ===
> >> > --- kvm-ptpdriver.orig/arch/x86/kvm/x86.c2017-01-13 
> >> > 08:59:03.015895353 -0200
> >> > +++ kvm-ptpdriver/arch/x86/kvm/x86.c 2017-01-13 09:04:46.581415259 
> >> > -0200
> >> > @@ -1139,6 +1139,8 @@
> >> >  
> >> >  u64 boot_ns;
> >> >  u64 nsec_base;
> >> > +u64 wall_time_sec;
> >> > +u64 wall_time_snsec;
> >> 
> >> The leading "s" in "snsec" looks like a copy-paste residue.
> > 
> > Just copying the userspace vsyscall interface.
> 
> Oh, so the "s" means "sub-" for sub-nanosecond precision.

It only counts nanoseconds, how can it be sub nanosecond precise?

> >> >  };
> >> >  
> >> >  static struct pvclock_gtod_data pvclock_gtod_data;
> >> > @@ -1162,6 +1164,9 @@
> >> >  vdata->boot_ns  = boot_ns;
> >> >  vdata->nsec_base= tk->tkr_mono.xtime_nsec;
> >> >  
> >> > +vdata->wall_time_sec= tk->xtime_sec;
> >> > +vdata->wall_time_snsec  = tk->tkr_mono.xtime_nsec;
> >> 
> >> Using tk->tkr_mono offsets for real time seems wrong -- what happens if
> >> the real time is half a second shifted from monotonic time?
> > 
> > Both the userspace vsyscall interface and getnstimeofday
> > use it for realtime clock.
> > 
> > Monotonic clock adds the offset:
> > 
> > vdata->monotonic_time_snsec = tk->tkr_mono.xtime_nsec
> > +
> > ((u64)tk->wall_to_monotonic.tv_nsec
> > << tk->tkr_mono.shift);
> 
> I see, thanks.  Makes me wonder why our monotonic time is correct then,
> but that is problably thanks to boot_ns.

The actual starting point of the system_timestamp part of kvmclock
does not matter, all it matters is that it counts in nanoseconds.

> >> If it's ok, then vdata->nsec_base == vdata->wall_time_snsec, so we don't
> >> need it.
> > 
> > Just copying the userspace vsyscall interface.
> > 
> > Do you actually want to change the "s" and unify wall_time_snsec with
> > nsec_base?
> 
> The "s" isn't important, even though I don't think we do anything that
> would justify it, but make use just 8 bytes for both.

Unified.

> Renaming nsec_base is ok, but I'm not sure what tk->tkr_mono.xtime_nsec
> is anymore.

Is the nsec part of tk->xtime_sec. See accumulate_nsecs_to_secs
(which is called from the timer interrupt handler).

/**
 * struct timekeeper - Structure holding internal timekeeping values.
 * @tkr_mono:   The readout base structure for CLOCK_MONOTONIC
 * @tkr_raw:The readout base structure for
 * CLOCK_MONOTONIC_RAW
 * @xtime_sec:  Current CLOCK_REALTIME time in seconds
 * @ktime_sec:  Current CLOCK_MONOTONIC time in seconds
 * @wall_to_monotonic:  CLOCK_REALTIME to CLOCK_MONOTONIC offset
 * @offs_real:  Offset clock monotonic -> clock realtime
 * @offs_boot:  Offset clock monotonic -> clock boottime
 * @offs_tai:   Offset clock monotonic -> clock tai
 * @tai_offset: The current UTC to TAI offset in seconds

Re: [patch 3/3] PTP: add kvm PTP driver

2017-01-13 Thread Marcelo Tosatti

On Fri, Jan 13, 2017 at 04:56:58PM +0100, Radim Krcmar wrote:
> 2017-01-13 10:01-0200, Marcelo Tosatti:
> > Add a driver with gettime method returning hosts realtime clock.
> > This allows Chrony to synchronize host and guest clocks with 
> > high precision (see results below).
> > 
> > chronyc> sources
> > MS Name/IP address Stratum Poll Reach LastRx Last sample
> > ===
> > #* PHC0  0   3   377 6 +4ns[   +4ns] +/-
> > 3ns
> > 
> > To configure Chronyd to use PHC refclock, add the 
> > following line to its configuration file:
> > 
> > refclock PHC /dev/ptpX poll 3 dpoll -2 offset 0
> > 
> > Where /dev/ptpX is the kvmclock PTP clock.
> > 
> > 
> > ---
> >  drivers/ptp/Kconfig   |   12 +++
> >  drivers/ptp/Makefile  |1 
> >  drivers/ptp/ptp_kvm.c |  180 
> > ++
> >  3 files changed, 193 insertions(+)
> > 
> > Index: kvm-ptpdriver/drivers/ptp/Kconfig
> > ===
> > --- kvm-ptpdriver.orig/drivers/ptp/Kconfig  2017-01-13 09:17:31.724568567 
> > -0200
> > +++ kvm-ptpdriver/drivers/ptp/Kconfig   2017-01-13 09:55:33.344208894 
> > -0200
> > @@ -90,4 +90,16 @@
> >   To compile this driver as a module, choose M here: the module
> >   will be called ptp_pch.
> >  
> > +config PTP_1588_CLOCK_KVM
> > +   tristate "KVM virtual PTP clock"
> > +   depends on PTP_1588_CLOCK
> > +   depends on KVM_GUEST
> > +   default y
> > +   help
> > + This driver adds support for using kvm infrastructure as a PTP
> > + clock. This clock is only useful if you are using KVM guests.
> > +
> > + To compile this driver as a module, choose M here: the module
> > + will be called ptp_kvm.
> > +
> >  endmenu
> > Index: kvm-ptpdriver/drivers/ptp/ptp_kvm.c
> > ===
> > --- /dev/null   1970-01-01 00:00:00.0 +
> > +++ kvm-ptpdriver/drivers/ptp/ptp_kvm.c 2017-01-13 09:57:55.013440645 
> > -0200
> > @@ -0,0 +1,180 @@
> > +/*
> > + * Virtual PTP 1588 clock for use with KVM guests
> > + *
> > + * Copyright (C) 2017 Red Hat Inc.
> > + *
> > + *  This program is free software; you can redistribute it and/or modify
> > + *  it under the terms of the GNU General Public License as published by
> > + *  the Free Software Foundation; either version 2 of the License, or
> > + *  (at your option) any later version.
> > + *
> > + *  This program is distributed in the hope that it will be useful,
> > + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + *  GNU General Public License for more details.
> > + *
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +struct kvm_ptp_clock {
> > +   struct ptp_clock *ptp_clock;
> > +   struct ptp_clock_info caps;
> > +};
> > +
> > +DEFINE_SPINLOCK(kvm_ptp_lock);
> > +
> > +static struct pvclock_vsyscall_time_info *hv_clock;
> > +
> > +/*
> > + * PTP clock operations
> > + */
> > +
> > +static int ptp_kvm_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
> > +{
> > +   return -EOPNOTSUPP;
> > +}
> > +
> > +static int ptp_kvm_adjtime(struct ptp_clock_info *ptp, s64 delta)
> > +{
> > +   return -EOPNOTSUPP;
> > +}
> > +
> > +static struct kvm_clock_offset clock_off;
> > +static phys_addr_t clock_off_gpa;
> > +
> > +static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 
> > *ts)
> > +{
> > +   unsigned long ret;
> > +   struct timespec64 tspec;
> > +   u64 delta;
> > +   cycle_t offset;
> > +   unsigned version;
> > +   int cpu;
> > +   struct pvclock_vcpu_time_info *src;
> > +
> > +   preempt_disable_notrace();
> > +   cpu = smp_processor_id();
> > +   src = _clock[cpu].pvti;
> > +
> > +   spin_lock(_ptp_lock);
> > +
> > +   do {
> > +   /*
> > +* We are measuring the delay between
> > +* kvm_hypercall and rdtsc using TSC,
> > +* and converting that delta to
> > +* tsc_to_system_mul and tsc_shift
> > +* So any changes to tsc_to_system_mul
> > +* and tsc_shift in this region
> > +* invalidate the measurement.
> > +*/
> 
> This assumes that the host uses kvmclock, but the guest can be using
> just TSC and even have kvmclock disabled.  This code should at least
> check that kvmclock enabled.

Fixed.

> (If we made kvmclock mandatory, then we could fix pvclock_wall_clock
>  interface, because it is already defined to provide real time based on
>  kvmclock ...)
> 
> > +   version = pvclock_read_begin(src);
> > +
> > +   ret = kvm_hypercall2(KVM_HC_CLOCK_OFFSET,
> > +clock_off_gpa,
> > +

Re: [patch 3/3] PTP: add kvm PTP driver

2017-01-13 Thread Marcelo Tosatti

On Fri, Jan 13, 2017 at 04:56:58PM +0100, Radim Krcmar wrote:
> 2017-01-13 10:01-0200, Marcelo Tosatti:
> > Add a driver with gettime method returning hosts realtime clock.
> > This allows Chrony to synchronize host and guest clocks with 
> > high precision (see results below).
> > 
> > chronyc> sources
> > MS Name/IP address Stratum Poll Reach LastRx Last sample
> > ===
> > #* PHC0  0   3   377 6 +4ns[   +4ns] +/-
> > 3ns
> > 
> > To configure Chronyd to use PHC refclock, add the 
> > following line to its configuration file:
> > 
> > refclock PHC /dev/ptpX poll 3 dpoll -2 offset 0
> > 
> > Where /dev/ptpX is the kvmclock PTP clock.
> > 
> > 
> > ---
> >  drivers/ptp/Kconfig   |   12 +++
> >  drivers/ptp/Makefile  |1 
> >  drivers/ptp/ptp_kvm.c |  180 
> > ++
> >  3 files changed, 193 insertions(+)
> > 
> > Index: kvm-ptpdriver/drivers/ptp/Kconfig
> > ===
> > --- kvm-ptpdriver.orig/drivers/ptp/Kconfig  2017-01-13 09:17:31.724568567 
> > -0200
> > +++ kvm-ptpdriver/drivers/ptp/Kconfig   2017-01-13 09:55:33.344208894 
> > -0200
> > @@ -90,4 +90,16 @@
> >   To compile this driver as a module, choose M here: the module
> >   will be called ptp_pch.
> >  
> > +config PTP_1588_CLOCK_KVM
> > +   tristate "KVM virtual PTP clock"
> > +   depends on PTP_1588_CLOCK
> > +   depends on KVM_GUEST
> > +   default y
> > +   help
> > + This driver adds support for using kvm infrastructure as a PTP
> > + clock. This clock is only useful if you are using KVM guests.
> > +
> > + To compile this driver as a module, choose M here: the module
> > + will be called ptp_kvm.
> > +
> >  endmenu
> > Index: kvm-ptpdriver/drivers/ptp/ptp_kvm.c
> > ===
> > --- /dev/null   1970-01-01 00:00:00.0 +
> > +++ kvm-ptpdriver/drivers/ptp/ptp_kvm.c 2017-01-13 09:57:55.013440645 
> > -0200
> > @@ -0,0 +1,180 @@
> > +/*
> > + * Virtual PTP 1588 clock for use with KVM guests
> > + *
> > + * Copyright (C) 2017 Red Hat Inc.
> > + *
> > + *  This program is free software; you can redistribute it and/or modify
> > + *  it under the terms of the GNU General Public License as published by
> > + *  the Free Software Foundation; either version 2 of the License, or
> > + *  (at your option) any later version.
> > + *
> > + *  This program is distributed in the hope that it will be useful,
> > + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + *  GNU General Public License for more details.
> > + *
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +struct kvm_ptp_clock {
> > +   struct ptp_clock *ptp_clock;
> > +   struct ptp_clock_info caps;
> > +};
> > +
> > +DEFINE_SPINLOCK(kvm_ptp_lock);
> > +
> > +static struct pvclock_vsyscall_time_info *hv_clock;
> > +
> > +/*
> > + * PTP clock operations
> > + */
> > +
> > +static int ptp_kvm_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
> > +{
> > +   return -EOPNOTSUPP;
> > +}
> > +
> > +static int ptp_kvm_adjtime(struct ptp_clock_info *ptp, s64 delta)
> > +{
> > +   return -EOPNOTSUPP;
> > +}
> > +
> > +static struct kvm_clock_offset clock_off;
> > +static phys_addr_t clock_off_gpa;
> > +
> > +static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 
> > *ts)
> > +{
> > +   unsigned long ret;
> > +   struct timespec64 tspec;
> > +   u64 delta;
> > +   cycle_t offset;
> > +   unsigned version;
> > +   int cpu;
> > +   struct pvclock_vcpu_time_info *src;
> > +
> > +   preempt_disable_notrace();
> > +   cpu = smp_processor_id();
> > +   src = _clock[cpu].pvti;
> > +
> > +   spin_lock(_ptp_lock);
> > +
> > +   do {
> > +   /*
> > +* We are measuring the delay between
> > +* kvm_hypercall and rdtsc using TSC,
> > +* and converting that delta to
> > +* tsc_to_system_mul and tsc_shift
> > +* So any changes to tsc_to_system_mul
> > +* and tsc_shift in this region
> > +* invalidate the measurement.
> > +*/
> 
> This assumes that the host uses kvmclock, but the guest can be using
> just TSC and even have kvmclock disabled.  This code should at least
> check that kvmclock enabled.

Fixed.

> (If we made kvmclock mandatory, then we could fix pvclock_wall_clock
>  interface, because it is already defined to provide real time based on
>  kvmclock ...)
> 
> > +   version = pvclock_read_begin(src);
> > +
> > +   ret = kvm_hypercall2(KVM_HC_CLOCK_OFFSET,
> > +clock_off_gpa,
> > +

Re: [patch 1/3] KVM: x86: provide realtime host clock via vsyscall notifiers

2017-01-13 Thread Marcelo Tosatti

On Fri, Jan 13, 2017 at 05:28:09PM +0100, Radim Krcmar wrote:
> 2017-01-13 13:34-0200, Marcelo Tosatti:
> > On Fri, Jan 13, 2017 at 04:18:04PM +0100, Radim Krcmar wrote:
> >> 2017-01-13 10:01-0200, Marcelo Tosatti:
> >> > Expose the realtime host clock and save the TSC value
> >> > used for the clock calculation.
> >> > 
> >> > Signed-off-by: Marcelo Tosatti 
> >> > 
> >> > ---
> >> >  arch/x86/kvm/x86.c |   38 ++
> >> >  1 file changed, 38 insertions(+)
> >> > 
> >> > Index: kvm-ptpdriver/arch/x86/kvm/x86.c
> >> > ===
> >> > --- kvm-ptpdriver.orig/arch/x86/kvm/x86.c2017-01-13 
> >> > 08:59:03.015895353 -0200
> >> > +++ kvm-ptpdriver/arch/x86/kvm/x86.c 2017-01-13 09:04:46.581415259 
> >> > -0200
> >> > @@ -1139,6 +1139,8 @@
> >> >  
> >> >  u64 boot_ns;
> >> >  u64 nsec_base;
> >> > +u64 wall_time_sec;
> >> > +u64 wall_time_snsec;
> >> 
> >> The leading "s" in "snsec" looks like a copy-paste residue.
> > 
> > Just copying the userspace vsyscall interface.
> 
> Oh, so the "s" means "sub-" for sub-nanosecond precision.

It only counts nanoseconds, how can it be sub nanosecond precise?

> >> >  };
> >> >  
> >> >  static struct pvclock_gtod_data pvclock_gtod_data;
> >> > @@ -1162,6 +1164,9 @@
> >> >  vdata->boot_ns  = boot_ns;
> >> >  vdata->nsec_base= tk->tkr_mono.xtime_nsec;
> >> >  
> >> > +vdata->wall_time_sec= tk->xtime_sec;
> >> > +vdata->wall_time_snsec  = tk->tkr_mono.xtime_nsec;
> >> 
> >> Using tk->tkr_mono offsets for real time seems wrong -- what happens if
> >> the real time is half a second shifted from monotonic time?
> > 
> > Both the userspace vsyscall interface and getnstimeofday
> > use it for realtime clock.
> > 
> > Monotonic clock adds the offset:
> > 
> > vdata->monotonic_time_snsec = tk->tkr_mono.xtime_nsec
> > +
> > ((u64)tk->wall_to_monotonic.tv_nsec
> > << tk->tkr_mono.shift);
> 
> I see, thanks.  Makes me wonder why our monotonic time is correct then,
> but that is problably thanks to boot_ns.

The actual starting point of the system_timestamp part of kvmclock
does not matter, all it matters is that it counts in nanoseconds.

> >> If it's ok, then vdata->nsec_base == vdata->wall_time_snsec, so we don't
> >> need it.
> > 
> > Just copying the userspace vsyscall interface.
> > 
> > Do you actually want to change the "s" and unify wall_time_snsec with
> > nsec_base?
> 
> The "s" isn't important, even though I don't think we do anything that
> would justify it, but make use just 8 bytes for both.

Unified.

> Renaming nsec_base is ok, but I'm not sure what tk->tkr_mono.xtime_nsec
> is anymore.

Is the nsec part of tk->xtime_sec. See accumulate_nsecs_to_secs
(which is called from the timer interrupt handler).

/**
 * struct timekeeper - Structure holding internal timekeeping values.
 * @tkr_mono:   The readout base structure for CLOCK_MONOTONIC
 * @tkr_raw:The readout base structure for
 * CLOCK_MONOTONIC_RAW
 * @xtime_sec:  Current CLOCK_REALTIME time in seconds
 * @ktime_sec:  Current CLOCK_MONOTONIC time in seconds
 * @wall_to_monotonic:  CLOCK_REALTIME to CLOCK_MONOTONIC offset
 * @offs_real:  Offset clock monotonic -> clock realtime
 * @offs_boot:  Offset clock monotonic -> clock boottime
 * @offs_tai:   Offset clock monotonic -> clock tai
 * @tai_offset: The current UTC to TAI offset in seconds

Re: [PATCH] Documentation: cpuset: Fix 'cpuset.tasks' -> 'tasks'

2017-01-13 Thread Jonathan Corbet

On Fri, 13 Jan 2017 09:44:34 -0800
"W. Trevor King"  wrote:

> > So I'll confess that I don't understand this change.  All of the
> > control files are referred to as cpuset.whatever in this document;
> > why should this one, in particular, be different?  
> 
> 'tasks' is part of the generic cgroup tooling, so it doesn't get the
> cpuset prefix:

Duh, of course.  Got more coffee now and applied it, thanks.

jon

Re: [PATCH] Documentation: cpuset: Fix 'cpuset.tasks' -> 'tasks'

2017-01-13 Thread Jonathan Corbet

On Fri, 13 Jan 2017 09:44:34 -0800
"W. Trevor King"  wrote:

> > So I'll confess that I don't understand this change.  All of the
> > control files are referred to as cpuset.whatever in this document;
> > why should this one, in particular, be different?  
> 
> 'tasks' is part of the generic cgroup tooling, so it doesn't get the
> cpuset prefix:

Duh, of course.  Got more coffee now and applied it, thanks.

jon

[PATCH] net: constify mdiobb_ops structures

2017-01-13 Thread Bhumika Goyal

Declare mdiobb_ops structures as const as they are only stored in the
ops field of mdiobb_ctrl structures. This field is of type const, so
mdiobb_ops structures having this property can be declared const too.
Done using Coccinelle:

@r disable optional_qualifier@
identifier x;
position p;
@@
static struct mdiobb_ops x@p={...};

@ok@
struct bb_info bitbang;
struct ravb_private priv;
struct ax_device ax;
struct mdio_gpio_info bb;
identifier r.x;
position p;
@@
(
bitbang.ctrl.ops=@p
|
priv.mdiobb.ops=@p
|
ax.bb_ctrl.ops=@p
|
bb.ctrl.ops=@p
)

@bad@
position p != {r.p,ok.p};
identifier r.x;
@@
x@p

@depends on !bad disable optional_qualifier@
identifier r.x;
@@
+const
struct mdiobb_ops x;

Before and after size details:

File size before:
   textdata bss dec hex filename
  132931203  16   1451238b0 net/ethernet/8390/ax88796.o
File size after:
   textdata bss dec hex filename
  133571139  16   1451238b0 net/ethernet/8390/ax88796.o

File size before: 
   textdata bss dec hex filename
   1440 100   01540 604 freescale/fs_enet/mii-bitbang.o
File size after: Remains the same

File size before:
   textdata bss dec hex filename
  19000 192  16   192084b08 net/ethernet/renesas/ravb_main.o
File size after: Remains the same

File size before:
   textdata bss dec hex filename
  285011568   8   30077757d net/ethernet/renesas/sh_eth.o
File size after:
   textdata bss dec hex filename
  285651504   8   30077757d net/ethernet/renesas/sh_eth.o

File size before:
   textdata bss dec hex filename
   1859 248   02107 83b drivers/net/phy/mdio-gpio.o
File size after:
   textdata bss dec hex filename
   1915 192   02107 83b drivers/net/phy/mdio-gpio.o

Signed-off-by: Bhumika Goyal 
---
 drivers/net/ethernet/8390/ax88796.c  | 2 +-
 drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c | 2 +-
 drivers/net/ethernet/renesas/ravb_main.c | 2 +-
 drivers/net/ethernet/renesas/sh_eth.c| 2 +-
 drivers/net/phy/mdio-gpio.c  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/8390/ax88796.c 
b/drivers/net/ethernet/8390/ax88796.c
index b0a3b85..e977671 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -585,7 +585,7 @@ static int ax_bb_get_data(struct mdiobb_ctrl *ctrl)
return reg_memr & AX_MEMR_MDI ? 1 : 0;
 }
 
-static struct mdiobb_ops bb_ops = {
+static const struct mdiobb_ops bb_ops = {
.owner = THIS_MODULE,
.set_mdc = ax_bb_mdc,
.set_mdio_dir = ax_bb_dir,
diff --git a/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c 
b/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c
index 1f015ed..c8e5d88 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c
@@ -100,7 +100,7 @@ static inline void mdc(struct mdiobb_ctrl *ctrl, int what)
in_be32(bitbang->dat);
 }
 
-static struct mdiobb_ops bb_ops = {
+static const struct mdiobb_ops bb_ops = {
.owner = THIS_MODULE,
.set_mdc = mdc,
.set_mdio_dir = mdio_dir,
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 92d7692..1b0acd1 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -171,7 +171,7 @@ static int ravb_get_mdio_data(struct mdiobb_ctrl *ctrl)
 }
 
 /* MDIO bus control struct */
-static struct mdiobb_ops bb_ops = {
+static const struct mdiobb_ops bb_ops = {
.owner = THIS_MODULE,
.set_mdc = ravb_set_mdc,
.set_mdio_dir = ravb_set_mdio_dir,
diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
b/drivers/net/ethernet/renesas/sh_eth.c
index 00fafab..6ef5dd8 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1052,7 +1052,7 @@ static void sh_mdc_ctrl(struct mdiobb_ctrl *ctrl, int bit)
 }
 
 /* mdio bus control struct */
-static struct mdiobb_ops bb_ops = {
+static const struct mdiobb_ops bb_ops = {
.owner = THIS_MODULE,
.set_mdc = sh_mdc_ctrl,
.set_mdio_dir = sh_mmd_ctrl,
diff --git a/drivers/net/phy/mdio-gpio.c b/drivers/net/phy/mdio-gpio.c
index 27ab630..97333d5 100644
--- a/drivers/net/phy/mdio-gpio.c
+++ b/drivers/net/phy/mdio-gpio.c
@@ -122,7 +122,7 @@ static void mdc_set(struct mdiobb_ctrl *ctrl, int what)
gpio_set_value_cansleep(bitbang->mdc, what ^ bitbang->mdc_active_low);
 }
 
-static struct mdiobb_ops mdio_gpio_ops = {
+static const struct mdiobb_ops mdio_gpio_ops = {
.owner = THIS_MODULE,
.set_mdc = mdc_set,
.set_mdio_dir = mdio_dir,
-- 
1.9.1

[PATCH] net: constify mdiobb_ops structures

2017-01-13 Thread Bhumika Goyal

Declare mdiobb_ops structures as const as they are only stored in the
ops field of mdiobb_ctrl structures. This field is of type const, so
mdiobb_ops structures having this property can be declared const too.
Done using Coccinelle:

@r disable optional_qualifier@
identifier x;
position p;
@@
static struct mdiobb_ops x@p={...};

@ok@
struct bb_info bitbang;
struct ravb_private priv;
struct ax_device ax;
struct mdio_gpio_info bb;
identifier r.x;
position p;
@@
(
bitbang.ctrl.ops=@p
|
priv.mdiobb.ops=@p
|
ax.bb_ctrl.ops=@p
|
bb.ctrl.ops=@p
)

@bad@
position p != {r.p,ok.p};
identifier r.x;
@@
x@p

@depends on !bad disable optional_qualifier@
identifier r.x;
@@
+const
struct mdiobb_ops x;

Before and after size details:

File size before:
   textdata bss dec hex filename
  132931203  16   1451238b0 net/ethernet/8390/ax88796.o
File size after:
   textdata bss dec hex filename
  133571139  16   1451238b0 net/ethernet/8390/ax88796.o

File size before: 
   textdata bss dec hex filename
   1440 100   01540 604 freescale/fs_enet/mii-bitbang.o
File size after: Remains the same

File size before:
   textdata bss dec hex filename
  19000 192  16   192084b08 net/ethernet/renesas/ravb_main.o
File size after: Remains the same

File size before:
   textdata bss dec hex filename
  285011568   8   30077757d net/ethernet/renesas/sh_eth.o
File size after:
   textdata bss dec hex filename
  285651504   8   30077757d net/ethernet/renesas/sh_eth.o

File size before:
   textdata bss dec hex filename
   1859 248   02107 83b drivers/net/phy/mdio-gpio.o
File size after:
   textdata bss dec hex filename
   1915 192   02107 83b drivers/net/phy/mdio-gpio.o

Signed-off-by: Bhumika Goyal 
---
 drivers/net/ethernet/8390/ax88796.c  | 2 +-
 drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c | 2 +-
 drivers/net/ethernet/renesas/ravb_main.c | 2 +-
 drivers/net/ethernet/renesas/sh_eth.c| 2 +-
 drivers/net/phy/mdio-gpio.c  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/8390/ax88796.c 
b/drivers/net/ethernet/8390/ax88796.c
index b0a3b85..e977671 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -585,7 +585,7 @@ static int ax_bb_get_data(struct mdiobb_ctrl *ctrl)
return reg_memr & AX_MEMR_MDI ? 1 : 0;
 }
 
-static struct mdiobb_ops bb_ops = {
+static const struct mdiobb_ops bb_ops = {
.owner = THIS_MODULE,
.set_mdc = ax_bb_mdc,
.set_mdio_dir = ax_bb_dir,
diff --git a/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c 
b/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c
index 1f015ed..c8e5d88 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c
@@ -100,7 +100,7 @@ static inline void mdc(struct mdiobb_ctrl *ctrl, int what)
in_be32(bitbang->dat);
 }
 
-static struct mdiobb_ops bb_ops = {
+static const struct mdiobb_ops bb_ops = {
.owner = THIS_MODULE,
.set_mdc = mdc,
.set_mdio_dir = mdio_dir,
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 92d7692..1b0acd1 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -171,7 +171,7 @@ static int ravb_get_mdio_data(struct mdiobb_ctrl *ctrl)
 }
 
 /* MDIO bus control struct */
-static struct mdiobb_ops bb_ops = {
+static const struct mdiobb_ops bb_ops = {
.owner = THIS_MODULE,
.set_mdc = ravb_set_mdc,
.set_mdio_dir = ravb_set_mdio_dir,
diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
b/drivers/net/ethernet/renesas/sh_eth.c
index 00fafab..6ef5dd8 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1052,7 +1052,7 @@ static void sh_mdc_ctrl(struct mdiobb_ctrl *ctrl, int bit)
 }
 
 /* mdio bus control struct */
-static struct mdiobb_ops bb_ops = {
+static const struct mdiobb_ops bb_ops = {
.owner = THIS_MODULE,
.set_mdc = sh_mdc_ctrl,
.set_mdio_dir = sh_mmd_ctrl,
diff --git a/drivers/net/phy/mdio-gpio.c b/drivers/net/phy/mdio-gpio.c
index 27ab630..97333d5 100644
--- a/drivers/net/phy/mdio-gpio.c
+++ b/drivers/net/phy/mdio-gpio.c
@@ -122,7 +122,7 @@ static void mdc_set(struct mdiobb_ctrl *ctrl, int what)
gpio_set_value_cansleep(bitbang->mdc, what ^ bitbang->mdc_active_low);
 }
 
-static struct mdiobb_ops mdio_gpio_ops = {
+static const struct mdiobb_ops mdio_gpio_ops = {
.owner = THIS_MODULE,
.set_mdc = mdc_set,
.set_mdio_dir = mdio_dir,
-- 
1.9.1

Re: [PATCH V2 0/6] gpio: davinci: Redesign driver to accommodate ngpios in one gpio chip

2017-01-13 Thread Grygorii Strashko




On 01/12/2017 10:20 PM, Keerthy wrote:

The Davinci GPIO driver is implemented to work with one monolithic
Davinci GPIO platform device which may have up to Y(144) gpios.
The Davinci GPIO driver instantiates number of GPIO chips with
max 32 gpio pins per each during initialization and one IRQ domain.
So, the current GPIO's  opjects structure is:

 Davinci GPIO controller
 |-  --|
 ...|--- irq_domain (hwirq [0..143])
 |-  --|

Current driver creates one chip for every 32 GPIOs in a controller.
This was a limitation earlier now there is no need for that. Hence
redesigning the driver to create one gpio chip for all the ngpio
in the controller.

|-  --|--- irq_domain (hwirq [0..143]).

The previous discussion on this can be found here:
https://www.spinics.net/lists/linux-omap/msg132869.html

The series is posted on top of:

https://lkml.org/lkml/2017/1/4/94

Changes in v2:

  * Optimized the re-design patch.
  * Added couple of code clean ups after the re-design.
  * Included v2 of https://patchwork.ozlabs.org/patch/710855/

Keerthy (6):
  gpio: davinci: Remove gpio2regs function to accommodate multi
instances
  gpio: davinci: Remove unwanted blank line
  gpio: davinci: Redesign driver to accommodate ngpios in one gpio chip
  gpio: davinci: Add support for multiple GPIO controllers
  gpio: davinci: Remove custom .xlate
  gpio: davinci: Remove redundant macros

 drivers/gpio/gpio-davinci.c| 174 +
 include/linux/platform_data/gpio-davinci.h |  20 ++--
 2 files changed, 90 insertions(+), 104 deletions(-)



Reviewed-by: Grygorii Strashko 

--
regards,
-grygorii

Re: [PATCH V2 0/6] gpio: davinci: Redesign driver to accommodate ngpios in one gpio chip

2017-01-13 Thread Grygorii Strashko




On 01/12/2017 10:20 PM, Keerthy wrote:

The Davinci GPIO driver is implemented to work with one monolithic
Davinci GPIO platform device which may have up to Y(144) gpios.
The Davinci GPIO driver instantiates number of GPIO chips with
max 32 gpio pins per each during initialization and one IRQ domain.
So, the current GPIO's  opjects structure is:

 Davinci GPIO controller
 |-  --|
 ...|--- irq_domain (hwirq [0..143])
 |-  --|

Current driver creates one chip for every 32 GPIOs in a controller.
This was a limitation earlier now there is no need for that. Hence
redesigning the driver to create one gpio chip for all the ngpio
in the controller.

|-  --|--- irq_domain (hwirq [0..143]).

The previous discussion on this can be found here:
https://www.spinics.net/lists/linux-omap/msg132869.html

The series is posted on top of:

https://lkml.org/lkml/2017/1/4/94

Changes in v2:

  * Optimized the re-design patch.
  * Added couple of code clean ups after the re-design.
  * Included v2 of https://patchwork.ozlabs.org/patch/710855/

Keerthy (6):
  gpio: davinci: Remove gpio2regs function to accommodate multi
instances
  gpio: davinci: Remove unwanted blank line
  gpio: davinci: Redesign driver to accommodate ngpios in one gpio chip
  gpio: davinci: Add support for multiple GPIO controllers
  gpio: davinci: Remove custom .xlate
  gpio: davinci: Remove redundant macros

 drivers/gpio/gpio-davinci.c| 174 +
 include/linux/platform_data/gpio-davinci.h |  20 ++--
 2 files changed, 90 insertions(+), 104 deletions(-)



Reviewed-by: Grygorii Strashko 

--
regards,
-grygorii

Re: [tpmdd-devel] [PATCH RFC v2 5/5] tpm2: expose resource manager via a device link /dev/tpms

2017-01-13 Thread Jason Gunthorpe

On Fri, Jan 13, 2017 at 09:40:08AM -0800, James Bottomley wrote:
> On Fri, 2017-01-13 at 10:25 -0700, Jason Gunthorpe wrote:
> > On Thu, Jan 12, 2017 at 10:56:28PM +0200, Jarkko Sakkinen wrote:
> > 
> > > >  dev_t tpm_devt;
> > > 
> > > But they should have different major device numbers.
> > 
> > major/minors don't really matter these days since they are dynamic
> 
> Right, although we have this weird piece of code:
> 
> 
>   if (chip->dev_num == 0)
>   chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
>   else
>   chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);
> 
> So the first TPM device gets the MISC_MAJOR with TPM_MINOR and the rest
> get the dynamic major/minor.  It means when you do an ls on a complex
> system you get something like:

The TPM has always used a static major/minor for tpm0 and dynamic for
the following. Originally this used a misc_device and did:

} else if (chip->dev_num == 0)
chip->vendor.miscdev.minor = TPM_MINOR;
else
chip->vendor.miscdev.minor = MISC_DYNAMIC_MINOR;

When we moved to struct device the !0 tpms got a dynamic major as
well.

I don't really know what the broad kernel feeling is on historical
major/minor stability - this was mainly continuing what had always
been done in this code for pre-udev userspace compatibility.

Does it harm anything?

Jason

Re: [tpmdd-devel] [PATCH RFC v2 5/5] tpm2: expose resource manager via a device link /dev/tpms

2017-01-13 Thread Jason Gunthorpe

On Fri, Jan 13, 2017 at 09:40:08AM -0800, James Bottomley wrote:
> On Fri, 2017-01-13 at 10:25 -0700, Jason Gunthorpe wrote:
> > On Thu, Jan 12, 2017 at 10:56:28PM +0200, Jarkko Sakkinen wrote:
> > 
> > > >  dev_t tpm_devt;
> > > 
> > > But they should have different major device numbers.
> > 
> > major/minors don't really matter these days since they are dynamic
> 
> Right, although we have this weird piece of code:
> 
> 
>   if (chip->dev_num == 0)
>   chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
>   else
>   chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);
> 
> So the first TPM device gets the MISC_MAJOR with TPM_MINOR and the rest
> get the dynamic major/minor.  It means when you do an ls on a complex
> system you get something like:

The TPM has always used a static major/minor for tpm0 and dynamic for
the following. Originally this used a misc_device and did:

} else if (chip->dev_num == 0)
chip->vendor.miscdev.minor = TPM_MINOR;
else
chip->vendor.miscdev.minor = MISC_DYNAMIC_MINOR;

When we moved to struct device the !0 tpms got a dynamic major as
well.

I don't really know what the broad kernel feeling is on historical
major/minor stability - this was mainly continuing what had always
been done in this code for pre-udev userspace compatibility.

Does it harm anything?

Jason

[PATCH] KVM: VMX: downgrade warning on unexpected exit code

2017-01-13 Thread Radim Krčmář

2017-01-02 11:23+0100, Dmitry Vyukov:
> Hello,
> 
> I've got the following warning while running syzkaller fuzzer:
> 
> WARNING: CPU: 2 PID: 13257 at arch/x86/kvm/vmx.c:8633
> vmx_handle_exit+0x262b/0x38b0 arch/x86/kvm/vmx.c:8633
> vmx: unexpected exit reason 0xb
> CPU: 2 PID: 13257 Comm: syz-executor7 Not tainted 4.10.0-rc1+ #118
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>  panic+0x1cb/0x3a9 kernel/panic.c:179
>  __warn+0x1c4/0x1e0 kernel/panic.c:539
>  warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:562
>  vmx_handle_exit+0x262b/0x38b0 arch/x86/kvm/vmx.c:8633
>  vcpu_enter_guest arch/x86/kvm/x86.c:6884 [inline]
>  vcpu_run arch/x86/kvm/x86.c:6943 [inline]
>  kvm_arch_vcpu_ioctl_run+0xf3d/0x45f0 arch/x86/kvm/x86.c:7101
>  kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2569
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x4438a9
> RSP: 002b:7f935aa2fb58 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 0015 RCX: 004438a9
> RDX:  RSI: ae80 RDI: 0015
> RBP: 006ddb30 R08:  R09: 
> R10:  R11: 0286 R12: 0070
> R13: 0006 R14: 0015 R15: 20014000
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> 
> Exit reason 0xb is GETSEC instruction. This does not look harmful as
> it is handled as #UD. But I think we should print a single line
> message regarding non-emulated instruction as in other cases, just to
> not scare cloud admins and to make syzkaller ignore it.
> 
> FTR, a raw reproducer is here:
> https://gist.githubusercontent.com/dvyukov/c762f6ea04ebbba49cdee0a6caca31b4/raw/f21deb04cdc70ae74100c12447d71bb0cd2025c7/gistfile1.txt

I can't reproduce -- maybe a nested bug, which will take a while to
figure out.  Still, host dump at that point is useless, so the change
makes sense.

A guest dump would be useful (e.g. I think that GETSEC should not exit
if guest CR4.SMX is disabled), but just giving the error is about as
good and we want something short and rate-limited if the message can be
trigerred by a guest in production ...

---8<---
We never needed the call trace and we better rate-limit if it can be
triggered by a guest.

Signed-off-by: Radim Krčmář 
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a236decb81e4..7cd606daa01e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8595,7 +8595,8 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu);
else {
-   WARN_ONCE(1, "vmx: unexpected exit reason 0x%x\n", exit_reason);
+   vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n",
+   exit_reason);
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
-- 
2.11.0

[PATCH] KVM: VMX: downgrade warning on unexpected exit code

2017-01-13 Thread Radim Krčmář

2017-01-02 11:23+0100, Dmitry Vyukov:
> Hello,
> 
> I've got the following warning while running syzkaller fuzzer:
> 
> WARNING: CPU: 2 PID: 13257 at arch/x86/kvm/vmx.c:8633
> vmx_handle_exit+0x262b/0x38b0 arch/x86/kvm/vmx.c:8633
> vmx: unexpected exit reason 0xb
> CPU: 2 PID: 13257 Comm: syz-executor7 Not tainted 4.10.0-rc1+ #118
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>  panic+0x1cb/0x3a9 kernel/panic.c:179
>  __warn+0x1c4/0x1e0 kernel/panic.c:539
>  warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:562
>  vmx_handle_exit+0x262b/0x38b0 arch/x86/kvm/vmx.c:8633
>  vcpu_enter_guest arch/x86/kvm/x86.c:6884 [inline]
>  vcpu_run arch/x86/kvm/x86.c:6943 [inline]
>  kvm_arch_vcpu_ioctl_run+0xf3d/0x45f0 arch/x86/kvm/x86.c:7101
>  kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2569
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x4438a9
> RSP: 002b:7f935aa2fb58 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 0015 RCX: 004438a9
> RDX:  RSI: ae80 RDI: 0015
> RBP: 006ddb30 R08:  R09: 
> R10:  R11: 0286 R12: 0070
> R13: 0006 R14: 0015 R15: 20014000
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> 
> Exit reason 0xb is GETSEC instruction. This does not look harmful as
> it is handled as #UD. But I think we should print a single line
> message regarding non-emulated instruction as in other cases, just to
> not scare cloud admins and to make syzkaller ignore it.
> 
> FTR, a raw reproducer is here:
> https://gist.githubusercontent.com/dvyukov/c762f6ea04ebbba49cdee0a6caca31b4/raw/f21deb04cdc70ae74100c12447d71bb0cd2025c7/gistfile1.txt

I can't reproduce -- maybe a nested bug, which will take a while to
figure out.  Still, host dump at that point is useless, so the change
makes sense.

A guest dump would be useful (e.g. I think that GETSEC should not exit
if guest CR4.SMX is disabled), but just giving the error is about as
good and we want something short and rate-limited if the message can be
trigerred by a guest in production ...

---8<---
We never needed the call trace and we better rate-limit if it can be
triggered by a guest.

Signed-off-by: Radim Krčmář 
---
 arch/x86/kvm/vmx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a236decb81e4..7cd606daa01e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8595,7 +8595,8 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu);
else {
-   WARN_ONCE(1, "vmx: unexpected exit reason 0x%x\n", exit_reason);
+   vcpu_unimpl(vcpu, "vmx: unexpected exit reason 0x%x\n",
+   exit_reason);
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
-- 
2.11.0

[PATCH v4 0/2] reset: Make optional functions really optional.

2017-01-13 Thread Ramiro Oliveira

v4:
 - Change the way error handling was being processed when in optional mode.

v3:
 - Improve description of changes ocurring in the patchset
 - Add 2 new return errors.

v2:
 - Make some comments more explicit
 - Add optional flag to reduce code duplication
 - Change shared flag from int to bool to match optional flag 

Up until now optional functions in the reset API were similar to the non
optional.

This patch corrects that, while maintaining compatibility with existing
drivers.

As suggested here: https://lkml.org/lkml/2016/12/14/502

Ramiro Oliveira (2):
  reset: Change shared flag from int to bool
  reset: make optional functions really optional

 drivers/reset/core.c  | 53 +++
 include/linux/reset.h | 45 +--
 2 files changed, 67 insertions(+), 31 deletions(-)

-- 
2.11.0

[PATCH v4 1/2] reset: Change shared flag from int to bool

2017-01-13 Thread Ramiro Oliveira

Since the new parameter being added is going to be a bool this patch
changes the shared flag from int to bool to match the new parameter.

Signed-off-by: Ramiro Oliveira 
---
 drivers/reset/core.c  |  8 
 include/linux/reset.h | 32 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/reset/core.c b/drivers/reset/core.c
index 10368ed8fd13..272c1e4ecb5c 100644
--- a/drivers/reset/core.c
+++ b/drivers/reset/core.c
@@ -41,7 +41,7 @@ struct reset_control {
struct list_head list;
unsigned int id;
unsigned int refcnt;
-   int shared;
+   bool shared;
atomic_t deassert_count;
atomic_t triggered_count;
 };
@@ -254,7 +254,7 @@ EXPORT_SYMBOL_GPL(reset_control_status);
 
 static struct reset_control *__reset_control_get(
struct reset_controller_dev *rcdev,
-   unsigned int index, int shared)
+   unsigned int index, bool shared)
 {
struct reset_control *rstc;
 
@@ -299,7 +299,7 @@ static void __reset_control_put(struct reset_control *rstc)
 }
 
 struct reset_control *__of_reset_control_get(struct device_node *node,
-const char *id, int index, int shared)
+const char *id, int index, bool shared)
 {
struct reset_control *rstc;
struct reset_controller_dev *r, *rcdev;
@@ -379,7 +379,7 @@ static void devm_reset_control_release(struct device *dev, 
void *res)
 }
 
 struct reset_control *__devm_reset_control_get(struct device *dev,
-const char *id, int index, int shared)
+const char *id, int index, bool shared)
 {
struct reset_control **ptr, *rstc;
 
diff --git a/include/linux/reset.h b/include/linux/reset.h
index 5daff15722d3..ec1d1fd28f5f 100644
--- a/include/linux/reset.h
+++ b/include/linux/reset.h
@@ -13,10 +13,10 @@ int reset_control_deassert(struct reset_control *rstc);
 int reset_control_status(struct reset_control *rstc);
 
 struct reset_control *__of_reset_control_get(struct device_node *node,
-const char *id, int index, int shared);
+const char *id, int index, bool shared);
 void reset_control_put(struct reset_control *rstc);
 struct reset_control *__devm_reset_control_get(struct device *dev,
-const char *id, int index, int shared);
+const char *id, int index, bool shared);
 
 int __must_check device_reset(struct device *dev);
 
@@ -69,14 +69,14 @@ static inline int device_reset_optional(struct device *dev)
 
 static inline struct reset_control *__of_reset_control_get(
struct device_node *node,
-   const char *id, int index, int shared)
+   const char *id, int index, bool shared)
 {
return ERR_PTR(-ENOTSUPP);
 }
 
 static inline struct reset_control *__devm_reset_control_get(
struct device *dev,
-   const char *id, int index, int shared)
+   const char *id, int index, bool shared)
 {
return ERR_PTR(-ENOTSUPP);
 }
@@ -132,19 +132,19 @@ __must_check reset_control_get_exclusive(struct device 
*dev, const char *id)
 static inline struct reset_control *reset_control_get_shared(
struct device *dev, const char *id)
 {
-   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, 1);
+   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, true);
 }
 
 static inline struct reset_control *reset_control_get_optional_exclusive(
struct device *dev, const char *id)
 {
-   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, 0);
+   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, false);
 }
 
 static inline struct reset_control *reset_control_get_optional_shared(
struct device *dev, const char *id)
 {
-   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, 1);
+   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, true);
 }
 
 /**
@@ -185,7 +185,7 @@ static inline struct reset_control 
*of_reset_control_get_exclusive(
 static inline struct reset_control *of_reset_control_get_shared(
struct device_node *node, const char *id)
 {
-   return __of_reset_control_get(node, id, 0, 1);
+   return __of_reset_control_get(node, id, 0, true);
 }
 
 /**
@@ -202,7 +202,7 @@ static inline struct reset_control 
*of_reset_control_get_shared(
 static inline struct reset_control *of_reset_control_get_exclusive_by_index(

[PATCH v4 0/2] reset: Make optional functions really optional.

2017-01-13 Thread Ramiro Oliveira

v4:
 - Change the way error handling was being processed when in optional mode.

v3:
 - Improve description of changes ocurring in the patchset
 - Add 2 new return errors.

v2:
 - Make some comments more explicit
 - Add optional flag to reduce code duplication
 - Change shared flag from int to bool to match optional flag 

Up until now optional functions in the reset API were similar to the non
optional.

This patch corrects that, while maintaining compatibility with existing
drivers.

As suggested here: https://lkml.org/lkml/2016/12/14/502

Ramiro Oliveira (2):
  reset: Change shared flag from int to bool
  reset: make optional functions really optional

 drivers/reset/core.c  | 53 +++
 include/linux/reset.h | 45 +--
 2 files changed, 67 insertions(+), 31 deletions(-)

-- 
2.11.0

[PATCH v4 1/2] reset: Change shared flag from int to bool

2017-01-13 Thread Ramiro Oliveira

Since the new parameter being added is going to be a bool this patch
changes the shared flag from int to bool to match the new parameter.

Signed-off-by: Ramiro Oliveira 
---
 drivers/reset/core.c  |  8 
 include/linux/reset.h | 32 
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/reset/core.c b/drivers/reset/core.c
index 10368ed8fd13..272c1e4ecb5c 100644
--- a/drivers/reset/core.c
+++ b/drivers/reset/core.c
@@ -41,7 +41,7 @@ struct reset_control {
struct list_head list;
unsigned int id;
unsigned int refcnt;
-   int shared;
+   bool shared;
atomic_t deassert_count;
atomic_t triggered_count;
 };
@@ -254,7 +254,7 @@ EXPORT_SYMBOL_GPL(reset_control_status);
 
 static struct reset_control *__reset_control_get(
struct reset_controller_dev *rcdev,
-   unsigned int index, int shared)
+   unsigned int index, bool shared)
 {
struct reset_control *rstc;
 
@@ -299,7 +299,7 @@ static void __reset_control_put(struct reset_control *rstc)
 }
 
 struct reset_control *__of_reset_control_get(struct device_node *node,
-const char *id, int index, int shared)
+const char *id, int index, bool shared)
 {
struct reset_control *rstc;
struct reset_controller_dev *r, *rcdev;
@@ -379,7 +379,7 @@ static void devm_reset_control_release(struct device *dev, 
void *res)
 }
 
 struct reset_control *__devm_reset_control_get(struct device *dev,
-const char *id, int index, int shared)
+const char *id, int index, bool shared)
 {
struct reset_control **ptr, *rstc;
 
diff --git a/include/linux/reset.h b/include/linux/reset.h
index 5daff15722d3..ec1d1fd28f5f 100644
--- a/include/linux/reset.h
+++ b/include/linux/reset.h
@@ -13,10 +13,10 @@ int reset_control_deassert(struct reset_control *rstc);
 int reset_control_status(struct reset_control *rstc);
 
 struct reset_control *__of_reset_control_get(struct device_node *node,
-const char *id, int index, int shared);
+const char *id, int index, bool shared);
 void reset_control_put(struct reset_control *rstc);
 struct reset_control *__devm_reset_control_get(struct device *dev,
-const char *id, int index, int shared);
+const char *id, int index, bool shared);
 
 int __must_check device_reset(struct device *dev);
 
@@ -69,14 +69,14 @@ static inline int device_reset_optional(struct device *dev)
 
 static inline struct reset_control *__of_reset_control_get(
struct device_node *node,
-   const char *id, int index, int shared)
+   const char *id, int index, bool shared)
 {
return ERR_PTR(-ENOTSUPP);
 }
 
 static inline struct reset_control *__devm_reset_control_get(
struct device *dev,
-   const char *id, int index, int shared)
+   const char *id, int index, bool shared)
 {
return ERR_PTR(-ENOTSUPP);
 }
@@ -132,19 +132,19 @@ __must_check reset_control_get_exclusive(struct device 
*dev, const char *id)
 static inline struct reset_control *reset_control_get_shared(
struct device *dev, const char *id)
 {
-   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, 1);
+   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, true);
 }
 
 static inline struct reset_control *reset_control_get_optional_exclusive(
struct device *dev, const char *id)
 {
-   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, 0);
+   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, false);
 }
 
 static inline struct reset_control *reset_control_get_optional_shared(
struct device *dev, const char *id)
 {
-   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, 1);
+   return __of_reset_control_get(dev ? dev->of_node : NULL, id, 0, true);
 }
 
 /**
@@ -185,7 +185,7 @@ static inline struct reset_control 
*of_reset_control_get_exclusive(
 static inline struct reset_control *of_reset_control_get_shared(
struct device_node *node, const char *id)
 {
-   return __of_reset_control_get(node, id, 0, 1);
+   return __of_reset_control_get(node, id, 0, true);
 }
 
 /**
@@ -202,7 +202,7 @@ static inline struct reset_control 
*of_reset_control_get_shared(
 static inline struct reset_control *of_reset_control_get_exclusive_by_index(

[PATCH v4 2/2] reset: make optional functions really optional

2017-01-13 Thread Ramiro Oliveira

The *_get_optional_* functions weren't really optional so this patch
makes them really optional.

These *_get_optional_* functions will now return NULL instead of an error
if no matching reset phandle is found in the DT, and all the
reset_control_* functions now accept NULL rstc pointers.

Signed-off-by: Ramiro Oliveira 
---
 drivers/reset/core.c  | 49 +++--
 include/linux/reset.h | 45 ++---
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/reset/core.c b/drivers/reset/core.c
index 272c1e4ecb5c..c79cce3a7b6d 100644
--- a/drivers/reset/core.c
+++ b/drivers/reset/core.c
@@ -143,12 +143,18 @@ EXPORT_SYMBOL_GPL(devm_reset_controller_register);
  * a no-op.
  * Consumers must not use reset_control_(de)assert on shared reset lines when
  * reset_control_reset has been used.
+ *
+ * If rstc is NULL it is an optional reset and the function will just
+ * return 0.
  */
 int reset_control_reset(struct reset_control *rstc)
 {
int ret;
 
-   if (WARN_ON(IS_ERR_OR_NULL(rstc)))
+   if (!rstc)
+   return 0;
+
+   if (WARN_ON(IS_ERR(rstc)))
return -EINVAL;
 
if (!rstc->rcdev->ops->reset)
@@ -182,10 +188,17 @@ EXPORT_SYMBOL_GPL(reset_control_reset);
  * internal state to be reset, but must be prepared for this to happen.
  * Consumers must not use reset_control_reset on shared reset lines when
  * reset_control_(de)assert has been used.
+ * return 0.
+ *
+ * If rstc is NULL it is an optional reset and the function will just
+ * return 0.
  */
 int reset_control_assert(struct reset_control *rstc)
 {
-   if (WARN_ON(IS_ERR_OR_NULL(rstc)))
+   if (!rstc)
+   return 0;
+
+   if (WARN_ON(IS_ERR(rstc)))
return -EINVAL;
 
if (!rstc->rcdev->ops->assert)
@@ -213,10 +226,17 @@ EXPORT_SYMBOL_GPL(reset_control_assert);
  * After calling this function, the reset is guaranteed to be deasserted.
  * Consumers must not use reset_control_reset on shared reset lines when
  * reset_control_(de)assert has been used.
+ * return 0.
+ *
+ * If rstc is NULL it is an optional reset and the function will just
+ * return 0.
  */
 int reset_control_deassert(struct reset_control *rstc)
 {
-   if (WARN_ON(IS_ERR_OR_NULL(rstc)))
+   if (!rstc)
+   return 0;
+
+   if (WARN_ON(IS_ERR(rstc)))
return -EINVAL;
 
if (!rstc->rcdev->ops->deassert)
@@ -237,12 +257,15 @@ EXPORT_SYMBOL_GPL(reset_control_deassert);
 /**
  * reset_control_status - returns a negative errno if not supported, a
  * positive value if the reset line is asserted, or zero if the reset
- * line is not asserted.
+ * line is not asserted or if the desc is NULL (optional reset).
  * @rstc: reset controller
  */
 int reset_control_status(struct reset_control *rstc)
 {
-   if (WARN_ON(IS_ERR_OR_NULL(rstc)))
+   if (!rstc)
+   return 0;
+
+   if (WARN_ON(IS_ERR(rstc)))
return -EINVAL;
 
if (rstc->rcdev->ops->status)
@@ -299,7 +322,8 @@ static void __reset_control_put(struct reset_control *rstc)
 }
 
 struct reset_control *__of_reset_control_get(struct device_node *node,
-const char *id, int index, bool shared)
+const char *id, int index, bool shared,
+bool optional)
 {
struct reset_control *rstc;
struct reset_controller_dev *r, *rcdev;
@@ -313,14 +337,18 @@ struct reset_control *__of_reset_control_get(struct 
device_node *node,
if (id) {
index = of_property_match_string(node,
 "reset-names", id);
+   if (index == -EILSEQ)
+   return ERR_PTR(index);
if (index < 0)
-   return ERR_PTR(-ENOENT);
+   return optional ? NULL : ERR_PTR(-ENOENT);
}
 
ret = of_parse_phandle_with_args(node, "resets", "#reset-cells",
 index, );
-   if (ret)
+   if (ret == -EINVAL)
return ERR_PTR(ret);
+   if (ret)
+   return optional ? NULL : ERR_PTR(ret);
 
mutex_lock(_list_mutex);
rcdev = NULL;
@@ -379,7 +407,8 @@ static void devm_reset_control_release(struct device *dev, 
void *res)
 }
 
 struct reset_control *__devm_reset_control_get(struct device *dev,
-const char *id, int index, bool shared)
+const char *id, int index, bool shared,
+bool optional)
 {
struct reset_control **ptr, *rstc;
 
@@ -389,7 +418,7 @@ struct reset_control *__devm_reset_control_get(struct 
device *dev,
return ERR_PTR(-ENOMEM);
 
rstc = __of_reset_control_get(dev ? dev->of_node : NULL,
-

[PATCH v4 2/2] reset: make optional functions really optional

2017-01-13 Thread Ramiro Oliveira

The *_get_optional_* functions weren't really optional so this patch
makes them really optional.

These *_get_optional_* functions will now return NULL instead of an error
if no matching reset phandle is found in the DT, and all the
reset_control_* functions now accept NULL rstc pointers.

Signed-off-by: Ramiro Oliveira 
---
 drivers/reset/core.c  | 49 +++--
 include/linux/reset.h | 45 ++---
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/reset/core.c b/drivers/reset/core.c
index 272c1e4ecb5c..c79cce3a7b6d 100644
--- a/drivers/reset/core.c
+++ b/drivers/reset/core.c
@@ -143,12 +143,18 @@ EXPORT_SYMBOL_GPL(devm_reset_controller_register);
  * a no-op.
  * Consumers must not use reset_control_(de)assert on shared reset lines when
  * reset_control_reset has been used.
+ *
+ * If rstc is NULL it is an optional reset and the function will just
+ * return 0.
  */
 int reset_control_reset(struct reset_control *rstc)
 {
int ret;
 
-   if (WARN_ON(IS_ERR_OR_NULL(rstc)))
+   if (!rstc)
+   return 0;
+
+   if (WARN_ON(IS_ERR(rstc)))
return -EINVAL;
 
if (!rstc->rcdev->ops->reset)
@@ -182,10 +188,17 @@ EXPORT_SYMBOL_GPL(reset_control_reset);
  * internal state to be reset, but must be prepared for this to happen.
  * Consumers must not use reset_control_reset on shared reset lines when
  * reset_control_(de)assert has been used.
+ * return 0.
+ *
+ * If rstc is NULL it is an optional reset and the function will just
+ * return 0.
  */
 int reset_control_assert(struct reset_control *rstc)
 {
-   if (WARN_ON(IS_ERR_OR_NULL(rstc)))
+   if (!rstc)
+   return 0;
+
+   if (WARN_ON(IS_ERR(rstc)))
return -EINVAL;
 
if (!rstc->rcdev->ops->assert)
@@ -213,10 +226,17 @@ EXPORT_SYMBOL_GPL(reset_control_assert);
  * After calling this function, the reset is guaranteed to be deasserted.
  * Consumers must not use reset_control_reset on shared reset lines when
  * reset_control_(de)assert has been used.
+ * return 0.
+ *
+ * If rstc is NULL it is an optional reset and the function will just
+ * return 0.
  */
 int reset_control_deassert(struct reset_control *rstc)
 {
-   if (WARN_ON(IS_ERR_OR_NULL(rstc)))
+   if (!rstc)
+   return 0;
+
+   if (WARN_ON(IS_ERR(rstc)))
return -EINVAL;
 
if (!rstc->rcdev->ops->deassert)
@@ -237,12 +257,15 @@ EXPORT_SYMBOL_GPL(reset_control_deassert);
 /**
  * reset_control_status - returns a negative errno if not supported, a
  * positive value if the reset line is asserted, or zero if the reset
- * line is not asserted.
+ * line is not asserted or if the desc is NULL (optional reset).
  * @rstc: reset controller
  */
 int reset_control_status(struct reset_control *rstc)
 {
-   if (WARN_ON(IS_ERR_OR_NULL(rstc)))
+   if (!rstc)
+   return 0;
+
+   if (WARN_ON(IS_ERR(rstc)))
return -EINVAL;
 
if (rstc->rcdev->ops->status)
@@ -299,7 +322,8 @@ static void __reset_control_put(struct reset_control *rstc)
 }
 
 struct reset_control *__of_reset_control_get(struct device_node *node,
-const char *id, int index, bool shared)
+const char *id, int index, bool shared,
+bool optional)
 {
struct reset_control *rstc;
struct reset_controller_dev *r, *rcdev;
@@ -313,14 +337,18 @@ struct reset_control *__of_reset_control_get(struct 
device_node *node,
if (id) {
index = of_property_match_string(node,
 "reset-names", id);
+   if (index == -EILSEQ)
+   return ERR_PTR(index);
if (index < 0)
-   return ERR_PTR(-ENOENT);
+   return optional ? NULL : ERR_PTR(-ENOENT);
}
 
ret = of_parse_phandle_with_args(node, "resets", "#reset-cells",
 index, );
-   if (ret)
+   if (ret == -EINVAL)
return ERR_PTR(ret);
+   if (ret)
+   return optional ? NULL : ERR_PTR(ret);
 
mutex_lock(_list_mutex);
rcdev = NULL;
@@ -379,7 +407,8 @@ static void devm_reset_control_release(struct device *dev, 
void *res)
 }
 
 struct reset_control *__devm_reset_control_get(struct device *dev,
-const char *id, int index, bool shared)
+const char *id, int index, bool shared,
+bool optional)
 {
struct reset_control **ptr, *rstc;
 
@@ -389,7 +418,7 @@ struct reset_control *__devm_reset_control_get(struct 
device *dev,
return ERR_PTR(-ENOMEM);
 
rstc = __of_reset_control_get(dev ? dev->of_node : NULL,
-

Re: kvm: WARNING in x86_emulate_insn

2017-01-13 Thread Radim Krčmář

2017-01-12 14:55+0100, Dmitry Vyukov:
> Hello,
> 
> I've got the following WARNING in x86_emulate_insn while running
> syzkaller fuzzer:
> 
> WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/emulate.c:5558
> x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572
> Modules linked in:
> CPU: 2 PID: 18646 Comm: syz-executor Not tainted 4.10.0-rc3+ #155
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>  __warn+0x19f/0x1e0 kernel/panic.c:547
>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
>  x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572
>  x86_emulate_instruction+0x403/0x1cc0 arch/x86/kvm/x86.c:5618
>  emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline]
>  handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762
>  vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625
>  vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline]
>  vcpu_run arch/x86/kvm/x86.c:6947 [inline]
>  kvm_arch_vcpu_ioctl_run+0xf3d/0x4660 arch/x86/kvm/x86.c:7105
>  kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2569
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x445329
> RSP: 002b:7f9e6e22fb58 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 0018 RCX: 00445329
> RDX:  RSI: ae80 RDI: 0018
> RBP: 006deb40 R08:  R09: 
> R10:  R11: 0286 R12: 00700150
> R13:  R14: 7f9e6e2309c0 R15: 7f9e6e230700
> ---[ end trace 6b54f749506b620c ]---
> [ cut here ]
> WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/x86.c:366
> exception_type+0x73/0x80 arch/x86/kvm/x86.c:366
> Modules linked in:
> CPU: 2 PID: 18646 Comm: syz-executor Tainted: GW   4.10.0-rc3+ 
> #155
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>  __warn+0x19f/0x1e0 kernel/panic.c:547
>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
>  exception_type+0x73/0x80 arch/x86/kvm/x86.c:366
>  x86_emulate_instruction+0x1356/0x1cc0 arch/x86/kvm/x86.c:5664
>  emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline]
>  handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762
>  vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625
>  vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline]
>  vcpu_run arch/x86/kvm/x86.c:6947 [inline]
>  kvm_arch_vcpu_ioctl_run+0xf3d/0x4660 arch/x86/kvm/x86.c:7105
>  kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2569
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x445329
> RSP: 002b:7f9e6e22fb58 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 0018 RCX: 00445329
> RDX:  RSI: ae80 RDI: 0018
> RBP: 006deb40 R08:  R09: 
> R10:  R11: 0286 R12: 00700150
> R13:  R14: 7f9e6e2309c0 R15: 7f9e6e230700
> ---[ end trace 6b54f749506b620d ]---
> 
> On commit ba836a6f5ab1243ff5e08a941a2d1de8b31244e1.
> 
> Unfortunately I can't reproduce it with a C program.
> It reproduces with the following syzkaller program within a minute, though:
> https://gist.githubusercontent.com/dvyukov/d09118fb9d986a9385487d80a1b50680/raw/884c68d22c3a80778ae596a6c5daf7467ea41b68/gistfile1.txt
> It can be executed following these instructions:
> https://github.com/google/syzkaller/wiki/How-to-execute-syzkaller-programs
> I run syz-execprog as:
> ./syz-execprog -repeat=0 -procs=8 -sandbox=none gistfile1.txt
> 
> Note that syz_kvm_setup_cpu is a pseudo syscall that setups vcpu into
> a complex state:
> https://github.com/google/syzkaller/blob/master/executor/common_kvm_amd64.h#L271
> 
> My bet would be on some race where VM memory is overwritten
> concurrently, and it affects either guest execution or
> emulate_instruction in a bad way...

Yeah, all functions that return X86EMUL_PROPAGATE_FAULT seem to set
exception.vector to something sane.  The only easy way to get a bad value there
is when x86_emulate_instruction() clears it to -1U, but I don't see how a race
would play out.

Anyway, I can't reproduce on bare metal [got another warning, see below].
Will try after rebuilding a guest kernel.

Thanks.


The best result was this warning after 300k executions:

[ cut here ]
WARNING: CPU: 7 PID: 20187 at lib/debugobjects.c:263

Re: kvm: WARNING in x86_emulate_insn

2017-01-13 Thread Radim Krčmář

2017-01-12 14:55+0100, Dmitry Vyukov:
> Hello,
> 
> I've got the following WARNING in x86_emulate_insn while running
> syzkaller fuzzer:
> 
> WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/emulate.c:5558
> x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572
> Modules linked in:
> CPU: 2 PID: 18646 Comm: syz-executor Not tainted 4.10.0-rc3+ #155
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>  __warn+0x19f/0x1e0 kernel/panic.c:547
>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
>  x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572
>  x86_emulate_instruction+0x403/0x1cc0 arch/x86/kvm/x86.c:5618
>  emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline]
>  handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762
>  vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625
>  vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline]
>  vcpu_run arch/x86/kvm/x86.c:6947 [inline]
>  kvm_arch_vcpu_ioctl_run+0xf3d/0x4660 arch/x86/kvm/x86.c:7105
>  kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2569
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x445329
> RSP: 002b:7f9e6e22fb58 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 0018 RCX: 00445329
> RDX:  RSI: ae80 RDI: 0018
> RBP: 006deb40 R08:  R09: 
> R10:  R11: 0286 R12: 00700150
> R13:  R14: 7f9e6e2309c0 R15: 7f9e6e230700
> ---[ end trace 6b54f749506b620c ]---
> [ cut here ]
> WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/x86.c:366
> exception_type+0x73/0x80 arch/x86/kvm/x86.c:366
> Modules linked in:
> CPU: 2 PID: 18646 Comm: syz-executor Tainted: GW   4.10.0-rc3+ 
> #155
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>  __warn+0x19f/0x1e0 kernel/panic.c:547
>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
>  exception_type+0x73/0x80 arch/x86/kvm/x86.c:366
>  x86_emulate_instruction+0x1356/0x1cc0 arch/x86/kvm/x86.c:5664
>  emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline]
>  handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762
>  vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625
>  vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline]
>  vcpu_run arch/x86/kvm/x86.c:6947 [inline]
>  kvm_arch_vcpu_ioctl_run+0xf3d/0x4660 arch/x86/kvm/x86.c:7105
>  kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2569
>  vfs_ioctl fs/ioctl.c:43 [inline]
>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:683
>  SYSC_ioctl fs/ioctl.c:698 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x445329
> RSP: 002b:7f9e6e22fb58 EFLAGS: 0286 ORIG_RAX: 0010
> RAX: ffda RBX: 0018 RCX: 00445329
> RDX:  RSI: ae80 RDI: 0018
> RBP: 006deb40 R08:  R09: 
> R10:  R11: 0286 R12: 00700150
> R13:  R14: 7f9e6e2309c0 R15: 7f9e6e230700
> ---[ end trace 6b54f749506b620d ]---
> 
> On commit ba836a6f5ab1243ff5e08a941a2d1de8b31244e1.
> 
> Unfortunately I can't reproduce it with a C program.
> It reproduces with the following syzkaller program within a minute, though:
> https://gist.githubusercontent.com/dvyukov/d09118fb9d986a9385487d80a1b50680/raw/884c68d22c3a80778ae596a6c5daf7467ea41b68/gistfile1.txt
> It can be executed following these instructions:
> https://github.com/google/syzkaller/wiki/How-to-execute-syzkaller-programs
> I run syz-execprog as:
> ./syz-execprog -repeat=0 -procs=8 -sandbox=none gistfile1.txt
> 
> Note that syz_kvm_setup_cpu is a pseudo syscall that setups vcpu into
> a complex state:
> https://github.com/google/syzkaller/blob/master/executor/common_kvm_amd64.h#L271
> 
> My bet would be on some race where VM memory is overwritten
> concurrently, and it affects either guest execution or
> emulate_instruction in a bad way...

Yeah, all functions that return X86EMUL_PROPAGATE_FAULT seem to set
exception.vector to something sane.  The only easy way to get a bad value there
is when x86_emulate_instruction() clears it to -1U, but I don't see how a race
would play out.

Anyway, I can't reproduce on bare metal [got another warning, see below].
Will try after rebuilding a guest kernel.

Thanks.


The best result was this warning after 300k executions:

[ cut here ]
WARNING: CPU: 7 PID: 20187 at lib/debugobjects.c:263

Re: [PATCH 12/37] PCI: dwc: Create a new config symbol to enable pci dwc host

2017-01-13 Thread Joao Pinto

Hi Kishon,

Às 10:26 AM de 1/12/2017, Kishon Vijay Abraham I escreveu:
> Now that pci designware host has a separate file, create a new
> config symbol to select the host only driver. This is in preparation
> to enable endpoint support to designware driver.
> 
> Signed-off-by: Kishon Vijay Abraham I 
> ---
>  drivers/pci/dwc/Kconfig   |   26 +++---
>  drivers/pci/dwc/Makefile  |3 ++-
>  drivers/pci/dwc/pcie-designware.h |   29 +
>  3 files changed, 42 insertions(+), 16 deletions(-)
> 

You are already working in a base where dwc/ already exists. I know you made a
rename / re-structure patch for pci, but I think it was not yet accepted, right?
I don't see it in any of Bjorn' dev branches.

Thanks.

> diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig
> index 8b08519..d0bdfb5 100644
> --- a/drivers/pci/dwc/Kconfig
> +++ b/drivers/pci/dwc/Kconfig
> @@ -3,13 +3,17 @@ menu "DesignWare PCI Core Support"
>  
>  config PCIE_DW
>   bool
> +
> +config PCIE_DW_HOST
> +bool
>   depends on PCI_MSI_IRQ_DOMAIN
> +select PCIE_DW
>  
>  config PCI_DRA7XX
>   bool "TI DRA7xx PCIe controller"
>   depends on OF && HAS_IOMEM && TI_PIPE3
>   depends on PCI_MSI_IRQ_DOMAIN
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
>Enables support for the PCIe controller in the DRA7xx SoC.  There
>are two instances of PCIe controller in DRA7xx.  This controller can
> @@ -18,7 +22,7 @@ config PCI_DRA7XX
>  config PCIE_DW_PLAT
>   bool "Platform bus based DesignWare PCIe Controller"
>   depends on PCI_MSI_IRQ_DOMAIN
> - select PCIE_DW
> + select PCIE_DW_HOST
>   ---help---
>This selects the DesignWare PCIe controller support. Select this if
>you have a PCIe controller on Platform bus.
> @@ -32,21 +36,21 @@ config PCI_EXYNOS
>   depends on SOC_EXYNOS5440 || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>  
>  config PCI_IMX6
>   bool "Freescale i.MX6 PCIe controller"
>   depends on SOC_IMX6Q || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>  
>  config PCIE_SPEAR13XX
>   bool "STMicroelectronics SPEAr PCIe controller"
>   depends on ARCH_SPEAR13XX || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want PCIe support on SPEAr13XX SoCs.
>  
> @@ -55,7 +59,7 @@ config PCI_KEYSTONE
>   depends on ARCH_KEYSTONE || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want to enable PCI controller support on Keystone
> SoCs. The PCI controller on Keystone is based on Designware hardware
> @@ -67,7 +71,7 @@ config PCI_LAYERSCAPE
>   depends on OF && (ARM || ARCH_LAYERSCAPE || COMPILE_TEST)
>   depends on PCI_MSI_IRQ_DOMAIN
>   select MFD_SYSCON
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want PCIe controller support on Layerscape SoCs.
>  
> @@ -76,7 +80,7 @@ config PCI_HISI
>   bool "HiSilicon Hip05 and Hip06 SoCs PCIe controllers"
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want PCIe controller support on HiSilicon
> Hip05 and Hip06 SoCs
> @@ -86,7 +90,7 @@ config PCIE_QCOM
>   depends on (ARCH_QCOM || COMPILE_TEST) && OF
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here to enable PCIe controller support on Qualcomm SoCs. The
> PCIe controller uses the Designware core plus Qualcomm-specific
> @@ -97,7 +101,7 @@ config PCIE_ARMADA_8K
>   depends on ARCH_MVEBU || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want to enable PCIe controller support on
> Armada-8K SoCs. The PCIe controller on Armada-8K is based on
> @@ -109,7 +113,7 @@ config PCIE_ARTPEC6
>   depends on MACH_ARTPEC6 || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here to enable PCIe controller support on Axis ARTPEC-6
> SoCs.  This PCIe controller uses the DesignWare core.
> diff --git a/drivers/pci/dwc/Makefile b/drivers/pci/dwc/Makefile
> index 3b57e55..a2df13c 100644
> --- a/drivers/pci/dwc/Makefile
> +++ b/drivers/pci/dwc/Makefile
> @@ -1,4 +1,5 @@
> -obj-$(CONFIG_PCIE_DW) += pcie-designware.o

Re: [PATCH 12/37] PCI: dwc: Create a new config symbol to enable pci dwc host

2017-01-13 Thread Joao Pinto

Hi Kishon,

Às 10:26 AM de 1/12/2017, Kishon Vijay Abraham I escreveu:
> Now that pci designware host has a separate file, create a new
> config symbol to select the host only driver. This is in preparation
> to enable endpoint support to designware driver.
> 
> Signed-off-by: Kishon Vijay Abraham I 
> ---
>  drivers/pci/dwc/Kconfig   |   26 +++---
>  drivers/pci/dwc/Makefile  |3 ++-
>  drivers/pci/dwc/pcie-designware.h |   29 +
>  3 files changed, 42 insertions(+), 16 deletions(-)
> 

You are already working in a base where dwc/ already exists. I know you made a
rename / re-structure patch for pci, but I think it was not yet accepted, right?
I don't see it in any of Bjorn' dev branches.

Thanks.

> diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig
> index 8b08519..d0bdfb5 100644
> --- a/drivers/pci/dwc/Kconfig
> +++ b/drivers/pci/dwc/Kconfig
> @@ -3,13 +3,17 @@ menu "DesignWare PCI Core Support"
>  
>  config PCIE_DW
>   bool
> +
> +config PCIE_DW_HOST
> +bool
>   depends on PCI_MSI_IRQ_DOMAIN
> +select PCIE_DW
>  
>  config PCI_DRA7XX
>   bool "TI DRA7xx PCIe controller"
>   depends on OF && HAS_IOMEM && TI_PIPE3
>   depends on PCI_MSI_IRQ_DOMAIN
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
>Enables support for the PCIe controller in the DRA7xx SoC.  There
>are two instances of PCIe controller in DRA7xx.  This controller can
> @@ -18,7 +22,7 @@ config PCI_DRA7XX
>  config PCIE_DW_PLAT
>   bool "Platform bus based DesignWare PCIe Controller"
>   depends on PCI_MSI_IRQ_DOMAIN
> - select PCIE_DW
> + select PCIE_DW_HOST
>   ---help---
>This selects the DesignWare PCIe controller support. Select this if
>you have a PCIe controller on Platform bus.
> @@ -32,21 +36,21 @@ config PCI_EXYNOS
>   depends on SOC_EXYNOS5440 || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>  
>  config PCI_IMX6
>   bool "Freescale i.MX6 PCIe controller"
>   depends on SOC_IMX6Q || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>  
>  config PCIE_SPEAR13XX
>   bool "STMicroelectronics SPEAr PCIe controller"
>   depends on ARCH_SPEAR13XX || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want PCIe support on SPEAr13XX SoCs.
>  
> @@ -55,7 +59,7 @@ config PCI_KEYSTONE
>   depends on ARCH_KEYSTONE || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want to enable PCI controller support on Keystone
> SoCs. The PCI controller on Keystone is based on Designware hardware
> @@ -67,7 +71,7 @@ config PCI_LAYERSCAPE
>   depends on OF && (ARM || ARCH_LAYERSCAPE || COMPILE_TEST)
>   depends on PCI_MSI_IRQ_DOMAIN
>   select MFD_SYSCON
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want PCIe controller support on Layerscape SoCs.
>  
> @@ -76,7 +80,7 @@ config PCI_HISI
>   bool "HiSilicon Hip05 and Hip06 SoCs PCIe controllers"
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want PCIe controller support on HiSilicon
> Hip05 and Hip06 SoCs
> @@ -86,7 +90,7 @@ config PCIE_QCOM
>   depends on (ARCH_QCOM || COMPILE_TEST) && OF
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here to enable PCIe controller support on Qualcomm SoCs. The
> PCIe controller uses the Designware core plus Qualcomm-specific
> @@ -97,7 +101,7 @@ config PCIE_ARMADA_8K
>   depends on ARCH_MVEBU || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here if you want to enable PCIe controller support on
> Armada-8K SoCs. The PCIe controller on Armada-8K is based on
> @@ -109,7 +113,7 @@ config PCIE_ARTPEC6
>   depends on MACH_ARTPEC6 || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
>   select PCIEPORTBUS
> - select PCIE_DW
> + select PCIE_DW_HOST
>   help
> Say Y here to enable PCIe controller support on Axis ARTPEC-6
> SoCs.  This PCIe controller uses the DesignWare core.
> diff --git a/drivers/pci/dwc/Makefile b/drivers/pci/dwc/Makefile
> index 3b57e55..a2df13c 100644
> --- a/drivers/pci/dwc/Makefile
> +++ b/drivers/pci/dwc/Makefile
> @@ -1,4 +1,5 @@
> -obj-$(CONFIG_PCIE_DW) += pcie-designware.o pcie-designware-host.o
>

Re: [PATCH 3/6 linux-next] fs/affs: make affs exportable

2017-01-13 Thread J. Bruce Fields

On Wed, Jan 04, 2017 at 06:53:31AM +0100, Fabian Frederick wrote:
> 
> 
> > On 03 January 2017 at 23:29 Al Viro  wrote:
> >
> >
> > On Tue, Jan 03, 2017 at 10:30:39PM +0100, Fabian Frederick wrote:
> > > Add standard functions making AFFS work with NFS.
> > >
> > > Functions based on ext4 implementation.
> > > Tested on loop device.
> >
> > How the hell is that supposed to work with cold dcache?  You don't have
> > ->get_parent() there at all...
> >
> > There *IS* a reference to parent directory in those suckers - not the same
> > kind as in normal unix filesystems (".." is not a directory entry there -
> > it's all fake), but it's doable.  be32_to_cpu(AFFS_TAIL(sb, bh)->parent)
> > would be the inumber you need, where bh is the inode block of directory.
> >
> > So it can be done, but not in this form.  NAK for the time being...
> 
> I worked on that function declared optional in
> Documentation/filesystems/nfs/Exporting
> but was unable to test it.

If we're going to reject patches that don't implement get_parent (and I
think we should), then we should replace "optional but strongly
recommended" there by just "mandatory".  Any objections?

--b.

Re: [PATCH 3/6 linux-next] fs/affs: make affs exportable

2017-01-13 Thread J. Bruce Fields

On Wed, Jan 04, 2017 at 06:53:31AM +0100, Fabian Frederick wrote:
> 
> 
> > On 03 January 2017 at 23:29 Al Viro  wrote:
> >
> >
> > On Tue, Jan 03, 2017 at 10:30:39PM +0100, Fabian Frederick wrote:
> > > Add standard functions making AFFS work with NFS.
> > >
> > > Functions based on ext4 implementation.
> > > Tested on loop device.
> >
> > How the hell is that supposed to work with cold dcache?  You don't have
> > ->get_parent() there at all...
> >
> > There *IS* a reference to parent directory in those suckers - not the same
> > kind as in normal unix filesystems (".." is not a directory entry there -
> > it's all fake), but it's doable.  be32_to_cpu(AFFS_TAIL(sb, bh)->parent)
> > would be the inumber you need, where bh is the inode block of directory.
> >
> > So it can be done, but not in this form.  NAK for the time being...
> 
> I worked on that function declared optional in
> Documentation/filesystems/nfs/Exporting
> but was unable to test it.

If we're going to reject patches that don't implement get_parent (and I
think we should), then we should replace "optional but strongly
recommended" there by just "mandatory".  Any objections?

--b.

Re: [PATCH] Documentation: cpuset: Fix 'cpuset.tasks' -> 'tasks'

2017-01-13 Thread W. Trevor King

On Fri, Jan 13, 2017 at 10:27:42AM -0700, Jonathan Corbet wrote:
> > -if a task's pid is written to another cpusets 'cpuset.tasks' file, then its
> > +if a task's pid is written to another cpuset's 'tasks' file, then its
> 
> So I'll confess that I don't understand this change.  All of the
> control files are referred to as cpuset.whatever in this document;
> why should this one, in particular, be different?

'tasks' is part of the generic cgroup tooling, so it doesn't get the
cpuset prefix:

  $ ls /sys/fs/cgroup/cpuset/*tasks*
  /sys/fs/cgroup/cpuset/tasks

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature

Re: [PATCH] Documentation: cpuset: Fix 'cpuset.tasks' -> 'tasks'

2017-01-13 Thread W. Trevor King

On Fri, Jan 13, 2017 at 10:27:42AM -0700, Jonathan Corbet wrote:
> > -if a task's pid is written to another cpusets 'cpuset.tasks' file, then its
> > +if a task's pid is written to another cpuset's 'tasks' file, then its
> 
> So I'll confess that I don't understand this change.  All of the
> control files are referred to as cpuset.whatever in this document;
> why should this one, in particular, be different?

'tasks' is part of the generic cgroup tooling, so it doesn't get the
cpuset prefix:

  $ ls /sys/fs/cgroup/cpuset/*tasks*
  /sys/fs/cgroup/cpuset/tasks

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature

Re: [PATCH 08/10] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-13 Thread Hannes Reinecke

On 01/13/2017 05:41 PM, Omar Sandoval wrote:
> On Fri, Jan 13, 2017 at 12:15:17PM +0100, Hannes Reinecke wrote:
>> On 01/11/2017 10:40 PM, Jens Axboe wrote:
>>> This adds a set of hooks that intercepts the blk-mq path of
>>> allocating/inserting/issuing/completing requests, allowing
>>> us to develop a scheduler within that framework.
>>>
>>> We reuse the existing elevator scheduler API on the registration
>>> side, but augment that with the scheduler flagging support for
>>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>>> devices.
>>>
>>> We split driver and scheduler tags, so we can run the scheduling
>>> independent of device queue depth.
>>>
>>> Signed-off-by: Jens Axboe 
>> [ .. ]
>>> @@ -823,6 +847,35 @@ static inline unsigned int queued_to_index(unsigned 
>>> int queued)
>>> return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
>>>  }
>>>  
>>> +static bool blk_mq_get_driver_tag(struct request *rq,
>>> + struct blk_mq_hw_ctx **hctx, bool wait)
>>> +{
>>> +   struct blk_mq_alloc_data data = {
>>> +   .q = rq->q,
>>> +   .ctx = rq->mq_ctx,
>>> +   .hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu),
>>> +   .flags = wait ? 0 : BLK_MQ_REQ_NOWAIT,
>>> +   };
>>> +
>>> +   if (blk_mq_hctx_stopped(data.hctx))
>>> +   return false;
>>> +
>>> +   if (rq->tag != -1) {
>>> +done:
>>> +   if (hctx)
>>> +   *hctx = data.hctx;
>>> +   return true;
>>> +   }
>>> +
>>> +   rq->tag = blk_mq_get_tag();
>>> +   if (rq->tag >= 0) {
>>> +   data.hctx->tags->rqs[rq->tag] = rq;
>>> +   goto done;
>>> +   }
>>> +
>>> +   return false;
>>> +}
>>> +
>> What happens with the existing request at 'rqs[rq->tag]' ?
>> Surely there is one already, right?
>> Things like '->init_request' assume a fully populated array, so moving
>> one entry to another location is ... interesting.
>>
>> I would have thought we need to do a request cloning here,
>> otherwise this would introduce a memory leak, right?
>> (Not to mention a potential double completion, as the request is now at
>> two positions in the array)
>>
>> Cheers,
>>
>> Hannes
> 
> The entries in tags->rqs aren't slab objects, they're pointers into
> pages allocated separately and tracked on tags->page_list. See
> blk_mq_alloc_rqs(). In blk_mq_free_rqs(), we free all of the pages on
> tags->page_list, so there shouldn't be a memory leak.
> 
> As for hctx->tags->rqs, entries are only overwritten when a scheduler is
> enabled. In that case, the rqs array is storing pointers to requests
> actually from hctx->sched_tags, so overwriting/leaking isn't an issue.

Ah. Thanks.
That explains it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Re: [PATCH 08/10] blk-mq-sched: add framework for MQ capable IO schedulers

2017-01-13 Thread Hannes Reinecke

On 01/13/2017 05:41 PM, Omar Sandoval wrote:
> On Fri, Jan 13, 2017 at 12:15:17PM +0100, Hannes Reinecke wrote:
>> On 01/11/2017 10:40 PM, Jens Axboe wrote:
>>> This adds a set of hooks that intercepts the blk-mq path of
>>> allocating/inserting/issuing/completing requests, allowing
>>> us to develop a scheduler within that framework.
>>>
>>> We reuse the existing elevator scheduler API on the registration
>>> side, but augment that with the scheduler flagging support for
>>> the blk-mq interfce, and with a separate set of ops hooks for MQ
>>> devices.
>>>
>>> We split driver and scheduler tags, so we can run the scheduling
>>> independent of device queue depth.
>>>
>>> Signed-off-by: Jens Axboe 
>> [ .. ]
>>> @@ -823,6 +847,35 @@ static inline unsigned int queued_to_index(unsigned 
>>> int queued)
>>> return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
>>>  }
>>>  
>>> +static bool blk_mq_get_driver_tag(struct request *rq,
>>> + struct blk_mq_hw_ctx **hctx, bool wait)
>>> +{
>>> +   struct blk_mq_alloc_data data = {
>>> +   .q = rq->q,
>>> +   .ctx = rq->mq_ctx,
>>> +   .hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu),
>>> +   .flags = wait ? 0 : BLK_MQ_REQ_NOWAIT,
>>> +   };
>>> +
>>> +   if (blk_mq_hctx_stopped(data.hctx))
>>> +   return false;
>>> +
>>> +   if (rq->tag != -1) {
>>> +done:
>>> +   if (hctx)
>>> +   *hctx = data.hctx;
>>> +   return true;
>>> +   }
>>> +
>>> +   rq->tag = blk_mq_get_tag();
>>> +   if (rq->tag >= 0) {
>>> +   data.hctx->tags->rqs[rq->tag] = rq;
>>> +   goto done;
>>> +   }
>>> +
>>> +   return false;
>>> +}
>>> +
>> What happens with the existing request at 'rqs[rq->tag]' ?
>> Surely there is one already, right?
>> Things like '->init_request' assume a fully populated array, so moving
>> one entry to another location is ... interesting.
>>
>> I would have thought we need to do a request cloning here,
>> otherwise this would introduce a memory leak, right?
>> (Not to mention a potential double completion, as the request is now at
>> two positions in the array)
>>
>> Cheers,
>>
>> Hannes
> 
> The entries in tags->rqs aren't slab objects, they're pointers into
> pages allocated separately and tracked on tags->page_list. See
> blk_mq_alloc_rqs(). In blk_mq_free_rqs(), we free all of the pages on
> tags->page_list, so there shouldn't be a memory leak.
> 
> As for hctx->tags->rqs, entries are only overwritten when a scheduler is
> enabled. In that case, the rqs array is storing pointers to requests
> actually from hctx->sched_tags, so overwriting/leaking isn't an issue.

Ah. Thanks.
That explains it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Re: [tpmdd-devel] [PATCH RFC v2 5/5] tpm2: expose resource manager via a device link /dev/tpms

2017-01-13 Thread James Bottomley

On Fri, 2017-01-13 at 10:25 -0700, Jason Gunthorpe wrote:
> On Thu, Jan 12, 2017 at 10:56:28PM +0200, Jarkko Sakkinen wrote:
> 
> > >  dev_t tpm_devt;
> > 
> > But they should have different major device numbers.
> 
> major/minors don't really matter these days since they are dynamic

Right, although we have this weird piece of code:


if (chip->dev_num == 0)
chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
else
chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);

So the first TPM device gets the MISC_MAJOR with TPM_MINOR and the rest
get the dynamic major/minor.  It means when you do an ls on a complex
system you get something like:

crw--- 1 root root  10,   224 Jan 13 06:21 /dev/tpm0
crw--- 1 root root 246, 1 Jan 13 09:38 /dev/tpm1
crw-rw-rw- 1 root root 246, 65536 Jan 13 06:21 /dev/tpms0
crw-rw-rw- 1 root root 246, 65537 Jan 13 09:38 /dev/tpms1

Perhaps it's time just to junk the reserved misc minor?

James

Re: [tpmdd-devel] [PATCH RFC v2 5/5] tpm2: expose resource manager via a device link /dev/tpms

2017-01-13 Thread James Bottomley

On Fri, 2017-01-13 at 10:25 -0700, Jason Gunthorpe wrote:
> On Thu, Jan 12, 2017 at 10:56:28PM +0200, Jarkko Sakkinen wrote:
> 
> > >  dev_t tpm_devt;
> > 
> > But they should have different major device numbers.
> 
> major/minors don't really matter these days since they are dynamic

Right, although we have this weird piece of code:


if (chip->dev_num == 0)
chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
else
chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);

So the first TPM device gets the MISC_MAJOR with TPM_MINOR and the rest
get the dynamic major/minor.  It means when you do an ls on a complex
system you get something like:

crw--- 1 root root  10,   224 Jan 13 06:21 /dev/tpm0
crw--- 1 root root 246, 1 Jan 13 09:38 /dev/tpm1
crw-rw-rw- 1 root root 246, 65536 Jan 13 06:21 /dev/tpms0
crw-rw-rw- 1 root root 246, 65537 Jan 13 09:38 /dev/tpms1

Perhaps it's time just to junk the reserved misc minor?

James

[GIT PULL] Ceph fixes for 4.10-rc4

2017-01-13 Thread Ilya Dryomov

Hi Linus,

The following changes since commit a121103c922847ba5010819a3f250f1f7fc84ab8:

  Linux 4.10-rc3 (2017-01-08 14:18:17 -0800)

are available in the git repository at:

  https://github.com/ceph/ceph-client.git tags/ceph-for-4.10-rc4

for you to fetch changes up to 84fcc2d2bd6cbf621e49e1d0f7eaef2e3c666b40:

  ceph: fix get_oldest_context() (2017-01-12 19:31:01 +0100)


Two small fixups for the filesystem changes that went into this merge
window.


Geng, Jichao (1):
  ceph: fix get_oldest_context()

Yan, Zheng (1):
  ceph: fix mds cluster availability check

 fs/ceph/addr.c   | 4 ++--
 fs/ceph/mds_client.c | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

Re: [PATCH v4 0/4] Application Data Integrity feature introduced by SPARC M7

2017-01-13 Thread Rob Gardner


On 01/13/2017 09:08 AM, Dave Hansen wrote:

On 01/13/2017 07:29 AM, Rob Gardner wrote:

so perhaps ADI should simply be disallowed for memory mapped to
files, and this particular complication can be avoided. Thoughts?

What's a "file" from your perspective?

In Linux, shared memory is a file.  hugetlbfs is done with files.  Many
databases mmap() their data into their address space.


Of course I meant a traditional file is the DOS sense, ie, data stored 
on something magnetic. ;) But it doesn't really matter because I am just 
trying to envision a use case for any of the mmap scenarios.


For instance a very persuasive use case for ADI is to 'color' malloc 
memory, freed malloc memory, and malloc's metadata with different ADI 
version tags so as to catch buffer overflows, underflows, use-after-free 
and use-after-realloc type scenarios. What is an equally compelling or 
even mildly interesting use case for ADI in shared memory and file mmap 
situations? Maybe you could mmap a file and immediately tag the entire 
thing with some version, thus disallowing all access to it, and then 
hand out access a chunk at a time. And then?


Rob

[GIT PULL] Ceph fixes for 4.10-rc4

2017-01-13 Thread Ilya Dryomov

Hi Linus,

The following changes since commit a121103c922847ba5010819a3f250f1f7fc84ab8:

  Linux 4.10-rc3 (2017-01-08 14:18:17 -0800)

are available in the git repository at:

  https://github.com/ceph/ceph-client.git tags/ceph-for-4.10-rc4

for you to fetch changes up to 84fcc2d2bd6cbf621e49e1d0f7eaef2e3c666b40:

  ceph: fix get_oldest_context() (2017-01-12 19:31:01 +0100)


Two small fixups for the filesystem changes that went into this merge
window.


Geng, Jichao (1):
  ceph: fix get_oldest_context()

Yan, Zheng (1):
  ceph: fix mds cluster availability check

 fs/ceph/addr.c   | 4 ++--
 fs/ceph/mds_client.c | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

Re: [PATCH v4 0/4] Application Data Integrity feature introduced by SPARC M7

2017-01-13 Thread Rob Gardner


On 01/13/2017 09:08 AM, Dave Hansen wrote:

On 01/13/2017 07:29 AM, Rob Gardner wrote:

so perhaps ADI should simply be disallowed for memory mapped to
files, and this particular complication can be avoided. Thoughts?

What's a "file" from your perspective?

In Linux, shared memory is a file.  hugetlbfs is done with files.  Many
databases mmap() their data into their address space.


Of course I meant a traditional file is the DOS sense, ie, data stored 
on something magnetic. ;) But it doesn't really matter because I am just 
trying to envision a use case for any of the mmap scenarios.


For instance a very persuasive use case for ADI is to 'color' malloc 
memory, freed malloc memory, and malloc's metadata with different ADI 
version tags so as to catch buffer overflows, underflows, use-after-free 
and use-after-realloc type scenarios. What is an equally compelling or 
even mildly interesting use case for ADI in shared memory and file mmap 
situations? Maybe you could mmap a file and immediately tag the entire 
thing with some version, thus disallowing all access to it, and then 
hand out access a chunk at a time. And then?


Rob

Re: irq domain hierarchy vs. chaining w/ PCI MSI-X...

2017-01-13 Thread David Daney


On 01/13/2017 08:15 AM, Marc Zyngier wrote:

Thanks Linus for looping me in.

On 12/01/17 22:35, David Daney wrote:

Hi Thomas,

I am trying to figure out how to handle this situation:

   handle_level_irq()
   +---+ handle_fasteoi_irq()
   | PCIe hosted   | +---+
+-+
  --level_gpio>| GPIO to MSI-X |--MSI_message--+>| gicv3-ITS |---> |
CPU |
   | widget|   | +---+
+-+
   +---+   |
   |
   +---+   |
   | other PCIe device |---MSI_message-+
   +---+


The question is how to structure the interrupt handling.  My initial
attempt was a chaining arrangement where the GPIO driver does
request_irq() for the appropriate MSI-X vector, and the handler calls
back into the irq system like this:


static irqreturn_t thunderx_gpio_chain_handler(int irq, void *dev)
{
struct thunderx_irqdev *irqdev = dev;
int chained_irq;
int ret;

chained_irq = irq_find_mapping(irqdev->gpio->chip.irqdomain,
   irqdev->line);
if (!chained_irq)
return IRQ_NONE;

ret = generic_handle_irq(chained_irq);

return ret ? IRQ_NONE : IRQ_HANDLED;
}

Thus getting the proper GPIO irq_chip functions called to manage the
level triggering semantics.

The drawbacks of this approach are that there are then two irqs
associated with the GPIO line (the base MSI-X and the chained GPIO),
also there can be up to 80-100 of these widgets, so potentially we can
consume twice that many irq numbers.

It was suggested by Linus Walleij that using an irq domain hierarchy
might be a better idea.  However, I cannot figure out how this might
work.  The gicv3-ITS needs to use handle_fasteoi_irq(), and we need
handle_level_irq() for the GPIO-level lines.  Getting the proper
irq_chip functions called in a hierarchical configuration doesn't seem
doable given the heterogeneous flow handlers.


My main worry here is that you're trying to handle something that is
apparently a level interrupt using a transport that only handles edge
interrupts. It means that each time you EOI an interrupt, you need to be
able to resample the level and retrigger it if still high. Do you have a
HW facility to do so? Or do you emulate this in SW?


Yes.

The first thing the handle_level_irq() flow handler does is to mask the 
source.  After calling the handler, it then unmasks the source.


The act of unmasking in the HW causes another MSI to be emitted if the 
level signal is still active.  This is what we want as it ensures that 
interrupts keep firing as long as the signal is active, even though the 
underlying mechanism uses edge semantics MSI.






Can you think of a better way of structuring this than chaining from the
MSI-X handler as I outlined above?


We already have similar horrors - see irq-mbigen.c which does exactly
that. It creates IRQ domains spanning a bunch of MSIs allocated to that
platform device. Once you have that, the rest is pretty easy.

In your case, it probably requires adding the same kind of surgery so
that we can create an IRQ domain from the MSIs associated with a PCIe
device. Not too hard, just not there yet, and we can probably reuse some
of the code that lives in platform-msi.c


That seems too ugly.

I would propose one of the following:

1) Just keep the crappy chaining more or less as I originally 
implemented it, as it works well enough to get useful work done.


2) add an optional hierarchical flow handler function to struct 
irq_data.  If populated, this flow handler would be called from 
handle_fasteoi_irq() instead of calling handle_irq_event().  This could 
allow each irq_chip to have its own flow handler, instead of a single 
flow handler shared by each of the hierarchically nested irq_chip.


David Daney





Thanks,

M.

Re: irq domain hierarchy vs. chaining w/ PCI MSI-X...

2017-01-13 Thread David Daney


On 01/13/2017 08:15 AM, Marc Zyngier wrote:

Thanks Linus for looping me in.

On 12/01/17 22:35, David Daney wrote:

Hi Thomas,

I am trying to figure out how to handle this situation:

   handle_level_irq()
   +---+ handle_fasteoi_irq()
   | PCIe hosted   | +---+
+-+
  --level_gpio>| GPIO to MSI-X |--MSI_message--+>| gicv3-ITS |---> |
CPU |
   | widget|   | +---+
+-+
   +---+   |
   |
   +---+   |
   | other PCIe device |---MSI_message-+
   +---+


The question is how to structure the interrupt handling.  My initial
attempt was a chaining arrangement where the GPIO driver does
request_irq() for the appropriate MSI-X vector, and the handler calls
back into the irq system like this:


static irqreturn_t thunderx_gpio_chain_handler(int irq, void *dev)
{
struct thunderx_irqdev *irqdev = dev;
int chained_irq;
int ret;

chained_irq = irq_find_mapping(irqdev->gpio->chip.irqdomain,
   irqdev->line);
if (!chained_irq)
return IRQ_NONE;

ret = generic_handle_irq(chained_irq);

return ret ? IRQ_NONE : IRQ_HANDLED;
}

Thus getting the proper GPIO irq_chip functions called to manage the
level triggering semantics.

The drawbacks of this approach are that there are then two irqs
associated with the GPIO line (the base MSI-X and the chained GPIO),
also there can be up to 80-100 of these widgets, so potentially we can
consume twice that many irq numbers.

It was suggested by Linus Walleij that using an irq domain hierarchy
might be a better idea.  However, I cannot figure out how this might
work.  The gicv3-ITS needs to use handle_fasteoi_irq(), and we need
handle_level_irq() for the GPIO-level lines.  Getting the proper
irq_chip functions called in a hierarchical configuration doesn't seem
doable given the heterogeneous flow handlers.


My main worry here is that you're trying to handle something that is
apparently a level interrupt using a transport that only handles edge
interrupts. It means that each time you EOI an interrupt, you need to be
able to resample the level and retrigger it if still high. Do you have a
HW facility to do so? Or do you emulate this in SW?


Yes.

The first thing the handle_level_irq() flow handler does is to mask the 
source.  After calling the handler, it then unmasks the source.


The act of unmasking in the HW causes another MSI to be emitted if the 
level signal is still active.  This is what we want as it ensures that 
interrupts keep firing as long as the signal is active, even though the 
underlying mechanism uses edge semantics MSI.






Can you think of a better way of structuring this than chaining from the
MSI-X handler as I outlined above?


We already have similar horrors - see irq-mbigen.c which does exactly
that. It creates IRQ domains spanning a bunch of MSIs allocated to that
platform device. Once you have that, the rest is pretty easy.

In your case, it probably requires adding the same kind of surgery so
that we can create an IRQ domain from the MSIs associated with a PCIe
device. Not too hard, just not there yet, and we can probably reuse some
of the code that lives in platform-msi.c


That seems too ugly.

I would propose one of the following:

1) Just keep the crappy chaining more or less as I originally 
implemented it, as it works well enough to get useful work done.


2) add an optional hierarchical flow handler function to struct 
irq_data.  If populated, this flow handler would be called from 
handle_fasteoi_irq() instead of calling handle_irq_event().  This could 
allow each irq_chip to have its own flow handler, instead of a single 
flow handler shared by each of the hierarchically nested irq_chip.


David Daney





Thanks,

M.

Re: [PATCH v11 01/12] dt-bindings: display: mediatek: update supported chips

2017-01-13 Thread Rob Herring

On Wed, Jan 11, 2017 at 02:51:02PM +0800, YT Shen wrote:
> Add decriptions about supported chips, including MT2701 & MT8173
> 
> Signed-off-by: YT Shen 
> ---
>  Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt | 2 ++
>  Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt  | 2 ++
>  2 files changed, 4 insertions(+)

Acked-by: Rob Herring

Re: [PATCH v11 01/12] dt-bindings: display: mediatek: update supported chips

2017-01-13 Thread Rob Herring

On Wed, Jan 11, 2017 at 02:51:02PM +0800, YT Shen wrote:
> Add decriptions about supported chips, including MT2701 & MT8173
> 
> Signed-off-by: YT Shen 
> ---
>  Documentation/devicetree/bindings/display/mediatek/mediatek,disp.txt | 2 ++
>  Documentation/devicetree/bindings/display/mediatek/mediatek,dsi.txt  | 2 ++
>  2 files changed, 4 insertions(+)

Acked-by: Rob Herring

Re: [PATCH v1 2/2] arm: dts: mt2701: add nor flash node

2017-01-13 Thread Marek Vasut

On 01/13/2017 05:56 PM, Boris Brezillon wrote:
> On Fri, 13 Jan 2017 17:44:12 +0100
> Marek Vasut  wrote:
> 
>> On 01/13/2017 05:28 PM, Boris Brezillon wrote:
>>> On Fri, 13 Jan 2017 17:13:55 +0100
>>> Marek Vasut  wrote:
>>>   
 On 01/13/2017 04:12 PM, Matthias Brugger wrote:  
>
>
> On 13/01/17 15:17, Boris Brezillon wrote:
>> On Fri, 13 Jan 2017 15:13:29 +0800
>> Guochun Mao  wrote:
>>
>>> Add Mediatek nor flash node.
>>>
>>> Signed-off-by: Guochun Mao 
>>> ---
>>>  arch/arm/boot/dts/mt2701-evb.dts |   25 +
>>>  arch/arm/boot/dts/mt2701.dtsi|   12 
>>>  2 files changed, 37 insertions(+)
>>>
>>> diff --git a/arch/arm/boot/dts/mt2701-evb.dts
>>> b/arch/arm/boot/dts/mt2701-evb.dts
>>> index 082ca88..85e5ae8 100644
>>> --- a/arch/arm/boot/dts/mt2701-evb.dts
>>> +++ b/arch/arm/boot/dts/mt2701-evb.dts
>>> @@ -24,6 +24,31 @@
>>>  };
>>>  };
>>>
>>> +_flash {
>>> +pinctrl-names = "default";
>>> +pinctrl-0 = <_pins_default>;
>>> +status = "okay";
>>> +flash@0 {
>>> +compatible = "jedec,spi-nor";
>>> +reg = <0>;
>>> +};
>>> +};
>>> +
>>> + {
>>> +nor_pins_default: nor {
>>> +pins1 {
>>> +pinmux = ,
>>> + ,
>>> + ,
>>> + ,
>>> + ,
>>> + ;
>>> +drive-strength = ;
>>> +bias-pull-up;
>>> +};
>>> +};
>>> +};
>>> +
>>>   {
>>>  status = "okay";
>>>  };
>>> diff --git a/arch/arm/boot/dts/mt2701.dtsi
>>> b/arch/arm/boot/dts/mt2701.dtsi
>>> index bdf8954..1eefce4 100644
>>> --- a/arch/arm/boot/dts/mt2701.dtsi
>>> +++ b/arch/arm/boot/dts/mt2701.dtsi
>>> @@ -227,6 +227,18 @@
>>>  status = "disabled";
>>>  };
>>>
>>> +nor_flash: spi@11014000 {
>>> +compatible = "mediatek,mt2701-nor",
>>> + "mediatek,mt8173-nor";
>>
>> Why define both here? Is "mediatek,mt8173-nor" really providing a
>> subset of the features supported by "mediatek,mt2701-nor"?
>>
>
> I think even if the ip block is the same, we should provide both
> bindings, just in case in the future we find out that mt2701 has some
> hidden bug, feature or bug-feature. This way even if we update the
> driver, we stay compatible with older device tree blobs in the wild.
>
> We can drop the mt2701-nor in the bindings definition if you want.   
>>>
>>> Oh, sorry, I misunderstood. What I meant is that if you want to
>>> list/support all possible compatibles, maybe you should just put one
>>> compatible in your DT and patch your driver (+ binding doc) to define
>>> all of them.  
>>
>> Uh, what ? I lost you here :-)
>>
 This exactly. We should have a DT compat in the form:
 compatible = "vendor,-block", "vendor,-block";
 Then if we find a problem in the future, we can match on the
 "vendor,-block" and still support the old DTs.  
>>>
>>> Not sure it's only in term of whose IP appeared first. My understanding
>>> is that it's a way to provide inheritance. For example:
>>>
>>> ",", ",";
>>>
>>> or
>>>
>>> 
>>> ",",",";
>>>
>>> BTW, which one is the oldest between mt8173 and mt2701? :-)  
>>
>> And that's another thing and I agree with you, but I don't think that's
>> what we're discussing in this thread. But (!), OT, I think we should
>> codify the rules in Documentation/ . This discussion came up multiple
>> times recently.
>>
>> And my question still stands, what do we put into the DT here, IMO
>> compatible = "mediatek,mt2701-nor", "mediatek,mt8173-nor";
> 
> I'd say
> 
>   compatible = "mediatek,mt8173-nor";
> 
> because both compatible are referring to very specific IP version. It's
> not the same as

But then you don't have the ability to handle a block in this particular
SoC in case there's a bug found in it in the future,
so IMO it should be:

compatible = "mediatek,mt2701-nor", "mediatek,mt8173-nor";

>   compatible = "mediatek,mt8173-nor", "mediatek,mt81xx-nor";

This doesn't look right, since here we add two new compatibles ...

> where you clearly have a generic compatible which is overloaded by a
> specific one.
> 
> But anyway, I'm not the one taking the decision here, let's wait for DT
> maintainers reviews.
> 
>> and what goes into the binding document ? I guess both too ?
> 
> If both exist, they should be both documented.
> 


-- 
Best regards,
Marek Vasut

Re: [PATCH] Documentation: Update CPU hotplug and move it to core-api

2017-01-13 Thread Jonathan Corbet

On Thu, 22 Dec 2016 17:19:34 +0100
Sebastian Andrzej Siewior  wrote:

> The current CPU hotplug is outdated. During the update to what we
> currently have I rewrote it partly and moved to sphinx format.

OK, I've applied this (finally) to the docs tree, even though I do kind
of agree with Jani about keeping the author information in Git.

Thanks,

jon

Re: [PATCH v1 2/2] arm: dts: mt2701: add nor flash node

2017-01-13 Thread Marek Vasut

On 01/13/2017 05:56 PM, Boris Brezillon wrote:
> On Fri, 13 Jan 2017 17:44:12 +0100
> Marek Vasut  wrote:
> 
>> On 01/13/2017 05:28 PM, Boris Brezillon wrote:
>>> On Fri, 13 Jan 2017 17:13:55 +0100
>>> Marek Vasut  wrote:
>>>   
 On 01/13/2017 04:12 PM, Matthias Brugger wrote:  
>
>
> On 13/01/17 15:17, Boris Brezillon wrote:
>> On Fri, 13 Jan 2017 15:13:29 +0800
>> Guochun Mao  wrote:
>>
>>> Add Mediatek nor flash node.
>>>
>>> Signed-off-by: Guochun Mao 
>>> ---
>>>  arch/arm/boot/dts/mt2701-evb.dts |   25 +
>>>  arch/arm/boot/dts/mt2701.dtsi|   12 
>>>  2 files changed, 37 insertions(+)
>>>
>>> diff --git a/arch/arm/boot/dts/mt2701-evb.dts
>>> b/arch/arm/boot/dts/mt2701-evb.dts
>>> index 082ca88..85e5ae8 100644
>>> --- a/arch/arm/boot/dts/mt2701-evb.dts
>>> +++ b/arch/arm/boot/dts/mt2701-evb.dts
>>> @@ -24,6 +24,31 @@
>>>  };
>>>  };
>>>
>>> +_flash {
>>> +pinctrl-names = "default";
>>> +pinctrl-0 = <_pins_default>;
>>> +status = "okay";
>>> +flash@0 {
>>> +compatible = "jedec,spi-nor";
>>> +reg = <0>;
>>> +};
>>> +};
>>> +
>>> + {
>>> +nor_pins_default: nor {
>>> +pins1 {
>>> +pinmux = ,
>>> + ,
>>> + ,
>>> + ,
>>> + ,
>>> + ;
>>> +drive-strength = ;
>>> +bias-pull-up;
>>> +};
>>> +};
>>> +};
>>> +
>>>   {
>>>  status = "okay";
>>>  };
>>> diff --git a/arch/arm/boot/dts/mt2701.dtsi
>>> b/arch/arm/boot/dts/mt2701.dtsi
>>> index bdf8954..1eefce4 100644
>>> --- a/arch/arm/boot/dts/mt2701.dtsi
>>> +++ b/arch/arm/boot/dts/mt2701.dtsi
>>> @@ -227,6 +227,18 @@
>>>  status = "disabled";
>>>  };
>>>
>>> +nor_flash: spi@11014000 {
>>> +compatible = "mediatek,mt2701-nor",
>>> + "mediatek,mt8173-nor";
>>
>> Why define both here? Is "mediatek,mt8173-nor" really providing a
>> subset of the features supported by "mediatek,mt2701-nor"?
>>
>
> I think even if the ip block is the same, we should provide both
> bindings, just in case in the future we find out that mt2701 has some
> hidden bug, feature or bug-feature. This way even if we update the
> driver, we stay compatible with older device tree blobs in the wild.
>
> We can drop the mt2701-nor in the bindings definition if you want.   
>>>
>>> Oh, sorry, I misunderstood. What I meant is that if you want to
>>> list/support all possible compatibles, maybe you should just put one
>>> compatible in your DT and patch your driver (+ binding doc) to define
>>> all of them.  
>>
>> Uh, what ? I lost you here :-)
>>
 This exactly. We should have a DT compat in the form:
 compatible = "vendor,-block", "vendor,-block";
 Then if we find a problem in the future, we can match on the
 "vendor,-block" and still support the old DTs.  
>>>
>>> Not sure it's only in term of whose IP appeared first. My understanding
>>> is that it's a way to provide inheritance. For example:
>>>
>>> ",", ",";
>>>
>>> or
>>>
>>> 
>>> ",",",";
>>>
>>> BTW, which one is the oldest between mt8173 and mt2701? :-)  
>>
>> And that's another thing and I agree with you, but I don't think that's
>> what we're discussing in this thread. But (!), OT, I think we should
>> codify the rules in Documentation/ . This discussion came up multiple
>> times recently.
>>
>> And my question still stands, what do we put into the DT here, IMO
>> compatible = "mediatek,mt2701-nor", "mediatek,mt8173-nor";
> 
> I'd say
> 
>   compatible = "mediatek,mt8173-nor";
> 
> because both compatible are referring to very specific IP version. It's
> not the same as

But then you don't have the ability to handle a block in this particular
SoC in case there's a bug found in it in the future,
so IMO it should be:

compatible = "mediatek,mt2701-nor", "mediatek,mt8173-nor";

>   compatible = "mediatek,mt8173-nor", "mediatek,mt81xx-nor";

This doesn't look right, since here we add two new compatibles ...

> where you clearly have a generic compatible which is overloaded by a
> specific one.
> 
> But anyway, I'm not the one taking the decision here, let's wait for DT
> maintainers reviews.
> 
>> and what goes into the binding document ? I guess both too ?
> 
> If both exist, they should be both documented.
> 


-- 
Best regards,
Marek Vasut

Re: [PATCH] Documentation: Update CPU hotplug and move it to core-api

2017-01-13 Thread Jonathan Corbet

On Thu, 22 Dec 2016 17:19:34 +0100
Sebastian Andrzej Siewior  wrote:

> The current CPU hotplug is outdated. During the update to what we
> currently have I rewrote it partly and moved to sphinx format.

OK, I've applied this (finally) to the docs tree, even though I do kind
of agree with Jani about keeping the author information in Git.

Thanks,

jon

Re: [PATCH v2] tcp: fix tcp_fastopen unaligned access complaints on sparc

2017-01-13 Thread David Miller

From: Shannon Nelson 
Date: Thu, 12 Jan 2017 14:24:58 -0800

> Fix up a data alignment issue on sparc by swapping the order
> of the cookie byte array field with the length field in
> struct tcp_fastopen_cookie, and making it a proper union
> to clean up the typecasting.
> 
> This addresses log complaints like these:
> log_unaligned: 113 callbacks suppressed
> Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
> Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
> Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
> Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
> Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
> 
> Cc: Eric Dumazet 
> Signed-off-by: Shannon Nelson 
> ---
> v2: Use Eric's suggestion for a union in the struct

Applied and queued up for -stable, thanks.

Re: [PATCH v2] tcp: fix tcp_fastopen unaligned access complaints on sparc

2017-01-13 Thread David Miller

From: Shannon Nelson 
Date: Thu, 12 Jan 2017 14:24:58 -0800

> Fix up a data alignment issue on sparc by swapping the order
> of the cookie byte array field with the length field in
> struct tcp_fastopen_cookie, and making it a proper union
> to clean up the typecasting.
> 
> This addresses log complaints like these:
> log_unaligned: 113 callbacks suppressed
> Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
> Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
> Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
> Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
> Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
> 
> Cc: Eric Dumazet 
> Signed-off-by: Shannon Nelson 
> ---
> v2: Use Eric's suggestion for a union in the struct

Applied and queued up for -stable, thanks.

Re: [PATCH v8 1/3] dt-bindings: Add support for samsung s6e3ha2 panel binding

2017-01-13 Thread Rob Herring

On Wed, Jan 11, 2017 at 03:33:57PM +0900, Hoegeun Kwon wrote:
> The Samsung s6e3ha2 is a 5.7" 1440x2560 AMOLED panel connected
> using MIPI-DSI interfaces.
> 
> Signed-off-by: Donghwa Lee 
> Signed-off-by: Hyungwon Hwang 
> Signed-off-by: Hoegeun Kwon 
> Tested-by: Chanwoo Choi 
> Reviewed-by: Andrzej Hajda 
> ---
>  .../bindings/display/panel/samsung,s6e3ha2.txt | 26 
> ++
>  1 file changed, 26 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/panel/samsung,s6e3ha2.txt

Acked-by: Rob Herring

Re: [PATCH v8 1/3] dt-bindings: Add support for samsung s6e3ha2 panel binding

2017-01-13 Thread Rob Herring

On Wed, Jan 11, 2017 at 03:33:57PM +0900, Hoegeun Kwon wrote:
> The Samsung s6e3ha2 is a 5.7" 1440x2560 AMOLED panel connected
> using MIPI-DSI interfaces.
> 
> Signed-off-by: Donghwa Lee 
> Signed-off-by: Hyungwon Hwang 
> Signed-off-by: Hoegeun Kwon 
> Tested-by: Chanwoo Choi 
> Reviewed-by: Andrzej Hajda 
> ---
>  .../bindings/display/panel/samsung,s6e3ha2.txt | 26 
> ++
>  1 file changed, 26 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/display/panel/samsung,s6e3ha2.txt

Acked-by: Rob Herring

[RFC PATCH v3 1/2] tracing: add TRACE_EVENT_MAP_COND

2017-01-13 Thread Julien Desfossez

This new macro allows to hook conditional tracepoint probes to
pre-existing trace events. This allows to create specialized versions of
the same tracepoint without having to explicitly call every possible
tracepoints in the instrumented code.

In order to use it, a TRACE_EVENT must already exist, after that, we can
connect as many TRACE_EVENT_MAP_COND to this TRACE_EVENT as needed.

Example usage:

TRACE_EVENT(tp_test,
TP_PROTO(proto),
TP_ARGS(args),
TP_STRUCT__entry(), /* can be empty */
TP_fast_assign(), /* can be empty */
TP_printk() /* can be empty */
);

TRACE_EVENT_MAP_COND(tp_test, cond_test,
TP_PROTO(proto),
TP_ARGS(args),
TP_CONDITION(cond),
TP_STRUCT__entry(entry),
TP_fast_assign(assign),
TP_printk(print)
);

Cc: Peter Zijlstra 
Cc: Steven Rostedt (Red Hat) 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Daniel Bristot de Oliveira 
Reviewed-by: Mathieu Desnoyers 
Signed-off-by: Julien Desfossez 
---
 include/linux/trace_events.h |  14 -
 include/linux/tracepoint.h   |   6 ++
 include/trace/define_trace.h |   6 ++
 include/trace/perf.h |  24 ++--
 include/trace/trace_events.h | 130 +--
 kernel/trace/trace_events.c  |  15 +++--
 6 files changed, 168 insertions(+), 27 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index be00761..1f7e0ec 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -217,6 +217,7 @@ enum {
TRACE_EVENT_FL_TRACEPOINT_BIT,
TRACE_EVENT_FL_KPROBE_BIT,
TRACE_EVENT_FL_UPROBE_BIT,
+   TRACE_EVENT_FL_MAP_BIT,
 };
 
 /*
@@ -231,6 +232,7 @@ enum {
  *  TRACEPOINT- Event is a tracepoint
  *  KPROBE- Event is a kprobe
  *  UPROBE- Event is a uprobe
+ *  MAP   - Event maps to a tracepoint as an alias
  */
 enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@@ -241,10 +243,16 @@ enum {
TRACE_EVENT_FL_TRACEPOINT   = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
TRACE_EVENT_FL_KPROBE   = (1 << TRACE_EVENT_FL_KPROBE_BIT),
TRACE_EVENT_FL_UPROBE   = (1 << TRACE_EVENT_FL_UPROBE_BIT),
+   TRACE_EVENT_FL_MAP  = (1 << TRACE_EVENT_FL_MAP_BIT),
 };
 
 #define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
 
+struct trace_event_map {
+   struct tracepoint   *tp;
+   char*name;
+};
+
 struct trace_event_call {
struct list_headlist;
struct trace_event_class *class;
@@ -252,6 +260,8 @@ struct trace_event_call {
char*name;
/* Set TRACE_EVENT_FL_TRACEPOINT flag when using "tp" */
struct tracepoint   *tp;
+   /* Set TRACE_EVENT_FL_MAP flag when using "map" instead */
+   struct trace_event_map  *map;
};
struct trace_event  event;
char*print_fmt;
@@ -282,7 +292,9 @@ struct trace_event_call {
 static inline const char *
 trace_event_name(struct trace_event_call *call)
 {
-   if (call->flags & TRACE_EVENT_FL_TRACEPOINT)
+   if (call->flags & TRACE_EVENT_FL_MAP)
+   return call->map->name;
+   else if (call->flags & TRACE_EVENT_FL_TRACEPOINT)
return call->tp ? call->tp->name : NULL;
else
return call->name;
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index f72fcfe..3e5b5a4 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -276,6 +276,7 @@ static inline void tracepoint_synchronize_unregister(void)
 
 #define DEFINE_TRACE_FN(name, reg, unreg)
 #define DEFINE_TRACE(name)
+#define DEFINE_TRACE_MAP_COND(name, map, cond)
 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
 #define EXPORT_TRACEPOINT_SYMBOL(name)
 
@@ -469,6 +470,8 @@ static inline void tracepoint_synchronize_unregister(void)
  */
 
 #define DECLARE_EVENT_CLASS(name, proto, args, tstruct, assign, print)
+#define DECLARE_EVENT_COND_CLASS(name, proto, args, cond,  \
+   tstruct, assign, print)
 #define DEFINE_EVENT(template, name, proto, args)  \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
 #define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg)\
@@ -498,4 +501,7 @@ static inline void tracepoint_synchronize_unregister(void)
 
 #define TRACE_EVENT_PERF_PERM(event, expr...)
 
+#define TRACE_EVENT_MAP_COND(name, map, proto, args, cond, \
+   struct, assign, print)
+
 #endif /* ifdef TRACE_EVENT (see note above) */
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 6e3945f..4e112f2 100644
--- a/include/trace/define_trace.h
+++

[RFC PATCH v3 0/2] Extend scheduling tracepoints

2017-01-13 Thread Julien Desfossez

This patchset proposes a way to extend the existing scheduling tracepoints. The
intent is to allow to output relevant priority informations based on the
scheduling class of the tasks without breaking the existing tracepoints or
having to handle the various cases in the scheduler code.

The proposed approach is to add the TRACE_EVENT_MAP_COND macro which depends on
an existing TRACE_EVENT, but only writes data into the trace when a condition
is satisfied. This allows to create multiple versions of the same tracepoint
without having to call each of them explicitly from the instrumented code.

This approach is then illustrated with the sched_switch tracepoint, where we
provide a way to output different fields based on the scheduling class of the
next task.

If accepted, this method could be used everywhere the "prio" field is currently
exposed to user-space through traces.

Julien Desfossez (2):
  tracing: add TRACE_EVENT_MAP_COND
  tracing: add policy-based sched_switch events

 include/linux/trace_events.h |  14 +++-
 include/linux/tracepoint.h   |   6 ++
 include/trace/define_trace.h |   6 ++
 include/trace/events/sched.h | 192 +++
 include/trace/perf.h |  24 --
 include/trace/trace_events.h | 130 +
 kernel/trace/trace_events.c  |  15 +++-
 7 files changed, 360 insertions(+), 27 deletions(-)

-- 
1.9.1

[RFC PATCH v3 2/2] tracing: add policy-based sched_switch events

2017-01-13 Thread Julien Desfossez

Add 3 new tracepoints: sched_switch_fair, sched_switch_rt and
sched_switch_dl.

These conditional tracepoints are emitted based on the scheduling class
of the next task. Each of these tracepoint gets rid of the prio field
from the original sched_switch and replaces it with fields that are
relevant to the policy of the next task:
  - for a fair task: the nice value,
  - for a rt task: the nice and rt_priority values,
  - for a dl task: the runtime, deadline and period values.

The original sched_switch event is left unmodified, so these new events
can be enabled at the same time (but they are emitted consecutively so
we can see a timestamp offset).

Example output from the 3 new events:
sched_switch_fair: prev_comm=cat prev_pid=2179 prev_state=R+ ==> next_comm=b
   next_pid=874 next_policy=SCHED_NORMAL next_nice=0

sched_switch_rt: prev_comm=swapper/10 prev_pid=0 prev_state=R ==> next_comm=b
 next_pid=2215 next_policy=SCHED_FIFO next_nice=0
 next_rt_priority=100

sched_switch_dl: prev_comm=swapper/10 prev_pid=0 prev_state=R ==> next_comm=b
 next_pid=2215 next_policy=SCHED_DEADLINE
 next_dl_runtime=1000 next_dl_deadline=3000
 next_dl_period=3000

Cc: Peter Zijlstra 
Cc: Steven Rostedt (Red Hat) 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Daniel Bristot de Oliveira 
Reviewed-by: Mathieu Desnoyers 
Signed-off-by: Julien Desfossez 
---
 include/trace/events/sched.h | 192 +++
 1 file changed, 192 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 9b90c57..c506ed1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -5,9 +5,39 @@
 #define _TRACE_SCHED_H
 
 #include 
+#include 
+#include 
 #include 
 #include 
 
+#define SCHEDULING_POLICY  \
+   EM( SCHED_NORMAL,   "SCHED_NORMAL") \
+   EM( SCHED_FIFO, "SCHED_FIFO")   \
+   EM( SCHED_RR,   "SCHED_RR") \
+   EM( SCHED_BATCH,"SCHED_BATCH")  \
+   EM( SCHED_IDLE, "SCHED_IDLE")   \
+   EMe(SCHED_DEADLINE, "SCHED_DEADLINE")
+
+/*
+ * First define the enums in the above macros to be exported to userspace
+ * via TRACE_DEFINE_ENUM().
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)   TRACE_DEFINE_ENUM(a);
+#define EMe(a, b)  TRACE_DEFINE_ENUM(a);
+
+SCHEDULING_POLICY
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)   {a, b},
+#define EMe(a, b)  {a, b}
+
 /*
  * Tracepoint for calling kthread_stop, performed to end a kthread:
  */
@@ -162,6 +192,168 @@ static inline long __trace_sched_switch_state(bool 
preempt, struct task_struct *
 );
 
 /*
+ * Tracepoint for task switches, performed by the scheduler where the next
+ * task has a fair scheduling policy.
+ */
+TRACE_EVENT_MAP_COND(sched_switch, sched_switch_fair,
+
+   TP_PROTO(bool preempt,
+struct task_struct *prev,
+struct task_struct *next),
+
+   TP_ARGS(preempt, prev, next),
+
+   TP_CONDITION(!dl_prio(next->prio) && !rt_prio(next->prio)),
+
+   TP_STRUCT__entry(
+   __array(char,   prev_comm,  TASK_COMM_LEN   )
+   __field(pid_t,  prev_pid)
+   __field(long,   prev_state  )
+   __array(char,   next_comm,  TASK_COMM_LEN   )
+   __field(pid_t,  next_pid)
+   __field(unsigned int, next_policy   )
+   __field(int,next_nice   )
+   ),
+
+   TP_fast_assign(
+   memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
+   __entry->prev_pid   = prev->pid;
+   __entry->prev_state = __trace_sched_switch_state(preempt, 
prev);
+   memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
+   __entry->next_pid   = next->pid;
+   __entry->next_policy= next->policy;
+   __entry->next_nice  = task_nice(next);
+   ),
+
+   TP_printk("prev_comm=%s prev_pid=%d prev_state=%s%s ==> next_comm=%s "
+   "next_pid=%d next_policy=%s next_nice=%d",
+   __entry->prev_comm, __entry->prev_pid,
+   __entry->prev_state & (TASK_STATE_MAX-1) ?
+ __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|",
+   { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" },
+   { 16, "Z" }, { 32, "X" }, { 64, "x"

[RFC PATCH v3 1/2] tracing: add TRACE_EVENT_MAP_COND

2017-01-13 Thread Julien Desfossez

This new macro allows to hook conditional tracepoint probes to
pre-existing trace events. This allows to create specialized versions of
the same tracepoint without having to explicitly call every possible
tracepoints in the instrumented code.

In order to use it, a TRACE_EVENT must already exist, after that, we can
connect as many TRACE_EVENT_MAP_COND to this TRACE_EVENT as needed.

Example usage:

TRACE_EVENT(tp_test,
TP_PROTO(proto),
TP_ARGS(args),
TP_STRUCT__entry(), /* can be empty */
TP_fast_assign(), /* can be empty */
TP_printk() /* can be empty */
);

TRACE_EVENT_MAP_COND(tp_test, cond_test,
TP_PROTO(proto),
TP_ARGS(args),
TP_CONDITION(cond),
TP_STRUCT__entry(entry),
TP_fast_assign(assign),
TP_printk(print)
);

Cc: Peter Zijlstra 
Cc: Steven Rostedt (Red Hat) 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Daniel Bristot de Oliveira 
Reviewed-by: Mathieu Desnoyers 
Signed-off-by: Julien Desfossez 
---
 include/linux/trace_events.h |  14 -
 include/linux/tracepoint.h   |   6 ++
 include/trace/define_trace.h |   6 ++
 include/trace/perf.h |  24 ++--
 include/trace/trace_events.h | 130 +--
 kernel/trace/trace_events.c  |  15 +++--
 6 files changed, 168 insertions(+), 27 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index be00761..1f7e0ec 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -217,6 +217,7 @@ enum {
TRACE_EVENT_FL_TRACEPOINT_BIT,
TRACE_EVENT_FL_KPROBE_BIT,
TRACE_EVENT_FL_UPROBE_BIT,
+   TRACE_EVENT_FL_MAP_BIT,
 };
 
 /*
@@ -231,6 +232,7 @@ enum {
  *  TRACEPOINT- Event is a tracepoint
  *  KPROBE- Event is a kprobe
  *  UPROBE- Event is a uprobe
+ *  MAP   - Event maps to a tracepoint as an alias
  */
 enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@@ -241,10 +243,16 @@ enum {
TRACE_EVENT_FL_TRACEPOINT   = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
TRACE_EVENT_FL_KPROBE   = (1 << TRACE_EVENT_FL_KPROBE_BIT),
TRACE_EVENT_FL_UPROBE   = (1 << TRACE_EVENT_FL_UPROBE_BIT),
+   TRACE_EVENT_FL_MAP  = (1 << TRACE_EVENT_FL_MAP_BIT),
 };
 
 #define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
 
+struct trace_event_map {
+   struct tracepoint   *tp;
+   char*name;
+};
+
 struct trace_event_call {
struct list_headlist;
struct trace_event_class *class;
@@ -252,6 +260,8 @@ struct trace_event_call {
char*name;
/* Set TRACE_EVENT_FL_TRACEPOINT flag when using "tp" */
struct tracepoint   *tp;
+   /* Set TRACE_EVENT_FL_MAP flag when using "map" instead */
+   struct trace_event_map  *map;
};
struct trace_event  event;
char*print_fmt;
@@ -282,7 +292,9 @@ struct trace_event_call {
 static inline const char *
 trace_event_name(struct trace_event_call *call)
 {
-   if (call->flags & TRACE_EVENT_FL_TRACEPOINT)
+   if (call->flags & TRACE_EVENT_FL_MAP)
+   return call->map->name;
+   else if (call->flags & TRACE_EVENT_FL_TRACEPOINT)
return call->tp ? call->tp->name : NULL;
else
return call->name;
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index f72fcfe..3e5b5a4 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -276,6 +276,7 @@ static inline void tracepoint_synchronize_unregister(void)
 
 #define DEFINE_TRACE_FN(name, reg, unreg)
 #define DEFINE_TRACE(name)
+#define DEFINE_TRACE_MAP_COND(name, map, cond)
 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
 #define EXPORT_TRACEPOINT_SYMBOL(name)
 
@@ -469,6 +470,8 @@ static inline void tracepoint_synchronize_unregister(void)
  */
 
 #define DECLARE_EVENT_CLASS(name, proto, args, tstruct, assign, print)
+#define DECLARE_EVENT_COND_CLASS(name, proto, args, cond,  \
+   tstruct, assign, print)
 #define DEFINE_EVENT(template, name, proto, args)  \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
 #define DEFINE_EVENT_FN(template, name, proto, args, reg, unreg)\
@@ -498,4 +501,7 @@ static inline void tracepoint_synchronize_unregister(void)
 
 #define TRACE_EVENT_PERF_PERM(event, expr...)
 
+#define TRACE_EVENT_MAP_COND(name, map, proto, args, cond, \
+   struct, assign, print)
+
 #endif /* ifdef TRACE_EVENT (see note above) */
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 6e3945f..4e112f2 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -45,6 +45,10 @@
assign, print, reg, unreg)  \
DEFINE_TRACE_FN(name, reg, unreg)
 
+#undef

[RFC PATCH v3 0/2] Extend scheduling tracepoints

2017-01-13 Thread Julien Desfossez

This patchset proposes a way to extend the existing scheduling tracepoints. The
intent is to allow to output relevant priority informations based on the
scheduling class of the tasks without breaking the existing tracepoints or
having to handle the various cases in the scheduler code.

The proposed approach is to add the TRACE_EVENT_MAP_COND macro which depends on
an existing TRACE_EVENT, but only writes data into the trace when a condition
is satisfied. This allows to create multiple versions of the same tracepoint
without having to call each of them explicitly from the instrumented code.

This approach is then illustrated with the sched_switch tracepoint, where we
provide a way to output different fields based on the scheduling class of the
next task.

If accepted, this method could be used everywhere the "prio" field is currently
exposed to user-space through traces.

Julien Desfossez (2):
  tracing: add TRACE_EVENT_MAP_COND
  tracing: add policy-based sched_switch events

 include/linux/trace_events.h |  14 +++-
 include/linux/tracepoint.h   |   6 ++
 include/trace/define_trace.h |   6 ++
 include/trace/events/sched.h | 192 +++
 include/trace/perf.h |  24 --
 include/trace/trace_events.h | 130 +
 kernel/trace/trace_events.c  |  15 +++-
 7 files changed, 360 insertions(+), 27 deletions(-)

-- 
1.9.1

[RFC PATCH v3 2/2] tracing: add policy-based sched_switch events

2017-01-13 Thread Julien Desfossez

Add 3 new tracepoints: sched_switch_fair, sched_switch_rt and
sched_switch_dl.

These conditional tracepoints are emitted based on the scheduling class
of the next task. Each of these tracepoint gets rid of the prio field
from the original sched_switch and replaces it with fields that are
relevant to the policy of the next task:
  - for a fair task: the nice value,
  - for a rt task: the nice and rt_priority values,
  - for a dl task: the runtime, deadline and period values.

The original sched_switch event is left unmodified, so these new events
can be enabled at the same time (but they are emitted consecutively so
we can see a timestamp offset).

Example output from the 3 new events:
sched_switch_fair: prev_comm=cat prev_pid=2179 prev_state=R+ ==> next_comm=b
   next_pid=874 next_policy=SCHED_NORMAL next_nice=0

sched_switch_rt: prev_comm=swapper/10 prev_pid=0 prev_state=R ==> next_comm=b
 next_pid=2215 next_policy=SCHED_FIFO next_nice=0
 next_rt_priority=100

sched_switch_dl: prev_comm=swapper/10 prev_pid=0 prev_state=R ==> next_comm=b
 next_pid=2215 next_policy=SCHED_DEADLINE
 next_dl_runtime=1000 next_dl_deadline=3000
 next_dl_period=3000

Cc: Peter Zijlstra 
Cc: Steven Rostedt (Red Hat) 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Daniel Bristot de Oliveira 
Reviewed-by: Mathieu Desnoyers 
Signed-off-by: Julien Desfossez 
---
 include/trace/events/sched.h | 192 +++
 1 file changed, 192 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 9b90c57..c506ed1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -5,9 +5,39 @@
 #define _TRACE_SCHED_H
 
 #include 
+#include 
+#include 
 #include 
 #include 
 
+#define SCHEDULING_POLICY  \
+   EM( SCHED_NORMAL,   "SCHED_NORMAL") \
+   EM( SCHED_FIFO, "SCHED_FIFO")   \
+   EM( SCHED_RR,   "SCHED_RR") \
+   EM( SCHED_BATCH,"SCHED_BATCH")  \
+   EM( SCHED_IDLE, "SCHED_IDLE")   \
+   EMe(SCHED_DEADLINE, "SCHED_DEADLINE")
+
+/*
+ * First define the enums in the above macros to be exported to userspace
+ * via TRACE_DEFINE_ENUM().
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)   TRACE_DEFINE_ENUM(a);
+#define EMe(a, b)  TRACE_DEFINE_ENUM(a);
+
+SCHEDULING_POLICY
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b)   {a, b},
+#define EMe(a, b)  {a, b}
+
 /*
  * Tracepoint for calling kthread_stop, performed to end a kthread:
  */
@@ -162,6 +192,168 @@ static inline long __trace_sched_switch_state(bool 
preempt, struct task_struct *
 );
 
 /*
+ * Tracepoint for task switches, performed by the scheduler where the next
+ * task has a fair scheduling policy.
+ */
+TRACE_EVENT_MAP_COND(sched_switch, sched_switch_fair,
+
+   TP_PROTO(bool preempt,
+struct task_struct *prev,
+struct task_struct *next),
+
+   TP_ARGS(preempt, prev, next),
+
+   TP_CONDITION(!dl_prio(next->prio) && !rt_prio(next->prio)),
+
+   TP_STRUCT__entry(
+   __array(char,   prev_comm,  TASK_COMM_LEN   )
+   __field(pid_t,  prev_pid)
+   __field(long,   prev_state  )
+   __array(char,   next_comm,  TASK_COMM_LEN   )
+   __field(pid_t,  next_pid)
+   __field(unsigned int, next_policy   )
+   __field(int,next_nice   )
+   ),
+
+   TP_fast_assign(
+   memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
+   __entry->prev_pid   = prev->pid;
+   __entry->prev_state = __trace_sched_switch_state(preempt, 
prev);
+   memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
+   __entry->next_pid   = next->pid;
+   __entry->next_policy= next->policy;
+   __entry->next_nice  = task_nice(next);
+   ),
+
+   TP_printk("prev_comm=%s prev_pid=%d prev_state=%s%s ==> next_comm=%s "
+   "next_pid=%d next_policy=%s next_nice=%d",
+   __entry->prev_comm, __entry->prev_pid,
+   __entry->prev_state & (TASK_STATE_MAX-1) ?
+ __print_flags(__entry->prev_state & (TASK_STATE_MAX-1), "|",
+   { 1, "S"} , { 2, "D" }, { 4, "T" }, { 8, "t" },
+   { 16, "Z" }, { 32, "X" }, { 64, "x" },
+   { 128, "K" }, { 256, "W" }, { 512, "P" },
+   { 1024, "N" }) : "R",
+

[PATCH 17/17] spi/topcliff-pch: One check less in pch_spi_set_tx()

2017-01-13 Thread SF Markus Elfring

From: Markus Elfring 
Date: Fri, 13 Jan 2017 17:30:46 +0100

Delete a duplicate check after a bit of exception handling was moved into
a previous if branch of this function.

Signed-off-by: Markus Elfring 
---
 drivers/spi/spi-topcliff-pch.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/spi/spi-topcliff-pch.c b/drivers/spi/spi-topcliff-pch.c
index 97fd1ea9826b..33043a830032 100644
--- a/drivers/spi/spi-topcliff-pch.c
+++ b/drivers/spi/spi-topcliff-pch.c
@@ -584,22 +584,25 @@ static void pch_spi_set_tx(struct pch_spi_data *data, int 
*bpw)
data->pkt_tx_buff = kzalloc(size, GFP_KERNEL);
if (data->pkt_tx_buff) {
data->pkt_rx_buff = kzalloc(size, GFP_KERNEL);
-   if (!data->pkt_rx_buff)
+   if (!data->pkt_rx_buff) {
kfree(data->pkt_tx_buff);
-   }
 
-   if (!data->pkt_rx_buff) {
-   /* flush queue and set status of all transfers to -ENOMEM */
-   list_for_each_entry_safe(pmsg, tmp, data->queue.next, queue) {
-   pmsg->status = -ENOMEM;
+   /*
+* Flush queue and set status of all transfers
+* to -ENOMEM.
+*/
+   list_for_each_entry_safe(pmsg, tmp, data->queue.next,
+queue) {
+   pmsg->status = -ENOMEM;
 
-   if (pmsg->complete)
-   pmsg->complete(pmsg->context);
+   if (pmsg->complete)
+   pmsg->complete(pmsg->context);
 
-   /* delete from queue */
-   list_del_init(>queue);
+   /* delete from queue */
+   list_del_init(>queue);
+   }
+   return;
}
-   return;
}
 
/* copy Tx Data */
-- 
2.11.0

[PATCH 17/17] spi/topcliff-pch: One check less in pch_spi_set_tx()

2017-01-13 Thread SF Markus Elfring

From: Markus Elfring 
Date: Fri, 13 Jan 2017 17:30:46 +0100

Delete a duplicate check after a bit of exception handling was moved into
a previous if branch of this function.

Signed-off-by: Markus Elfring 
---
 drivers/spi/spi-topcliff-pch.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/spi/spi-topcliff-pch.c b/drivers/spi/spi-topcliff-pch.c
index 97fd1ea9826b..33043a830032 100644
--- a/drivers/spi/spi-topcliff-pch.c
+++ b/drivers/spi/spi-topcliff-pch.c
@@ -584,22 +584,25 @@ static void pch_spi_set_tx(struct pch_spi_data *data, int 
*bpw)
data->pkt_tx_buff = kzalloc(size, GFP_KERNEL);
if (data->pkt_tx_buff) {
data->pkt_rx_buff = kzalloc(size, GFP_KERNEL);
-   if (!data->pkt_rx_buff)
+   if (!data->pkt_rx_buff) {
kfree(data->pkt_tx_buff);
-   }
 
-   if (!data->pkt_rx_buff) {
-   /* flush queue and set status of all transfers to -ENOMEM */
-   list_for_each_entry_safe(pmsg, tmp, data->queue.next, queue) {
-   pmsg->status = -ENOMEM;
+   /*
+* Flush queue and set status of all transfers
+* to -ENOMEM.
+*/
+   list_for_each_entry_safe(pmsg, tmp, data->queue.next,
+queue) {
+   pmsg->status = -ENOMEM;
 
-   if (pmsg->complete)
-   pmsg->complete(pmsg->context);
+   if (pmsg->complete)
+   pmsg->complete(pmsg->context);
 
-   /* delete from queue */
-   list_del_init(>queue);
+   /* delete from queue */
+   list_del_init(>queue);
+   }
+   return;
}
-   return;
}
 
/* copy Tx Data */
-- 
2.11.0

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1610 matches

Mail list logo