Re: [PATCH] Revert "arm64: Increase the max granular size"

2017-04-18 Thread Chalamarla, Tirumalesh


On 4/17/17, 12:35 AM, "Imran Khan" <kim...@codeaurora.org> wrote:

On 4/12/2017 7:30 PM, Chalamarla, Tirumalesh wrote:
> 
> 
> On 4/11/17, 10:13 PM, "linux-arm-kernel on behalf of Imran Khan" 
<linux-arm-kernel-boun...@lists.infradead.org on behalf of 
kim...@codeaurora.org> wrote:
> 
> On 4/7/2017 7:36 AM, Ganesh Mahendran wrote:
> > 2017-04-06 23:58 GMT+08:00 Catalin Marinas 
<catalin.mari...@arm.com>:
> >> On Thu, Apr 06, 2017 at 12:52:13PM +0530, Imran Khan wrote:
> >>> On 4/5/2017 10:13 AM, Imran Khan wrote:
> >>>>> We may have to revisit this logic and consider L1_CACHE_BYTES 
the
> >>>>> _minimum_ of cache line sizes in arm64 systems supported by the 
kernel.
> >>>>> Do you have any benchmarks on Cavium boards that would show 
significant
> >>>>> degradation with 64-byte L1_CACHE_BYTES vs 128?
> >>>>>
> >>>>> For non-coherent DMA, the simplest is to make ARCH_DMA_MINALIGN 
the
> >>>>> _maximum_ of the supported systems:
> >>>>>
> >>>>> diff --git a/arch/arm64/include/asm/cache.h 
b/arch/arm64/include/asm/cache.h
> >>>>> index 5082b30bc2c0..4b5d7b27edaf 100644
> >>>>> --- a/arch/arm64/include/asm/cache.h
> >>>>> +++ b/arch/arm64/include/asm/cache.h
> >>>>> @@ -18,17 +18,17 @@
> >>>>>
> >>>>>  #include 
> >>>>>
> >>>>> -#define L1_CACHE_SHIFT 7
> >>>>> +#define L1_CACHE_SHIFT 6
> >>>>>  #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
> >>>>>
> >>>>>  /*
> >>>>>   * Memory returned by kmalloc() may be used for DMA, so we 
must make
> >>>>> - * sure that all such allocations are cache aligned. Otherwise,
> >>>>> - * unrelated code may cause parts of the buffer to be read 
into the
> >>>>> - * cache before the transfer is done, causing old data to be 
seen by
> >>>>> - * the CPU.
> >>>>> + * sure that all such allocations are aligned to the maximum 
*known*
> >>>>> + * cache line size on ARMv8 systems. Otherwise, unrelated code 
may cause
> >>>>> + * parts of the buffer to be read into the cache before the 
transfer is
> >>>>> + * done, causing old data to be seen by the CPU.
> >>>>>   */
> >>>>> -#define ARCH_DMA_MINALIGN  L1_CACHE_BYTES
> >>>>> +#define ARCH_DMA_MINALIGN  (128)
> >>>>>
> >>>>>  #ifndef __ASSEMBLY__
> >>>>>
> >>>>> diff --git a/arch/arm64/kernel/cpufeature.c 
b/arch/arm64/kernel/cpufeature.c
> >>>>> index 392c67eb9fa6..30bafca1aebf 100644
> >>>>> --- a/arch/arm64/kernel/cpufeature.c
> >>>>> +++ b/arch/arm64/kernel/cpufeature.c
> >>>>> @@ -976,9 +976,9 @@ void __init setup_cpu_features(void)
> >>>>> if (!cwg)
> >>>>> pr_warn("No Cache Writeback Granule 
information, assuming
> >>>>> cache line size %d\n",
> >>>>> cls);
> >>>>> -   if (L1_CACHE_BYTES < cls)
> >>>>> -   pr_warn("L1_CACHE_BYTES smaller than the Cache 
Writeback Granule (%d < %d)\n",
> >>>>> -   L1_CACHE_BYTES, cls);
> >>>>> +   if (ARCH_DMA_MINALIGN < cls)
> >>>>> +   pr_warn("ARCH_DMA_MINALIGN smaller than the 
Cache Writeback Granule (%d < %d)\n",
> >>>>> +   ARCH_DMA_MINALIGN, cls);
> >>>>>  }
> >>>>>
> >>>>>  static bool __maybe_unused
> >>>>
> >>>> This change was discussed at: [1] but was not concluded as 
apparently no one
> >>>> came back with test report and numbers. After including this 
change in our
> >>>>

Re: [PATCH] Revert "arm64: Increase the max granular size"

2017-04-18 Thread Chalamarla, Tirumalesh


On 4/17/17, 12:35 AM, "Imran Khan"  wrote:

On 4/12/2017 7:30 PM, Chalamarla, Tirumalesh wrote:
> 
> 
> On 4/11/17, 10:13 PM, "linux-arm-kernel on behalf of Imran Khan" 
 wrote:
> 
> On 4/7/2017 7:36 AM, Ganesh Mahendran wrote:
> > 2017-04-06 23:58 GMT+08:00 Catalin Marinas 
:
> >> On Thu, Apr 06, 2017 at 12:52:13PM +0530, Imran Khan wrote:
> >>> On 4/5/2017 10:13 AM, Imran Khan wrote:
> >>>>> We may have to revisit this logic and consider L1_CACHE_BYTES 
the
> >>>>> _minimum_ of cache line sizes in arm64 systems supported by the 
kernel.
> >>>>> Do you have any benchmarks on Cavium boards that would show 
significant
> >>>>> degradation with 64-byte L1_CACHE_BYTES vs 128?
> >>>>>
> >>>>> For non-coherent DMA, the simplest is to make ARCH_DMA_MINALIGN 
the
> >>>>> _maximum_ of the supported systems:
> >>>>>
> >>>>> diff --git a/arch/arm64/include/asm/cache.h 
b/arch/arm64/include/asm/cache.h
> >>>>> index 5082b30bc2c0..4b5d7b27edaf 100644
> >>>>> --- a/arch/arm64/include/asm/cache.h
> >>>>> +++ b/arch/arm64/include/asm/cache.h
> >>>>> @@ -18,17 +18,17 @@
> >>>>>
> >>>>>  #include 
> >>>>>
> >>>>> -#define L1_CACHE_SHIFT 7
> >>>>> +#define L1_CACHE_SHIFT 6
> >>>>>  #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
> >>>>>
> >>>>>  /*
> >>>>>   * Memory returned by kmalloc() may be used for DMA, so we 
must make
> >>>>> - * sure that all such allocations are cache aligned. Otherwise,
> >>>>> - * unrelated code may cause parts of the buffer to be read 
into the
> >>>>> - * cache before the transfer is done, causing old data to be 
seen by
> >>>>> - * the CPU.
> >>>>> + * sure that all such allocations are aligned to the maximum 
*known*
> >>>>> + * cache line size on ARMv8 systems. Otherwise, unrelated code 
may cause
> >>>>> + * parts of the buffer to be read into the cache before the 
transfer is
> >>>>> + * done, causing old data to be seen by the CPU.
> >>>>>   */
> >>>>> -#define ARCH_DMA_MINALIGN  L1_CACHE_BYTES
> >>>>> +#define ARCH_DMA_MINALIGN  (128)
> >>>>>
> >>>>>  #ifndef __ASSEMBLY__
> >>>>>
> >>>>> diff --git a/arch/arm64/kernel/cpufeature.c 
b/arch/arm64/kernel/cpufeature.c
> >>>>> index 392c67eb9fa6..30bafca1aebf 100644
> >>>>> --- a/arch/arm64/kernel/cpufeature.c
> >>>>> +++ b/arch/arm64/kernel/cpufeature.c
> >>>>> @@ -976,9 +976,9 @@ void __init setup_cpu_features(void)
> >>>>> if (!cwg)
> >>>>> pr_warn("No Cache Writeback Granule 
information, assuming
> >>>>> cache line size %d\n",
> >>>>> cls);
> >>>>> -   if (L1_CACHE_BYTES < cls)
> >>>>> -   pr_warn("L1_CACHE_BYTES smaller than the Cache 
Writeback Granule (%d < %d)\n",
> >>>>> -   L1_CACHE_BYTES, cls);
> >>>>> +   if (ARCH_DMA_MINALIGN < cls)
> >>>>> +   pr_warn("ARCH_DMA_MINALIGN smaller than the 
Cache Writeback Granule (%d < %d)\n",
> >>>>> +   ARCH_DMA_MINALIGN, cls);
> >>>>>  }
> >>>>>
> >>>>>  static bool __maybe_unused
> >>>>
> >>>> This change was discussed at: [1] but was not concluded as 
apparently no one
> >>>> came back with test report and numbers. After including this 
change in our
> >>>> local kernel we are seeing significant throughput improvement. 
For example with:
> >>>>
> >>>> iperf -c 192.1

Re: [PATCH] Revert "arm64: Increase the max granular size"

2017-04-12 Thread Chalamarla, Tirumalesh


On 4/11/17, 10:13 PM, "linux-arm-kernel on behalf of Imran Khan" 
 wrote:

On 4/7/2017 7:36 AM, Ganesh Mahendran wrote:
> 2017-04-06 23:58 GMT+08:00 Catalin Marinas :
>> On Thu, Apr 06, 2017 at 12:52:13PM +0530, Imran Khan wrote:
>>> On 4/5/2017 10:13 AM, Imran Khan wrote:
> We may have to revisit this logic and consider L1_CACHE_BYTES the
> _minimum_ of cache line sizes in arm64 systems supported by the 
kernel.
> Do you have any benchmarks on Cavium boards that would show 
significant
> degradation with 64-byte L1_CACHE_BYTES vs 128?
>
> For non-coherent DMA, the simplest is to make ARCH_DMA_MINALIGN the
> _maximum_ of the supported systems:
>
> diff --git a/arch/arm64/include/asm/cache.h 
b/arch/arm64/include/asm/cache.h
> index 5082b30bc2c0..4b5d7b27edaf 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -18,17 +18,17 @@
>
>  #include 
>
> -#define L1_CACHE_SHIFT 7
> +#define L1_CACHE_SHIFT 6
>  #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
>
>  /*
>   * Memory returned by kmalloc() may be used for DMA, so we must make
> - * sure that all such allocations are cache aligned. Otherwise,
> - * unrelated code may cause parts of the buffer to be read into the
> - * cache before the transfer is done, causing old data to be seen by
> - * the CPU.
> + * sure that all such allocations are aligned to the maximum *known*
> + * cache line size on ARMv8 systems. Otherwise, unrelated code may 
cause
> + * parts of the buffer to be read into the cache before the transfer 
is
> + * done, causing old data to be seen by the CPU.
>   */
> -#define ARCH_DMA_MINALIGN  L1_CACHE_BYTES
> +#define ARCH_DMA_MINALIGN  (128)
>
>  #ifndef __ASSEMBLY__
>
> diff --git a/arch/arm64/kernel/cpufeature.c 
b/arch/arm64/kernel/cpufeature.c
> index 392c67eb9fa6..30bafca1aebf 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -976,9 +976,9 @@ void __init setup_cpu_features(void)
> if (!cwg)
> pr_warn("No Cache Writeback Granule information, 
assuming
> cache line size %d\n",
> cls);
> -   if (L1_CACHE_BYTES < cls)
> -   pr_warn("L1_CACHE_BYTES smaller than the Cache 
Writeback Granule (%d < %d)\n",
> -   L1_CACHE_BYTES, cls);
> +   if (ARCH_DMA_MINALIGN < cls)
> +   pr_warn("ARCH_DMA_MINALIGN smaller than the Cache 
Writeback Granule (%d < %d)\n",
> +   ARCH_DMA_MINALIGN, cls);
>  }
>
>  static bool __maybe_unused

 This change was discussed at: [1] but was not concluded as apparently 
no one
 came back with test report and numbers. After including this change in 
our
 local kernel we are seeing significant throughput improvement. For 
example with:

 iperf -c 192.168.1.181 -i 1 -w 128K -t 60

 The average throughput is improving by about 30% (230Mbps from 
180Mbps).
 Could you please let us know if this change can be included in 
upstream kernel.

 [1]: https://groups.google.com/forum/#!topic/linux.kernel/P40yDB90ePs
>>>
>>> Could you please provide some feedback about the above mentioned query ?
>>
>> Do you have an explanation on the performance variation when
>> L1_CACHE_BYTES is changed? We'd need to understand how the network stack
>> is affected by L1_CACHE_BYTES, in which context it uses it (is it for
>> non-coherent DMA?).
> 
> network stack use SKB_DATA_ALIGN to align.
> ---
> #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \
> ~(SMP_CACHE_BYTES - 1))
> 
> #define SMP_CACHE_BYTES L1_CACHE_BYTES
> ---
> I think this is the reason of performance regression.
> 

Yes this is the reason for performance regression. Due to increases L1 
cache alignment the 
object is coming from next kmalloc slab and skb->truesize is changing from 
2304 bytes to 
4352 bytes. This in turn increases sk_wmem_alloc which causes queuing of 
less send buffers.

We tried different benchmarks and found none which really affects with Cache 
line change. If there is no correctness issue,
I think we are fine with reverting the patch.

Though I still think it is beneficiary to do some more investigation for the 
perf loss, who knows 32 bit align or no align might 
Give even more perf benefit. 


Thanks,

Re: [PATCH] Revert "arm64: Increase the max granular size"

2017-04-12 Thread Chalamarla, Tirumalesh


On 4/11/17, 10:13 PM, "linux-arm-kernel on behalf of Imran Khan" 
 wrote:

On 4/7/2017 7:36 AM, Ganesh Mahendran wrote:
> 2017-04-06 23:58 GMT+08:00 Catalin Marinas :
>> On Thu, Apr 06, 2017 at 12:52:13PM +0530, Imran Khan wrote:
>>> On 4/5/2017 10:13 AM, Imran Khan wrote:
> We may have to revisit this logic and consider L1_CACHE_BYTES the
> _minimum_ of cache line sizes in arm64 systems supported by the 
kernel.
> Do you have any benchmarks on Cavium boards that would show 
significant
> degradation with 64-byte L1_CACHE_BYTES vs 128?
>
> For non-coherent DMA, the simplest is to make ARCH_DMA_MINALIGN the
> _maximum_ of the supported systems:
>
> diff --git a/arch/arm64/include/asm/cache.h 
b/arch/arm64/include/asm/cache.h
> index 5082b30bc2c0..4b5d7b27edaf 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -18,17 +18,17 @@
>
>  #include 
>
> -#define L1_CACHE_SHIFT 7
> +#define L1_CACHE_SHIFT 6
>  #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
>
>  /*
>   * Memory returned by kmalloc() may be used for DMA, so we must make
> - * sure that all such allocations are cache aligned. Otherwise,
> - * unrelated code may cause parts of the buffer to be read into the
> - * cache before the transfer is done, causing old data to be seen by
> - * the CPU.
> + * sure that all such allocations are aligned to the maximum *known*
> + * cache line size on ARMv8 systems. Otherwise, unrelated code may 
cause
> + * parts of the buffer to be read into the cache before the transfer 
is
> + * done, causing old data to be seen by the CPU.
>   */
> -#define ARCH_DMA_MINALIGN  L1_CACHE_BYTES
> +#define ARCH_DMA_MINALIGN  (128)
>
>  #ifndef __ASSEMBLY__
>
> diff --git a/arch/arm64/kernel/cpufeature.c 
b/arch/arm64/kernel/cpufeature.c
> index 392c67eb9fa6..30bafca1aebf 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -976,9 +976,9 @@ void __init setup_cpu_features(void)
> if (!cwg)
> pr_warn("No Cache Writeback Granule information, 
assuming
> cache line size %d\n",
> cls);
> -   if (L1_CACHE_BYTES < cls)
> -   pr_warn("L1_CACHE_BYTES smaller than the Cache 
Writeback Granule (%d < %d)\n",
> -   L1_CACHE_BYTES, cls);
> +   if (ARCH_DMA_MINALIGN < cls)
> +   pr_warn("ARCH_DMA_MINALIGN smaller than the Cache 
Writeback Granule (%d < %d)\n",
> +   ARCH_DMA_MINALIGN, cls);
>  }
>
>  static bool __maybe_unused

 This change was discussed at: [1] but was not concluded as apparently 
no one
 came back with test report and numbers. After including this change in 
our
 local kernel we are seeing significant throughput improvement. For 
example with:

 iperf -c 192.168.1.181 -i 1 -w 128K -t 60

 The average throughput is improving by about 30% (230Mbps from 
180Mbps).
 Could you please let us know if this change can be included in 
upstream kernel.

 [1]: https://groups.google.com/forum/#!topic/linux.kernel/P40yDB90ePs
>>>
>>> Could you please provide some feedback about the above mentioned query ?
>>
>> Do you have an explanation on the performance variation when
>> L1_CACHE_BYTES is changed? We'd need to understand how the network stack
>> is affected by L1_CACHE_BYTES, in which context it uses it (is it for
>> non-coherent DMA?).
> 
> network stack use SKB_DATA_ALIGN to align.
> ---
> #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \
> ~(SMP_CACHE_BYTES - 1))
> 
> #define SMP_CACHE_BYTES L1_CACHE_BYTES
> ---
> I think this is the reason of performance regression.
> 

Yes this is the reason for performance regression. Due to increases L1 
cache alignment the 
object is coming from next kmalloc slab and skb->truesize is changing from 
2304 bytes to 
4352 bytes. This in turn increases sk_wmem_alloc which causes queuing of 
less send buffers.

We tried different benchmarks and found none which really affects with Cache 
line change. If there is no correctness issue,
I think we are fine with reverting the patch.

Though I still think it is beneficiary to do some more investigation for the 
perf loss, who knows 32 bit align or no align might 
Give even more perf benefit. 


Thanks,
Tirumalesh.  
>>
>> The Cavium guys haven't shown any numbers (IIUC) to back the
>> L1_CACHE_BYTES 

Re: [PATCH] PCI: Add cavium acs pci quirk

2017-02-27 Thread Chalamarla, Tirumalesh


On 2/27/17, 11:02 AM, "David Daney"  wrote:

On 02/14/2017 07:07 AM, Bjorn Helgaas wrote:
> On Mon, Feb 13, 2017 at 09:44:57PM -0700, Alex Williamson wrote:
>> On Sat, 30 Jan 2016 01:33:58 +0530
>> Manish Jaggi  wrote:
>>
>>> Cavium devices matching this quirk do not perform
>>> peer-to-peer with other functions, allowing masking out
>>> these bits as if they were unimplemented in the ACS capability.
>>>
>>> Acked-by: Tirumalesh Chalamarla 
>>> Signed-off-by: Manish Jaggi 
>>> ---
>>>  drivers/pci/quirks.c | 15 +++
>>>  1 file changed, 15 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 7e32730..a300fa6 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -3814,6 +3814,19 @@ static int pci_quirk_amd_sb_acs(struct pci_dev 
*dev, u16 acs_flags)
>>>  #endif
>>>  }
>>>
>>> +static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)
>>> +{
>>> +   /*
>>> +* Cavium devices matching this quirk do not perform
>>> +* peer-to-peer with other functions, allowing masking out
>>> +* these bits as if they were unimplemented in the ACS 
capability.
>>> +*/
>>> +   acs_flags &= ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR |
>>> +  PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT);
>>> +
>>> +   return acs_flags ? 0 : 1;
>>> +}
>>> +
>>>  /*
>>>   * Many Intel PCH root ports do provide ACS-like features to disable 
peer
>>>   * transactions and validate bus numbers in requests, but do not 
provide an
>>> @@ -3966,6 +3979,8 @@ static const struct pci_dev_acs_enabled {
>>> { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_intel_pch_acs },
>>> { 0x19a2, 0x710, pci_quirk_mf_endpoint_acs }, /* Emulex BE3-R */
>>> { 0x10df, 0x720, pci_quirk_mf_endpoint_acs }, /* Emulex 
Skyhawk-R */
>>> +   /* Cavium ThunderX */
>>> +   { PCI_VENDOR_ID_CAVIUM, PCI_ANY_ID, pci_quirk_cavium_acs },
>>> { 0 }
>>>  };
>>>
>>
>> Apologies for not catching this, but what sort of crystal ball do you
>> have that can predict not only current devices, but future devices will
>> not support peer-to-peer features?  Is there an internal design
>> guidelines reference specification for Cavium that we can realistically
>> expect this to remain consistent, or is this just an attempt to never
>> think about ACS again at the customer's peril?  What about the existing
>> non-ThunderX products with Cavium vendor ID, does this really apply to
>> those?  I would strongly suggest taking the device ID into account.
>> See examples like the pci_quirk_intel_pch_acs quirk where the initial
>> filter is PCI_ANY_ID, but specific device types and ranges of device
>> IDs are identified by the function for evaluation.  This seems reckless
>> to me and I'd advise that it be reverted.  Thanks,
>
> I'd be happy to revert this, but it would be easier if somebody sent a
> patch and a changelog.
>

I agree that it should be reverted.

I was hoping that Manish or Tirumalesh would fix this properly by only 
activating the quirk on the faulty hardware, but such a fix has never 
appeared.

This was supposed to be true for all Cavium chips. We got such answer from 
Hardware architects, but it might change in the future. 
I will be happy to replace this with DEVICE_ID.
Manish,
Could you please take this u, if not I will send a patch to do this.

Tirumalesh. 

David Daney







Re: [PATCH] PCI: Add cavium acs pci quirk

2017-02-27 Thread Chalamarla, Tirumalesh


On 2/27/17, 11:02 AM, "David Daney"  wrote:

On 02/14/2017 07:07 AM, Bjorn Helgaas wrote:
> On Mon, Feb 13, 2017 at 09:44:57PM -0700, Alex Williamson wrote:
>> On Sat, 30 Jan 2016 01:33:58 +0530
>> Manish Jaggi  wrote:
>>
>>> Cavium devices matching this quirk do not perform
>>> peer-to-peer with other functions, allowing masking out
>>> these bits as if they were unimplemented in the ACS capability.
>>>
>>> Acked-by: Tirumalesh Chalamarla 
>>> Signed-off-by: Manish Jaggi 
>>> ---
>>>  drivers/pci/quirks.c | 15 +++
>>>  1 file changed, 15 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 7e32730..a300fa6 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -3814,6 +3814,19 @@ static int pci_quirk_amd_sb_acs(struct pci_dev 
*dev, u16 acs_flags)
>>>  #endif
>>>  }
>>>
>>> +static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags)
>>> +{
>>> +   /*
>>> +* Cavium devices matching this quirk do not perform
>>> +* peer-to-peer with other functions, allowing masking out
>>> +* these bits as if they were unimplemented in the ACS 
capability.
>>> +*/
>>> +   acs_flags &= ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR |
>>> +  PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT);
>>> +
>>> +   return acs_flags ? 0 : 1;
>>> +}
>>> +
>>>  /*
>>>   * Many Intel PCH root ports do provide ACS-like features to disable 
peer
>>>   * transactions and validate bus numbers in requests, but do not 
provide an
>>> @@ -3966,6 +3979,8 @@ static const struct pci_dev_acs_enabled {
>>> { PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_intel_pch_acs },
>>> { 0x19a2, 0x710, pci_quirk_mf_endpoint_acs }, /* Emulex BE3-R */
>>> { 0x10df, 0x720, pci_quirk_mf_endpoint_acs }, /* Emulex 
Skyhawk-R */
>>> +   /* Cavium ThunderX */
>>> +   { PCI_VENDOR_ID_CAVIUM, PCI_ANY_ID, pci_quirk_cavium_acs },
>>> { 0 }
>>>  };
>>>
>>
>> Apologies for not catching this, but what sort of crystal ball do you
>> have that can predict not only current devices, but future devices will
>> not support peer-to-peer features?  Is there an internal design
>> guidelines reference specification for Cavium that we can realistically
>> expect this to remain consistent, or is this just an attempt to never
>> think about ACS again at the customer's peril?  What about the existing
>> non-ThunderX products with Cavium vendor ID, does this really apply to
>> those?  I would strongly suggest taking the device ID into account.
>> See examples like the pci_quirk_intel_pch_acs quirk where the initial
>> filter is PCI_ANY_ID, but specific device types and ranges of device
>> IDs are identified by the function for evaluation.  This seems reckless
>> to me and I'd advise that it be reverted.  Thanks,
>
> I'd be happy to revert this, but it would be easier if somebody sent a
> patch and a changelog.
>

I agree that it should be reverted.

I was hoping that Manish or Tirumalesh would fix this properly by only 
activating the quirk on the faulty hardware, but such a fix has never 
appeared.

This was supposed to be true for all Cavium chips. We got such answer from 
Hardware architects, but it might change in the future. 
I will be happy to replace this with DEVICE_ID.
Manish,
Could you please take this u, if not I will send a patch to do this.

Tirumalesh. 

David Daney







Re: [PATCH v9 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes

2016-05-10 Thread Chalamarla, Tirumalesh





On 5/10/16, 12:34 AM, "Eric Auger" <eric.au...@linaro.org> wrote:

>Hi Chalamarla,
>> On 5/9/16, 12:48 AM, "Eric Auger" <eric.au...@linaro.org> wrote:
>> 
>>> Hi Chalarmala,
>>> On 05/05/2016 07:44 PM, Chalamarla, Tirumalesh wrote:
>>>> Hi Eric,
>>>>
>>>> Does this series supports gicv3-its emulation?
>>>> Do we have a tree with all the dependent patches
>>> GICv3 ITS emulation support comes with:
>>> [PATCH v4 00/12] KVM: arm64: GICv3 ITS emulation
>>> http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/5738
>>>
>>> My series just allows PCI device MSI transactions to reach the host MSI
>>> frame through the SMMU. Only host GICv2m has been tested at the moment
>>> with a guest exposed with a GICv2m too.
>> 
>> 
>> I wanted to test this series on Cavium Thunder, but the only mode supported 
>> is Gicv3 with ITS, is there a chance 
>> That I get a tree with all the dependencies?
>I can prepare a kernel tree with all the dependencies including ITS
>patches supporting get_doorbell_info.
>
>Then on userspace side:
>- do you plan to use QEMU?
>- then can you expose your guest with a GICv3 + v2m? I think this should
>work.
>
>exposing ITS to the guest requires a respin of Pavel's series:
>[Qemu-devel] [RFC PATCH v3 0/5] vITS support
>(https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg05197.html)
>and has more complex kernel dependencies.
>
>I think exposing the guest with GICv3 + v2m should work

Will try both. We have a qemu with that patches will try once the kernel tree 
is ready.

Thanks,
Tirumalesh.
>
>Best Regards
>
>Eric
>>>> Thanks,
>>>> Tirumalesh. 
>>>> On 5/4/16, 8:18 AM, "linux-arm-kernel on behalf of Eric Auger" 
>>>> <linux-arm-kernel-boun...@lists.infradead.org on behalf of 
>>>> eric.au...@linaro.org> wrote:
>>>>
>>>>> This series implements the MSI address mapping/unmapping in the MSI layer.
>>>>> IOMMU binding happens on pci_enable_msi since this function can sleep and
>>>>> return errors. On msi_domain_set_affinity, msi_domain_(de)activate, which
>>>>> are not allowed to sleep, we simply look for the already existing binding.
>>>>>
>>>>> A new MSI domain info flag value is introduced to report whether the msi
>>>>> domain implements IRQ remapping. GIC v3 ITS is the first MSI controller
>>>>> advertising it. This flag value will be used by VFIO subsystem to
>>>>> determine whether MSI forwarding is safe.
>>>>>
>>>>> More details & context can be found at:
>>>>> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>> Applies on top of PART 1/3. Also depends on
>>>>> [PATCH 1/3] iommu: Add MMIO mapping type,
>>>>> http://comments.gmane.org/gmane.linux.kernel.iommu/12869
>>>>>
>>>>> Git: complete series available at
>>>>> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9-msi-v9
>>>>>
>>>>> This branch contains all parts in v9.
>>>>>
>>>>> History:
>>>>>
>>>>> v8 -> v9:
>>>>> - use a union in irq_chip_msi_doorbell_info + boolean telling whether the
>>>>>  doorbell is percpu
>>>>> - decouple irq_data parsing from the actual mapping/unmapping in
>>>>>  msi_handle_doorbell_mappings
>>>>> - fix misc style issues
>>>>>
>>>>> v7 -> v8:
>>>>> take into account Marc's comments:
>>>>> - use iommu_msi_msg_pa_to_va with new proto
>>>>> - change in irq_chip_msi_doorbell_info struct definition:
>>>>>  prot and size became shared between all doorbells and phys_addr_t 
>>>>> __percpu
>>>>> - cleanups in v2m irqchip
>>>>> - eventually did not touch MSI_FLAG_IRQ_REMAPPING naming
>>>>> - On msi_handle_doorbell_mappings, stop on the first irqchip where 
>>>>> doorbells
>>>>>  can be found
>>>>> - fix resource deallocation on mapping failure in msi_domain_alloc_irqs
>>>>>
>>>>> v6 -> v7:
>>>>> - do alloc/map handling on pci_enable_msi and search on 
>>>>&

Re: [PATCH v9 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes

2016-05-10 Thread Chalamarla, Tirumalesh





On 5/10/16, 12:34 AM, "Eric Auger"  wrote:

>Hi Chalamarla,
>> On 5/9/16, 12:48 AM, "Eric Auger"  wrote:
>> 
>>> Hi Chalarmala,
>>> On 05/05/2016 07:44 PM, Chalamarla, Tirumalesh wrote:
>>>> Hi Eric,
>>>>
>>>> Does this series supports gicv3-its emulation?
>>>> Do we have a tree with all the dependent patches
>>> GICv3 ITS emulation support comes with:
>>> [PATCH v4 00/12] KVM: arm64: GICv3 ITS emulation
>>> http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/5738
>>>
>>> My series just allows PCI device MSI transactions to reach the host MSI
>>> frame through the SMMU. Only host GICv2m has been tested at the moment
>>> with a guest exposed with a GICv2m too.
>> 
>> 
>> I wanted to test this series on Cavium Thunder, but the only mode supported 
>> is Gicv3 with ITS, is there a chance 
>> That I get a tree with all the dependencies?
>I can prepare a kernel tree with all the dependencies including ITS
>patches supporting get_doorbell_info.
>
>Then on userspace side:
>- do you plan to use QEMU?
>- then can you expose your guest with a GICv3 + v2m? I think this should
>work.
>
>exposing ITS to the guest requires a respin of Pavel's series:
>[Qemu-devel] [RFC PATCH v3 0/5] vITS support
>(https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg05197.html)
>and has more complex kernel dependencies.
>
>I think exposing the guest with GICv3 + v2m should work

Will try both. We have a qemu with that patches will try once the kernel tree 
is ready.

Thanks,
Tirumalesh.
>
>Best Regards
>
>Eric
>>>> Thanks,
>>>> Tirumalesh. 
>>>> On 5/4/16, 8:18 AM, "linux-arm-kernel on behalf of Eric Auger" 
>>>> >>> eric.au...@linaro.org> wrote:
>>>>
>>>>> This series implements the MSI address mapping/unmapping in the MSI layer.
>>>>> IOMMU binding happens on pci_enable_msi since this function can sleep and
>>>>> return errors. On msi_domain_set_affinity, msi_domain_(de)activate, which
>>>>> are not allowed to sleep, we simply look for the already existing binding.
>>>>>
>>>>> A new MSI domain info flag value is introduced to report whether the msi
>>>>> domain implements IRQ remapping. GIC v3 ITS is the first MSI controller
>>>>> advertising it. This flag value will be used by VFIO subsystem to
>>>>> determine whether MSI forwarding is safe.
>>>>>
>>>>> More details & context can be found at:
>>>>> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>> Applies on top of PART 1/3. Also depends on
>>>>> [PATCH 1/3] iommu: Add MMIO mapping type,
>>>>> http://comments.gmane.org/gmane.linux.kernel.iommu/12869
>>>>>
>>>>> Git: complete series available at
>>>>> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9-msi-v9
>>>>>
>>>>> This branch contains all parts in v9.
>>>>>
>>>>> History:
>>>>>
>>>>> v8 -> v9:
>>>>> - use a union in irq_chip_msi_doorbell_info + boolean telling whether the
>>>>>  doorbell is percpu
>>>>> - decouple irq_data parsing from the actual mapping/unmapping in
>>>>>  msi_handle_doorbell_mappings
>>>>> - fix misc style issues
>>>>>
>>>>> v7 -> v8:
>>>>> take into account Marc's comments:
>>>>> - use iommu_msi_msg_pa_to_va with new proto
>>>>> - change in irq_chip_msi_doorbell_info struct definition:
>>>>>  prot and size became shared between all doorbells and phys_addr_t 
>>>>> __percpu
>>>>> - cleanups in v2m irqchip
>>>>> - eventually did not touch MSI_FLAG_IRQ_REMAPPING naming
>>>>> - On msi_handle_doorbell_mappings, stop on the first irqchip where 
>>>>> doorbells
>>>>>  can be found
>>>>> - fix resource deallocation on mapping failure in msi_domain_alloc_irqs
>>>>>
>>>>> v6 -> v7:
>>>>> - do alloc/map handling on pci_enable_msi and search on 
>>>>> msi_(de)domain_activate
>>>>> - add msi_doorbell_info callback in irq-chip to retrieve the 
>>>>> characteristi

Re: [PATCH v9 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes

2016-05-09 Thread Chalamarla, Tirumalesh





On 5/9/16, 12:48 AM, "Eric Auger" <eric.au...@linaro.org> wrote:

>Hi Chalarmala,
>On 05/05/2016 07:44 PM, Chalamarla, Tirumalesh wrote:
>> Hi Eric,
>> 
>> Does this series supports gicv3-its emulation?
>> Do we have a tree with all the dependent patches
>GICv3 ITS emulation support comes with:
>[PATCH v4 00/12] KVM: arm64: GICv3 ITS emulation
>http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/5738
>
>My series just allows PCI device MSI transactions to reach the host MSI
>frame through the SMMU. Only host GICv2m has been tested at the moment
>with a guest exposed with a GICv2m too.


I wanted to test this series on Cavium Thunder, but the only mode supported is 
Gicv3 with ITS, is there a chance 
That I get a tree with all the dependencies?
>
>Best Regards
>
>Eric
>
>
>> 
>> 
>> 
>> 
>> 
>> Thanks,
>> Tirumalesh. 
>> On 5/4/16, 8:18 AM, "linux-arm-kernel on behalf of Eric Auger" 
>> <linux-arm-kernel-boun...@lists.infradead.org on behalf of 
>> eric.au...@linaro.org> wrote:
>> 
>>> This series implements the MSI address mapping/unmapping in the MSI layer.
>>> IOMMU binding happens on pci_enable_msi since this function can sleep and
>>> return errors. On msi_domain_set_affinity, msi_domain_(de)activate, which
>>> are not allowed to sleep, we simply look for the already existing binding.
>>>
>>> A new MSI domain info flag value is introduced to report whether the msi
>>> domain implements IRQ remapping. GIC v3 ITS is the first MSI controller
>>> advertising it. This flag value will be used by VFIO subsystem to
>>> determine whether MSI forwarding is safe.
>>>
>>> More details & context can be found at:
>>> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> Applies on top of PART 1/3. Also depends on
>>> [PATCH 1/3] iommu: Add MMIO mapping type,
>>> http://comments.gmane.org/gmane.linux.kernel.iommu/12869
>>>
>>> Git: complete series available at
>>> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9-msi-v9
>>>
>>> This branch contains all parts in v9.
>>>
>>> History:
>>>
>>> v8 -> v9:
>>> - use a union in irq_chip_msi_doorbell_info + boolean telling whether the
>>>  doorbell is percpu
>>> - decouple irq_data parsing from the actual mapping/unmapping in
>>>  msi_handle_doorbell_mappings
>>> - fix misc style issues
>>>
>>> v7 -> v8:
>>> take into account Marc's comments:
>>> - use iommu_msi_msg_pa_to_va with new proto
>>> - change in irq_chip_msi_doorbell_info struct definition:
>>>  prot and size became shared between all doorbells and phys_addr_t __percpu
>>> - cleanups in v2m irqchip
>>> - eventually did not touch MSI_FLAG_IRQ_REMAPPING naming
>>> - On msi_handle_doorbell_mappings, stop on the first irqchip where doorbells
>>>  can be found
>>> - fix resource deallocation on mapping failure in msi_domain_alloc_irqs
>>>
>>> v6 -> v7:
>>> - do alloc/map handling on pci_enable_msi and search on 
>>> msi_(de)domain_activate
>>> - add msi_doorbell_info callback in irq-chip to retrieve the characteristics
>>>  of doorbells
>>>
>>> RFC v5 -> patch v6:
>>> - split to ease the review process
>>> - rebase on default iommu domain code (irq_data_to_msi_mapping_domain
>>>  checks IOMMU_DOMAIN_DMA type)
>>> - fix unmap sequence on msi_domain_set_affinity (reported by Marc):
>>>  unmap the previous doorbell when the new one has been mapped & written to
>>>  the device, ie. irq_chip_write_msi_msg.
>>> - "msi: msi_compose wrapper removed" following change above
>>> - add size parameter to iommu_get_reserved_iova API following Marc's request
>>>
>>> RFC v4 -> RFC v5:
>>> - take into account Thomas' comments on MSI related patches
>>>  - split "msi: IOMMU map the doorbell address when needed"
>>>  - increase readability and add comments
>>>  - fix style issues
>>> - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
>>> - platform ITS now advertises IOMMU_CAP_INTR_REMAP
>>> - fix compilation issue with CONFIG_IOMMU API unset
>>> - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING
>>>
>>> RFC v3 -> v4:
>>> - Move doorbell

Re: [PATCH v9 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes

2016-05-09 Thread Chalamarla, Tirumalesh





On 5/9/16, 12:48 AM, "Eric Auger"  wrote:

>Hi Chalarmala,
>On 05/05/2016 07:44 PM, Chalamarla, Tirumalesh wrote:
>> Hi Eric,
>> 
>> Does this series supports gicv3-its emulation?
>> Do we have a tree with all the dependent patches
>GICv3 ITS emulation support comes with:
>[PATCH v4 00/12] KVM: arm64: GICv3 ITS emulation
>http://permalink.gmane.org/gmane.comp.emulators.kvm.arm.devel/5738
>
>My series just allows PCI device MSI transactions to reach the host MSI
>frame through the SMMU. Only host GICv2m has been tested at the moment
>with a guest exposed with a GICv2m too.


I wanted to test this series on Cavium Thunder, but the only mode supported is 
Gicv3 with ITS, is there a chance 
That I get a tree with all the dependencies?
>
>Best Regards
>
>Eric
>
>
>> 
>> 
>> 
>> 
>> 
>> Thanks,
>> Tirumalesh. 
>> On 5/4/16, 8:18 AM, "linux-arm-kernel on behalf of Eric Auger" 
>> > eric.au...@linaro.org> wrote:
>> 
>>> This series implements the MSI address mapping/unmapping in the MSI layer.
>>> IOMMU binding happens on pci_enable_msi since this function can sleep and
>>> return errors. On msi_domain_set_affinity, msi_domain_(de)activate, which
>>> are not allowed to sleep, we simply look for the already existing binding.
>>>
>>> A new MSI domain info flag value is introduced to report whether the msi
>>> domain implements IRQ remapping. GIC v3 ITS is the first MSI controller
>>> advertising it. This flag value will be used by VFIO subsystem to
>>> determine whether MSI forwarding is safe.
>>>
>>> More details & context can be found at:
>>> http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>>>
>>> Best Regards
>>>
>>> Eric
>>>
>>> Applies on top of PART 1/3. Also depends on
>>> [PATCH 1/3] iommu: Add MMIO mapping type,
>>> http://comments.gmane.org/gmane.linux.kernel.iommu/12869
>>>
>>> Git: complete series available at
>>> https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9-msi-v9
>>>
>>> This branch contains all parts in v9.
>>>
>>> History:
>>>
>>> v8 -> v9:
>>> - use a union in irq_chip_msi_doorbell_info + boolean telling whether the
>>>  doorbell is percpu
>>> - decouple irq_data parsing from the actual mapping/unmapping in
>>>  msi_handle_doorbell_mappings
>>> - fix misc style issues
>>>
>>> v7 -> v8:
>>> take into account Marc's comments:
>>> - use iommu_msi_msg_pa_to_va with new proto
>>> - change in irq_chip_msi_doorbell_info struct definition:
>>>  prot and size became shared between all doorbells and phys_addr_t __percpu
>>> - cleanups in v2m irqchip
>>> - eventually did not touch MSI_FLAG_IRQ_REMAPPING naming
>>> - On msi_handle_doorbell_mappings, stop on the first irqchip where doorbells
>>>  can be found
>>> - fix resource deallocation on mapping failure in msi_domain_alloc_irqs
>>>
>>> v6 -> v7:
>>> - do alloc/map handling on pci_enable_msi and search on 
>>> msi_(de)domain_activate
>>> - add msi_doorbell_info callback in irq-chip to retrieve the characteristics
>>>  of doorbells
>>>
>>> RFC v5 -> patch v6:
>>> - split to ease the review process
>>> - rebase on default iommu domain code (irq_data_to_msi_mapping_domain
>>>  checks IOMMU_DOMAIN_DMA type)
>>> - fix unmap sequence on msi_domain_set_affinity (reported by Marc):
>>>  unmap the previous doorbell when the new one has been mapped & written to
>>>  the device, ie. irq_chip_write_msi_msg.
>>> - "msi: msi_compose wrapper removed" following change above
>>> - add size parameter to iommu_get_reserved_iova API following Marc's request
>>>
>>> RFC v4 -> RFC v5:
>>> - take into account Thomas' comments on MSI related patches
>>>  - split "msi: IOMMU map the doorbell address when needed"
>>>  - increase readability and add comments
>>>  - fix style issues
>>> - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
>>> - platform ITS now advertises IOMMU_CAP_INTR_REMAP
>>> - fix compilation issue with CONFIG_IOMMU API unset
>>> - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING
>>>
>>> RFC v3 -> v4:
>>> - Move doorbell mapping/unmapping in msi.c
>>> - fix ref count issue on set_affinity: in 

Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration

2016-05-05 Thread Chalamarla, Tirumalesh





On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" 
 wrote:

>The user is allowed to register a reserved MSI IOVA range by using the
>DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>This region is stored in the vfio_dma rb tree. At that point the iova
>range is not mapped to any target address yet. The host kernel will use
>those iova when needed, typically when MSIs are allocated.
>
>Signed-off-by: Eric Auger 
>Signed-off-by: Bharat Bhushan 
>
>---
>v7 -> v8:
>- use iommu_msi_set_aperture function. There is no notion of
>  unregistration anymore since the reserved msi slot remains
>  until the container gets closed.
>
>v6 -> v7:
>- use iommu_free_reserved_iova_domain
>- convey prot attributes downto dma-reserved-iommu iova domain creation
>- reserved bindings teardown now performed on iommu domain destruction
>- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>- change title
>- pass the protection attribute to dma-reserved-iommu API
>
>v3 -> v4:
>- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>- protect vfio_register_reserved_iova_range implementation with
>  CONFIG_IOMMU_DMA_RESERVED
>- handle unregistration by user-space and on vfio_iommu_type1 release
>
>v1 -> v2:
>- set returned value according to alloc_reserved_iova_domain result
>- free the iova domains in case any error occurs
>
>RFC v1 -> v1:
>- takes into account Alex comments, based on
>  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>- use the existing dma map/unmap ioctl interface with a flag to register
>  a reserved IOVA range. A single reserved iova region is allowed.
>---
> drivers/vfio/vfio_iommu_type1.c | 78 -
> include/uapi/linux/vfio.h   | 10 +-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 94a9916..4d3a6f1 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -36,6 +36,7 @@
> #include 
> #include 
> #include 
>+#include 
> 
> #define DRIVER_VERSION  "0.2"
> #define DRIVER_AUTHOR   "Alex Williamson "
>@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, 
>struct vfio_dma *dma)
>   vfio_lock_acct(-unlocked);
> }
> 
>+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>+  dma_addr_t iova, size_t size)
>+{
>+  struct vfio_domain *d;
>+  int ret = 0;
>+
>+  list_for_each_entry(d, >domain_list, next) {
>+  ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>+  if (ret)
>+  break;
>+  }
>+  return ret;
>+}
>+
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
>   vfio_unmap_unpin(iommu, dma);
>@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>   return ret;
> }
> 
>+static int vfio_register_msi_range(struct vfio_iommu *iommu,
>+ struct vfio_iommu_type1_dma_map *map)
>+{
>+  dma_addr_t iova = map->iova;
>+  size_t size = map->size;
>+  int ret = 0;
>+  struct vfio_dma *dma;
>+  unsigned long order;
>+  uint64_t mask;
>+
>+  /* Verify that none of our __u64 fields overflow */
>+  if (map->size != size || map->iova != iova)
>+  return -EINVAL;
>+
>+  order =  __ffs(vfio_pgsize_bitmap(iommu));
>+  mask = ((uint64_t)1 << order) - 1;
>+
>+  WARN_ON(mask & PAGE_MASK);
>+
>+  if (!size || (size | iova) & mask)
>+  return -EINVAL;
>+
>+  /* Don't allow IOVA address wrap */
>+  if (iova + size - 1 < iova)
>+  return -EINVAL;
>+
>+  mutex_lock(>lock);
>+
>+  if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>+  ret =  -EEXIST;
>+  goto unlock;
>+  }
>+
>+  dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>+  if (!dma) {
>+  ret = -ENOMEM;
>+  goto unlock;
>+  }
>+
>+  dma->iova = iova;
>+  dma->size = size;
>+  dma->type = VFIO_IOVA_RESERVED;
>+
>+  ret = vfio_set_msi_aperture(iommu, iova, size);
>+  if (ret)
>+  goto free_unlock;
>+
>+  vfio_link_dma(iommu, dma);
>+  goto unlock;
>+
>+free_unlock:
>+  kfree(dma);
>+unlock:
>+  mutex_unlock(>lock);
>+  return ret;
>+}
>+
> static int vfio_bus_type(struct device *dev, void *data)
> {
>   struct bus_type **bus = data;
>@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   } else if (cmd == VFIO_IOMMU_MAP_DMA) {
>   struct vfio_iommu_type1_dma_map map;
>   uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>-  

Re: [PATCH v9 4/7] vfio: allow reserved msi iova registration

2016-05-05 Thread Chalamarla, Tirumalesh





On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" 
 wrote:

>The user is allowed to register a reserved MSI IOVA range by using the
>DMA MAP API and setting the new flag: VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA.
>This region is stored in the vfio_dma rb tree. At that point the iova
>range is not mapped to any target address yet. The host kernel will use
>those iova when needed, typically when MSIs are allocated.
>
>Signed-off-by: Eric Auger 
>Signed-off-by: Bharat Bhushan 
>
>---
>v7 -> v8:
>- use iommu_msi_set_aperture function. There is no notion of
>  unregistration anymore since the reserved msi slot remains
>  until the container gets closed.
>
>v6 -> v7:
>- use iommu_free_reserved_iova_domain
>- convey prot attributes downto dma-reserved-iommu iova domain creation
>- reserved bindings teardown now performed on iommu domain destruction
>- rename VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA into
> VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA
>- change title
>- pass the protection attribute to dma-reserved-iommu API
>
>v3 -> v4:
>- use iommu_alloc/free_reserved_iova_domain exported by dma-reserved-iommu
>- protect vfio_register_reserved_iova_range implementation with
>  CONFIG_IOMMU_DMA_RESERVED
>- handle unregistration by user-space and on vfio_iommu_type1 release
>
>v1 -> v2:
>- set returned value according to alloc_reserved_iova_domain result
>- free the iova domains in case any error occurs
>
>RFC v1 -> v1:
>- takes into account Alex comments, based on
>  [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region:
>- use the existing dma map/unmap ioctl interface with a flag to register
>  a reserved IOVA range. A single reserved iova region is allowed.
>---
> drivers/vfio/vfio_iommu_type1.c | 78 -
> include/uapi/linux/vfio.h   | 10 +-
> 2 files changed, 86 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 94a9916..4d3a6f1 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -36,6 +36,7 @@
> #include 
> #include 
> #include 
>+#include 
> 
> #define DRIVER_VERSION  "0.2"
> #define DRIVER_AUTHOR   "Alex Williamson "
>@@ -445,6 +446,20 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, 
>struct vfio_dma *dma)
>   vfio_lock_acct(-unlocked);
> }
> 
>+static int vfio_set_msi_aperture(struct vfio_iommu *iommu,
>+  dma_addr_t iova, size_t size)
>+{
>+  struct vfio_domain *d;
>+  int ret = 0;
>+
>+  list_for_each_entry(d, >domain_list, next) {
>+  ret = iommu_msi_set_aperture(d->domain, iova, iova + size - 1);
>+  if (ret)
>+  break;
>+  }
>+  return ret;
>+}
>+
> static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
> {
>   vfio_unmap_unpin(iommu, dma);
>@@ -693,6 +708,63 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>   return ret;
> }
> 
>+static int vfio_register_msi_range(struct vfio_iommu *iommu,
>+ struct vfio_iommu_type1_dma_map *map)
>+{
>+  dma_addr_t iova = map->iova;
>+  size_t size = map->size;
>+  int ret = 0;
>+  struct vfio_dma *dma;
>+  unsigned long order;
>+  uint64_t mask;
>+
>+  /* Verify that none of our __u64 fields overflow */
>+  if (map->size != size || map->iova != iova)
>+  return -EINVAL;
>+
>+  order =  __ffs(vfio_pgsize_bitmap(iommu));
>+  mask = ((uint64_t)1 << order) - 1;
>+
>+  WARN_ON(mask & PAGE_MASK);
>+
>+  if (!size || (size | iova) & mask)
>+  return -EINVAL;
>+
>+  /* Don't allow IOVA address wrap */
>+  if (iova + size - 1 < iova)
>+  return -EINVAL;
>+
>+  mutex_lock(>lock);
>+
>+  if (vfio_find_dma(iommu, iova, size, VFIO_IOVA_ANY)) {
>+  ret =  -EEXIST;
>+  goto unlock;
>+  }
>+
>+  dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>+  if (!dma) {
>+  ret = -ENOMEM;
>+  goto unlock;
>+  }
>+
>+  dma->iova = iova;
>+  dma->size = size;
>+  dma->type = VFIO_IOVA_RESERVED;
>+
>+  ret = vfio_set_msi_aperture(iommu, iova, size);
>+  if (ret)
>+  goto free_unlock;
>+
>+  vfio_link_dma(iommu, dma);
>+  goto unlock;
>+
>+free_unlock:
>+  kfree(dma);
>+unlock:
>+  mutex_unlock(>lock);
>+  return ret;
>+}
>+
> static int vfio_bus_type(struct device *dev, void *data)
> {
>   struct bus_type **bus = data;
>@@ -1062,7 +1134,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   } else if (cmd == VFIO_IOMMU_MAP_DMA) {
>   struct vfio_iommu_type1_dma_map map;
>   uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
>-  VFIO_DMA_MAP_FLAG_WRITE;
>+  VFIO_DMA_MAP_FLAG_WRITE |
>+  VFIO_DMA_MAP_FLAG_RESERVED_MSI_IOVA;
> 
> 

Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain

2016-05-05 Thread Chalamarla, Tirumalesh





On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" 
 wrote:

>On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>by the msi controller. vfio_safe_irq_domain allows to check whether
>interrupts are "safe" for a given device. They are if the device does
>not use MSI or if the device uses MSI and the msi-parent controller
>supports IRQ remapping.
>
>Then we check at group level if all devices have safe interrupts: if not,
>we only allow the group to be attached if allow_unsafe_interrupts is set.
>
>At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>changed in next patch.

Will this work in systems with multiple ITS?
>
>Signed-off-by: Eric Auger 
>
>---
>v3 -> v4:
>- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>  and irq_remapping into safe_irq_domains
>
>v2 -> v3:
>- protect vfio_msi_parent_irq_remapping_capable with
>  CONFIG_GENERIC_MSI_IRQ_DOMAIN
>---
> drivers/vfio/vfio_iommu_type1.c | 44 +++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 4d3a6f1..2fc8197 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -37,6 +37,8 @@
> #include 
> #include 
> #include 
>+#include 
>+#include 
> 
> #define DRIVER_VERSION  "0.2"
> #define DRIVER_AUTHOR   "Alex Williamson "
>@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>   return 0;
> }
> 
>+/**
>+ * vfio_safe_irq_domain: returns whether the irq domain
>+ * the device is attached to is safe with respect to MSI isolation.
>+ * If the irq domain is not an MSI domain, we return it is safe.
>+ *
>+ * @dev: device handle
>+ * @data: unused
>+ * returns 0 if the irq domain is safe, -1 if not.
>+ */
>+static int vfio_safe_irq_domain(struct device *dev, void *data)
>+{
>+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>+  struct irq_domain *domain;
>+  struct msi_domain_info *info;
>+
>+  domain = dev_get_msi_domain(dev);
>+  if (!domain)
>+  return 0;
>+
>+  info = msi_get_domain_info(domain);
>+
>+  if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>+  return -1;
>+#endif
>+  return 0;
>+}
>+
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>struct vfio_domain *domain)
> {
>@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>   struct vfio_group *group, *g;
>   struct vfio_domain *domain, *d;
>   struct bus_type *bus = NULL;
>-  int ret;
>+  int ret, safe_irq_domains;
> 
>   mutex_lock(>lock);
> 
>@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> 
>   group->iommu_group = iommu_group;
> 
>+  /*
>+   * Determine if all the devices of the group have a safe irq domain
>+   * with respect to MSI isolation
>+   */
>+  safe_irq_domains = !iommu_group_for_each_dev(iommu_group, ,
>+ vfio_safe_irq_domain);
>+
>   /* Determine bus_type in order to allocate a domain */
>   ret = iommu_group_for_each_dev(iommu_group, , vfio_bus_type);
>   if (ret)
>@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>   INIT_LIST_HEAD(>group_list);
>   list_add(>next, >group_list);
> 
>+  /*
>+   * to advertise safe interrupts either the IOMMU or the MSI controllers
>+   * must support IRQ remapping/interrupt translation
>+   */
>   if (!allow_unsafe_interrupts &&
>-  !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>+  (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>   pr_warn("%s: No interrupt remapping support.  Use the module 
> param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this 
> platform\n",
>  __func__);
>   ret = -EPERM;
>-- 
>1.9.1
>
>
>___
>linux-arm-kernel mailing list
>linux-arm-ker...@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH v9 5/7] vfio/type1: also check IRQ remapping capability at msi domain

2016-05-05 Thread Chalamarla, Tirumalesh





On 5/4/16, 4:54 AM, "linux-arm-kernel on behalf of Eric Auger" 
 wrote:

>On x86 IRQ remapping is abstracted by the IOMMU. On ARM this is abstracted
>by the msi controller. vfio_safe_irq_domain allows to check whether
>interrupts are "safe" for a given device. They are if the device does
>not use MSI or if the device uses MSI and the msi-parent controller
>supports IRQ remapping.
>
>Then we check at group level if all devices have safe interrupts: if not,
>we only allow the group to be attached if allow_unsafe_interrupts is set.
>
>At this point ARM sMMU still advertises IOMMU_CAP_INTR_REMAP. This is
>changed in next patch.

Will this work in systems with multiple ITS?
>
>Signed-off-by: Eric Auger 
>
>---
>v3 -> v4:
>- rename vfio_msi_parent_irq_remapping_capable into vfio_safe_irq_domain
>  and irq_remapping into safe_irq_domains
>
>v2 -> v3:
>- protect vfio_msi_parent_irq_remapping_capable with
>  CONFIG_GENERIC_MSI_IRQ_DOMAIN
>---
> drivers/vfio/vfio_iommu_type1.c | 44 +++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>index 4d3a6f1..2fc8197 100644
>--- a/drivers/vfio/vfio_iommu_type1.c
>+++ b/drivers/vfio/vfio_iommu_type1.c
>@@ -37,6 +37,8 @@
> #include 
> #include 
> #include 
>+#include 
>+#include 
> 
> #define DRIVER_VERSION  "0.2"
> #define DRIVER_AUTHOR   "Alex Williamson "
>@@ -777,6 +779,33 @@ static int vfio_bus_type(struct device *dev, void *data)
>   return 0;
> }
> 
>+/**
>+ * vfio_safe_irq_domain: returns whether the irq domain
>+ * the device is attached to is safe with respect to MSI isolation.
>+ * If the irq domain is not an MSI domain, we return it is safe.
>+ *
>+ * @dev: device handle
>+ * @data: unused
>+ * returns 0 if the irq domain is safe, -1 if not.
>+ */
>+static int vfio_safe_irq_domain(struct device *dev, void *data)
>+{
>+#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
>+  struct irq_domain *domain;
>+  struct msi_domain_info *info;
>+
>+  domain = dev_get_msi_domain(dev);
>+  if (!domain)
>+  return 0;
>+
>+  info = msi_get_domain_info(domain);
>+
>+  if (!(info->flags & MSI_FLAG_IRQ_REMAPPING))
>+  return -1;
>+#endif
>+  return 0;
>+}
>+
> static int vfio_iommu_replay(struct vfio_iommu *iommu,
>struct vfio_domain *domain)
> {
>@@ -870,7 +899,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>   struct vfio_group *group, *g;
>   struct vfio_domain *domain, *d;
>   struct bus_type *bus = NULL;
>-  int ret;
>+  int ret, safe_irq_domains;
> 
>   mutex_lock(>lock);
> 
>@@ -893,6 +922,13 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> 
>   group->iommu_group = iommu_group;
> 
>+  /*
>+   * Determine if all the devices of the group have a safe irq domain
>+   * with respect to MSI isolation
>+   */
>+  safe_irq_domains = !iommu_group_for_each_dev(iommu_group, ,
>+ vfio_safe_irq_domain);
>+
>   /* Determine bus_type in order to allocate a domain */
>   ret = iommu_group_for_each_dev(iommu_group, , vfio_bus_type);
>   if (ret)
>@@ -920,8 +956,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
>   INIT_LIST_HEAD(>group_list);
>   list_add(>next, >group_list);
> 
>+  /*
>+   * to advertise safe interrupts either the IOMMU or the MSI controllers
>+   * must support IRQ remapping/interrupt translation
>+   */
>   if (!allow_unsafe_interrupts &&
>-  !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
>+  (!iommu_capable(bus, IOMMU_CAP_INTR_REMAP) && !safe_irq_domains)) {
>   pr_warn("%s: No interrupt remapping support.  Use the module 
> param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this 
> platform\n",
>  __func__);
>   ret = -EPERM;
>-- 
>1.9.1
>
>
>___
>linux-arm-kernel mailing list
>linux-arm-ker...@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH v9 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes

2016-05-05 Thread Chalamarla, Tirumalesh
Hi Eric,

Does this series supports gicv3-its emulation?
Do we have a tree with all the dependent patches





Thanks,
Tirumalesh. 
On 5/4/16, 8:18 AM, "linux-arm-kernel on behalf of Eric Auger" 
 wrote:

>This series implements the MSI address mapping/unmapping in the MSI layer.
>IOMMU binding happens on pci_enable_msi since this function can sleep and
>return errors. On msi_domain_set_affinity, msi_domain_(de)activate, which
>are not allowed to sleep, we simply look for the already existing binding.
>
>A new MSI domain info flag value is introduced to report whether the msi
>domain implements IRQ remapping. GIC v3 ITS is the first MSI controller
>advertising it. This flag value will be used by VFIO subsystem to
>determine whether MSI forwarding is safe.
>
>More details & context can be found at:
>http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>
>Best Regards
>
>Eric
>
>Applies on top of PART 1/3. Also depends on
>[PATCH 1/3] iommu: Add MMIO mapping type,
>http://comments.gmane.org/gmane.linux.kernel.iommu/12869
>
>Git: complete series available at
>https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9-msi-v9
>
>This branch contains all parts in v9.
>
>History:
>
>v8 -> v9:
>- use a union in irq_chip_msi_doorbell_info + boolean telling whether the
>  doorbell is percpu
>- decouple irq_data parsing from the actual mapping/unmapping in
>  msi_handle_doorbell_mappings
>- fix misc style issues
>
>v7 -> v8:
>take into account Marc's comments:
>- use iommu_msi_msg_pa_to_va with new proto
>- change in irq_chip_msi_doorbell_info struct definition:
>  prot and size became shared between all doorbells and phys_addr_t __percpu
>- cleanups in v2m irqchip
>- eventually did not touch MSI_FLAG_IRQ_REMAPPING naming
>- On msi_handle_doorbell_mappings, stop on the first irqchip where doorbells
>  can be found
>- fix resource deallocation on mapping failure in msi_domain_alloc_irqs
>
>v6 -> v7:
>- do alloc/map handling on pci_enable_msi and search on msi_(de)domain_activate
>- add msi_doorbell_info callback in irq-chip to retrieve the characteristics
>  of doorbells
>
>RFC v5 -> patch v6:
>- split to ease the review process
>- rebase on default iommu domain code (irq_data_to_msi_mapping_domain
>  checks IOMMU_DOMAIN_DMA type)
>- fix unmap sequence on msi_domain_set_affinity (reported by Marc):
>  unmap the previous doorbell when the new one has been mapped & written to
>  the device, ie. irq_chip_write_msi_msg.
>- "msi: msi_compose wrapper removed" following change above
>- add size parameter to iommu_get_reserved_iova API following Marc's request
>
>RFC v4 -> RFC v5:
>- take into account Thomas' comments on MSI related patches
>  - split "msi: IOMMU map the doorbell address when needed"
>  - increase readability and add comments
>  - fix style issues
> - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
> - platform ITS now advertises IOMMU_CAP_INTR_REMAP
> - fix compilation issue with CONFIG_IOMMU API unset
> - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING
>
>RFC v3 -> v4:
>- Move doorbell mapping/unmapping in msi.c
>- fix ref count issue on set_affinity: in case of a change in the address
>  the previous address is decremented
>- doorbell map/unmap now is done on msi composition. Should allow the use
>  case for platform MSI controllers
>- create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
>  to reserved IOVA management (looking like dma-iommu glue)
>- series reordering to ease the review:
>  - first part is related to IOMMU
>  - second related to MSI sub-system
>  - third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
>- expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
>  [this partially addresses Marc's comments on iommu_get/put_single_reserved
>   size/alignment problematic - which I did not ignore - but I don't know
>   how much I can do at the moment]
>
>RFC v2 -> RFC v3:
>- should fix wrong handling of some CONFIG combinations:
>  CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
>- fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)
>
>PATCH v1 -> RFC v2:
>- reverted to RFC since it looks more reasonable ;-) the code is split
>  between VFIO, IOMMU, MSI controller and I am not sure I did the right
>  choices. Also API need to be further discussed.
>- iova API usage in arm-smmu.c.
>- MSI controller natively programs the MSI addr with either the PA or IOVA.
>  This is not done anymore in vfio-pci driver as suggested by Alex.
>- check irq remapping capability of the group
>
>RFC v1 [2] -> PATCH v1:
>- use the existing dma map/unmap ioctl interface with a flag to register a
>  reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
>- a single reserved IOVA contiguous region now is allowed
>- use of an RB tree indexed by PA to store allocated reserved slots
>- use 

Re: [PATCH v9 0/8] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes

2016-05-05 Thread Chalamarla, Tirumalesh
Hi Eric,

Does this series supports gicv3-its emulation?
Do we have a tree with all the dependent patches





Thanks,
Tirumalesh. 
On 5/4/16, 8:18 AM, "linux-arm-kernel on behalf of Eric Auger" 
 wrote:

>This series implements the MSI address mapping/unmapping in the MSI layer.
>IOMMU binding happens on pci_enable_msi since this function can sleep and
>return errors. On msi_domain_set_affinity, msi_domain_(de)activate, which
>are not allowed to sleep, we simply look for the already existing binding.
>
>A new MSI domain info flag value is introduced to report whether the msi
>domain implements IRQ remapping. GIC v3 ITS is the first MSI controller
>advertising it. This flag value will be used by VFIO subsystem to
>determine whether MSI forwarding is safe.
>
>More details & context can be found at:
>http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/
>
>Best Regards
>
>Eric
>
>Applies on top of PART 1/3. Also depends on
>[PATCH 1/3] iommu: Add MMIO mapping type,
>http://comments.gmane.org/gmane.linux.kernel.iommu/12869
>
>Git: complete series available at
>https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.6-rc6-pcie-passthrough-v9-msi-v9
>
>This branch contains all parts in v9.
>
>History:
>
>v8 -> v9:
>- use a union in irq_chip_msi_doorbell_info + boolean telling whether the
>  doorbell is percpu
>- decouple irq_data parsing from the actual mapping/unmapping in
>  msi_handle_doorbell_mappings
>- fix misc style issues
>
>v7 -> v8:
>take into account Marc's comments:
>- use iommu_msi_msg_pa_to_va with new proto
>- change in irq_chip_msi_doorbell_info struct definition:
>  prot and size became shared between all doorbells and phys_addr_t __percpu
>- cleanups in v2m irqchip
>- eventually did not touch MSI_FLAG_IRQ_REMAPPING naming
>- On msi_handle_doorbell_mappings, stop on the first irqchip where doorbells
>  can be found
>- fix resource deallocation on mapping failure in msi_domain_alloc_irqs
>
>v6 -> v7:
>- do alloc/map handling on pci_enable_msi and search on msi_(de)domain_activate
>- add msi_doorbell_info callback in irq-chip to retrieve the characteristics
>  of doorbells
>
>RFC v5 -> patch v6:
>- split to ease the review process
>- rebase on default iommu domain code (irq_data_to_msi_mapping_domain
>  checks IOMMU_DOMAIN_DMA type)
>- fix unmap sequence on msi_domain_set_affinity (reported by Marc):
>  unmap the previous doorbell when the new one has been mapped & written to
>  the device, ie. irq_chip_write_msi_msg.
>- "msi: msi_compose wrapper removed" following change above
>- add size parameter to iommu_get_reserved_iova API following Marc's request
>
>RFC v4 -> RFC v5:
>- take into account Thomas' comments on MSI related patches
>  - split "msi: IOMMU map the doorbell address when needed"
>  - increase readability and add comments
>  - fix style issues
> - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
> - platform ITS now advertises IOMMU_CAP_INTR_REMAP
> - fix compilation issue with CONFIG_IOMMU API unset
> - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING
>
>RFC v3 -> v4:
>- Move doorbell mapping/unmapping in msi.c
>- fix ref count issue on set_affinity: in case of a change in the address
>  the previous address is decremented
>- doorbell map/unmap now is done on msi composition. Should allow the use
>  case for platform MSI controllers
>- create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
>  to reserved IOVA management (looking like dma-iommu glue)
>- series reordering to ease the review:
>  - first part is related to IOMMU
>  - second related to MSI sub-system
>  - third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
>- expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
>  [this partially addresses Marc's comments on iommu_get/put_single_reserved
>   size/alignment problematic - which I did not ignore - but I don't know
>   how much I can do at the moment]
>
>RFC v2 -> RFC v3:
>- should fix wrong handling of some CONFIG combinations:
>  CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
>- fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)
>
>PATCH v1 -> RFC v2:
>- reverted to RFC since it looks more reasonable ;-) the code is split
>  between VFIO, IOMMU, MSI controller and I am not sure I did the right
>  choices. Also API need to be further discussed.
>- iova API usage in arm-smmu.c.
>- MSI controller natively programs the MSI addr with either the PA or IOVA.
>  This is not done anymore in vfio-pci driver as suggested by Alex.
>- check irq remapping capability of the group
>
>RFC v1 [2] -> PATCH v1:
>- use the existing dma map/unmap ioctl interface with a flag to register a
>  reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
>- a single reserved IOVA contiguous region now is allowed
>- use of an RB tree indexed by PA to store allocated reserved slots
>- use of a vfio_domain iova_domain to manage iova allocation within the
>  window 

Re: [PATCH] Revert "arm64: Increase the max granular size"

2016-03-21 Thread Chalamarla, Tirumalesh





On 3/21/16, 10:33 AM, "Catalin Marinas"  wrote:

>On Mon, Mar 21, 2016 at 05:23:01PM +, Will Deacon wrote:
>> On Mon, Mar 21, 2016 at 05:14:03PM +, Catalin Marinas wrote:
>> > diff --git a/arch/arm64/include/asm/cache.h 
>> > b/arch/arm64/include/asm/cache.h
>> > index 5082b30bc2c0..4b5d7b27edaf 100644
>> > --- a/arch/arm64/include/asm/cache.h
>> > +++ b/arch/arm64/include/asm/cache.h
>> > @@ -18,17 +18,17 @@
>> >  
>> >  #include 
>> >  
>> > -#define L1_CACHE_SHIFT7
>> > +#define L1_CACHE_SHIFT6
>> >  #define L1_CACHE_BYTES(1 << L1_CACHE_SHIFT)
>> >  
>> >  /*
>> >   * Memory returned by kmalloc() may be used for DMA, so we must make
>> > - * sure that all such allocations are cache aligned. Otherwise,
>> > - * unrelated code may cause parts of the buffer to be read into the
>> > - * cache before the transfer is done, causing old data to be seen by
>> > - * the CPU.
>> > + * sure that all such allocations are aligned to the maximum *known*
>> > + * cache line size on ARMv8 systems. Otherwise, unrelated code may cause
>> > + * parts of the buffer to be read into the cache before the transfer is
>> > + * done, causing old data to be seen by the CPU.
>> >   */
>> > -#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
>> > +#define ARCH_DMA_MINALIGN (128)
>> 
>> Does this actually fix the reported iperf regression? My assumption was
>> that ARCH_DMA_MINALIGN is the problem, but I could be wrong.
>
>I can't tell. But since I haven't seen any better explanation in this
>thread yet, I hope that at least someone would try this patch and come
>back with numbers.
>
>For networking, SKB_DATA_ALIGN() uses SMP_CACHE_BYTES (== L1_CACHE_BYTES).
>I think (hope) this alignment is not meant for non-coherent DMA,
>otherwise using SMP_CACHE_BYTES wouldn't make sense.

As I see the problem, may be it is because of fewer number of buffers available 
because of this alignment requirement.
But that should be fixed by making slab alignment a run time thing. I may be 
totally wrong. 

We are still coming up with a decent benchmark that shows degradation.  
>
>-- 
>Catalin


Re: [PATCH] Revert "arm64: Increase the max granular size"

2016-03-21 Thread Chalamarla, Tirumalesh





On 3/21/16, 10:33 AM, "Catalin Marinas"  wrote:

>On Mon, Mar 21, 2016 at 05:23:01PM +, Will Deacon wrote:
>> On Mon, Mar 21, 2016 at 05:14:03PM +, Catalin Marinas wrote:
>> > diff --git a/arch/arm64/include/asm/cache.h 
>> > b/arch/arm64/include/asm/cache.h
>> > index 5082b30bc2c0..4b5d7b27edaf 100644
>> > --- a/arch/arm64/include/asm/cache.h
>> > +++ b/arch/arm64/include/asm/cache.h
>> > @@ -18,17 +18,17 @@
>> >  
>> >  #include 
>> >  
>> > -#define L1_CACHE_SHIFT7
>> > +#define L1_CACHE_SHIFT6
>> >  #define L1_CACHE_BYTES(1 << L1_CACHE_SHIFT)
>> >  
>> >  /*
>> >   * Memory returned by kmalloc() may be used for DMA, so we must make
>> > - * sure that all such allocations are cache aligned. Otherwise,
>> > - * unrelated code may cause parts of the buffer to be read into the
>> > - * cache before the transfer is done, causing old data to be seen by
>> > - * the CPU.
>> > + * sure that all such allocations are aligned to the maximum *known*
>> > + * cache line size on ARMv8 systems. Otherwise, unrelated code may cause
>> > + * parts of the buffer to be read into the cache before the transfer is
>> > + * done, causing old data to be seen by the CPU.
>> >   */
>> > -#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
>> > +#define ARCH_DMA_MINALIGN (128)
>> 
>> Does this actually fix the reported iperf regression? My assumption was
>> that ARCH_DMA_MINALIGN is the problem, but I could be wrong.
>
>I can't tell. But since I haven't seen any better explanation in this
>thread yet, I hope that at least someone would try this patch and come
>back with numbers.
>
>For networking, SKB_DATA_ALIGN() uses SMP_CACHE_BYTES (== L1_CACHE_BYTES).
>I think (hope) this alignment is not meant for non-coherent DMA,
>otherwise using SMP_CACHE_BYTES wouldn't make sense.

As I see the problem, may be it is because of fewer number of buffers available 
because of this alignment requirement.
But that should be fixed by making slab alignment a run time thing. I may be 
totally wrong. 

We are still coming up with a decent benchmark that shows degradation.  
>
>-- 
>Catalin


Re: [PATCH] Revert "arm64: Increase the max granular size"

2016-03-19 Thread Chalamarla, Tirumalesh





On 3/16/16, 2:32 AM, "linux-arm-kernel on behalf of Ganesh Mahendran" 
 wrote:

>Reverts commit 97303480753e ("arm64: Increase the max granular size").
>
>The commit 97303480753e ("arm64: Increase the max granular size") will
>degrade system performente in some cpus.
>
>We test wifi network throughput with iperf on Qualcomm msm8996 CPU:
>
>run on host:
>  # iperf -s
>run on device:
>  # iperf -c  -t 100 -i 1
>
>
>Test result:
>
>with commit 97303480753e ("arm64: Increase the max granular size"):
>172MBits/sec
>
>without commit 97303480753e ("arm64: Increase the max granular size"):
>230MBits/sec
>
>
>Some module like slab/net will use the L1_CACHE_SHIFT, so if we do not
>set the parameter correctly, it may affect the system performance.
>
>So revert the commit.

Is there any explanation why is this so? May be there is an alternative to 
this, apart from reverting the commit.

Until now it seems L1_CACHE_SHIFT is the max of supported chips. But now we are 
making it 64byte, is there any reason why not 32. 

Thanks,
Tirumalesh. 
>
>Cc: sta...@vger.kernel.org
>Signed-off-by: Ganesh Mahendran 
>---
> arch/arm64/include/asm/cache.h |2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>index 5082b30..bde4499 100644
>--- a/arch/arm64/include/asm/cache.h
>+++ b/arch/arm64/include/asm/cache.h
>@@ -18,7 +18,7 @@
> 
> #include 
> 
>-#define L1_CACHE_SHIFT7
>+#define L1_CACHE_SHIFT6
> #define L1_CACHE_BYTES(1 << L1_CACHE_SHIFT)
> 
> /*
>-- 
>1.7.9.5
>
>
>___
>linux-arm-kernel mailing list
>linux-arm-ker...@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH] Revert "arm64: Increase the max granular size"

2016-03-19 Thread Chalamarla, Tirumalesh





On 3/16/16, 2:32 AM, "linux-arm-kernel on behalf of Ganesh Mahendran" 
 wrote:

>Reverts commit 97303480753e ("arm64: Increase the max granular size").
>
>The commit 97303480753e ("arm64: Increase the max granular size") will
>degrade system performente in some cpus.
>
>We test wifi network throughput with iperf on Qualcomm msm8996 CPU:
>
>run on host:
>  # iperf -s
>run on device:
>  # iperf -c  -t 100 -i 1
>
>
>Test result:
>
>with commit 97303480753e ("arm64: Increase the max granular size"):
>172MBits/sec
>
>without commit 97303480753e ("arm64: Increase the max granular size"):
>230MBits/sec
>
>
>Some module like slab/net will use the L1_CACHE_SHIFT, so if we do not
>set the parameter correctly, it may affect the system performance.
>
>So revert the commit.

Is there any explanation why is this so? May be there is an alternative to 
this, apart from reverting the commit.

Until now it seems L1_CACHE_SHIFT is the max of supported chips. But now we are 
making it 64byte, is there any reason why not 32. 

Thanks,
Tirumalesh. 
>
>Cc: sta...@vger.kernel.org
>Signed-off-by: Ganesh Mahendran 
>---
> arch/arm64/include/asm/cache.h |2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>index 5082b30..bde4499 100644
>--- a/arch/arm64/include/asm/cache.h
>+++ b/arch/arm64/include/asm/cache.h
>@@ -18,7 +18,7 @@
> 
> #include 
> 
>-#define L1_CACHE_SHIFT7
>+#define L1_CACHE_SHIFT6
> #define L1_CACHE_BYTES(1 << L1_CACHE_SHIFT)
> 
> /*
>-- 
>1.7.9.5
>
>
>___
>linux-arm-kernel mailing list
>linux-arm-ker...@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH 2/3] Docs: dt: Add PCI MSI map bindings

2015-07-24 Thread Chalamarla, Tirumalesh
looks good. possible to describe the chip we have.
> On Jul 23, 2015, at 9:52 AM, Mark Rutland  wrote:
> 
> Currently msi-parent is used by a few bindings to describe the
> relationship between a PCI root complex and a single MSI controller, but
> this property does not have a generic binding document.
> 
> Additionally, msi-parent is insufficient to describe more complex
> relationships between MSI controllers and devices under a root complex,
> where devices may be able to target multiple MSI controllers, or where
> MSI controllers use (non-probeable) sideband information to distinguish
> devices.
> 
> This patch adds a generic binding for mapping PCI devices to MSI
> controllers. This document covers msi-parent, and a new msi-map property
> (specific to PCI*) which may be used to map devices (identified by their
> Requester ID) to sideband data for each MSI controller that they may
> target.
> 
> Signed-off-by: Mark Rutland 
> ---
> Documentation/devicetree/bindings/pci/pci-msi.txt | 220 ++
> 1 file changed, 220 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/pci/pci-msi.txt
> 
> diff --git a/Documentation/devicetree/bindings/pci/pci-msi.txt 
> b/Documentation/devicetree/bindings/pci/pci-msi.txt
> new file mode 100644
> index 000..9b3cc81
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pci/pci-msi.txt
> @@ -0,0 +1,220 @@
> +This document describes the generic device tree binding for describing the
> +relationship between PCI devices and MSI controllers.
> +
> +Each PCI device under a root complex is uniquely identified by its Requester 
> ID
> +(AKA RID). A Requester ID is a triplet of a Bus number, Device number, and
> +Function number.
> +
> +For the purpose of this document, when treated as a numeric value, a RID is
> +formatted such that:
> +
> +* Bits [15:8] are the Bus number.
> +* Bits [7:3] are the Device number.
> +* Bits [2:0] are the Function number.
> +* Any other bits required for padding must be zero.
> +
> +MSIs may be distinguished in part through the use of sideband data 
> accompanying
> +writes. In the case of PCI devices, this sideband data may be derived from 
> the
> +Requester ID. A mechanism is required to associate a device with both the MSI
> +controllers it can address, and the sideband data that will be associated 
> with
> +its writes to those controllers.
> +
> +For generic MSI bindings, see
> +Documentation/devicetree/bindings/interrupt-controller/msi.txt.
> +
> +
> +PCI root complex
> +
> +
> +Optional properties
> +---
> +
> +- msi-map: Maps a Requester ID to an MSI controller and associated
> +  msi-specifier data. The property is an arbitrary number of tuples of
> +  (rid-base,msi-controller,msi-base,length), where:
> +
> +  * rid-base is a single cell describing the first RID matched by the entry.
> +
> +  * msi-controller is a single phandle to an MSI controller
> +
> +  * msi-base is an msi-specifier describing the msi-specifier produced for 
> the
> +first RID matched by the entry.
> +
> +  * length is a single cell describing how many consecutive RIDs are matched
> +following the rid-base.
> +
> +  Any RID r in the interval [rid-base, rid-base + length) is associated with
> +  the listed msi-controller, with the msi-specifier (r - rid-base + 
> msi-base).
> +
> +- msi-map-mask: A mask to be applied to each Requester ID prior to being 
> mapped
> +  to an msi-specifier per the msi-map property.
> +
> +- msi-parent: Describes the MSI parent of the root complex itself. Where
> +  the root complex and MSI controller do not pass sideband data with MSI
> +  writes, this property may be used to describe the MSI controller(s)
> +  used by PCI devices under the root complex, if defined as such in the
> +  binding for the root complex.
> +
> +
> +Example (1)
> +===
> +
> +/ {
> + #address-cells = <1>;
> + #size-cells = <1>;
> +
> + msi: msi-controller@a {
> + reg = <0xa 0x1>;
> + compatible = "vendor,some-controller";
> + msi-controller;
> + #msi-cells = <1>;
> + };
> +
> + pci: pci@f {
> + reg = <0xf 0x1>;
> + compatible = "vendor,pcie-root-complex";
> + device_type = "pci";
> +
> + /*
> +  * The sideband data provided to the MSI controller is
> +  * the RID, identity-mapped.
> +  */
> + msi-map = <0x0 _a 0x0 0x1>,
> + };
> +};
> +
> +
> +Example (2)
> +===
> +
> +/ {
> + #address-cells = <1>;
> + #size-cells = <1>;
> +
> + msi: msi-controller@a {
> + reg = <0xa 0x1>;
> + compatible = "vendor,some-controller";
> + msi-controller;
> + #msi-cells = <1>;
> + };
> +
> + pci: pci@f {
> + reg = <0xf 0x1>;
> + compatible = "vendor,pcie-root-complex";
> + device_type = "pci";
> +
> + /*
> 

Re: [PATCH 2/3] Docs: dt: Add PCI MSI map bindings

2015-07-24 Thread Chalamarla, Tirumalesh
looks good. possible to describe the chip we have.
 On Jul 23, 2015, at 9:52 AM, Mark Rutland mark.rutl...@arm.com wrote:
 
 Currently msi-parent is used by a few bindings to describe the
 relationship between a PCI root complex and a single MSI controller, but
 this property does not have a generic binding document.
 
 Additionally, msi-parent is insufficient to describe more complex
 relationships between MSI controllers and devices under a root complex,
 where devices may be able to target multiple MSI controllers, or where
 MSI controllers use (non-probeable) sideband information to distinguish
 devices.
 
 This patch adds a generic binding for mapping PCI devices to MSI
 controllers. This document covers msi-parent, and a new msi-map property
 (specific to PCI*) which may be used to map devices (identified by their
 Requester ID) to sideband data for each MSI controller that they may
 target.
 
 Signed-off-by: Mark Rutland mark.rutl...@arm.com
 ---
 Documentation/devicetree/bindings/pci/pci-msi.txt | 220 ++
 1 file changed, 220 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/pci-msi.txt
 
 diff --git a/Documentation/devicetree/bindings/pci/pci-msi.txt 
 b/Documentation/devicetree/bindings/pci/pci-msi.txt
 new file mode 100644
 index 000..9b3cc81
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/pci/pci-msi.txt
 @@ -0,0 +1,220 @@
 +This document describes the generic device tree binding for describing the
 +relationship between PCI devices and MSI controllers.
 +
 +Each PCI device under a root complex is uniquely identified by its Requester 
 ID
 +(AKA RID). A Requester ID is a triplet of a Bus number, Device number, and
 +Function number.
 +
 +For the purpose of this document, when treated as a numeric value, a RID is
 +formatted such that:
 +
 +* Bits [15:8] are the Bus number.
 +* Bits [7:3] are the Device number.
 +* Bits [2:0] are the Function number.
 +* Any other bits required for padding must be zero.
 +
 +MSIs may be distinguished in part through the use of sideband data 
 accompanying
 +writes. In the case of PCI devices, this sideband data may be derived from 
 the
 +Requester ID. A mechanism is required to associate a device with both the MSI
 +controllers it can address, and the sideband data that will be associated 
 with
 +its writes to those controllers.
 +
 +For generic MSI bindings, see
 +Documentation/devicetree/bindings/interrupt-controller/msi.txt.
 +
 +
 +PCI root complex
 +
 +
 +Optional properties
 +---
 +
 +- msi-map: Maps a Requester ID to an MSI controller and associated
 +  msi-specifier data. The property is an arbitrary number of tuples of
 +  (rid-base,msi-controller,msi-base,length), where:
 +
 +  * rid-base is a single cell describing the first RID matched by the entry.
 +
 +  * msi-controller is a single phandle to an MSI controller
 +
 +  * msi-base is an msi-specifier describing the msi-specifier produced for 
 the
 +first RID matched by the entry.
 +
 +  * length is a single cell describing how many consecutive RIDs are matched
 +following the rid-base.
 +
 +  Any RID r in the interval [rid-base, rid-base + length) is associated with
 +  the listed msi-controller, with the msi-specifier (r - rid-base + 
 msi-base).
 +
 +- msi-map-mask: A mask to be applied to each Requester ID prior to being 
 mapped
 +  to an msi-specifier per the msi-map property.
 +
 +- msi-parent: Describes the MSI parent of the root complex itself. Where
 +  the root complex and MSI controller do not pass sideband data with MSI
 +  writes, this property may be used to describe the MSI controller(s)
 +  used by PCI devices under the root complex, if defined as such in the
 +  binding for the root complex.
 +
 +
 +Example (1)
 +===
 +
 +/ {
 + #address-cells = 1;
 + #size-cells = 1;
 +
 + msi: msi-controller@a {
 + reg = 0xa 0x1;
 + compatible = vendor,some-controller;
 + msi-controller;
 + #msi-cells = 1;
 + };
 +
 + pci: pci@f {
 + reg = 0xf 0x1;
 + compatible = vendor,pcie-root-complex;
 + device_type = pci;
 +
 + /*
 +  * The sideband data provided to the MSI controller is
 +  * the RID, identity-mapped.
 +  */
 + msi-map = 0x0 msi_a 0x0 0x1,
 + };
 +};
 +
 +
 +Example (2)
 +===
 +
 +/ {
 + #address-cells = 1;
 + #size-cells = 1;
 +
 + msi: msi-controller@a {
 + reg = 0xa 0x1;
 + compatible = vendor,some-controller;
 + msi-controller;
 + #msi-cells = 1;
 + };
 +
 + pci: pci@f {
 + reg = 0xf 0x1;
 + compatible = vendor,pcie-root-complex;
 + device_type = pci;
 +
 + /*
 +  * The sideband data provided to the MSI controller is
 +  * the RID, masked to only the device and function 

Re: [PATCH] arm64: KVM: Enable minimalistic support for Thunder

2015-07-16 Thread Chalamarla, Tirumalesh
the discussion on the generic cpu seems to be stuck, is there a possibility to 
take this patch. so that Thunder has a chance to run some KVM.

Thanks,
Tirumalesh.  
> On Jun 29, 2015, at 10:11 AM, Marc Zyngier  wrote:
> 
> On 29/06/15 18:06, Chalamarla, Tirumalesh wrote:
>> 
>>> On Jun 29, 2015, at 1:53 AM, Marc Zyngier  wrote:
>>> 
>>> On 26/06/15 20:51, Tirumalesh Chalamarla wrote:
>>>> In order to allow KVM to run on Thunder implementations, add the
>>>> minimal support required.
>>>> 
>>>> Signed-off-by: Tirumalesh Chalamarla 
>>> 
>>> CCing the KVM/ARM maintainers should be the first course of action.
>>> 
>> thanks. 
>> 
>>> Also, you may want to try Suzuki's patch instead:
>>> 
>>> http://www.spinics.net/lists/kvm/msg117703.html
>>> 
>> will try with this.
>> 
>>> Constantly adding new CPUs without providing any insight as to how they
>>> should be emulated only brings churn, and not much benefit.
>>> 
>> we are not planning(at this point) to emulate Thunder with QEMU/others.   
> 
> Fair enough. If cross-cpu VM migration is none of your concern, then
> Suzuki's patch should be enough.
> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: KVM: Enable minimalistic support for Thunder

2015-07-16 Thread Chalamarla, Tirumalesh
the discussion on the generic cpu seems to be stuck, is there a possibility to 
take this patch. so that Thunder has a chance to run some KVM.

Thanks,
Tirumalesh.  
 On Jun 29, 2015, at 10:11 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 
 On 29/06/15 18:06, Chalamarla, Tirumalesh wrote:
 
 On Jun 29, 2015, at 1:53 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 
 On 26/06/15 20:51, Tirumalesh Chalamarla wrote:
 In order to allow KVM to run on Thunder implementations, add the
 minimal support required.
 
 Signed-off-by: Tirumalesh Chalamarla tchalama...@caviumnetworks.com
 
 CCing the KVM/ARM maintainers should be the first course of action.
 
 thanks. 
 
 Also, you may want to try Suzuki's patch instead:
 
 http://www.spinics.net/lists/kvm/msg117703.html
 
 will try with this.
 
 Constantly adding new CPUs without providing any insight as to how they
 should be emulated only brings churn, and not much benefit.
 
 we are not planning(at this point) to emulate Thunder with QEMU/others.   
 
 Fair enough. If cross-cpu VM migration is none of your concern, then
 Suzuki's patch should be enough.
 
 Thanks,
 
   M.
 -- 
 Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] GICv3: Add ITS entry to THUNDER dts

2015-07-02 Thread Chalamarla, Tirumalesh
Hi Catalin,

is it possible to pull this for 4.2?

Thanks,
Tirumalesh.  
> On Jun 26, 2015, at 12:12 PM, Tirumalesh Chalamarla 
>  wrote:
> 
> From: Tirumalesh Chalamarla 
> 
> The PCIe host controller uses MSIs provided by GICv3 ITS. Enable it on
> Thunder SoCs by adding an entry to DT.
> 
> Signed-off-by: Tirumalesh Chalamarla 
> Acked-by: Marc Zyngier 
> ---
> arch/arm64/boot/dts/cavium/thunder-88xx.dtsi | 9 +
> 1 file changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi 
> b/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
> index d8c0bdc..9cb7cf9 100644
> --- a/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
> +++ b/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
> @@ -376,10 +376,19 @@
>   gic0: interrupt-controller@8010, {
>   compatible = "arm,gic-v3";
>   #interrupt-cells = <3>;
> + #address-cells = <2>;
> + #size-cells = <2>;
> + ranges;
>   interrupt-controller;
>   reg = <0x8010 0x 0x0 0x01>, /* GICD */
> <0x8010 0x8000 0x0 0x60>; /* GICR */
>   interrupts = <1 9 0xf04>;
> +
> + its: gic-its@8010,0002 {
> + compatible = "arm,gic-v3-its";
> + msi-controller;
> + reg = <0x8010 0x2 0x0 0x20>;
> + };
>   };
> 
>   uaa0: serial@87e0,2400 {
> -- 
> 2.1.0
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] GICv3: Add ITS entry to THUNDER dts

2015-07-02 Thread Chalamarla, Tirumalesh
Hi Catalin,

is it possible to pull this for 4.2?

Thanks,
Tirumalesh.  
 On Jun 26, 2015, at 12:12 PM, Tirumalesh Chalamarla 
 tchalama...@caviumnetworks.com wrote:
 
 From: Tirumalesh Chalamarla tchalama...@cavium.com
 
 The PCIe host controller uses MSIs provided by GICv3 ITS. Enable it on
 Thunder SoCs by adding an entry to DT.
 
 Signed-off-by: Tirumalesh Chalamarla tchalama...@cavium.com
 Acked-by: Marc Zyngier marc.zyng...@arm.com
 ---
 arch/arm64/boot/dts/cavium/thunder-88xx.dtsi | 9 +
 1 file changed, 9 insertions(+)
 
 diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi 
 b/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
 index d8c0bdc..9cb7cf9 100644
 --- a/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
 +++ b/arch/arm64/boot/dts/cavium/thunder-88xx.dtsi
 @@ -376,10 +376,19 @@
   gic0: interrupt-controller@8010, {
   compatible = arm,gic-v3;
   #interrupt-cells = 3;
 + #address-cells = 2;
 + #size-cells = 2;
 + ranges;
   interrupt-controller;
   reg = 0x8010 0x 0x0 0x01, /* GICD */
 0x8010 0x8000 0x0 0x60; /* GICR */
   interrupts = 1 9 0xf04;
 +
 + its: gic-its@8010,0002 {
 + compatible = arm,gic-v3-its;
 + msi-controller;
 + reg = 0x8010 0x2 0x0 0x20;
 + };
   };
 
   uaa0: serial@87e0,2400 {
 -- 
 2.1.0
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: KVM: Enable minimalistic support for Thunder

2015-06-29 Thread Chalamarla, Tirumalesh

> On Jun 29, 2015, at 1:53 AM, Marc Zyngier  wrote:
> 
> On 26/06/15 20:51, Tirumalesh Chalamarla wrote:
>> In order to allow KVM to run on Thunder implementations, add the
>> minimal support required.
>> 
>> Signed-off-by: Tirumalesh Chalamarla 
> 
> CCing the KVM/ARM maintainers should be the first course of action.
> 
thanks. 

> Also, you may want to try Suzuki's patch instead:
> 
> http://www.spinics.net/lists/kvm/msg117703.html
> 
will try with this.

> Constantly adding new CPUs without providing any insight as to how they
> should be emulated only brings churn, and not much benefit.
> 
we are not planning(at this point) to emulate Thunder with QEMU/others.   

> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: KVM: Enable minimalistic support for Thunder

2015-06-29 Thread Chalamarla, Tirumalesh

 On Jun 29, 2015, at 1:53 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 
 On 26/06/15 20:51, Tirumalesh Chalamarla wrote:
 In order to allow KVM to run on Thunder implementations, add the
 minimal support required.
 
 Signed-off-by: Tirumalesh Chalamarla tchalama...@caviumnetworks.com
 
 CCing the KVM/ARM maintainers should be the first course of action.
 
thanks. 

 Also, you may want to try Suzuki's patch instead:
 
 http://www.spinics.net/lists/kvm/msg117703.html
 
will try with this.

 Constantly adding new CPUs without providing any insight as to how they
 should be emulated only brings churn, and not much benefit.
 
we are not planning(at this point) to emulate Thunder with QEMU/others.   

 Thanks,
 
   M.
 -- 
 Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] GICv3: Add ITS entry to THUNDER dts

2015-06-25 Thread Chalamarla, Tirumalesh
Hi Marc,

Any feed back on this. 
Do you want me to submit this as a separate patch?so that it is easy for 
getting acceptance. 

Thanks,
Tirumalesh. 

> On Sep 24, 2014, at 8:37 AM, Robert Richter  wrote:
> 
> From: Tirumalesh Chalamarla 
> 
> The PCIe host controller uses MSIs provided by GICv3 ITS. Enable it on
> Thunder SoCs by adding an entry to DT.
> 
> Signed-off-by: Tirumalesh Chalamarla 
> Signed-off-by: Robert Richter 
> ---
> arch/arm64/boot/dts/thunder-88xx.dtsi | 9 +
> 1 file changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/thunder-88xx.dtsi 
> b/arch/arm64/boot/dts/thunder-88xx.dtsi
> index d8c0bdc51882..9cb7cf94284a 100644
> --- a/arch/arm64/boot/dts/thunder-88xx.dtsi
> +++ b/arch/arm64/boot/dts/thunder-88xx.dtsi
> @@ -376,10 +376,19 @@
>   gic0: interrupt-controller@8010, {
>   compatible = "arm,gic-v3";
>   #interrupt-cells = <3>;
> + #address-cells = <2>;
> + #size-cells = <2>;
> + ranges;
>   interrupt-controller;
>   reg = <0x8010 0x 0x0 0x01>, /* GICD */
> <0x8010 0x8000 0x0 0x60>; /* GICR */
>   interrupts = <1 9 0xf04>;
> +
> + its: gic-its@8010,0002 {
> + compatible = "arm,gic-v3-its";
> + msi-controller;
> + reg = <0x8010 0x2 0x0 0x20>;
> + };
>   };
> 
>   uaa0: serial@87e0,2400 {
> -- 
> 2.1.0
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] GICv3: Add ITS entry to THUNDER dts

2015-06-25 Thread Chalamarla, Tirumalesh
Hi Marc,

Any feed back on this. 
Do you want me to submit this as a separate patch?so that it is easy for 
getting acceptance. 

Thanks,
Tirumalesh. 

 On Sep 24, 2014, at 8:37 AM, Robert Richter r...@kernel.org wrote:
 
 From: Tirumalesh Chalamarla tchalama...@cavium.com
 
 The PCIe host controller uses MSIs provided by GICv3 ITS. Enable it on
 Thunder SoCs by adding an entry to DT.
 
 Signed-off-by: Tirumalesh Chalamarla tchalama...@cavium.com
 Signed-off-by: Robert Richter rrich...@cavium.com
 ---
 arch/arm64/boot/dts/thunder-88xx.dtsi | 9 +
 1 file changed, 9 insertions(+)
 
 diff --git a/arch/arm64/boot/dts/thunder-88xx.dtsi 
 b/arch/arm64/boot/dts/thunder-88xx.dtsi
 index d8c0bdc51882..9cb7cf94284a 100644
 --- a/arch/arm64/boot/dts/thunder-88xx.dtsi
 +++ b/arch/arm64/boot/dts/thunder-88xx.dtsi
 @@ -376,10 +376,19 @@
   gic0: interrupt-controller@8010, {
   compatible = arm,gic-v3;
   #interrupt-cells = 3;
 + #address-cells = 2;
 + #size-cells = 2;
 + ranges;
   interrupt-controller;
   reg = 0x8010 0x 0x0 0x01, /* GICD */
 0x8010 0x8000 0x0 0x60; /* GICR */
   interrupts = 1 9 0xf04;
 +
 + its: gic-its@8010,0002 {
 + compatible = arm,gic-v3-its;
 + msi-controller;
 + reg = 0x8010 0x2 0x0 0x20;
 + };
   };
 
   uaa0: serial@87e0,2400 {
 -- 
 2.1.0
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id

2015-05-22 Thread Chalamarla, Tirumalesh

> On May 22, 2015, at 1:26 AM, Marc Zyngier  wrote:
> 
> On 20/05/15 13:48, Robert Richter wrote:
>> Mark,
>> 
>> thanks for review, also of the other patches of this series.
>> 
>> See below
>> 
>> On 20.05.15 13:11:38, Marc Zyngier wrote:
 -  dev_alias->dev_id = alias;
 +  dev_alias->dev_id = (pci_domain_nr(pdev->bus) << 16) | alias;
>> 
>>> This feels very scary. We're now assuming that the domain number will
>>> always be presented to the doorbell. What guarantee do we have that
>>> this is always the case, irrespective of the platform?
>>> 
>>> Also, domains have no PCI reality, they are a Linux thing. And they can
>>> be "randomly" assigned, unless you force the domain in DT with a
>>> linux,pci-domain property. This looks even more wrong, specially
>>> considering ACPI.
>> 
>> The main problem here is that device ids (32 bits) are system
>> specific. Since we have more than one PCI root complex we need the
>> upper 16 bits in the devid for mapping. Using pci_domain_nr for this
>> fits our needs for now and shouldn't affect systems with a single RC
>> only as the domain nr is zero then.
>> 
>> The domain number is incremented during initialization beginnig with
>> zero and the order of it is fixed since it is taken from DT or ACPI
>> tables. So we have full controll of it. I don't see issues here.
> 
> This may match what you have on ThunderX (as long as the kernel doesn't
> adopt another behaviour when allocating the domain number). But other
> platforms may have a completely different numbering, which will mess
> them up entirely.
> 
>>> It really feels like we need a way to describe how the BDF numbering is
>>> augmented. We also need to guarantee that we get the actual bridge
>>> number, as opposed to the domain number.
>> 
>> But true, the obove is just intermediate. In the end we need some sort
>> of handler that is setup during cpu initialization that registers a
>> callback for the gic to determine the device id of that paricular
>> system.
> 
> I don't really like the idea of a callback from the GIC - I'd prefer it
> to be standalone, and rely on the topology information to build the
> DeviceID. Mark Rutland had some ideas for DT (he posted an RFC a while
> ago), maybe it would be good to get back to that and find out what we
> can do. ACPI should also have similar information (IORT?).
> 

How can some one pass this from DT, especially in GIC entry. i still think it 
is bus owner responsibility and call back is better idea. 
but if some one has a better idea for DT and ACPI, we are fine as long as it 
works on ThunderX.   

Thanks,
Tirumalesh. 


> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] arm64: gicv3: its: Encode domain number in PCI stream id

2015-05-22 Thread Chalamarla, Tirumalesh

 On May 22, 2015, at 1:26 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 
 On 20/05/15 13:48, Robert Richter wrote:
 Mark,
 
 thanks for review, also of the other patches of this series.
 
 See below
 
 On 20.05.15 13:11:38, Marc Zyngier wrote:
 -  dev_alias-dev_id = alias;
 +  dev_alias-dev_id = (pci_domain_nr(pdev-bus)  16) | alias;
 
 This feels very scary. We're now assuming that the domain number will
 always be presented to the doorbell. What guarantee do we have that
 this is always the case, irrespective of the platform?
 
 Also, domains have no PCI reality, they are a Linux thing. And they can
 be randomly assigned, unless you force the domain in DT with a
 linux,pci-domain property. This looks even more wrong, specially
 considering ACPI.
 
 The main problem here is that device ids (32 bits) are system
 specific. Since we have more than one PCI root complex we need the
 upper 16 bits in the devid for mapping. Using pci_domain_nr for this
 fits our needs for now and shouldn't affect systems with a single RC
 only as the domain nr is zero then.
 
 The domain number is incremented during initialization beginnig with
 zero and the order of it is fixed since it is taken from DT or ACPI
 tables. So we have full controll of it. I don't see issues here.
 
 This may match what you have on ThunderX (as long as the kernel doesn't
 adopt another behaviour when allocating the domain number). But other
 platforms may have a completely different numbering, which will mess
 them up entirely.
 
 It really feels like we need a way to describe how the BDF numbering is
 augmented. We also need to guarantee that we get the actual bridge
 number, as opposed to the domain number.
 
 But true, the obove is just intermediate. In the end we need some sort
 of handler that is setup during cpu initialization that registers a
 callback for the gic to determine the device id of that paricular
 system.
 
 I don't really like the idea of a callback from the GIC - I'd prefer it
 to be standalone, and rely on the topology information to build the
 DeviceID. Mark Rutland had some ideas for DT (he posted an RFC a while
 ago), maybe it would be good to get back to that and find out what we
 can do. ACPI should also have similar information (IORT?).
 

How can some one pass this from DT, especially in GIC entry. i still think it 
is bus owner responsibility and call back is better idea. 
but if some one has a better idea for DT and ACPI, we are fine as long as it 
works on ThunderX.   

Thanks,
Tirumalesh. 


 Thanks,
 
   M.
 -- 
 Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-27 Thread Chalamarla, Tirumalesh
Marc,

 What is your opinion on ITS emulation . is it should be part of KVM or 
VFIO.
Also this code needs to depend on ITS host driver a lot, Host ITS 
driver needs to have an interface for this code to use.

Thanks,
 Tirumalesh
 
-Original Message-
From: Will Deacon [mailto:will.dea...@arm.com] 
Sent: Friday, June 27, 2014 1:47 AM
To: Alex Williamson
Cc: Chalamarla, Tirumalesh; Joerg Roedel; k...@vger.kernel.org; open list; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER; marc.zyng...@arm.com
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Thu, Jun 26, 2014 at 08:36:24PM +0100, Alex Williamson wrote:
> On Thu, 2014-06-26 at 19:10 +0000, Chalamarla, Tirumalesh wrote:
> > Thanks for the clarification Alex, That’s exactly my point, why are 
> > we relying on  QEMU or something else to emulate the MSI space when 
> > we can directly give access to devices using ITS (of course with a 
> > small emulation code).  This way we are also benefited from all ITS 
> > services like VCPU migration etc.
> 
> I have no idea what ITS is.

ITS is the MSI doorbell for GICv3 (ARM's latest interrupt controller).

I agree that we will need an ITS emulation if we want to use MSIs in the guest, 
and I believe that Marc (CC'd) had already started thinking about that.


> > What about non QEMU VFIO users, for example, if I wanted to use VFIO to
> > assign a device to a user process I don't need to depend on QEMU.   I
> > thought this is one of the main goals of vfio to make it independent of
> > hypervisors. 
> 
> Where did QEMU become a requirement?  Maybe I'm missing something in 
> the ARM part of the conversation that got chopped off, but this is 
> exactly why we have the VFIO/QEMU split that we do.  VFIO provides 
> basic virtualization for config space and restricts access to other 
> areas that users shouldn't be allowed to change.  QEMU is just one 
> example of a userspace VFIO driver.  QEMU takes the decomposed device 
> exposed through the VFIO ABI and re-creates a PCI device out of it.  
> VFIO itself has no dependency on QEMU.  Thanks,

I also don't understand the QEMU part here. The MSI emulation would be in the 
kernel, just like the GICv2 emulation that we already have. For userspace 
drivers, wouldn't you just use eventfd rather than bother with emulating MSIs?

Finally, the interrupt remapping part is about the SMMU preventing MSI writes 
to arbitrary portions of the host address space. The ITS is about routing 
interrupts to CPUs.

Will


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-27 Thread Chalamarla, Tirumalesh
Marc,

 What is your opinion on ITS emulation . is it should be part of KVM or 
VFIO.
Also this code needs to depend on ITS host driver a lot, Host ITS 
driver needs to have an interface for this code to use.

Thanks,
 Tirumalesh
 
-Original Message-
From: Will Deacon [mailto:will.dea...@arm.com] 
Sent: Friday, June 27, 2014 1:47 AM
To: Alex Williamson
Cc: Chalamarla, Tirumalesh; Joerg Roedel; k...@vger.kernel.org; open list; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER; marc.zyng...@arm.com
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Thu, Jun 26, 2014 at 08:36:24PM +0100, Alex Williamson wrote:
 On Thu, 2014-06-26 at 19:10 +, Chalamarla, Tirumalesh wrote:
  Thanks for the clarification Alex, That’s exactly my point, why are 
  we relying on  QEMU or something else to emulate the MSI space when 
  we can directly give access to devices using ITS (of course with a 
  small emulation code).  This way we are also benefited from all ITS 
  services like VCPU migration etc.
 
 I have no idea what ITS is.

ITS is the MSI doorbell for GICv3 (ARM's latest interrupt controller).

I agree that we will need an ITS emulation if we want to use MSIs in the guest, 
and I believe that Marc (CC'd) had already started thinking about that.


  What about non QEMU VFIO users, for example, if I wanted to use VFIO to
  assign a device to a user process I don't need to depend on QEMU.   I
  thought this is one of the main goals of vfio to make it independent of
  hypervisors. 
 
 Where did QEMU become a requirement?  Maybe I'm missing something in 
 the ARM part of the conversation that got chopped off, but this is 
 exactly why we have the VFIO/QEMU split that we do.  VFIO provides 
 basic virtualization for config space and restricts access to other 
 areas that users shouldn't be allowed to change.  QEMU is just one 
 example of a userspace VFIO driver.  QEMU takes the decomposed device 
 exposed through the VFIO ABI and re-creates a PCI device out of it.  
 VFIO itself has no dependency on QEMU.  Thanks,

I also don't understand the QEMU part here. The MSI emulation would be in the 
kernel, just like the GICv2 emulation that we already have. For userspace 
drivers, wouldn't you just use eventfd rather than bother with emulating MSIs?

Finally, the interrupt remapping part is about the SMMU preventing MSI writes 
to arbitrary portions of the host address space. The ITS is about routing 
interrupts to CPUs.

Will


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-26 Thread Chalamarla, Tirumalesh
Thanks for the clarification Alex, That’s exactly my point, why are we relying 
on  QEMU or something else to emulate the MSI space when we can directly give 
access to devices using ITS (of course with a small emulation code).
This way we are also benefited from all ITS services like VCPU migration etc.  

What about non QEMU VFIO users, for example, if I wanted to use VFIO to assign 
a device to a user process I don't need to depend on QEMU.   I thought this is 
one of the main goals of vfio to make it independent of hypervisors. 

Thanks,
Tirumalesh. 
  

-Original Message-
From: Alex Williamson [mailto:alex.william...@redhat.com] 
Sent: Thursday, June 26, 2014 12:00 PM
To: Chalamarla, Tirumalesh
Cc: Joerg Roedel; Will Deacon; k...@vger.kernel.org; open list; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Thu, 2014-06-26 at 18:41 +, Chalamarla, Tirumalesh wrote:
> Sorry there was a type,
> 
> The question is:
>  
> How is VFIO restricting software from writing to MSI/MSI-X 
> vectors of the device. 

All interrupts are configured via ioctl, not MSI config space or the MSI-X 
vector table in MMIO space.  VFIO protects the MSI config area by virtualizing 
it (you can't actually write the physical enable bit or address/data through 
VFIO).  The MSI-X vector table is protected by preventing read, write, or mmap 
access to it.  QEMU provides further virtualization above the basics provided 
by VFIO.  We really can't guarantee that devices don't have backdoors to 
configure these though.
See the realtek quirk in QEMU for an example of a device that has such a 
backdoor.  That's why we require interrupt remapping, so that a device that 
does this can only hurt the guest, and require the user to opt-out if they feel 
they have a sufficiently trusted guest.  Thanks,

Alex

> 
> -Original Message-----
> From: Chalamarla, Tirumalesh
> Sent: Thursday, June 26, 2014 11:16 AM
> To: Chalamarla, Tirumalesh; Joerg Roedel; Will Deacon
> Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
> stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
> t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated 
> list:ARM SMMU DRIVER
> Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
> IOMMU_CAP_INTR_REMAP
> 
> When I say emulating ITS, I mean translating guest ITS commands to physical 
> ITS commands  and placing them in physical queue. 
> 
> Regards,
> Tirumalesh.
> 
> -Original Message-
> From: kvmarm-boun...@lists.cs.columbia.edu 
> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Chalamarla, 
> Tirumalesh
> Sent: Thursday, June 26, 2014 11:08 AM
> To: Joerg Roedel; Will Deacon
> Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
> stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
> t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated 
> list:ARM SMMU DRIVER
> Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
> IOMMU_CAP_INTR_REMAP
> 
> Forgive me if this discussion is not relative here, but I thought it is.  
> 
> How is VFIO restricting devices from writing  to MSI/MSI-X, Is all the vector 
> area is mapped by VFIO to trap the accesses.  I am asking this because we 
> might need to emulate ITS somewhere either in KVM or VFIO to provide direct 
> access to devices.
> And I don't see any mentions on that.   I think this flag needs to be set by 
> ITS emulation.
> 
> Regards,
> Tirumalesh.
> 
> -Original Message-
> From: kvmarm-boun...@lists.cs.columbia.edu 
> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Joerg 
> Roedel
> Sent: Monday, June 16, 2014 8:39 AM
> To: Will Deacon
> Cc: stuart.yo...@freescale.com; k...@vger.kernel.org; open list; 
> io...@lists.linux-foundation.org; alex.william...@redhat.com; 
> moderated list:ARM SMMU DRIVER; t...@virtualopensystems.com; 
> kvm...@lists.cs.columbia.edu; Christoffer Dall
> Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
> IOMMU_CAP_INTR_REMAP
> 
> On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
> > Ok, thanks. In which case, I think this is really a combined 
> > property of the SMMU and the interrupt controller, so we might need 
> > some extra code so that the SMMU can check that the interrupt 
> > controller for the device is also capable of interrupt remapping.
> 
> Right, that this is part of IOMMU code has more or less historic reasons on 
> x86. Interrupt remapping is purely implemented in the IOMMU there, so on ARM 
> some clue-code 

RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-26 Thread Chalamarla, Tirumalesh
Sorry there was a type,

The question is:
 
How is VFIO restricting software from writing to MSI/MSI-X vectors 
of the device. 

-Original Message-
From: Chalamarla, Tirumalesh 
Sent: Thursday, June 26, 2014 11:16 AM
To: Chalamarla, Tirumalesh; Joerg Roedel; Will Deacon
Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER
Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

When I say emulating ITS, I mean translating guest ITS commands to physical ITS 
commands  and placing them in physical queue. 

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Chalamarla, 
Tirumalesh
Sent: Thursday, June 26, 2014 11:08 AM
To: Joerg Roedel; Will Deacon
Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER
Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

Forgive me if this discussion is not relative here, but I thought it is.  

How is VFIO restricting devices from writing  to MSI/MSI-X, Is all the vector 
area is mapped by VFIO to trap the accesses.  I am asking this because we might 
need to emulate ITS somewhere either in KVM or VFIO to provide direct access to 
devices.
And I don't see any mentions on that.   I think this flag needs to be set by 
ITS emulation.

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Joerg Roedel
Sent: Monday, June 16, 2014 8:39 AM
To: Will Deacon
Cc: stuart.yo...@freescale.com; k...@vger.kernel.org; open list; 
io...@lists.linux-foundation.org; alex.william...@redhat.com; moderated 
list:ARM SMMU DRIVER; t...@virtualopensystems.com; 
kvm...@lists.cs.columbia.edu; Christoffer Dall
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
> Ok, thanks. In which case, I think this is really a combined property 
> of the SMMU and the interrupt controller, so we might need some extra 
> code so that the SMMU can check that the interrupt controller for the 
> device is also capable of interrupt remapping.

Right, that this is part of IOMMU code has more or less historic reasons on 
x86. Interrupt remapping is purely implemented in the IOMMU there, so on ARM 
some clue-code between interrupt controler and smmu is needed.


Joerg


___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-26 Thread Chalamarla, Tirumalesh
When I say emulating ITS, I mean translating guest ITS commands to physical ITS 
commands  and placing them in physical queue. 

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Chalamarla, 
Tirumalesh
Sent: Thursday, June 26, 2014 11:08 AM
To: Joerg Roedel; Will Deacon
Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER
Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

Forgive me if this discussion is not relative here, but I thought it is.  

How is VFIO restricting devices from writing  to MSI/MSI-X, Is all the vector 
area is mapped by VFIO to trap the accesses.  I am asking this because we might 
need to emulate ITS somewhere either in KVM or VFIO to provide direct access to 
devices.
And I don't see any mentions on that.   I think this flag needs to be set by 
ITS emulation.

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Joerg Roedel
Sent: Monday, June 16, 2014 8:39 AM
To: Will Deacon
Cc: stuart.yo...@freescale.com; k...@vger.kernel.org; open list; 
io...@lists.linux-foundation.org; alex.william...@redhat.com; moderated 
list:ARM SMMU DRIVER; t...@virtualopensystems.com; 
kvm...@lists.cs.columbia.edu; Christoffer Dall
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
> Ok, thanks. In which case, I think this is really a combined property 
> of the SMMU and the interrupt controller, so we might need some extra 
> code so that the SMMU can check that the interrupt controller for the 
> device is also capable of interrupt remapping.

Right, that this is part of IOMMU code has more or less historic reasons on 
x86. Interrupt remapping is purely implemented in the IOMMU there, so on ARM 
some clue-code between interrupt controler and smmu is needed.


Joerg


___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-26 Thread Chalamarla, Tirumalesh
Forgive me if this discussion is not relative here, but I thought it is.  

How is VFIO restricting devices from writing  to MSI/MSI-X,
Is all the vector area is mapped by VFIO to trap the accesses.  I am asking 
this because we might need to emulate ITS somewhere either in KVM or VFIO to 
provide direct access to devices.
And I don't see any mentions on that.   I think this flag needs to be set by 
ITS emulation.

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Joerg Roedel
Sent: Monday, June 16, 2014 8:39 AM
To: Will Deacon
Cc: stuart.yo...@freescale.com; k...@vger.kernel.org; open list; 
io...@lists.linux-foundation.org; alex.william...@redhat.com; moderated 
list:ARM SMMU DRIVER; t...@virtualopensystems.com; 
kvm...@lists.cs.columbia.edu; Christoffer Dall
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
> Ok, thanks. In which case, I think this is really a combined property 
> of the SMMU and the interrupt controller, so we might need some extra 
> code so that the SMMU can check that the interrupt controller for the 
> device is also capable of interrupt remapping.

Right, that this is part of IOMMU code has more or less historic reasons on 
x86. Interrupt remapping is purely implemented in the IOMMU there, so on ARM 
some clue-code between interrupt controler and smmu is needed.


Joerg


___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-26 Thread Chalamarla, Tirumalesh
Forgive me if this discussion is not relative here, but I thought it is.  

How is VFIO restricting devices from writing  to MSI/MSI-X,
Is all the vector area is mapped by VFIO to trap the accesses.  I am asking 
this because we might need to emulate ITS somewhere either in KVM or VFIO to 
provide direct access to devices.
And I don't see any mentions on that.   I think this flag needs to be set by 
ITS emulation.

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Joerg Roedel
Sent: Monday, June 16, 2014 8:39 AM
To: Will Deacon
Cc: stuart.yo...@freescale.com; k...@vger.kernel.org; open list; 
io...@lists.linux-foundation.org; alex.william...@redhat.com; moderated 
list:ARM SMMU DRIVER; t...@virtualopensystems.com; 
kvm...@lists.cs.columbia.edu; Christoffer Dall
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
 Ok, thanks. In which case, I think this is really a combined property 
 of the SMMU and the interrupt controller, so we might need some extra 
 code so that the SMMU can check that the interrupt controller for the 
 device is also capable of interrupt remapping.

Right, that this is part of IOMMU code has more or less historic reasons on 
x86. Interrupt remapping is purely implemented in the IOMMU there, so on ARM 
some clue-code between interrupt controler and smmu is needed.


Joerg


___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-26 Thread Chalamarla, Tirumalesh
When I say emulating ITS, I mean translating guest ITS commands to physical ITS 
commands  and placing them in physical queue. 

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Chalamarla, 
Tirumalesh
Sent: Thursday, June 26, 2014 11:08 AM
To: Joerg Roedel; Will Deacon
Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER
Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

Forgive me if this discussion is not relative here, but I thought it is.  

How is VFIO restricting devices from writing  to MSI/MSI-X, Is all the vector 
area is mapped by VFIO to trap the accesses.  I am asking this because we might 
need to emulate ITS somewhere either in KVM or VFIO to provide direct access to 
devices.
And I don't see any mentions on that.   I think this flag needs to be set by 
ITS emulation.

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Joerg Roedel
Sent: Monday, June 16, 2014 8:39 AM
To: Will Deacon
Cc: stuart.yo...@freescale.com; k...@vger.kernel.org; open list; 
io...@lists.linux-foundation.org; alex.william...@redhat.com; moderated 
list:ARM SMMU DRIVER; t...@virtualopensystems.com; 
kvm...@lists.cs.columbia.edu; Christoffer Dall
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
 Ok, thanks. In which case, I think this is really a combined property 
 of the SMMU and the interrupt controller, so we might need some extra 
 code so that the SMMU can check that the interrupt controller for the 
 device is also capable of interrupt remapping.

Right, that this is part of IOMMU code has more or less historic reasons on 
x86. Interrupt remapping is purely implemented in the IOMMU there, so on ARM 
some clue-code between interrupt controler and smmu is needed.


Joerg


___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-26 Thread Chalamarla, Tirumalesh
Sorry there was a type,

The question is:
 
How is VFIO restricting software from writing to MSI/MSI-X vectors 
of the device. 

-Original Message-
From: Chalamarla, Tirumalesh 
Sent: Thursday, June 26, 2014 11:16 AM
To: Chalamarla, Tirumalesh; Joerg Roedel; Will Deacon
Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER
Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

When I say emulating ITS, I mean translating guest ITS commands to physical ITS 
commands  and placing them in physical queue. 

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Chalamarla, 
Tirumalesh
Sent: Thursday, June 26, 2014 11:08 AM
To: Joerg Roedel; Will Deacon
Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER
Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

Forgive me if this discussion is not relative here, but I thought it is.  

How is VFIO restricting devices from writing  to MSI/MSI-X, Is all the vector 
area is mapped by VFIO to trap the accesses.  I am asking this because we might 
need to emulate ITS somewhere either in KVM or VFIO to provide direct access to 
devices.
And I don't see any mentions on that.   I think this flag needs to be set by 
ITS emulation.

Regards,
Tirumalesh.

-Original Message-
From: kvmarm-boun...@lists.cs.columbia.edu 
[mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Joerg Roedel
Sent: Monday, June 16, 2014 8:39 AM
To: Will Deacon
Cc: stuart.yo...@freescale.com; k...@vger.kernel.org; open list; 
io...@lists.linux-foundation.org; alex.william...@redhat.com; moderated 
list:ARM SMMU DRIVER; t...@virtualopensystems.com; 
kvm...@lists.cs.columbia.edu; Christoffer Dall
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
 Ok, thanks. In which case, I think this is really a combined property 
 of the SMMU and the interrupt controller, so we might need some extra 
 code so that the SMMU can check that the interrupt controller for the 
 device is also capable of interrupt remapping.

Right, that this is part of IOMMU code has more or less historic reasons on 
x86. Interrupt remapping is purely implemented in the IOMMU there, so on ARM 
some clue-code between interrupt controler and smmu is needed.


Joerg


___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvm...@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability IOMMU_CAP_INTR_REMAP

2014-06-26 Thread Chalamarla, Tirumalesh
Thanks for the clarification Alex, That’s exactly my point, why are we relying 
on  QEMU or something else to emulate the MSI space when we can directly give 
access to devices using ITS (of course with a small emulation code).
This way we are also benefited from all ITS services like VCPU migration etc.  

What about non QEMU VFIO users, for example, if I wanted to use VFIO to assign 
a device to a user process I don't need to depend on QEMU.   I thought this is 
one of the main goals of vfio to make it independent of hypervisors. 

Thanks,
Tirumalesh. 
  

-Original Message-
From: Alex Williamson [mailto:alex.william...@redhat.com] 
Sent: Thursday, June 26, 2014 12:00 PM
To: Chalamarla, Tirumalesh
Cc: Joerg Roedel; Will Deacon; k...@vger.kernel.org; open list; 
stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated list:ARM 
SMMU DRIVER
Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
IOMMU_CAP_INTR_REMAP

On Thu, 2014-06-26 at 18:41 +, Chalamarla, Tirumalesh wrote:
 Sorry there was a type,
 
 The question is:
  
 How is VFIO restricting software from writing to MSI/MSI-X 
 vectors of the device. 

All interrupts are configured via ioctl, not MSI config space or the MSI-X 
vector table in MMIO space.  VFIO protects the MSI config area by virtualizing 
it (you can't actually write the physical enable bit or address/data through 
VFIO).  The MSI-X vector table is protected by preventing read, write, or mmap 
access to it.  QEMU provides further virtualization above the basics provided 
by VFIO.  We really can't guarantee that devices don't have backdoors to 
configure these though.
See the realtek quirk in QEMU for an example of a device that has such a 
backdoor.  That's why we require interrupt remapping, so that a device that 
does this can only hurt the guest, and require the user to opt-out if they feel 
they have a sufficiently trusted guest.  Thanks,

Alex

 
 -Original Message-
 From: Chalamarla, Tirumalesh
 Sent: Thursday, June 26, 2014 11:16 AM
 To: Chalamarla, Tirumalesh; Joerg Roedel; Will Deacon
 Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
 stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
 t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated 
 list:ARM SMMU DRIVER
 Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
 IOMMU_CAP_INTR_REMAP
 
 When I say emulating ITS, I mean translating guest ITS commands to physical 
 ITS commands  and placing them in physical queue. 
 
 Regards,
 Tirumalesh.
 
 -Original Message-
 From: kvmarm-boun...@lists.cs.columbia.edu 
 [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Chalamarla, 
 Tirumalesh
 Sent: Thursday, June 26, 2014 11:08 AM
 To: Joerg Roedel; Will Deacon
 Cc: k...@vger.kernel.org; open list; alex.william...@redhat.com; 
 stuart.yo...@freescale.com; io...@lists.linux-foundation.org; 
 t...@virtualopensystems.com; kvm...@lists.cs.columbia.edu; moderated 
 list:ARM SMMU DRIVER
 Subject: RE: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
 IOMMU_CAP_INTR_REMAP
 
 Forgive me if this discussion is not relative here, but I thought it is.  
 
 How is VFIO restricting devices from writing  to MSI/MSI-X, Is all the vector 
 area is mapped by VFIO to trap the accesses.  I am asking this because we 
 might need to emulate ITS somewhere either in KVM or VFIO to provide direct 
 access to devices.
 And I don't see any mentions on that.   I think this flag needs to be set by 
 ITS emulation.
 
 Regards,
 Tirumalesh.
 
 -Original Message-
 From: kvmarm-boun...@lists.cs.columbia.edu 
 [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Joerg 
 Roedel
 Sent: Monday, June 16, 2014 8:39 AM
 To: Will Deacon
 Cc: stuart.yo...@freescale.com; k...@vger.kernel.org; open list; 
 io...@lists.linux-foundation.org; alex.william...@redhat.com; 
 moderated list:ARM SMMU DRIVER; t...@virtualopensystems.com; 
 kvm...@lists.cs.columbia.edu; Christoffer Dall
 Subject: Re: [RFC PATCH v6 04/20] iommu/arm-smmu: add capability 
 IOMMU_CAP_INTR_REMAP
 
 On Mon, Jun 16, 2014 at 04:25:26PM +0100, Will Deacon wrote:
  Ok, thanks. In which case, I think this is really a combined 
  property of the SMMU and the interrupt controller, so we might need 
  some extra code so that the SMMU can check that the interrupt 
  controller for the device is also capable of interrupt remapping.
 
 Right, that this is part of IOMMU code has more or less historic reasons on 
 x86. Interrupt remapping is purely implemented in the IOMMU there, so on ARM 
 some clue-code between interrupt controler and smmu is needed.
 
 
   Joerg
 
 
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
 ___
 kvmarm mailing list
 kvm