Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

2021-10-01 Thread Thomas Gleixner
Marcelo,

On Fri, Oct 01 2021 at 09:05, Marcelo Tosatti wrote:
> On Fri, Oct 01, 2021 at 01:02:23AM +0200, Thomas Gleixner wrote:
>> But even if that would be the case, then what prevents the stale time
>> stamps to be visible? Nothing:
>> 
>> T0:t = now();
>>  -> pause
>>  -> resume
>>  -> magic "fixup"
>> T1:dostuff(t);
>
> Yes.
>
> BTW, you could have a userspace notification (then applications 
> could handle this if desired).

Well, we have that via timerfd with TFD_TIMER_CANCEL_ON_SET for
CLOCK_REALTIME. That's what applications which are sensitive to clock
REALTIME jumps use today.

>>   Now the proposed change is creating exactly the same problem:
>> 
>>   >> > + if (data.flags & KVM_CLOCK_REALTIME) {
>>   >> > + u64 now_real_ns = ktime_get_real_ns();
>>   >> > +
>>   >> > + /*
>>   >> > +  * Avoid stepping the kvmclock backwards.
>>   >> > +  */
>>   >> > + if (now_real_ns > data.realtime)
>>   >> > + data.clock += now_real_ns - data.realtime;
>>   >> > + }
>> 
>>   IOW, it takes the time between pause and resume into account and
>>   forwards the underlying base clock which makes CLOCK_MONOTONIC
>>   jump forward by exactly that amount of time.
>
> Well, it is assuming that the
>
>  T0:t = now();
>  T1:pause vm()
>  T2:  finish vm migration()
>  T3:dostuff(t);
>
> Interval between T1 and T2 is small (and that the guest
> clocks are synchronized up to a given boundary).

Yes, I understand that, but it's an assumption and there is no boundary
for the time jump in the proposed patches, which rings my alarm bells :)

> But i suppose adding a limit to the forward clock advance 
> (in the migration case) is useful:
>
>   1) If migration (well actually, only the final steps
>  to finish migration, the time between when guest is paused
>  on source and is resumed on destination) takes too long,
>  then too bad: fix it to be shorter if you want the clocks
>  to have close to zero change to realtime on migration.
>
>   2) Avoid the other bugs in case of large forward advance.
>
> Maybe having it configurable, with a say, 1 minute maximum by default
> is a good choice?

Don't know what 1 minute does in terms of applications etc. You have to
do some experiments on that.

> An alternative would be to advance only the guests REALTIME clock, from 
> data about how long steps T1-T2 took.

Yes, that's what would happen in the cooperative S2IDLE or S3 case when
the guest resumes.

>> So now virt came along and created a hard to solve circular dependency
>> problem:
>> 
>>- If CLOCK_MONOTONIC stops for too long then NTP/PTP gets out of
>>  sync, but everything else is happy.
>>  
>>- If CLOCK_MONOTONIC jumps too far forward, then all hell breaks
>>  lose, but NTP/PTP is happy.
>
> One must handle the
>
>  T0:t = now();
>   -> pause
>   -> resume
>   -> magic "fixup"
>  T1:dostuff(t);
>
> fact if one is going to use savevm/restorevm anyway, so...
> (it is kind of unfixable, unless you modify your application
> to accept notifications to redo any computation based on t, isnt it?).

Well yes, but what applications can deal with is CLOCK_REALTIME jumping
because that's a property of it. Not so much the CLOCK_MONOTONIC part.

>> If you decide that correctness is overrated, then please document it
>> clearly instead of trying to pretend being correct.
>
> Based on the above, advancing only CLOCK_REALTIME (and not CLOCK_MONOTONIC)
> would be correct, right? And its probably not very hard to do.

Time _is_ hard to get right. 

So you might experiment with something like this as a stop gap:

  Provide the guest something like this:

  u64  migration_seq;
  u64  realtime_delta_ns;

  in the shared clock page

  Do not forward jump clock MONOTONIC.

  On resume kick an IPI where the guest handler does:

 if (clock_data->migration_seq == migration_seq)
return;

 migration_seq = clock_data->migration_seq;

 ts64 = { 0, 0 };
 timespec64_add_ns(, clock_data->realtime_delta_ns);
 timekeeping_inject_sleeptime64();

  Make sure that the IPI completes before you migrate the guest another
  time or implement it slightly smarter, but you get the idea :)

That's what we use for suspend time injection, but it should just work
without frozen tasks as well. It wi

Re: [PATCH v8 4/7] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

2021-09-30 Thread Thomas Gleixner
Marcelo,

On Thu, Sep 30 2021 at 16:21, Marcelo Tosatti wrote:
> On Wed, Sep 29, 2021 at 03:56:29PM -0300, Marcelo Tosatti wrote:
>> On Thu, Sep 16, 2021 at 06:15:35PM +, Oliver Upton wrote:
>> 
>> Thomas, CC'ed, has deeper understanding of problems with 
>> forward time jumps than I do. Thomas, any comments?
>
> Based on the earlier discussion about the problems of synchronizing
> the guests clock via a notification to the NTP/Chrony daemon 
> (where there is a window where applications can read the stale
> value of the clock), a possible solution would be triggering
> an NMI on the destination (so that it runs ASAP, with higher
> priority than application/kernel).
>
> What would this NMI do, exactly?

Nothing. You cannot do anything time related in an NMI.

You might queue irq work which handles that, but that would still not
prevent user space or kernel space from observing the stale time stamp
depending on the execution state from where it resumes.

>> As a note: this makes it not OK to use KVM_CLOCK_REALTIME flag 
>> for either vm pause / vm resume (well, if paused for long periods of time) 
>> or savevm / restorevm.
>
> Maybe with the NMI above, it would be possible to use
> the realtime clock as a way to know time elapsed between
> events and advance guest clock without the current 
> problematic window.

As much duct tape you throw at the problem, it cannot be solved ever
because it's fundamentally broken. All you can do is to make the
observation windows smaller, but that's just curing the symptom.

The problem is that the guest is paused/resumed without getting any
information about that and the execution of the guest is stopped at an
arbitrary instruction boundary from which it resumes after migration or
restore. So there is no way to guarantee that after resume all vCPUs are
executing in a state which can handle that.

But even if that would be the case, then what prevents the stale time
stamps to be visible? Nothing:

T0:t = now();
 -> pause
 -> resume
 -> magic "fixup"
T1:dostuff(t);

But that's not a fundamental problem because every preemptible or
interruptible code has the same issue:

T0:t = now();
 -> preemption or interrupt
T1:dostuff(t);

Which is usually not a problem, but It becomes a problem when T1 - T0 is
greater than the usual expectations which can obviously be trivially
achieved by guest migration or a savevm, restorevm cycle.

But let's go a step back and look at the clocks and their expectations:

CLOCK_MONOTONIC:

  Monotonically increasing clock which counts unless the system
  is in suspend. On resume it continues counting without jumping
  forward.

  That's the reference clock for everything else and therefore it
  is important that it does _not_ jump around.

  The reasons why CLOCK_MONOTONIC stops during suspend is
  historical and any attempt to change that breaks the world and
  some more because making it jump forward will trigger all sorts
  of timeouts, watchdogs and whatever. The last attempt to make
  CLOCK_MONOTONIC behave like CLOCK_BOOTTIME was reverted within 3
  weeks. It's not going to be attempted again. See a3ed0e4393d6
  ("Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME") for
  details.

  Now the proposed change is creating exactly the same problem:

  >> > +if (data.flags & KVM_CLOCK_REALTIME) {
  >> > +u64 now_real_ns = ktime_get_real_ns();
  >> > +
  >> > +/*
  >> > + * Avoid stepping the kvmclock backwards.
  >> > + */
  >> > +if (now_real_ns > data.realtime)
  >> > +data.clock += now_real_ns - data.realtime;
  >> > +}

  IOW, it takes the time between pause and resume into account and
  forwards the underlying base clock which makes CLOCK_MONOTONIC
  jump forward by exactly that amount of time.

  So depending on the size of the delta you are running into exactly the
  same problem as the final failed attempt to unify CLOCK_MONOTONIC and
  CLOCK_BOOTTIME which btw. would have been a magic cure for virt.

  Too bad, not going to happen ever unless you fix all affected user
  space and kernel side issues.


CLOCK_BOOTTIME:

  CLOCK_MONOTONIC + time spent in suspend


CLOCK_REALTIME/TAI:

  CLOCK_MONOTONIC + offset

  The offset is established by reading RTC at boot time and can be
  changed by clock_settime(2) and adjtimex(2). The latter is used
  by NTP/PTP.

  Any user of CLOCK_REALTIME has to be prepared for it to jump in
  both directions, but of course NTP/PTP daemons have expectations
  vs. such time jumps.

  They rightfully assume on a properly configured and administrated
  system that there are only two things which can make CLOCK_REALTIME
  jump:

  1) NTP/PTP daemon controlled
  2) Suspend/resume related updates by the kernel


Just for the record, these assumptions predate virtualization.

So now virt came along and created a hard to solve circular dependency

Re: [RFC PATCH v12 05/11] time: Add mechanism to recognize clocksource in time_get_snapshot

2020-05-28 Thread Thomas Gleixner
Jianyong Wu  writes:
> From: Thomas Gleixner 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 7cb09c4cf21c..a8f65b3e4ec8 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -928,6 +928,9 @@ int __clocksource_register_scale(struct clocksource *cs, 
> u32 scale, u32 freq)
>  
>   clocksource_arch_init(cs);
>  
> +if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
> + cs->id = CSID_GENERIC;
> +

This is white space damaged and certainly not from me.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Thomas Gleixner
Anastassios Nanos  writes:
> On Mon, May 18, 2020 at 11:43 AM Thomas Gleixner  wrote:
>>
>> And this shows clearly how simple the user space is which is required to
>> do that. So why on earth would we want to have all of that in the
>> kernel?
>>
> well, the main idea is that all this functionality is already in the
> kernel. My view is that kvmmtest is as simple as kvmtest.

That still does not explain the purpose, the advantage and any reason
why this should be moreged.

> Moreover, it doesn't involve *any* mode switch at all while printing
> out the result of the addition of these two registers -- which I guess
> for a simple use-case like this it isn't much.  But if we were to
> scale this to a large number of exits (and their respective handling
> in user-space) that would incur significant overhead. Don't you agree?

No. I still do not see the real world use case you are trying to
solve. We are not going to accept changes like this which have no proper
justification, real world use cases and proper numbers backing it up.

Thanks,

tglx


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Thomas Gleixner
Anastassios Nanos  writes:
> To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
> QEMU, or some other kind of VM monitor in user-space to host the vCPU
> threads, I/O threads and various other book-keeping/management mechanisms.
> This is perfectly fine for a large number of reasons and use cases: for
> instance, running generic VMs, running general purpose Operating systems
> that need some kind of emulation for legacy boot/hardware etc.
>
> What if we wanted to execute a small piece of code as a guest instance,
> without the involvement of user-space? The KVM functions are already doing
> what they should: VM and vCPU setup is already part of the kernel, the only
> missing piece is memory handling.
>
> With these series, (a) we expose to the Linux Kernel the bare minimum KVM
> API functions in order to spawn a guest instance without the intervention
> of user-space; and (b) we tweak the memory handling code of KVM-related
> functions to account for another kind of guest, spawned in kernel-space.
>
> PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces the
> changes in the KVM memory handling code for x86_64 and aarch64.
>
> An example of use is provided based on kvmtest.c
> [https://lwn.net/Articles/658512/] at

And this shows clearly how simple the user space is which is required to
do that. So why on earth would we want to have all of that in the
kernel?

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 05/14] x86/mm: Introduce lookup_address_in_mm()

2020-01-09 Thread Thomas Gleixner
Sean Christopherson  writes:

> diff --git a/arch/x86/include/asm/pgtable_types.h 
> b/arch/x86/include/asm/pgtable_types.h
> index b5e49e6bac63..400ac8da75e8 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -561,6 +561,10 @@ static inline void update_page_count(int level, unsigned 
> long pages) { }
>  extern pte_t *lookup_address(unsigned long address, unsigned int *level);
>  extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
>   unsigned int *level);
> +
> +struct mm_struct;
> +pte_t *lookup_address_in_mm(struct mm_struct *mm, unsigned long address,
> + unsigned int *level);

Please keep the file consistent and use extern even if not required.

Other than that:

Reviewed-by: Thomas Gleixner 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v7 4/7] time: Add mechanism to recognize clocksource in time_get_snapshot

2019-11-14 Thread Thomas Gleixner
On Thu, 14 Nov 2019, Jianyong Wu wrote:
> From: Thomas Gleixner 
> In some scenario like return device time to ptp_kvm guest,
> we need identify the current clocksource outside core time code.
> A mechanism added to recognize the current clocksource
> by export clocksource id in time_get_snapshot.

Can you please replace that with the following:

 System time snapshots are not conveying information about the current
 clocksource which was used, but callers like the PTP KVM guest
 implementation have the requirement to evaluate the clocksource type to
 select the appropriate mechanism.

 Introduce a clocksource id field in struct clocksource which is by default
 set to CSID_GENERIC (0). Clocksource implementations can set that field to
 a value which allows to identify the clocksource.

 Store the clocksource id of the current clocksource in the
 system_time_snapshot so callers can evaluate which clocksource was used to
 take the snapshot and act accordingly.

> diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
> new file mode 100644
> index ..93bec8426c44
> --- /dev/null
> +++ b/include/linux/clocksource_ids.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_CLOCKSOURCE_IDS_H
> +#define _LINUX_CLOCKSOURCE_IDS_H
> +
> +/* Enum to give clocksources a unique identifier */
> +enum clocksource_ids {
> + CSID_GENERIC= 0,
> + CSID_ARM_ARCH_COUNTER,

This should only add the infrastructure with just CSID_GENERIC in place.

The ARM_ARCH variant needs to come in a seperate patch which adds the enum
and uses it in the corresponding driver. Seperate means a patch doing only
that and nothing else, i.e. not hidden in some other patch which actually
makes use of it.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v6 5/7] psci: Add hvc call service for ptp_kvm.

2019-11-07 Thread Thomas Gleixner
On Thu, 24 Oct 2019, Jianyong Wu wrote:

> This patch is the base of ptp_kvm for arm64.

This patch ...

> ptp_kvm modules will call hvc to get this service.
> The service offers real time and counter cycle of host for guest.
> 
> Signed-off-by: Jianyong Wu 
> ---
>  drivers/clocksource/arm_arch_timer.c |  2 ++
>  include/clocksource/arm_arch_timer.h |  4 
>  include/linux/arm-smccc.h| 12 
>  virt/kvm/arm/psci.c  | 22 ++
>  4 files changed, 40 insertions(+)
> 
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index 07e57a49d1e8..e4ad38042ef6 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -29,6 +29,7 @@
>  #include 
>  
>  #include 
> +#include 

Same ordering issue and lack of file.
 
> diff --git a/include/clocksource/arm_arch_timer.h 
> b/include/clocksource/arm_arch_timer.h
> index 1d68d5613dae..426d749e8cf8 100644
> --- a/include/clocksource/arm_arch_timer.h
> +++ b/include/clocksource/arm_arch_timer.h
> @@ -104,6 +104,10 @@ static inline bool arch_timer_evtstrm_available(void)
>   return false;
>  }
>  
> +bool is_arm_arch_counter(void *unuse)

A global function in a header file? You might want to make this static
inline. And while at it please s/unuse/unused/

> +{
> + return false;
> +}
>  #endif
>  #include 
> diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
> index 0debf49bf259..339bcbafac7b 100644
> --- a/virt/kvm/arm/psci.c
> +++ b/virt/kvm/arm/psci.c
> @@ -15,6 +15,7 @@
>  #include 
>  
>  #include 
> +#include 

Sigh.
  
>  /*
>   * This is an implementation of the Power State Coordination Interface
> @@ -392,6 +393,8 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>   u32 func_id = smccc_get_function(vcpu);
>   u32 val[4] = {};
>   u32 option;
> + u64 cycles;
> + struct system_time_snapshot systime_snapshot;

Also here you might notice that the variables are not randomly ordered.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v6 4/7] time: Add mechanism to recognize clocksource in time_get_snapshot

2019-11-06 Thread Thomas Gleixner
On Thu, 24 Oct 2019, Jianyong Wu wrote:
> From: Thomas Gleixner 
>
> In some scenario like return device time to ptp_kvm guest,
> we need identify the current clocksource outside core time code.
>
> This patch add a mechanism to recognize the current clocksource
> by export clocksource id in time_get_snapshot.

Please check Documentation/process/submitting-patches.rst and search for
'This patch'.

> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index b21db536fd52..ac8016b22734 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Please place that include to the other linux includes. You might notice
that there is ordering here.

But where is that include? It's not part of that series, so how is this
supposed to compile?

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v5 3/6] timekeeping: Add clocksource to system_time_snapshot

2019-10-16 Thread Thomas Gleixner
On Wed, 16 Oct 2019, Jianyong Wu (Arm Technology China) wrote:
> On Wed, 16 Oct 2019, Thomas Gleixner wrote:
> > On Tue, 15 Oct 2019, Jianyong Wu wrote:
> > +/* Enum to give clocksources a unique identifier */ enum
> > +clocksource_ids {
> > +   CSID_GENERIC= 0,
> > +   CSID_ARM_ARCH_COUNTER,
> > +   CSID_MAX,
> > +};
> > +
> 
> Does this mean I must add clocksource id for all kinds of ARCHs and
> update all the code which have checked clocksource in the old way?

What is the old way? No code exists which can access the current
clocksource pointer because the core code does not expose it at all.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [PATCH v5 3/6] timekeeping: Add clocksource to system_time_snapshot

2019-10-16 Thread Thomas Gleixner
Jianyong,

On Wed, 16 Oct 2019, Jianyong Wu (Arm Technology China) wrote:

Please fix your mail client not to copy the full headers into the reply.

> > On Wed, 16 Oct 2019, Paolo Bonzini wrote:
> > > On 15/10/19 22:13, Thomas Gleixner wrote:
> > That's a completely different beast, really.
> > 
> > The clocksource pointer is handed in by the caller and the core code 
> > validates
> > if the clocksource is the same as the current system clocksource and not the
> > other way round.
> > 
> > So there is no need for getting that pointer from the core code because the
> > caller knows already which clocksource needs to be active to make.the whole
> > cross device timestamp correlation work. And in that case it's the callers
> > responsibility to ensure that the pointer is valid which is the case for the
> > current use cases.
> > 
> I thinks there is something misunderstanding of my patch. See patch 4/6,
> the reason why I add clocksource is that I want to check if the current
> clocksouce is arm_arch_counter in virt/kvm/arm/psci.c and nothing to do
> with get_device_system_crosststamp.

There is no misunderstanding at all. Your patch is broken in several ways
as I explained in detail.

> So I really need a mechanism to do that check.
> 
> Thanks
> Jianyong

So just by chance I scrolled further down and found more replies from
you. Please trim the reply properly and add your 'Thanks Jianyong' to the
end of the mail.
 
> > From your other reply:
> > 
> > > Why add a global id?  ARM can add it to archdata similar to how x86
> > > has vclock_mode.  But I still think the right thing to do is to
> > > include the full system_counterval_t in the result of
> > > ktime_get_snapshot.  (More in a second, feel free to reply to the other
> > email only).
> > 
> > No, the clocksource pointer is not going to be exposed as there is no
> > guarantee that it will be still around after the call returns.
> > 
> > It's not even guaranteed to be correct when the store happens in Wu's patch
> > simply because the store is done outside of the seqcount protected region.
> 
> Yeah, all of the elements in system_time_snapshot should be captured in
> consistency. So I think the consistency will be guaranteed if the store
> ops added in the seqcount region.

Again. While it is consistent in terms of storage, it's still wrong to
expose a pointer to something which has no life time guarantee. Even if
your use case is just to compare the pointer it's a bad idea to do that
especially without any comment about the pointer validity at all.

Thanks,

tglx


Re: [PATCH v5 4/6] psci: Add hvc call service for ptp_kvm.

2019-10-16 Thread Thomas Gleixner
On Wed, 16 Oct 2019, Paolo Bonzini wrote:
> On 15/10/19 12:48, Jianyong Wu wrote:
> > diff --git a/drivers/clocksource/arm_arch_timer.c 
> > b/drivers/clocksource/arm_arch_timer.c
> > index 07e57a49d1e8..3597f1f27b10 100644
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -1634,3 +1634,8 @@ static int __init arch_timer_acpi_init(struct 
> > acpi_table_header *table)
> >  }
> >  TIMER_ACPI_DECLARE(arch_timer, ACPI_SIG_GTDT, arch_timer_acpi_init);
> >  #endif
> > +
> > +bool is_arm_arch_counter(void *cs)
> > +{
> > +   return (struct clocksource *)cs == _counter;
> > +}
> 
> As Thomas pointed out, any reason to have a void * here?
> 
> However, since he didn't like modifying the struct, here is an
> alternative idea:
> 
> 1) add a "struct clocksource*" argument to ktime_get_snapshot
> 
> 2) return -ENODEV if the argument is not NULL and is not the current
> clocksource
> 
> 3) move the implementation of the hypercall to
> drivers/clocksource/arm_arch_timer.c, so that it can call
> ktime_get_snapshot(_snapshot, _counter);

And then you implement a gazillion of those functions for every
arch/subarch which has a similar requirement. Pointless exercise.

Having the ID is trivial enough and the storage space is not really a
concern.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 3/6] timekeeping: Add clocksource to system_time_snapshot

2019-10-16 Thread Thomas Gleixner
On Wed, 16 Oct 2019, Paolo Bonzini wrote:
> On 15/10/19 22:13, Thomas Gleixner wrote:
> > On Tue, 15 Oct 2019, Paolo Bonzini wrote:
> >> On 15/10/19 12:48, Jianyong Wu wrote:
> >>>  
> >>>
> >>
> >> Reviewed-by: Paolo Bonzini 
> > 
> > You're sure about having reviewed that in detail?
> 
> I did review the patch; the void* ugliness is not in this one, and I do
> have some other qualms on that one.
> 
> > This changelog is telling absolutely nothing WHY anything outside of the
> > timekeeping core code needs access to the current clocksource. Neither does
> > it tell why it is safe to provide the pointer to random callers.
> 
> Agreed on the changelog, but the pointer to a clocksource is already
> part of the timekeeping external API via struct system_counterval_t.
> get_device_system_crosststamp for example expects a clocksource pointer
> but provides no way to get such a pointer.

That's a completely different beast, really.

The clocksource pointer is handed in by the caller and the core code
validates if the clocksource is the same as the current system clocksource
and not the other way round.

So there is no need for getting that pointer from the core code because the
caller knows already which clocksource needs to be active to make.the whole
cross device timestamp correlation work. And in that case it's the callers
responsibility to ensure that the pointer is valid which is the case for
the current use cases.

>From your other reply:

> Why add a global id?  ARM can add it to archdata similar to how x86 has
> vclock_mode.  But I still think the right thing to do is to include the
> full system_counterval_t in the result of ktime_get_snapshot.  (More in
> a second, feel free to reply to the other email only).

No, the clocksource pointer is not going to be exposed as there is no
guarantee that it will be still around after the call returns.

It's not even guaranteed to be correct when the store happens in Wu's patch
simply because the store is done outside of the seqcount protected region.

Vs. arch data: arch data is an opaque struct, so you'd need to store a
pointer which has the same issue as the clocksource pointer itself.

If we want to convey information then it has to be in the generic part
of struct clocksource.

In fact we could even simplify the existing get_device_system_crosststamp()
use case by using the ID field.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 3/6] timekeeping: Add clocksource to system_time_snapshot

2019-10-15 Thread Thomas Gleixner
On Tue, 15 Oct 2019, Paolo Bonzini wrote:
> On 15/10/19 12:48, Jianyong Wu wrote:
> >  
> > 
> 
> Reviewed-by: Paolo Bonzini 

You're sure about having reviewed that in detail?

Thanks,

tglx


Re: [PATCH v5 3/6] timekeeping: Add clocksource to system_time_snapshot

2019-10-15 Thread Thomas Gleixner
On Tue, 15 Oct 2019, Jianyong Wu wrote:

> Sometimes, we need check current clocksource outside of
> timekeeping area. Add clocksource to system_time_snapshot then
> we can get clocksource as well as system time.

This changelog is telling absolutely nothing WHY anything outside of the
timekeeping core code needs access to the current clocksource. Neither does
it tell why it is safe to provide the pointer to random callers.

> +/*
> + * struct system_time_snapshot - simultaneous raw/real time capture with
> + *   counter value
> + * @sc:  Contains clocksource and clocksource counter value to 
> produce
> + *   the system times
> + * @real:Realtime system time
> + * @raw: Monotonic raw system time
> + * @clock_was_set_seq:   The sequence number of clock was set events
> + * @cs_was_changed_seq:  The sequence number of clocksource change events
> + */
> +struct system_time_snapshot {
> + struct system_counterval_t sc;
> + ktime_t real;
> + ktime_t raw;
> + unsigned intclock_was_set_seq;
> + u8  cs_was_changed_seq;
> +};
> +
>  /*
>   * Get cross timestamp between system clock and device clock
>   */
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 44b726bab4bd..66ff089605b3 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -983,7 +983,8 @@ void ktime_get_snapshot(struct system_time_snapshot 
> *systime_snapshot)
>   nsec_raw  = timekeeping_cycles_to_ns(>tkr_raw, now);
>   } while (read_seqcount_retry(_core.seq, seq));
>  
> - systime_snapshot->cycles = now;
> + systime_snapshot->sc.cycles = now;
> + systime_snapshot->sc.cs = tk->tkr_mono.clock;

The clock pointer can change right after the store, the underlying data can
be freed .

Looking at the rest of the patch set the actual usage site is:

> +   case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> +   ktime_get_snapshot(_snapshot);
> +   if (!is_arm_arch_counter(systime_snapshot.sc.cs))
> +   return kvm_psci_call(vcpu);

and that function does:

> +bool is_arm_arch_counter(void *cs)

void *? Type safety is overrated, right? The type is well known

+{
+   return (struct clocksource *)cs == _counter;

That nonsensical typecast does not make up for that.

+}

So while the access to the pointer is actually safe, this is not going to
happen simply because you modify a generic interface in a way which will
lead the next developer to insane assumptions about the validity of that
pointer.

While the kernel is pretty lax in terms of isolation due to the nature of
the programming language, this does not justify to expose critical
internals of core code to random callers. Guess why most of the timekeeping
internals are carefully shielded from external access.

Something like the completely untested (not even compiled) patch below
gives you access to the information you need and allows to reuse the
mechanism for other purposes without adding is_$FOO_timer() all over the
place.

Thanks,

tglx

8<--
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -9,6 +9,7 @@
 #ifndef _LINUX_CLOCKSOURCE_H
 #define _LINUX_CLOCKSOURCE_H
 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +50,10 @@ struct module;
  * 400-499: Perfect
  * The ideal clocksource. A must-use where
  * available.
+ * @id:Defaults to CSID_GENERIC. The id value is 
captured
+ * in certain snapshot functions to allow callers to
+ * validate the clocksource from which the snapshot was
+ * taken.
  * @read:  returns a cycle value, passes clocksource as argument
  * @enable:optional function to enable the clocksource
  * @disable:   optional function to disable the clocksource
@@ -91,6 +96,7 @@ struct clocksource {
const char *name;
struct list_head list;
int rating;
+   enum clocksource_ids id;
int (*enable)(struct clocksource *cs);
void (*disable)(struct clocksource *cs);
unsigned long flags;
--- /dev/null
+++ b/include/linux/clocksource_ids.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CLOCKSOURCE_IDS_H
+#define _LINUX_CLOCKSOURCE_IDS_H
+
+/* Enum to give clocksources a unique identifier */
+enum clocksource_ids {
+   CSID_GENERIC= 0,
+   CSID_ARM_ARCH_COUNTER,
+   CSID_MAX,
+};
+
+#endif
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_TIMEKEEPING_H
 #define _LINUX_TIMEKEEPING_H
 
+#include 
 #include 
 
 /* Included from linux/ktime.h */
@@ -228,15 +229,17 @@ extern void timekeeping_inject_sleeptime
  * @cycles:Clocksource counter value to produce the system times
  * @real:  Realtime system time
  * 

Re: KVM Arm64 and Linux-RT issues

2019-07-26 Thread Thomas Gleixner
On Wed, 24 Jul 2019, Marc Zyngier wrote:
> On 23/07/2019 18:58, Julien Grall wrote:
> It really feels like a change in hrtimer_cancel semantics. From what I
> understand, this is used to avoid racing against the softirq, but boy it
> breaks things.
> 
> If this cannot be avoided, this means we can't cancel the background
> timer (which is used to emulate the vcpu timer while it is blocked
> waiting for an interrupt), then we must move this canceling to the point
> where the vcpu is unblocked (instead of scheduled), which may have some
> side effects -- I'll have a look.
> 
> But that's not the only problem: We also have hrtimers used to emulate
> timers while the vcpu is running, and these timers are canceled in
> kvm_timer_vcpu_put(), which is also called from a preempt notifier.
> Unfortunately, I don't have a reasonable solution for that (other than
> putting this hrtimer_cancel in a workqueue and start chasing the
> resulting races).

The fix is simple. See below. We'll add that to the next RT release. That
will take a while as I'm busy with posting RT stuff for upstream :)

Thanks,

tglx

8<
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -80,7 +80,7 @@ static inline bool userspace_irqchip(str
 static void soft_timer_start(struct hrtimer *hrt, u64 ns)
 {
hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
- HRTIMER_MODE_ABS);
+ HRTIMER_MODE_ABS_HARD);
 }
 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

2018-10-14 Thread Thomas Gleixner
On Sun, 14 Oct 2018, Liran Alon wrote:
> > On 13 Oct 2018, at 17:53, lantianyu1...@gmail.com wrote:
> > 
> > From: Lan Tianyu 
> > 
> > This patch is to add wrapper functions for tlb_remote_flush_with_range
> > callback.
> > 
> > Signed-off-by: Lan Tianyu 
> > ---
> > Change sicne V3:
> >   Remove code of updating "tlbs_dirty"
> > Change since V2:
> >   Fix comment in the kvm_flush_remote_tlbs_with_range()
> > ---
> > arch/x86/kvm/mmu.c | 40 
> > 1 file changed, 40 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index c73d9f650de7..ff656d85903a 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -264,6 +264,46 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
> > static union kvm_mmu_page_role
> > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
> > 
> > +
> > +static inline bool kvm_available_flush_tlb_with_range(void)
> > +{
> > +   return kvm_x86_ops->tlb_remote_flush_with_range;
> > +}
> 
> Seems that kvm_available_flush_tlb_with_range() is not used in this patch…

What's wrong with that? 

It provides the implementation and later patches make use of it. It's a
sensible way to split patches into small, self contained entities.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 12/20] genirq: Document vcpu_info usage for percpu_devid interrupts

2017-10-21 Thread Thomas Gleixner
On Sat, 21 Oct 2017, Christoffer Dall wrote:

> On Fri, Oct 20, 2017 at 01:56:16PM +0200, Thomas Gleixner wrote:
> > On Fri, 20 Oct 2017, Christoffer Dall wrote:
> > 
> > Please Cc lkml when changing genirq core code.
> > 
> 
> ok, will do so in the future.
> 
> > > It is currently unclear how to set the VCPU affinity for a percpu_devid
> > > interrupt , since the Linux irq_data structure describes the state for
> > > multiple interrupts, one for each physical CPU on the system.  Since
> > > each such interrupt can be associated with different VCPUs or none at
> > > all, associating a single VCPU state with such an interrupt does not
> > > capture the necessary semantics.
> > > 
> > > The implementers of irq_set_affinity are the Intel and AMD IOMMUs, and
> > > the ARM GIC irqchip.  The Intel and AMD callers do not appear to use
> > > percpu_devid interrupts, and the ARM GIC implementation only checks the
> > > pointer against NULL vs. non-NULL.
> > > 
> > > Therefore, simply update the function documentation to explain the
> > > expected use in the context of percpu_devid interrupts, allowing future
> > > changes or additions to irqchip implementers to do the right thing.
> > > 
> > > This allows us to set the VCPU affinity for the virtual timer interrupt
> > > in KVM/ARM, which is a percpu_devid (PPI) interrupt.
> > 
> > I have a hard time to understand how adding the comment allows you to set
> > VCPU affinity 
> > 
> 
> Obviously everything already works without adding the comment, however
> it is not entirely clear how to use the vcpu_into for percpu_devid
> interrupts, and this patch aims to make that clear, as it is the first
> user of irq_set_vcpu_affinity for a percpu_devid interrupt which
> actually uses the vcpu_info pointer.
> 
> Would you rather that we did not update the documentation or what do you
> suggest?

Documentation is welcome of course. I just want a changelog which is not
misleading. i.e. it says:

   Update the function documentation .

   This allow us to set 

Which reads like: Updating the documentation allows us to set ...

See?

Thanks,

tglx


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 12/20] genirq: Document vcpu_info usage for percpu_devid interrupts

2017-10-20 Thread Thomas Gleixner
On Fri, 20 Oct 2017, Christoffer Dall wrote:

Please Cc lkml when changing genirq core code.

> It is currently unclear how to set the VCPU affinity for a percpu_devid
> interrupt , since the Linux irq_data structure describes the state for
> multiple interrupts, one for each physical CPU on the system.  Since
> each such interrupt can be associated with different VCPUs or none at
> all, associating a single VCPU state with such an interrupt does not
> capture the necessary semantics.
> 
> The implementers of irq_set_affinity are the Intel and AMD IOMMUs, and
> the ARM GIC irqchip.  The Intel and AMD callers do not appear to use
> percpu_devid interrupts, and the ARM GIC implementation only checks the
> pointer against NULL vs. non-NULL.
> 
> Therefore, simply update the function documentation to explain the
> expected use in the context of percpu_devid interrupts, allowing future
> changes or additions to irqchip implementers to do the right thing.
> 
> This allows us to set the VCPU affinity for the virtual timer interrupt
> in KVM/ARM, which is a percpu_devid (PPI) interrupt.

I have a hard time to understand how adding the comment allows you to set
VCPU affinity 

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 01/52] genirq: Let irq_set_vcpu_affinity() iterate over hierarchy

2017-07-04 Thread Thomas Gleixner


On Wed, 28 Jun 2017, Marc Zyngier wrote:

> When assigning an interrupt to a vcpu, it is not unlikely that
> the level of the hierarchy implementing irq_set_vcpu_affinity
> is not the top level (think a generic MSI domain on top of a
> virtualization aware interrupt controller).
> 
> In such a case, let's iterate over the hierarchy until we find
> an irqchip implementing it.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 00/33] irqchip: Core support for GICv4

2017-02-13 Thread Thomas Gleixner
Marc,

On Tue, 17 Jan 2017, Marc Zyngier wrote:

> This series implements the core support for GICv4. And despite its
> size, it does exactly *nothing*. What it adds is an infrastructure
> that a hypervisor (KVM) can use to route VLPIs to a guest.

That's a very well done patch set and it was a pleasure to review!

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 33/33] irqchip/gic-v4: Enable low-level GICv4 operations

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Get the show on the road...
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 32/33] irqchip/gic-v4: Add some basic documentation

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Do a braindump of the way things are supposed to work.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Nice !

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 31/33] irqchip/gic-v4: Add VLPI configuration interface

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> Add the required interfaces to map, unmap and update a VLPI.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 30/33] irqchip/gic-v4: Add VPE command interface

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Add the required interfaces to schedule a VPE and perform a
> VINVALL command.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 29/33] irqchip/gic-v4: Add per-VM VPE domain creation

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> When creating a VM, it is very convenient to have an irq domain
> containing all the doorbell interrupts associated with that VM
> (each interrupt representing a VPE).
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 28/33] irqchip/gic-v3-its: Support VPE doorbell invalidation even when !DirectLPI

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> When we don't have the DirectLPI feature, we must work around the
> architecture shortcomings to be able to perform the required
> invalidation.
> 
> For this, we create a fake device whose sole purpose is to
> provide a way to issue a map/inv/unmap sequence (and the corresponding
> sync operations). That's 6 commands and a full serialization point
> to be able to do this.
> 
> You just have hope the hypervisor won't do that too often...
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 27/33] irqchip/gic-v3-its: Add VPE interrupt masking

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> When masking/unmasking a doorbell interrupt, it is necessary
> to issue an invalidation to the corresponding redistributor.
> We use the DirectLPI feature by writting directly to the corresponding
> redistributor.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 26/33] irqchip/gic-v3-its: Add VPE affinity changes

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> When we're about to run a vcpu, it is crucial that the redistributor
> associated with the physical CPU is being told about the new residency.
> 
> This is abstracted by hijacking the irq_set_affinity method for the
> doorbell interrupt associated with the VPE. It is expected that the
> hypervisor will call this method before scheduling the VPE.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 24/33] irqchip/gic-v3-its: Add VPE scheduling

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> +static int its_vpe_set_vcpu_affinity(struct irq_data *d, void *vcpu_info)
> +{
> + struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
> + struct its_cmd_info *info = vcpu_info;
> + u64 val;
> +
> + switch (info->cmd_type) {
> + case SCHEDULE_VPE:
> + {
> + void * __iomem vlpi_base = gic_data_rdist_vlpi_base();

Moving the vlpi_base line on top of the fucntion spares you lines and
brackets. No point in having the same thing twice

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 25/33] irqchip/gic-v3-its: Add VPE invalidation hook

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> When a guest issues a INVALL command targetting a collection, it must
> be translated into a VINVALL for the VPE that has this collection.
> 
> This patch implements a hook that offers this functionallity to the
> hypervisor.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 23/33] irqchip/gic-v3-its: Add VPENDBASER/VPROPBASER accessors

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> V{PEND,PROP}BASER being 64bit registers, they need some ad-hoc
> accessors on 32bit, specially given that VPENDBASER contains
> a Valid bit, making the access a bit convoluted.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 22/33] irqchip/gic-v3-its: Add VPE irq domain [de]activation

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> On activation, a VPE is mapped using the VMAPP command, followed
> by a VINVALL for a good measure. On deactivation, the VPE is
> simply unmapped.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 21/33] irqchip/gic-v3-its: Add VPE irq domain allocation/teardown

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> +static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int 
> virq,
> + unsigned int nr_irqs, void *args)
> +{
> + msi_alloc_info_t *info = args;
> + struct its_vpe **vpes = info->scratchpad[0].ptr;
> + struct its_vm *vm = vpes[0]->its_vm;
> + unsigned long *bitmap;
> + struct page *vprop_page;
> + int base, nr_ids, i, err = 0;
> +
> + bitmap = its_lpi_alloc_chunks(nr_irqs, , _ids);
> + if (!bitmap)
> + return -ENOMEM;
> +
> + if (nr_ids < nr_irqs) {
> + its_lpi_free_chunks(bitmap, base, nr_ids);
> + return -ENOMEM;
> + }
> +
> + vprop_page = its_allocate_prop_table(GFP_KERNEL);
> + if (!vprop_page) {
> + its_lpi_free_chunks(bitmap, base, nr_ids);
> + return ENOMEM;
> + }
> +
> + for (i = 0; i < nr_irqs; i++) {
> + vpes[i]->vpe_db_lpi = base + i;
> + err = its_vpe_init(vpes[i]);
> + if (err)
> + break;
> + irq_domain_set_hwirq_and_chip(domain,
> +   virq + i, vpes[i]->vpe_db_lpi,
> +   _vpe_irq_chip, vpes[i]);
> + set_bit(i, bitmap);
> + }
> +
> + if (err) {
> + while (--i >= 0) {
> + its_vpe_teardown(vpes[i]);
> + clear_bit(i, bitmap);
> + }
> +
> + its_lpi_free_chunks(bitmap, base, nr_ids);
> + its_free_prop_table(vprop_page);

Couldn't you just do:

if (err) {
if (i > 0)
its_vpe_irq_domain_free(domain, virq, i - 1)

Hmm?

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 20/33] irqchip/gic-v3-its: Add VPE domain infrastructure

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> @@ -2266,6 +2294,8 @@ int __init its_init(struct fwnode_handle *handle, 
> struct rdists *rdists,
>   struct irq_domain *parent_domain)
>  {
>   struct device_node *of_node;
> + struct its_node *its;
> + bool has_v4 = false;
>  
>   its_parent = parent_domain;
>   of_node = to_of_node(handle);
> @@ -2283,5 +2313,11 @@ int __init its_init(struct fwnode_handle *handle, 
> struct rdists *rdists,
>   its_alloc_lpi_tables();
>   its_lpi_init(rdists->id_bits);
>  
> + list_for_each_entry(its, _nodes, entry)
> + has_v4 |= its->is_v4;

Hmm, can you have mixed version its nodes? If not, you probably should have
some sanity check (not necessarily here). If yes, then how does that work?

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 19/33] irqchip/gic-v3-its: Add VLPI configuration handling

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> When a VLPI is reconfigured (enabled, disabled, change in priority),
> the full configuration byte must be written, and the caches invalidated.
> 
> Also, when using the irq_mask/irq_unmask methods, it is necessary
> to disable the doorbell for that particular interrupt (by mapping it
> to 1023) on top of clearing the Enable bit.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Aside of the switch/case part:

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 17/33] irqchip/gic-v3-its: Add VLPI configuration hook

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> +static int its_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu_info)
> +{
> + struct its_device *its_dev = irq_data_get_irq_chip_data(d);
> + struct its_cmd_info *info = vcpu_info;
> + u32 event = its_get_event_id(d);
> +
> + /* Need a v4 ITS */
> + if (!its_dev->its->is_v4 || !info)
> + return -EINVAL;
> +
> + switch (info->cmd_type) {
> + case MAP_VLPI:
> + {

Looking at the later patches which add functionality here, I'd rather avoid
these massive case { } blocks and stick the functionality into seperate
helper functions.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 16/33] irqchip/gic-v3-its: Add GICv4 ITS command definitions

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Add the new GICv4 ITS command definitions, most of them, being
> defined in terms of their physical counterparts.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 15/33] irqchip/gic-v4: Add management structure definitions

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Add a bunch of GICv4-specific data structures that will get used in
> subsequent patches.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 14/33] irqchip/gic-v3-its: Generalize LPI configuration

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> We're are going to need to change a bit more than just the enable
> bit in the LPI property table in the future. So let's change the
> LPI configuration funtion to take a set of bits to be cleared,
> and a set of bits to be set.
> 
> This way, we'll be able to use it when a guest updates an LPI
> property (priority, for example).
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 13/33] irqchip/gic-v3-its: Generalize device table allocation

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> As we want to use 2-level tables for VCPUs, let's hack the device
> table allocator in order to make it slightly more generic. It
> will get reused in subsequent patches.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 12/33] irqchip/gic-v3-its: Rework LPI freeing

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Rework LPI deallocation so that it can be reused by the v4 support
> code.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 11/33] irqchip/gic-v3-its: Split out pending table allocation

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Just as for the property table, let's move the pending table
> allocation to a separate function.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 10/33] irqchip/gic-v4-its: Allow use of indirect VCPU tables

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> The VCPU tables can be quite sparse as well, and it makes sense
> to use indirect tables as well if possible.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 08/33] irqchip/gic-v3-its: Implement irq_set_irqchip_state for pending state

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Allow the pending state of an LPI to be set or cleared via
> irq_set_irqchip_state.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 07/33] irqchip/gic-v3-its: Macro-ize its_send_single_command

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> +
> +static __its_send_single_cmd(its_send_single_command, its_cmd_builder_t,
> +  struct its_collection, its_build_sync_cmd)

I'm fine with that macro magic, but the above is very unintuitive as it
looks like a normal function.

The way this is done in other places is to have:

#define BUILD_CMD_FUNC(name,) \


and then have

BUILD_CMD_FUNC(its_send_single_command, );

That makes it entirely obvious that this is macro magic.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 06/33] irqchip/gic-v3-its: Add probing for VLPI properties

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> + typer = gic_read_typer(its_base + GITS_TYPER);
>   its->base = its_base;
>   its->phys_base = res->start;
> - its->ite_size = ((gic_read_typer(its_base + GITS_TYPER) >> 4) & 0xf) + 
> 1;
> + its->ite_size = ((typer >> 4) & 0xf) + 1;
> + its->is_v4 = !!(typer & GITS_TYPER_VLPIS);
> + if (its->is_v4 && !(typer & GITS_TYPER_VMOVP)) {
> + int its_number;
> +
> + its_number = find_first_zero_bit(_list_map, 16);

s/16/ITS_MAX_ENTITIES or whatever.

> + if (its_number >= 16) {
> + pr_err("ITS@%pa: No ITSList entry available!\n",
> +>start);
> + err = -EINVAL;
> + goto out_free_its;
> + }
> +
> + ctlr = readl_relaxed(its_base + GITS_CTLR);
> + ctlr &= ~GITS_CTLR_ITS_NUMBER;
> + ctlr |= its_number << GITS_CTLR_ITS_NUMBER_SHIFT;
> + writel_relaxed(ctlr, its_base + GITS_CTLR);
> + ctlr = readl_relaxed(its_base + GITS_CTLR);
> + if ((ctlr & GITS_CTLR_ITS_NUMBER) != (its_number << 
> GITS_CTLR_ITS_NUMBER_SHIFT)) {
> + its_number = ctlr & GITS_CTLR_ITS_NUMBER;
> + its_number >>= GITS_CTLR_ITS_NUMBER_SHIFT;
> + }
> +
> + if (test_and_set_bit(its_number, _list_map)) {

You just established above that the bit is not set. I assume that this is
code which has no concurrency concerns

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 04/33] irqchip/gic-v3-its: Move LPI definitions around

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> The various LPI definitions are in the middle of the code, and
> would be better placed at the beginning, given that we're going
> to use some of them much earlier.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 03/33] irqchip/gic-v3-its: Refactor command encoding

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:
> +static void its_mask_encode(u64 *raw_cmd, u64 val, int h, int l)

I'd rather name h/l in a way which makes it clear that they are describing
a bit range. msb/lsb perhaps.

> +{
> + u64 mask = GENMASK_ULL(h, l);

New line missing here.

> + *raw_cmd &= ~mask;
> + *raw_cmd |= (val << l) & mask;
> +}
> +
>  static void its_encode_cmd(struct its_cmd_block *cmd, u8 cmd_nr)
>  {
> - cmd->raw_cmd[0] &= ~0xffULL;
> - cmd->raw_cmd[0] |= cmd_nr;
> + its_mask_encode(>raw_cmd[0], cmd_nr, 7, 0);

I assume that the manual provides a name for these bit ranges. Wouldn't it
be helpful to use them in these functions?

Looks good otherwise.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 02/33] irqchip/gic-v3: Add VLPI/DirectLPI discovery

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> Add helper functions that probe for VLPI and DirectLPI properties.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 01/33] irqchip/gic-v3: Add redistributor iterator

2017-02-13 Thread Thomas Gleixner
On Tue, 17 Jan 2017, Marc Zyngier wrote:

> In order to discover the VLPI properties, we need to iterate over
> the redistributor regions. As we already have code that does this,
> let's factor it out and make it slightly more generic.
> 
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Reviewed-by: Thomas Gleixner <t...@linutronix.de>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested

2016-07-26 Thread Thomas Gleixner
Eric,

On Mon, 25 Jul 2016, Auger Eric wrote:
> On 20/07/2016 11:09, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Eric Auger wrote:
> >> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
> >>  {
> >>int ret = 0;
> >>  
> >> -  if (erase)
> >> +  if (erase) {
> >>memset(msg, 0, sizeof(*msg));
> >> -  else
> >> +  } else {
> >> +  struct device *dev;
> >> +
> >>ret = irq_chip_compose_msi_msg(irq_data, msg);
> >> +  if (ret)
> >> +  return ret;
> >> +
> >> +  dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
> >> +  WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
> > 
> > What the heck is this call doing? And why is there only a WARN_ON and not a
> > proper error return code handling?
> 
> iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I
> of this series. This helper function detects the physical address found in
> the MSI message has a corresponding allocated IOVA. This happens if the MSI
> doorbell is accessed through an IOMMU and this IOMMU do not bypass the MSI
> addresses (ARM case). Allocation of this IOVA was performed in the previous
> patch.
>
> So, if this is the case, the physical address is swapped with the IOVA
> address. That way the PCIe device will send the MSI with this IOVA and
> the address will be translated by the IOMMU into the target MSI doorbell PA.
> 
> Hope this clarifies

No, it does not. You are explaining in great length what that function is
doing, but you are not explaining WHY your don't do a proper return code
handling and just do a WARN_ON() and happily proceed. If that function fails
then the interrupt will not be functional, so WHY on earth are you continuing?

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs

2016-07-26 Thread Thomas Gleixner
B1;2802;0cEric,

On Mon, 25 Jul 2016, Auger Eric wrote:
> On 20/07/2016 11:04, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Eric Auger wrote:
> >> +  if (ret) {
> >> +  for (; i >= 0; i--) {
> >> +  struct irq_data *d = irq_get_irq_data(virq + i);
> >> +
> >> +  msi_handle_doorbell_mappings(d, false);
> >> +  }
> >> +  irq_domain_free_irqs(virq, desc->nvec_used);
> >> +  desc->irq = 0;
> >> +  goto error;
> > 
> > How is that supposed to work? You clear desc->irq and then you call
> > ops->handle_error.
>
> if I don't clear the desc->irq I enter an infinite loop in
> pci_enable_msix_range.
>
> This happens because msix_capability_init and pcie_enable_msix returns 1.
> In msix_capability_init, at out_avail: we enumerate the msi_desc which have
> a non zero irq, hence the returned value equal to 1.
>
> Currently the only handle_error ops I found, pci_msi_domain_handle_error
> does not use irq field so works although questionable.

The logic here is: If the allocation does not succeed for the requested number
of interrupts, we tell the caller how many interrupts we were able to set up.
So the caller can decide what to do.

In your case you don't want to have a partial allocation, so instead of
playing silly games with desc->irq you should add a flag which tells the PCI
code that you are not interested in a partial allocation and that it should
return an error code instead.

Something like PCI_DEV_FLAGS_MSI_NO_PARTIAL_ALLOC should do the trick.

> As for the irq_domain_free_irqs I think I can remove it since handled later.

Not only the free_irqs(). You should let the teardown function handle
everything including your doorbell mapping teardown. It's nothing special and
free_msi_irqs() at the end of msix_capability_init() will take care of it.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe

2016-07-22 Thread Thomas Gleixner
On Thu, 21 Jul 2016, Auger Eric wrote:
> On 20/07/2016 10:12, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Eric Auger wrote:
> >> +bool msi_doorbell_safe(void)
> >> +{
> >> +  struct irqchip_doorbell *db;
> >> +  bool irq_remapping = true;
> >> +
> >> +  mutex_lock(_doorbell_mutex);
> >> +  list_for_each_entry(db, _doorbell_list, next) {
> >> +  irq_remapping &= db->info.irq_remapping;
> > 
> > db->info.irq_remapping is set in msi_doorbell_register(). So you can keep 
> > book
> > about that there. No need to iterate here.
> Yes makes sense to store the info at registration time. Currently this
> function is not in any fast path but that's cleaner from a general
> perspective. I will need to do such iteration at un-registration though.

Two simple counter should be sufficient.

  nr_registered_bells;
  nr_remapped_bells;



Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested

2016-07-20 Thread Thomas Gleixner
On Tue, 19 Jul 2016, Eric Auger wrote:

First of all - valid for all patches:

Subject: sys/subsys: Sentence starts with an uppercase letter

Now for this particular one:

genirq/msi: use the MSI doorbell's IOVA when requested

> On MSI message composition we now use the MSI doorbell's IOVA in
> place of the doorbell's PA in case the device is upstream to an
> IOMMU that requires MSI addresses to be mapped. The doorbell's
> allocation and mapping happened on an early stage (pci_enable_msi).

This changelog is completely useless. At least I cannot figure out what that
patch actually does. And the implementation is not self explaining either.
 
> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>  {
>   int ret = 0;
>  
> - if (erase)
> + if (erase) {
>   memset(msg, 0, sizeof(*msg));
> - else
> + } else {
> + struct device *dev;
> +
>   ret = irq_chip_compose_msi_msg(irq_data, msg);
> + if (ret)
> + return ret;
> +
> + dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
> + WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));

What the heck is this call doing? And why is there only a WARN_ON and not a
proper error return code handling?

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs

2016-07-20 Thread Thomas Gleixner
On Tue, 19 Jul 2016, Eric Auger wrote:
>  /**
> + * msi_handle_doorbell_mappings: in case the irq data corresponds to an
> + * MSI that requires iommu mapping, traverse the irq domain hierarchy
> + * to retrieve the doorbells to handle and iommu_map/unmap them according
> + * to @map boolean.
> + *
> + * @data: irq data handle
> + * @map: mapping if true, unmapping if false
> + */


Please run that through the kernel doc generator. It does not work that way.

The format is:

/**
 * function_name - Short function description
 * @arg1:   Description of arg1
 * @argument2:  Description of argument2
 *
 * Long explanation including documentation of the return values.
 */

> +static int msi_handle_doorbell_mappings(struct irq_data *data, bool map)
> +{
> + const struct irq_chip_msi_doorbell_info *dbinfo;
> + struct iommu_domain *domain;
> + struct irq_chip *chip;
> + struct device *dev;
> + dma_addr_t iova;
> + int ret = 0, cpu;
> +
> + while (data) {
> + dev = msi_desc_to_dev(irq_data_get_msi_desc(data));
> + domain = iommu_msi_domain(dev);
> + if (domain) {
> + chip = irq_data_get_irq_chip(data);
> + if (chip->msi_doorbell_info)
> + break;
> + }
> + data = data->parent_data;
> + }

Please split that out into a seperate function

struct irq_data *msi_get_doorbell_info(data)
{
.
if (chip->msi_doorbell_info)
return chip->msi_get_doorbell_info(data);
}
return NULL;
}

   info = msi_get_doorbell_info(data);
   .

> + if (!data)
> + return 0;
> +
> + dbinfo = chip->msi_doorbell_info(data);
> + if (!dbinfo)
> + return -EINVAL;
> +
> + if (!dbinfo->doorbell_is_percpu) {
> + if (!map) {
> + iommu_msi_put_doorbell_iova(domain,
> + dbinfo->global_doorbell);
> + return 0;
> + }
> + return iommu_msi_get_doorbell_iova(domain,
> +dbinfo->global_doorbell,
> +dbinfo->size, dbinfo->prot,
> +);
> + }

You can spare an indentation level with a helper function

if (!dbinfo->doorbell_is_percpu)
return msi_map_global_doorbell(domain, dbinfo);

> +
> + /* percpu doorbells */
> + for_each_possible_cpu(cpu) {
> + phys_addr_t __percpu *db_addr =
> + per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
> +
> + if (!map) {
> + iommu_msi_put_doorbell_iova(domain, *db_addr);
> + } else {
> +
> + ret = iommu_msi_get_doorbell_iova(domain, *db_addr,
> +   dbinfo->size,
> +   dbinfo->prot, );
> + if (ret)
> + return ret;
> + }
> + }

Same here:

for_each_possible_cpu(cpu) {
ret = msi_map_percpu_doorbell(domain, cpu);
if (ret)
return ret;
}
return 0;
 
Hmm?

> +
> + return 0;
> +}
> +
> +/**
>   * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
>   * @domain:  The domain to allocate from
>   * @dev: Pointer to device struct of the device for which the interrupts
> @@ -352,17 +423,29 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, 
> struct device *dev,
>  
>   virq = __irq_domain_alloc_irqs(domain, virq, desc->nvec_used,
>  dev_to_node(dev), , false);
> - if (virq < 0) {
> - ret = -ENOSPC;
> - if (ops->handle_error)
> - ret = ops->handle_error(domain, desc, ret);
> - if (ops->msi_finish)
> - ops->msi_finish(, ret);
> - return ret;
> - }
> + if (virq < 0)
> + goto error;
>  
>   for (i = 0; i < desc->nvec_used; i++)
>   irq_set_msi_desc_off(virq, i, desc);
> +
> + for (i = 0; i < desc->nvec_used; i++) {
> + struct irq_data *d = irq_get_irq_data(virq + i);
> +
> + ret = msi_handle_doorbell_mappings(d, true);
> + if (ret)
> + break;
> + }
> + if (ret) {
> + for (; i >= 0; i--) {
> + struct irq_data *d = irq_get_irq_data(virq + i);
> +
> + msi_handle_doorbell_mappings(d, false);
> + }
> +  

Re: [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe

2016-07-20 Thread Thomas Gleixner
On Tue, 19 Jul 2016, Eric Auger wrote:
> +bool msi_doorbell_safe(void)
> +{
> + struct irqchip_doorbell *db;
> + bool irq_remapping = true;
> +
> + mutex_lock(_doorbell_mutex);
> + list_for_each_entry(db, _doorbell_list, next) {
> + irq_remapping &= db->info.irq_remapping;

db->info.irq_remapping is set in msi_doorbell_register(). So you can keep book
about that there. No need to iterate here.

Thanks,

tglx


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages

2016-07-19 Thread Thomas Gleixner
On Tue, 19 Jul 2016, Eric Auger wrote:
> msi_doorbell_pages sum up the number of iommu pages of a given order

adding () to the function name would make it immediately clear that
msi_doorbell_pages is a function.

> +/**
> + * msi_doorbell_pages: compute the number of iommu pages of size 1 << order
> + * requested to map all the registered doorbells
> + *
> + * @order: iommu page order
> + */

Why are you adding the kernel doc to the header and not to the implementation?

> +int msi_doorbell_pages(unsigned int order);
> +
>  #else
>  
>  static inline struct irq_chip_msi_doorbell_info *
> @@ -47,6 +55,12 @@ msi_doorbell_register_global(phys_addr_t base, size_t size,
>  static inline void
>  msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db) {}
>  
> +static inline int
> +msi_doorbell_pages(unsigned int order)

What's the point of this line break? 

> +{
> + return 0;
> +}
> +
>  #endif /* CONFIG_MSI_DOORBELL */
>  
>  #endif
> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
> index 0ff541e..a5bde37 100644
> --- a/kernel/irq/msi-doorbell.c
> +++ b/kernel/irq/msi-doorbell.c
> @@ -60,3 +60,55 @@ void msi_doorbell_unregister_global(struct 
> irq_chip_msi_doorbell_info *dbinfo)
>   mutex_unlock(_doorbell_mutex);
>  }
>  EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
> +
> +static int compute_db_mapping_requirements(phys_addr_t addr, size_t size,
> +unsigned int order)
> +{
> + phys_addr_t offset, granule;
> + unsigned int nb_pages;
> +
> + granule = (uint64_t)(1 << order);
> + offset = addr & (granule - 1);
> + size = ALIGN(size + offset, granule);
> + nb_pages = size >> order;
> +
> + return nb_pages;
> +}
> +
> +static int
> +compute_dbinfo_mapping_requirements(struct irq_chip_msi_doorbell_info 
> *dbinfo,
> + unsigned int order)

I'm sure you can find even longer function names which require more line
breaks.

> +{
> + int ret = 0;
> +
> + if (!dbinfo->doorbell_is_percpu) {
> + ret = compute_db_mapping_requirements(dbinfo->global_doorbell,
> +   dbinfo->size, order);
> + } else {
> + phys_addr_t __percpu *pbase;
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + pbase = per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
> + ret += compute_db_mapping_requirements(*pbase,
> +dbinfo->size,
> +order);
> + }
> + }
> + return ret;
> +}
> +
> +int msi_doorbell_pages(unsigned int order)
> +{
> + struct irqchip_doorbell *db;
> + int ret = 0;
> +
> + mutex_lock(_doorbell_mutex);
> + list_for_each_entry(db, _doorbell_list, next) {

Pointless braces

> + ret += compute_dbinfo_mapping_requirements(>info, order);
> + }
> + mutex_unlock(_doorbell_mutex);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(msi_doorbell_pages);

So here is a general rant about your naming choices.

   struct irqchip_doorbell
   struct irq_chip_msi_doorbell_info

   struct irq_chip {
  *(*msi_doorbell_info);
   }

   irqchip_doorbell_mutex

   msi_doorbell_register_global
   msi_doorbell_unregister_global

   msi_doorbell_pages

This really sucks. Your public functions start sensibly with msi_doorbell.

Though what is the _global postfix for the register/unregister functions for?
Are there _private functions in the pipeline?

msi_doorbell_pages() is not telling me what it does. msi_calc_doorbell_pages()
would describe it right away.

You doorbell info structure can really do with:

struct msi_doorbell_info;

And the wrapper struct around it is fine with:

struct msi_doorbell;

Thanks,

tglx

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration

2016-07-19 Thread Thomas Gleixner
On Tue, 19 Jul 2016, Eric Auger wrote:
> +
> +#include 
> +#include 
> +#include 
> +
> +struct irqchip_doorbell {
> + struct irq_chip_msi_doorbell_info info;
> + struct list_head next;

Again, please align the struct members.

> +};
> +
> +static LIST_HEAD(irqchip_doorbell_list);
> +static DEFINE_MUTEX(irqchip_doorbell_mutex);
> +
> +struct irq_chip_msi_doorbell_info *
> +msi_doorbell_register_global(phys_addr_t base, size_t size,
> +  int prot, bool irq_remapping)
> +{
> + struct irqchip_doorbell *db;
> +
> + db = kmalloc(sizeof(*db), GFP_KERNEL);
> + if (!db)
> + return ERR_PTR(-ENOMEM);
> +
> + db->info.doorbell_is_percpu = false;

Please use kzalloc and get rid of zero initialization. If you add stuff to the
struct then initialization will be automatically 0.

> +void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info 
> *dbinfo)
> +{
> + struct irqchip_doorbell *db, *tmp;
> +
> + mutex_lock(_doorbell_mutex);
> + list_for_each_entry_safe(db, tmp, _doorbell_list, next) {

Why do you need that iterator? 

db = container_of(dbinfo, struct ., info);

Hmm?

> + if (dbinfo == >info) {
> + list_del(>next);
> + kfree(db);

Please move the kfree() outside of the lock region. It does not matter much
here, but we really should stop doing random crap in locked regions.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v11 03/10] genirq/irq: introduce msi_doorbell_info

2016-07-19 Thread Thomas Gleixner
On Tue, 19 Jul 2016, Eric Auger wrote:
>  
> +/* Describe all the MSI doorbell regions for an irqchip */
> +struct irq_chip_msi_doorbell_info {
> + union {
> + phys_addr_t __percpu *percpu_doorbells;
> + phys_addr_t global_doorbell;
> + };
> + bool doorbell_is_percpu;
> + bool irq_remapping; /* is irq_remapping implemented? */

Please do not use tail comments. Use proper kernel doc for documentation.

> + size_t size;/* size of each doorbell */
> + int prot;   /* iommu protection flag */

Please align the members proper

union {
phys_addr_t __percpu*percpu_doorbells;
phys_addr_t global_doorbell;
};
booldoorbell_is_percpu;
boolirq_remapping;

> +};
> +
>  /**
>   * struct irq_chip - hardware interrupt chip descriptor
>   *
> @@ -349,6 +361,7 @@ static inline irq_hw_number_t irqd_to_hwirq(struct 
> irq_data *d)
>   * @irq_get_irqchip_state:   return the internal state of an interrupt
>   * @irq_set_irqchip_state:   set the internal state of a interrupt
>   * @irq_set_vcpu_affinity:   optional to target a vCPU in a virtual machine
> + * @msi_doorbell_info:   return the MSI doorbell info
>   * @ipi_send_single: send a single IPI to destination cpus
>   * @ipi_send_mask:   send an IPI to destination cpus in cpumask
>   * @flags:   chip specific flags
> @@ -394,7 +407,8 @@ struct irq_chip {
>   int (*irq_set_irqchip_state)(struct irq_data *data, enum 
> irqchip_irq_state which, bool state);
>  
>   int (*irq_set_vcpu_affinity)(struct irq_data *data, void 
> *vcpu_info);
> -
> + struct irq_chip_msi_doorbell_info *(*msi_doorbell_info)(

irq_get_msi_doorbell_info or msi_get_doorbell_info please

> + struct irq_data *data);

No need for a line break here. Please keep it as a single line.

Thanks

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC v4 09/14] msi: IOMMU map the doorbell address when needed

2016-02-26 Thread Thomas Gleixner
On Fri, 26 Feb 2016, Eric Auger wrote:
> +static int msi_map_doorbell(struct iommu_domain *d, struct msi_msg *msg)
> +{
> +#ifdef CONFIG_IOMMU_DMA_RESERVED
> + dma_addr_t iova;
> + phys_addr_t addr;
> + int ret;
> +
> +#ifdef CONFIG_PHYS_ADDR_T_64BIT
> + addr = ((phys_addr_t)(msg->address_hi) << 32) | msg->address_lo;
> +#else
> + addr = msg->address_lo;
> +#endif
> +
> + ret = iommu_get_single_reserved(d, addr, IOMMU_WRITE, );
> + if (!ret) {
> + msg->address_lo = lower_32_bits(iova);
> + msg->address_hi = upper_32_bits(iova);
> + }
> + return ret;
> +#else
> + return -ENODEV;
> +#endif

This is completely unreadable. Please make this in a way which is parseable.
A few small inline functions do the trick.

> +static void msi_unmap_doorbell(struct iommu_domain *d, struct msi_msg *msg)
> +{
> +#ifdef CONFIG_IOMMU_DMA_RESERVED
> + dma_addr_t iova;
> +
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> + iova = ((dma_addr_t)(msg->address_hi) << 32) | msg->address_lo;
> +#else
> + iova = msg->address_lo;
> +#endif
> +
> + iommu_put_single_reserved(d, iova);
> +#endif

Ditto.

> +}
> +
> +#ifdef CONFIG_IOMMU_API
> +static struct iommu_domain *
> + irq_data_to_msi_mapping_domain(struct irq_data *irq_data)

If you split lines, then the function name starts at the beginning of the line
and not at some randome place.

> +{
> +
> + struct msi_desc *desc;
> + struct device *dev;
> + struct iommu_domain *d;
> + int ret;

Please order variables by descending length

> + desc = irq_data_get_msi_desc(irq_data);
> + if (!desc)
> + return NULL;
> +
> + dev = msi_desc_to_dev(desc);
> +
> + d = iommu_get_domain_for_dev(dev);
> + if (!d)
> + return NULL;
> +
> + ret = iommu_domain_get_attr(d, DOMAIN_ATTR_MSI_MAPPING, NULL);
> + if (!ret)
> + return d;
> + else
> + return NULL;

Does anyone except you understand the purpose of the function? Comments have
been invented for a reason.

> +}
> +#else
> +static inline iommu_domain *
> + irq_data_to_msi_mapping_domain(struct irq_data *irq_data)
> +{
> + return NULL;
> +}
> +#endif /* CONFIG_IOMMU_API */
> +
> +static int msi_compose(struct irq_data *irq_data,
> +struct msi_msg *msg, bool erase)
> +{
> + struct msi_msg old_msg;
> + struct iommu_domain *d;
> + int ret = 0;
> +
> + d = irq_data_to_msi_mapping_domain(irq_data);
> + if (unlikely(d))
> + get_cached_msi_msg(irq_data->irq, _msg);
> +
> + if (erase)
> + memset(msg, 0, sizeof(*msg));
> + else
> + ret = irq_chip_compose_msi_msg(irq_data, msg);
> +
> + if (!d)
> + goto out;
> +
> + /* handle (re-)mapping of MSI doorbell */
> + if ((old_msg.address_lo != msg->address_lo) ||
> + (old_msg.address_hi != msg->address_hi))
> + msi_unmap_doorbell(d, _msg);
> +
> + if (!erase)
> + WARN_ON(msi_map_doorbell(d, msg));
> +
> +out:
> + return ret;
> +}

No, this is not the way we do this. You replace existing functionality by some
new fangled thing. which behaves differently.

This wants to be seperate patches, which first create a wrapper for
irq_chip_compose_msi_msg() and then adds the new functionality to it including
a proper explanation.

I have no idea how the above is supposed to be the same as the existing code
for the non iommu case.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC v4 07/14] msi: Add a new MSI_FLAG_IRQ_REMAPPING flag

2016-02-26 Thread Thomas Gleixner
On Fri, 26 Feb 2016, Eric Auger wrote:

> Let's introduce a new msi_domain_info flag value, MSI_FLAG_IRQ_REMAPPING
> meant to tell the domain supports IRQ REMAPPING, also known as Interrupt
> Translation Service. On Intel HW this capability is abstracted on IOMMU
> side while on ARM it is abstracted on MSI controller side.
> 
> GICv3 ITS HW is the first HW advertising that feature.

No. We add the flag, epxlain what it does and then in a second step add the
user _after_ all other preliminaries are implemented.
 
Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 3/4] irqchip: GIC: Convert to EOImode == 1

2015-08-25 Thread Thomas Gleixner
On Tue, 25 Aug 2015, Marc Zyngier wrote:
 +static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
 +
  #ifndef MAX_GIC_NR
  #define MAX_GIC_NR   1
  #endif
 @@ -137,6 +140,14 @@ static inline unsigned int gic_irq(struct irq_data *d)
   return d-hwirq;
  }
  
 +static inline bool primary_gic_irq(struct irq_data *d)
 +{
 + if (MAX_GIC_NR  1)
 + return irq_data_get_irq_chip_data(d) == gic_data[0];
 +
 + return true;
 +}
 +
  /*
   * Routines to acknowledge, disable and enable interrupts
   */
 @@ -164,7 +175,14 @@ static void gic_unmask_irq(struct irq_data *d)
  
  static void gic_eoi_irq(struct irq_data *d)
  {
 - writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
 + u32 deact_offset = GIC_CPU_EOI;
 +
 + if (static_key_true(supports_deactivate)) {
 + if (primary_gic_irq(d))
 + deact_offset = GIC_CPU_DEACTIVATE;

I really wonder for the whole series whether you really want all that
static key dance and extra conditionals in the callbacks instead of
just using seperate irq chips for the different interrupts.

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 4/6] irqchip: GIC: Use chip_data instead of handler_data for cascaded interrupts

2015-07-09 Thread Thomas Gleixner
On Thu, 9 Jul 2015, Marc Zyngier wrote:

 When used as a primary interrupt controller, the GIC driver uses
 irq_data-chip_data to extract controller-specific information.
 
 When used as a secondary interrupt controller, it uses handler_data
 instead. As this difference is relatively pointless and only creates
 confusion, change the secondary path to match what the rest of the
 driver does.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  drivers/irqchip/irq-gic.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
 index e264675..3c7f3a4 100644
 --- a/drivers/irqchip/irq-gic.c
 +++ b/drivers/irqchip/irq-gic.c
 @@ -298,7 +298,7 @@ static void __exception_irq_entry gic_handle_irq(struct 
 pt_regs *regs)
  
  static void gic_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
  {
 - struct gic_chip_data *chip_data = irq_get_handler_data(irq);
 + struct gic_chip_data *chip_data = irq_get_chip_data(irq);
   struct irq_chip *chip = irq_get_chip(irq);

You should make that

chip_data = irq_desc_get_chip_data(desc);
chip = irq_desc_get_chip(desc);

That avoids two pointless lookups of irqdesc 

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm