Re: [PATCH 10/35] irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation

2019-09-26 Thread Zenghui Yu

On 2019/9/27 0:27, Marc Zyngier wrote:

On 26/09/2019 16:57, Marc Zyngier wrote:

On 26/09/2019 16:19, Zenghui Yu wrote:

Hi Marc,

Two more questions below.

On 2019/9/25 22:41, Marc Zyngier wrote:

On 25/09/2019 14:04, Zenghui Yu wrote:

Hi Marc,

Some questions about this patch, mostly to confirm that I would
understand things here correctly.

On 2019/9/24 2:25, Marc Zyngier wrote:

GICv4.1 defines a new VPE table that is potentially shared between
both the ITSs and the redistributors, following complicated affinity
rules.

To make things more confusing, the programming of this table at
the redistributor level is reusing the GICv4.0 GICR_VPROPBASER register
for something completely different.

The code flow is somewhat complexified by the need to respect the
affinities required by the HW, meaning that tables can either be
inherited from a previously discovered ITS or redistributor.

Signed-off-by: Marc Zyngier 
---


[...]


@@ -1962,6 +1965,65 @@ static bool its_parse_indirect_baser(struct its_node 
*its,
return indirect;
}

+static u32 compute_common_aff(u64 val)

+{
+   u32 aff, clpiaff;
+
+   aff = FIELD_GET(GICR_TYPER_AFFINITY, val);
+   clpiaff = FIELD_GET(GICR_TYPER_COMMON_LPI_AFF, val);
+
+   return aff & ~(GENMASK(31, 0) >> (clpiaff * 8));
+}
+
+static u32 compute_its_aff(struct its_node *its)
+{
+   u64 val;
+   u32 svpet;
+
+   /*
+* Reencode the ITS SVPET and MPIDR as a GICR_TYPER, and compute
+* the resulting affinity. We then use that to see if this match
+* our own affinity.
+*/
+   svpet = FIELD_GET(GITS_TYPER_SVPET, its->typer);


The spec says, ITS does not share vPE table with Redistributors when
SVPET==0.  It seems that we miss this rule and simply regard SVPET as
GICR_TYPER_COMMON_LPI_AFF here.  Am I wrong?


Correct. I missed the case where the ITS doesn't share anything. That's
pretty unlikely though (you loose all the benefit of v4.1, and I don't
really see how you'd make it work reliably).


Actually, this is already handled...






+   val  = FIELD_PREP(GICR_TYPER_COMMON_LPI_AFF, svpet);
+   val |= FIELD_PREP(GICR_TYPER_AFFINITY, its->mpidr);
+   return compute_common_aff(val);
+}
+
+static struct its_node *find_sibbling_its(struct its_node *cur_its)
+{
+   struct its_node *its;
+   u32 aff;
+
+   if (!FIELD_GET(GITS_TYPER_SVPET, cur_its->typer))
+   return NULL;


... here. If SVPET is 0, there is no sibling, and we'll allocate a VPE
table as usual.


Yes, I see.  So we can safely encode the non-zero SVPET as
COMMON_LPI_AFF in a GICR_TYPER when computing the affinity.


Thanks,
zenghui



Re: [PATCH 10/35] irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation

2019-09-26 Thread Zenghui Yu

On 2019/9/26 23:57, Marc Zyngier wrote:

On 26/09/2019 16:19, Zenghui Yu wrote:

On 2019/9/25 22:41, Marc Zyngier wrote:


Indeed. The whole idea is that ITSs and RDs can share a vPE table if
they are in the same affinity group (such as a socket, for example).
This is what is missing from v4.0 where the ITS knows about vPEs, but
doesn't know about residency. With that in place, the HW is able to do a
lot more for us.


Thanks for your education!

I really want to know *how* does GICv4.1 ITS know about the vPE
residency (in hardware level)?


Hey, I'm a SW guy, I don't design the hardware! ;-)


I can understand that Redistributor can easily achieve it by
VPENDBASER's Valid and vPEID field.  And it's necessary for ITS to
know the residency so that it can determine whether it's appropriate
to trigger default doorbell for vPE.  But I have no knowledge about
how can it be achieved in hardware details.


My take is that the RD and the ITS can communicate over the shared
table, hence my remark above about SVPET==0. But as I said, I'm not a HW
guy.


;-) I should have asked our GIC engineers for these things.


Thanks,
zenghui



Re: [PATCH 10/35] irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation

2019-09-26 Thread Marc Zyngier
On 26/09/2019 16:57, Marc Zyngier wrote:
> On 26/09/2019 16:19, Zenghui Yu wrote:
>> Hi Marc,
>>
>> Two more questions below.
>>
>> On 2019/9/25 22:41, Marc Zyngier wrote:
>>> On 25/09/2019 14:04, Zenghui Yu wrote:
 Hi Marc,

 Some questions about this patch, mostly to confirm that I would
 understand things here correctly.

 On 2019/9/24 2:25, Marc Zyngier wrote:
> GICv4.1 defines a new VPE table that is potentially shared between
> both the ITSs and the redistributors, following complicated affinity
> rules.
>
> To make things more confusing, the programming of this table at
> the redistributor level is reusing the GICv4.0 GICR_VPROPBASER register
> for something completely different.
>
> The code flow is somewhat complexified by the need to respect the
> affinities required by the HW, meaning that tables can either be
> inherited from a previously discovered ITS or redistributor.
>
> Signed-off-by: Marc Zyngier 
> ---

 [...]

> @@ -1962,6 +1965,65 @@ static bool its_parse_indirect_baser(struct 
> its_node *its,
>   return indirect;
>}
>
> +static u32 compute_common_aff(u64 val)
> +{
> + u32 aff, clpiaff;
> +
> + aff = FIELD_GET(GICR_TYPER_AFFINITY, val);
> + clpiaff = FIELD_GET(GICR_TYPER_COMMON_LPI_AFF, val);
> +
> + return aff & ~(GENMASK(31, 0) >> (clpiaff * 8));
> +}
> +
> +static u32 compute_its_aff(struct its_node *its)
> +{
> + u64 val;
> + u32 svpet;
> +
> + /*
> +  * Reencode the ITS SVPET and MPIDR as a GICR_TYPER, and compute
> +  * the resulting affinity. We then use that to see if this match
> +  * our own affinity.
> +  */
> + svpet = FIELD_GET(GITS_TYPER_SVPET, its->typer);
>>
>> The spec says, ITS does not share vPE table with Redistributors when
>> SVPET==0.  It seems that we miss this rule and simply regard SVPET as
>> GICR_TYPER_COMMON_LPI_AFF here.  Am I wrong?
> 
> Correct. I missed the case where the ITS doesn't share anything. That's
> pretty unlikely though (you loose all the benefit of v4.1, and I don't
> really see how you'd make it work reliably).

Actually, this is already handled...

> 
>>
> + val  = FIELD_PREP(GICR_TYPER_COMMON_LPI_AFF, svpet);
> + val |= FIELD_PREP(GICR_TYPER_AFFINITY, its->mpidr);
> + return compute_common_aff(val);
> +}
> +
> +static struct its_node *find_sibbling_its(struct its_node *cur_its)
> +{
> + struct its_node *its;
> + u32 aff;
> +
> + if (!FIELD_GET(GITS_TYPER_SVPET, cur_its->typer))
> + return NULL;

... here. If SVPET is 0, there is no sibling, and we'll allocate a VPE
table as usual.

Thanks,

M.
-- 
Jazz is not dead, it just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 03/35] irqchip/gic-v3-its: Allow LPI invalidation via the DirectLPI interface

2019-09-26 Thread Marc Zyngier
On 26/09/2019 15:57, Zenghui Yu wrote:
> Hi Marc,
> 
> I get one kernel panic with this patch on D05.

Can you please try this (completely untested for now) on top of the 
whole series? I'll otherwise give it a go next week.

Thanks,

M.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index bc55327406b7..9d24236d1257 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3461,15 +3461,16 @@ static void its_vpe_send_cmd(struct its_vpe *vpe,
 
 static void its_vpe_send_inv(struct irq_data *d)
 {
+   struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
+
if (gic_rdists->has_direct_lpi) {
-   /*
-* Don't mess about. Generating the invalidation is easily
-* done by using the parent irq_data, just like below.
-*/
-   direct_lpi_inv(d->parent_data);
-   } else {
-   struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
+   void __iomem *rdbase;
 
+   /* Target the redistributor this VPE is currently known on */
+   rdbase = per_cpu_ptr(gic_rdists->rdist, vpe->col_idx)->rd_base;
+   gic_write_lpir(d->parent_data->hwirq, rdbase + GICR_INVLPIR);
+   wait_for_syncr(rdbase);
+   } else {
its_vpe_send_cmd(vpe, its_send_inv);
}
 }

-- 
Jazz is not dead, it just smells funny...


Re: [PATCH 10/35] irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation

2019-09-26 Thread Marc Zyngier
On 26/09/2019 16:19, Zenghui Yu wrote:
> Hi Marc,
> 
> Two more questions below.
> 
> On 2019/9/25 22:41, Marc Zyngier wrote:
>> On 25/09/2019 14:04, Zenghui Yu wrote:
>>> Hi Marc,
>>>
>>> Some questions about this patch, mostly to confirm that I would
>>> understand things here correctly.
>>>
>>> On 2019/9/24 2:25, Marc Zyngier wrote:
 GICv4.1 defines a new VPE table that is potentially shared between
 both the ITSs and the redistributors, following complicated affinity
 rules.

 To make things more confusing, the programming of this table at
 the redistributor level is reusing the GICv4.0 GICR_VPROPBASER register
 for something completely different.

 The code flow is somewhat complexified by the need to respect the
 affinities required by the HW, meaning that tables can either be
 inherited from a previously discovered ITS or redistributor.

 Signed-off-by: Marc Zyngier 
 ---
>>>
>>> [...]
>>>
 @@ -1962,6 +1965,65 @@ static bool its_parse_indirect_baser(struct 
 its_node *its,
return indirect;
}

 +static u32 compute_common_aff(u64 val)
 +{
 +  u32 aff, clpiaff;
 +
 +  aff = FIELD_GET(GICR_TYPER_AFFINITY, val);
 +  clpiaff = FIELD_GET(GICR_TYPER_COMMON_LPI_AFF, val);
 +
 +  return aff & ~(GENMASK(31, 0) >> (clpiaff * 8));
 +}
 +
 +static u32 compute_its_aff(struct its_node *its)
 +{
 +  u64 val;
 +  u32 svpet;
 +
 +  /*
 +   * Reencode the ITS SVPET and MPIDR as a GICR_TYPER, and compute
 +   * the resulting affinity. We then use that to see if this match
 +   * our own affinity.
 +   */
 +  svpet = FIELD_GET(GITS_TYPER_SVPET, its->typer);
> 
> The spec says, ITS does not share vPE table with Redistributors when
> SVPET==0.  It seems that we miss this rule and simply regard SVPET as
> GICR_TYPER_COMMON_LPI_AFF here.  Am I wrong?

Correct. I missed the case where the ITS doesn't share anything. That's
pretty unlikely though (you loose all the benefit of v4.1, and I don't
really see how you'd make it work reliably).

> 
 +  val  = FIELD_PREP(GICR_TYPER_COMMON_LPI_AFF, svpet);
 +  val |= FIELD_PREP(GICR_TYPER_AFFINITY, its->mpidr);
 +  return compute_common_aff(val);
 +}
 +
 +static struct its_node *find_sibbling_its(struct its_node *cur_its)
 +{
 +  struct its_node *its;
 +  u32 aff;
 +
 +  if (!FIELD_GET(GITS_TYPER_SVPET, cur_its->typer))
 +  return NULL;
 +
 +  aff = compute_its_aff(cur_its);
 +
 +  list_for_each_entry(its, &its_nodes, entry) {
 +  u64 baser;
 +
 +  if (!is_v4_1(its) || its == cur_its)
 +  continue;
 +
 +  if (!FIELD_GET(GITS_TYPER_SVPET, its->typer))
 +  continue;
 +
 +  if (aff != compute_its_aff(its))
 +  continue;
 +
 +  /* GICv4.1 guarantees that the vPE table is GITS_BASER2 */
 +  baser = its->tables[2].val;
 +  if (!(baser & GITS_BASER_VALID))
 +  continue;
 +
 +  return its;
 +  }
 +
 +  return NULL;
 +}
 +
static void its_free_tables(struct its_node *its)
{
int i;
 @@ -2004,6 +2066,15 @@ static int its_alloc_tables(struct its_node *its)
break;

case GITS_BASER_TYPE_VCPU:
 +  if (is_v4_1(its)) {
 +  struct its_node *sibbling;
 +
 +  if ((sibbling = find_sibbling_its(its))) {
 +  its->tables[2] = sibbling->tables[2];
>>>
>>> This means thst the vPE table is shared between ITSs which are under the
>>> same affinity group?
>>
>> That's indeed what this is trying to do...
>>
>>> Don't we need to set GITS_BASER (by its_setup_baser()) here to tell this
>>> ITS where the vPE table lacates?
>>
>> Absolutely. This is a bug. I didn't spot it as my model only has a
>> single ITS.
>>
>>>
 +  continue;
 +  }
 +  }
 +
indirect = its_parse_indirect_baser(its, baser,
psz, &order,

 ITS_MAX_VPEID_BITS);
 @@ -2025,6 +2096,212 @@ static int its_alloc_tables(struct its_node *its)
return 0;
}

 +static u64 inherit_vpe_l1_table_from_its(void)
 +{
 +  struct its_node *its;
 +  u64 val;
 +  u32 aff;
 +
 +  val = gic_read_typer(gic_data_rdist_rd_base() + GICR_TYPER);
 +  aff = compute_common_aff(val);
 +
 +  list_for_each_entry(its, &its_nodes, entry) {
 +  u64 baser;
 +
 +  

Re: [PATCH 03/35] irqchip/gic-v3-its: Allow LPI invalidation via the DirectLPI interface

2019-09-26 Thread Marc Zyngier
Hi Zenghui,

On 26/09/2019 15:57, Zenghui Yu wrote:
> Hi Marc,
> 
> I get one kernel panic with this patch on D05.

Ah, surprise! I haven't had time to test this on a D05 yet, such in a
hurry to push the damn thing out of the building...

> 
> (I don't have the GICv4.1 board at the moment. I have to wait for the
>   appropriate HW to do more tests.)
> 
> On 2019/9/24 2:25, Marc Zyngier wrote:
>> We currently don't make much use of the DirectLPI feature, and it would
>> be beneficial to do this more, if only because it becomes a mandatory
>> feature for GICv4.1.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>   drivers/irqchip/irq-gic-v3-its.c | 51 +++-
>>   1 file changed, 37 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
>> b/drivers/irqchip/irq-gic-v3-its.c
>> index 58cb233cf138..c94eb287393b 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -175,6 +175,12 @@ static DEFINE_IDA(its_vpeid_ida);
>>   #define gic_data_rdist_rd_base()   (gic_data_rdist()->rd_base)
>>   #define gic_data_rdist_vlpi_base() (gic_data_rdist_rd_base() + SZ_128K)
>>   
>> +static inline u32 its_get_event_id(struct irq_data *d)
>> +{
>> +struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>> +return d->hwirq - its_dev->event_map.lpi_base;
>> +}
>> +
>>   static struct its_collection *dev_event_to_col(struct its_device *its_dev,
>> u32 event)
>>   {
>> @@ -183,6 +189,13 @@ static struct its_collection *dev_event_to_col(struct 
>> its_device *its_dev,
>>  return its->collections + its_dev->event_map.col_map[event];
>>   }
>>   
>> +static struct its_collection *irq_to_col(struct irq_data *d)
>> +{
>> +struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>> +
>> +return dev_event_to_col(its_dev, its_get_event_id(d));
>> +}
>> +
> 
> irq_to_col uses device's event_map and col_map to get the target
> collection, yes it works well with device's LPI.
> But direct_lpi_inv also pass doorbells to it...
> 
> We don't allocate doorbells for any devices, instead for each vPE.

Hmm. Yes, you're right. It looks like I've been carried away on this
one. I'll have a look.

> 
> And see below,
> 
>>   static struct its_collection *valid_col(struct its_collection *col)
>>   {
>>  if (WARN_ON_ONCE(col->target_address & GENMASK_ULL(15, 0)))
>> @@ -1031,12 +1044,6 @@ static void its_send_vinvall(struct its_node *its, 
>> struct its_vpe *vpe)
>>* irqchip functions - assumes MSI, mostly.
>>*/
>>   
>> -static inline u32 its_get_event_id(struct irq_data *d)
>> -{
>> -struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>> -return d->hwirq - its_dev->event_map.lpi_base;
>> -}
>> -
>>   static void lpi_write_config(struct irq_data *d, u8 clr, u8 set)
>>   {
>>  irq_hw_number_t hwirq;
>> @@ -1081,12 +1088,28 @@ static void wait_for_syncr(void __iomem *rdbase)
>>  cpu_relax();
>>   }
>>   
>> +static void direct_lpi_inv(struct irq_data *d)
>> +{
>> +struct its_collection *col;
>> +void __iomem *rdbase;
>> +
>> +/* Target the redistributor this LPI is currently routed to */
>> +col = irq_to_col(d);
>> +rdbase = per_cpu_ptr(gic_rdists->rdist, col->col_id)->rd_base;
>> +gic_write_lpir(d->hwirq, rdbase + GICR_INVLPIR);
>> +
>> +wait_for_syncr(rdbase);
>> +}
>> +
>>   static void lpi_update_config(struct irq_data *d, u8 clr, u8 set)
>>   {
>>  struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>>   
>>  lpi_write_config(d, clr, set);
>> -its_send_inv(its_dev, its_get_event_id(d));
>> +if (gic_rdists->has_direct_lpi && !irqd_is_forwarded_to_vcpu(d))
>> +direct_lpi_inv(d);
>> +else
>> +its_send_inv(its_dev, its_get_event_id(d));
>>   }
>>   
>>   static void its_vlpi_set_doorbell(struct irq_data *d, bool enable)
>> @@ -2912,15 +2935,15 @@ static void its_vpe_send_cmd(struct its_vpe *vpe,
>>   
>>   static void its_vpe_send_inv(struct irq_data *d)
>>   {
>> -struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
>> -
>>  if (gic_rdists->has_direct_lpi) {
>> -void __iomem *rdbase;
>> -
>> -rdbase = per_cpu_ptr(gic_rdists->rdist, vpe->col_idx)->rd_base;
>> -gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_INVLPIR);
>> -wait_for_syncr(rdbase);
>> +/*
>> + * Don't mess about. Generating the invalidation is easily
>> + * done by using the parent irq_data, just like below.
>> + */
>> +direct_lpi_inv(d->parent_data);
> 
> "GICv4-vpe"'s parent is "GICv3", not "ITS".  What do we expect with
> irq_data_get_irq_chip_data(parent's irq_data)?

Yup, terrible mix up. d->parent_data comes from the fact that we want to
invalidate the LPI and not d->hwirq (which is the VPEID). But doing so,
we also confuse direct_lpi_inv(), which expects to find meaningful data
(the its_dev

Re: [PATCH 10/35] irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation

2019-09-26 Thread Zenghui Yu

Hi Marc,

Two more questions below.

On 2019/9/25 22:41, Marc Zyngier wrote:

On 25/09/2019 14:04, Zenghui Yu wrote:

Hi Marc,

Some questions about this patch, mostly to confirm that I would
understand things here correctly.

On 2019/9/24 2:25, Marc Zyngier wrote:

GICv4.1 defines a new VPE table that is potentially shared between
both the ITSs and the redistributors, following complicated affinity
rules.

To make things more confusing, the programming of this table at
the redistributor level is reusing the GICv4.0 GICR_VPROPBASER register
for something completely different.

The code flow is somewhat complexified by the need to respect the
affinities required by the HW, meaning that tables can either be
inherited from a previously discovered ITS or redistributor.

Signed-off-by: Marc Zyngier 
---


[...]


@@ -1962,6 +1965,65 @@ static bool its_parse_indirect_baser(struct its_node 
*its,
return indirect;
   }
   
+static u32 compute_common_aff(u64 val)

+{
+   u32 aff, clpiaff;
+
+   aff = FIELD_GET(GICR_TYPER_AFFINITY, val);
+   clpiaff = FIELD_GET(GICR_TYPER_COMMON_LPI_AFF, val);
+
+   return aff & ~(GENMASK(31, 0) >> (clpiaff * 8));
+}
+
+static u32 compute_its_aff(struct its_node *its)
+{
+   u64 val;
+   u32 svpet;
+
+   /*
+* Reencode the ITS SVPET and MPIDR as a GICR_TYPER, and compute
+* the resulting affinity. We then use that to see if this match
+* our own affinity.
+*/
+   svpet = FIELD_GET(GITS_TYPER_SVPET, its->typer);


The spec says, ITS does not share vPE table with Redistributors when
SVPET==0.  It seems that we miss this rule and simply regard SVPET as
GICR_TYPER_COMMON_LPI_AFF here.  Am I wrong?


+   val  = FIELD_PREP(GICR_TYPER_COMMON_LPI_AFF, svpet);
+   val |= FIELD_PREP(GICR_TYPER_AFFINITY, its->mpidr);
+   return compute_common_aff(val);
+}
+
+static struct its_node *find_sibbling_its(struct its_node *cur_its)
+{
+   struct its_node *its;
+   u32 aff;
+
+   if (!FIELD_GET(GITS_TYPER_SVPET, cur_its->typer))
+   return NULL;
+
+   aff = compute_its_aff(cur_its);
+
+   list_for_each_entry(its, &its_nodes, entry) {
+   u64 baser;
+
+   if (!is_v4_1(its) || its == cur_its)
+   continue;
+
+   if (!FIELD_GET(GITS_TYPER_SVPET, its->typer))
+   continue;
+
+   if (aff != compute_its_aff(its))
+   continue;
+
+   /* GICv4.1 guarantees that the vPE table is GITS_BASER2 */
+   baser = its->tables[2].val;
+   if (!(baser & GITS_BASER_VALID))
+   continue;
+
+   return its;
+   }
+
+   return NULL;
+}
+
   static void its_free_tables(struct its_node *its)
   {
int i;
@@ -2004,6 +2066,15 @@ static int its_alloc_tables(struct its_node *its)
break;
   
   		case GITS_BASER_TYPE_VCPU:

+   if (is_v4_1(its)) {
+   struct its_node *sibbling;
+
+   if ((sibbling = find_sibbling_its(its))) {
+   its->tables[2] = sibbling->tables[2];


This means thst the vPE table is shared between ITSs which are under the
same affinity group?


That's indeed what this is trying to do...


Don't we need to set GITS_BASER (by its_setup_baser()) here to tell this
ITS where the vPE table lacates?


Absolutely. This is a bug. I didn't spot it as my model only has a
single ITS.




+   continue;
+   }
+   }
+
indirect = its_parse_indirect_baser(its, baser,
psz, &order,
ITS_MAX_VPEID_BITS);
@@ -2025,6 +2096,212 @@ static int its_alloc_tables(struct its_node *its)
return 0;
   }
   
+static u64 inherit_vpe_l1_table_from_its(void)

+{
+   struct its_node *its;
+   u64 val;
+   u32 aff;
+
+   val = gic_read_typer(gic_data_rdist_rd_base() + GICR_TYPER);
+   aff = compute_common_aff(val);
+
+   list_for_each_entry(its, &its_nodes, entry) {
+   u64 baser;
+
+   if (!is_v4_1(its))
+   continue;
+
+   if (!FIELD_GET(GITS_TYPER_SVPET, its->typer))
+   continue;
+
+   if (aff != compute_its_aff(its))
+   continue;
+
+   /* GICv4.1 guarantees that the vPE table is GITS_BASER2 */
+   baser = its->tables[2].val;
+   if (!(baser & GITS_BASER_VALID))
+   continue;
+
+   /* We have a winner! */
+   val  = GICR_VPROPBASER_4_1_VALID;
+   if (baser & GITS_BASER_INDIRECT)
+   val |= GICR_VPROPBASER_4_1_INDIRECT;
+   val |=

Re: [PATCH 03/35] irqchip/gic-v3-its: Allow LPI invalidation via the DirectLPI interface

2019-09-26 Thread Zenghui Yu

Hi Marc,

I get one kernel panic with this patch on D05.

(I don't have the GICv4.1 board at the moment. I have to wait for the
 appropriate HW to do more tests.)

On 2019/9/24 2:25, Marc Zyngier wrote:

We currently don't make much use of the DirectLPI feature, and it would
be beneficial to do this more, if only because it becomes a mandatory
feature for GICv4.1.

Signed-off-by: Marc Zyngier 
---
  drivers/irqchip/irq-gic-v3-its.c | 51 +++-
  1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 58cb233cf138..c94eb287393b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -175,6 +175,12 @@ static DEFINE_IDA(its_vpeid_ida);
  #define gic_data_rdist_rd_base()  (gic_data_rdist()->rd_base)
  #define gic_data_rdist_vlpi_base()(gic_data_rdist_rd_base() + SZ_128K)
  
+static inline u32 its_get_event_id(struct irq_data *d)

+{
+   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
+   return d->hwirq - its_dev->event_map.lpi_base;
+}
+
  static struct its_collection *dev_event_to_col(struct its_device *its_dev,
   u32 event)
  {
@@ -183,6 +189,13 @@ static struct its_collection *dev_event_to_col(struct 
its_device *its_dev,
return its->collections + its_dev->event_map.col_map[event];
  }
  
+static struct its_collection *irq_to_col(struct irq_data *d)

+{
+   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
+
+   return dev_event_to_col(its_dev, its_get_event_id(d));
+}
+


irq_to_col uses device's event_map and col_map to get the target
collection, yes it works well with device's LPI.
But direct_lpi_inv also pass doorbells to it...

We don't allocate doorbells for any devices, instead for each vPE.

And see below,


  static struct its_collection *valid_col(struct its_collection *col)
  {
if (WARN_ON_ONCE(col->target_address & GENMASK_ULL(15, 0)))
@@ -1031,12 +1044,6 @@ static void its_send_vinvall(struct its_node *its, 
struct its_vpe *vpe)
   * irqchip functions - assumes MSI, mostly.
   */
  
-static inline u32 its_get_event_id(struct irq_data *d)

-{
-   struct its_device *its_dev = irq_data_get_irq_chip_data(d);
-   return d->hwirq - its_dev->event_map.lpi_base;
-}
-
  static void lpi_write_config(struct irq_data *d, u8 clr, u8 set)
  {
irq_hw_number_t hwirq;
@@ -1081,12 +1088,28 @@ static void wait_for_syncr(void __iomem *rdbase)
cpu_relax();
  }
  
+static void direct_lpi_inv(struct irq_data *d)

+{
+   struct its_collection *col;
+   void __iomem *rdbase;
+
+   /* Target the redistributor this LPI is currently routed to */
+   col = irq_to_col(d);
+   rdbase = per_cpu_ptr(gic_rdists->rdist, col->col_id)->rd_base;
+   gic_write_lpir(d->hwirq, rdbase + GICR_INVLPIR);
+
+   wait_for_syncr(rdbase);
+}
+
  static void lpi_update_config(struct irq_data *d, u8 clr, u8 set)
  {
struct its_device *its_dev = irq_data_get_irq_chip_data(d);
  
  	lpi_write_config(d, clr, set);

-   its_send_inv(its_dev, its_get_event_id(d));
+   if (gic_rdists->has_direct_lpi && !irqd_is_forwarded_to_vcpu(d))
+   direct_lpi_inv(d);
+   else
+   its_send_inv(its_dev, its_get_event_id(d));
  }
  
  static void its_vlpi_set_doorbell(struct irq_data *d, bool enable)

@@ -2912,15 +2935,15 @@ static void its_vpe_send_cmd(struct its_vpe *vpe,
  
  static void its_vpe_send_inv(struct irq_data *d)

  {
-   struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-
if (gic_rdists->has_direct_lpi) {
-   void __iomem *rdbase;
-
-   rdbase = per_cpu_ptr(gic_rdists->rdist, vpe->col_idx)->rd_base;
-   gic_write_lpir(vpe->vpe_db_lpi, rdbase + GICR_INVLPIR);
-   wait_for_syncr(rdbase);
+   /*
+* Don't mess about. Generating the invalidation is easily
+* done by using the parent irq_data, just like below.
+*/
+   direct_lpi_inv(d->parent_data);


"GICv4-vpe"'s parent is "GICv3", not "ITS".  What do we expect with
irq_data_get_irq_chip_data(parent's irq_data)?

I noticed it when running this series on D05 (with GICv4.0 and DirectLPI
support), panic call trace attached below.
I think we really need a fix here.


Thanks,
zenghui


} else {
+   struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
+
its_vpe_send_cmd(vpe, its_send_inv);
}
  }



---8<

[ 2046.193227] Unable to handle kernel paging request at virtual address 
002045fa4d92

[ 2046.201350] Mem abort info:
[ 2046.204132]   ESR = 0x9605
[ 2046.207174]   Exception class = DABT (current EL), IL = 32 bits
[ 2046.213081]   SET = 0, FnV = 0
[ 2046.216123]   EA = 0, S1PTW = 0
[ 2046.219251] Data abort info:
[ 2046.222119]   ISV = 0, ISS = 0x0005
[ 2046.225942]   CM = 0

Re: [PATCH 2/2] KVM: arm/arm64: Allow user injection of external data aborts

2019-09-26 Thread Marc Zyngier
On 09/09/2019 13:13, Christoffer Dall wrote:
> In some scenarios, such as buggy guest or incorrect configuration of the
> VMM and firmware description data, userspace will detect a memory access
> to a portion of the IPA, which is not mapped to any MMIO region.
> 
> For this purpose, the appropriate action is to inject an external abort
> to the guest.  The kernel already has functionality to inject an
> external abort, but we need to wire up a signal from user space that
> lets user space tell the kernel to do this.
> 
> It turns out, we already have the set event functionality which we can
> perfectly reuse for this.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  Documentation/virt/kvm/api.txt| 15 ++-
>  arch/arm/include/uapi/asm/kvm.h   |  3 ++-
>  arch/arm/kvm/guest.c  |  3 +++
>  arch/arm64/include/uapi/asm/kvm.h |  3 ++-
>  arch/arm64/kvm/guest.c|  3 +++
>  arch/arm64/kvm/inject_fault.c |  4 ++--
>  include/uapi/linux/kvm.h  |  1 +
>  virt/kvm/arm/arm.c|  1 +
>  8 files changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> index 02501333f746..edd6cdc470ca 100644
> --- a/Documentation/virt/kvm/api.txt
> +++ b/Documentation/virt/kvm/api.txt
> @@ -955,6 +955,8 @@ The following bits are defined in the flags field:
>  
>  ARM/ARM64:
>  
> +User space may need to inject several types of events to the guest.
> +
>  If the guest accesses a device that is being emulated by the host kernel in
>  such a way that a real device would generate a physical SError, KVM may make
>  a virtual SError pending for that VCPU. This system error interrupt remains
> @@ -989,12 +991,23 @@ Specifying exception.has_esr on a system that does not 
> support it will return
>  -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
>  will return -EINVAL.
>  
> +If the guest performed an access to I/O memory which could not be handled by
> +user space, for example because of missing instruction syndrome decode
> +information or because there is no device mapped at the accessed IPA, then
> +user space can ask the kernel to inject an external abort using the address
> +from the exiting fault on the VCPU. It is a programming error to set
> +ext_dabt_pending at the same time as any of the serror fields, or to set
> +ext_dabt_pending on an exit which was not either KVM_EXIT_MMIO or

... on *re-entry from* an exit?

> +KVM_EXIT_ARM_NISV. This feature is only available if the system supports
> +KVM_CAP_ARM_INJECT_EXT_DABT;

s/;/./

Can we add some wording to the fact that this is only a helper for the
most common case? Most of the ARM exceptions can otherwise be
constructed/injected using the SET_ONE_REG API.

> +
>  struct kvm_vcpu_events {
>   struct {
>   __u8 serror_pending;
>   __u8 serror_has_esr;
> + __u8 ext_dabt_pending;
>   /* Align it to 8 bytes */
> - __u8 pad[6];
> + __u8 pad[5];
>   __u64 serror_esr;
>   } exception;
>   __u32 reserved[12];
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index a4217c1a5d01..d2449a5bf8d5 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -131,8 +131,9 @@ struct kvm_vcpu_events {
>   struct {
>   __u8 serror_pending;
>   __u8 serror_has_esr;
> + __u8 ext_dabt_pending;
>   /* Align it to 8 bytes */
> - __u8 pad[6];
> + __u8 pad[5];
>   __u64 serror_esr;
>   } exception;
>   __u32 reserved[12];
> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> index 684cf64b4033..4154c5589501 100644
> --- a/arch/arm/kvm/guest.c
> +++ b/arch/arm/kvm/guest.c
> @@ -263,11 +263,14 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
>  {
>   bool serror_pending = events->exception.serror_pending;
>   bool has_esr = events->exception.serror_has_esr;
> + bool has_ext_dabt_pending = events->exception.ext_dabt_pending;
>  
>   if (serror_pending && has_esr)
>   return -EINVAL;
>   else if (serror_pending)
>   kvm_inject_vabt(vcpu);
> + else if (has_ext_dabt_pending)
> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>  
>   return 0;
>  }
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 9a507716ae2f..7729efdb1c0c 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -164,8 +164,9 @@ struct kvm_vcpu_events {
>   struct {
>   __u8 serror_pending;
>   __u8 serror_has_esr;
> + __u8 ext_dabt_pending;
>   /* Align it to 8 bytes */
> - __u8 pad[6];
> + __u8 pad[5];
>   __u64 serror_esr;
>   } exception;
>   __u32 reserved[12];
> diff --git a/arch/

Re: [PATCH 1/2] KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace

2019-09-26 Thread Marc Zyngier
On 09/09/2019 13:13, Christoffer Dall wrote:
> For a long time, if a guest accessed memory outside of a memslot using
> any of the load/store instructions in the architecture which doesn't
> supply decoding information in the ESR_EL2 (the ISV bit is not set), the
> kernel would print the following message and terminate the VM as a
> result of returning -ENOSYS to userspace:
> 
>   load/store instruction decoding not implemented
> 
> The reason behind this message is that KVM assumes that all accesses
> outside a memslot is an MMIO access which should be handled by
> userspace, and we originally expected to eventually implement some sort
> of decoding of load/store instructions where the ISV bit was not set.
> 
> However, it turns out that many of the instructions which don't provide
> decoding information on abort are not safe to use for MMIO accesses, and
> the remaining few that would potentially make sense to use on MMIO
> accesses, such as those with register writeback, are not used in
> practice.  It also turns out that fetching an instruction from guest
> memory can be a pretty horrible affair, involving stopping all CPUs on
> SMP systems, handling multiple corner cases of address translation in
> software, and more.  It doesn't appear likely that we'll ever implement
> this in the kernel.
> 
> What is much more common is that a user has misconfigured his/her guest
> and is actually not accessing an MMIO region, but just hitting some
> random hole in the IPA space.  In this scenario, the error message above
> is almost misleading and has led to a great deal of confusion over the
> years.
> 
> It is, nevertheless, ABI to userspace, and we therefore need to
> introduce a new capability that userspace explicitly enables to change
> behavior.
> 
> This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
> which does exactly that, and introduces a new exit reason to report the
> event to userspace.  User space can then emulate an exception to the
> guest, restart the guest, suspend the guest, or take any other
> appropriate action as per the policy of the running system.
> 
> Reported-by: Heinrich Schuchardt 
> Signed-off-by: Christoffer Dall 
> ---
>  Documentation/virt/kvm/api.txt   | 29 
>  arch/arm/include/asm/kvm_arm.h   |  2 ++
>  arch/arm/include/asm/kvm_emulate.h   |  5 +
>  arch/arm/include/asm/kvm_host.h  |  8 
>  arch/arm64/include/asm/kvm_emulate.h |  5 +
>  arch/arm64/include/asm/kvm_host.h|  8 
>  include/uapi/linux/kvm.h |  7 +++
>  virt/kvm/arm/arm.c   | 21 
>  virt/kvm/arm/mmio.c  | 11 +--
>  9 files changed, 94 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> index 2d067767b617..02501333f746 100644
> --- a/Documentation/virt/kvm/api.txt
> +++ b/Documentation/virt/kvm/api.txt
> @@ -4453,6 +4453,35 @@ Hyper-V SynIC state change. Notification is used to 
> remap SynIC
>  event/message pages and to enable/disable SynIC messages/events processing
>  in userspace.
>  
> + /* KVM_EXIT_ARM_NISV */
> + struct {
> + __u64 esr_iss;
> + __u64 fault_ipa;
> + } arm_nisv;
> +
> +Used on arm and arm64 systems. If a guest accesses memory not in a memslot,
> +KVM will typically return to userspace and ask it to do MMIO emulation on its
> +behalf. However, for certain classes of instructions, no instruction decode
> +(direction, length of memory access) is provided, and fetching and decoding
> +the instruction from the VM is overly complicated to live in the kernel.
> +
> +Historically, when this situation occurred, KVM would print a warning and 
> kill
> +the VM. KVM assumed that if the guest accessed non-memslot memory, it was
> +trying to do I/O, which just couldn't be emulated, and the warning message 
> was
> +phrased accordingly. However, what happened more often was that a guest bug
> +caused access outside the guest memory areas which should lead to a more
> +mearningful warning message and an external abort in the guest, if the access

meaningful?

> +did not fall within an I/O window.
> +
> +Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable
> +this capability at VM creation. Once this is done, these types of errors will
> +instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from
> +the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA
> +in the fault_ipa field. Userspace can either fix up the access if it's
> +actually an I/O access by decoding the instruction from guest memory (if it's
> +very brave) and continue executing the guest, or it can decide to suspend,
> +dump, or restart the guest.
> +
>   /* Fix the size of the union. */
>   char padding[256];
>   };
> diff --git a/arch/arm/include/asm/kvm_arm.h

Re: [RFC PATCH v4 2/5] ptp: Reorganize ptp_kvm modules to make it arch-independent.

2019-09-26 Thread Suzuki K Poulose

Hi Jianyong,

On 26/09/2019 12:42, Jianyong Wu wrote:

Currently, ptp_kvm modules implementation is only for x86 which includs
large part of arch-specific code.  This patch move all of those code
into new arch related file in the same directory.

Signed-off-by: Jianyong Wu 
---
  drivers/ptp/Makefile |  1 +
  drivers/ptp/{ptp_kvm.c => kvm_ptp.c} | 77 ++--
  drivers/ptp/ptp_kvm_x86.c| 87 
  include/asm-generic/ptp_kvm.h| 12 
  4 files changed, 118 insertions(+), 59 deletions(-)
  rename drivers/ptp/{ptp_kvm.c => kvm_ptp.c} (63%)


minor nit: Could we not skip renaming the file ? Given
you are following the ptp_kvm_* for the arch specific
files and the header files, wouldn't it be good to
keep ptp_kvm.c ?

Rest looks fine.

Cheers
Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH v4 5/5] kvm: arm64: Add capability check extension for ptp_kvm

2019-09-26 Thread Jianyong Wu
Let userspace check if there is kvm ptp service in host.
before VMs migrate to a another host, VMM may check if this
cap is available to determine the migration behaviour.

Signed-off-by: Jianyong Wu 
Suggested-by: Marc Zyngier 
---
 include/uapi/linux/kvm.h | 1 +
 virt/kvm/arm/arm.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2fe12b40d503..a0bff6002bd9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -993,6 +993,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_SVE 170
 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
+#define KVM_CAP_ARM_KVM_PTP 173
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index bd5c55916d0d..8085160b 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -201,6 +201,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MP_STATE:
case KVM_CAP_IMMEDIATE_EXIT:
case KVM_CAP_VCPU_EVENTS:
+   case KVM_CAP_ARM_KVM_PTP:
r = 1;
break;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
-- 
2.17.1



[RFC PATCH v4 4/5] ptp: arm64: Enable ptp_kvm for arm64

2019-09-26 Thread Jianyong Wu
Currently in arm64 virtualization environment, there is no mechanism to
keep time sync between guest and host. Time in guest will drift compared
with host after boot up as they may both use third party time sources
to correct their time respectively. The time deviation will be in order
of milliseconds but some scenarios ask for higher time precision, like
in cloud envirenment, we want all the VMs running in the host aquire the
same level accuracy from host clock.

Use of kvm ptp clock, which choose the host clock source clock as a
reference clock to sync time clock between guest and host has been adopted
by x86 which makes the time sync order from milliseconds to nanoseconds.

This patch enable kvm ptp on arm64 and we get the similar clock drift as
found with x86 with kvm ptp.

Test result comparison between with kvm ptp and without it in arm64 are
as follows. This test derived from the result of command 'chronyc
sources'. we should take more cure of the last sample column which shows
the offset between the local clock and the source at the last measurement.

no kvm ptp in guest:
MS Name/IP address   Stratum Poll Reach LastRx Last sample

^* dns1.synet.edu.cn  2   6   37713  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37721  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37729  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37737  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37745  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37753  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37761  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn  2   6   377 4   -130us[ +796us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37712   -130us[ +796us] +/-   21ms
^* dns1.synet.edu.cn  2   6   37720   -130us[ +796us] +/-   21ms

in host:
MS Name/IP address   Stratum Poll Reach LastRx Last sample

^* 120.25.115.20  2   7   37772   -470us[ -603us] +/-   18ms
^* 120.25.115.20  2   7   37792   -470us[ -603us] +/-   18ms
^* 120.25.115.20  2   7   377   112   -470us[ -603us] +/-   18ms
^* 120.25.115.20  2   7   377 2   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   37722   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   37743   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   37763   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   37783   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   377   103   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20  2   7   377   123   +872ns[-6808ns] +/-   17ms

The dns1.synet.edu.cn is the network reference clock for guest and
120.25.115.20 is the network reference clock for host. we can't get the
clock error between guest and host directly, but a roughly estimated value
will be in order of hundreds of us to ms.

with kvm ptp in guest:
chrony has been disabled in host to remove the disturb by network clock.

MS Name/IP address Stratum Poll Reach LastRx Last sample

* PHC00   3   377 8 -7ns[   +1ns] +/-3ns
* PHC00   3   377 8 +1ns[  +16ns] +/-3ns
* PHC00   3   377 6 -4ns[   -0ns] +/-6ns
* PHC00   3   377 6 -8ns[  -12ns] +/-5ns
* PHC00   3   377 5 +2ns[   +4ns] +/-4ns
* PHC00   3   37713 +2ns[   +4ns] +/-4ns
* PHC00   3   37712 -4ns[   -6ns] +/-4ns
* PHC00   3   37711 -8ns[  -11ns] +/-6ns
* PHC00   3   37710-14ns[  -20ns] +/-4ns
* PHC00   3   377 8 +4ns[   +5ns] +/-4ns

The PHC0 is the ptp clock which choose the host clock as its source
clock. So we can be sure to say that the clock error between host and guest
is in order of ns.

Signed-off-by: Jianyong Wu 
---
 drivers/clocksource/arm_arch_timer.c | 28 +++
 drivers/ptp/Kconfig  |  2 +-
 drivers/ptp/kvm_ptp.c|  4 +-
 drivers/ptp/ptp_kvm_arm64.c  | 73 
 include/asm-generic/ptp_kvm.h|  2 +-
 5 files changed, 106 insertions(+), 3 deletions(-)
 create mode 100644 drivers/ptp/ptp_kvm_arm64.c

diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index 07e57a49d1e8..7a99673e4149 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -1633,4 +1633,32 @@ static int __init arch_timer_acpi_init(struct 
acpi_table_header *table)
return arch_timer_common_init();
 }
 TIMER_ACPI_DECLARE(arch_t

[RFC PATCH v4 2/5] ptp: Reorganize ptp_kvm modules to make it arch-independent.

2019-09-26 Thread Jianyong Wu
Currently, ptp_kvm modules implementation is only for x86 which includs
large part of arch-specific code.  This patch move all of those code
into new arch related file in the same directory.

Signed-off-by: Jianyong Wu 
---
 drivers/ptp/Makefile |  1 +
 drivers/ptp/{ptp_kvm.c => kvm_ptp.c} | 77 ++--
 drivers/ptp/ptp_kvm_x86.c| 87 
 include/asm-generic/ptp_kvm.h| 12 
 4 files changed, 118 insertions(+), 59 deletions(-)
 rename drivers/ptp/{ptp_kvm.c => kvm_ptp.c} (63%)
 create mode 100644 drivers/ptp/ptp_kvm_x86.c
 create mode 100644 include/asm-generic/ptp_kvm.h

diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index 677d1d178a3e..8f27ba302e31 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -4,6 +4,7 @@
 #
 
 ptp-y  := ptp_clock.o ptp_chardev.o ptp_sysfs.o
+ptp_kvm-y  := ptp_kvm_$(ARCH).o kvm_ptp.o
 obj-$(CONFIG_PTP_1588_CLOCK)   += ptp.o
 obj-$(CONFIG_PTP_1588_CLOCK_DTE)   += ptp_dte.o
 obj-$(CONFIG_PTP_1588_CLOCK_IXP46X)+= ptp_ixp46x.o
diff --git a/drivers/ptp/ptp_kvm.c b/drivers/ptp/kvm_ptp.c
similarity index 63%
rename from drivers/ptp/ptp_kvm.c
rename to drivers/ptp/kvm_ptp.c
index fc7d0b77e118..d8f215186904 100644
--- a/drivers/ptp/ptp_kvm.c
+++ b/drivers/ptp/kvm_ptp.c
@@ -8,12 +8,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
+#include 
 
 #include 
 
@@ -24,56 +24,29 @@ struct kvm_ptp_clock {
 
 DEFINE_SPINLOCK(kvm_ptp_lock);
 
-static struct pvclock_vsyscall_time_info *hv_clock;
-
-static struct kvm_clock_pairing clock_pair;
-static phys_addr_t clock_pair_gpa;
-
 static int ptp_kvm_get_time_fn(ktime_t *device_time,
   struct system_counterval_t *system_counter,
   void *ctx)
 {
-   unsigned long ret;
+   unsigned long ret, cycle;
struct timespec64 tspec;
-   unsigned version;
-   int cpu;
-   struct pvclock_vcpu_time_info *src;
+   struct clocksource *cs;
 
spin_lock(&kvm_ptp_lock);
 
preempt_disable_notrace();
-   cpu = smp_processor_id();
-   src = &hv_clock[cpu].pvti;
-
-   do {
-   /*
-* We are using a TSC value read in the hosts
-* kvm_hc_clock_pairing handling.
-* So any changes to tsc_to_system_mul
-* and tsc_shift or any other pvclock
-* data invalidate that measurement.
-*/
-   version = pvclock_read_begin(src);
-
-   ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
-clock_pair_gpa,
-KVM_CLOCK_PAIRING_WALLCLOCK);
-   if (ret != 0) {
-   pr_err_ratelimited("clock pairing hypercall ret %lu\n", 
ret);
-   spin_unlock(&kvm_ptp_lock);
-   preempt_enable_notrace();
-   return -EOPNOTSUPP;
-   }
-
-   tspec.tv_sec = clock_pair.sec;
-   tspec.tv_nsec = clock_pair.nsec;
-   ret = __pvclock_read_cycles(src, clock_pair.tsc);
-   } while (pvclock_read_retry(src, version));
+   ret = kvm_arch_ptp_get_clock_fn(&cycle, &tspec, &cs);
+   if (ret != 0) {
+   pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
+   spin_unlock(&kvm_ptp_lock);
+   preempt_enable_notrace();
+   return -EOPNOTSUPP;
+   }
 
preempt_enable_notrace();
 
-   system_counter->cycles = ret;
-   system_counter->cs = &kvm_clock;
+   system_counter->cycles = cycle;
+   system_counter->cs = cs;
 
*device_time = timespec64_to_ktime(tspec);
 
@@ -116,17 +89,13 @@ static int ptp_kvm_gettime(struct ptp_clock_info *ptp, 
struct timespec64 *ts)
 
spin_lock(&kvm_ptp_lock);
 
-   ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
-clock_pair_gpa,
-KVM_CLOCK_PAIRING_WALLCLOCK);
+   ret = kvm_arch_ptp_get_clock(&tspec);
if (ret != 0) {
pr_err_ratelimited("clock offset hypercall ret %lu\n", ret);
spin_unlock(&kvm_ptp_lock);
return -EOPNOTSUPP;
}
 
-   tspec.tv_sec = clock_pair.sec;
-   tspec.tv_nsec = clock_pair.nsec;
spin_unlock(&kvm_ptp_lock);
 
memcpy(ts, &tspec, sizeof(struct timespec64));
@@ -166,21 +135,11 @@ static void __exit ptp_kvm_exit(void)
 
 static int __init ptp_kvm_init(void)
 {
-   long ret;
-
-   if (!kvm_para_available())
-   return -ENODEV;
-
-   clock_pair_gpa = slow_virt_to_phys(&clock_pair);
-   hv_clock = pvclock_get_pvti_cpu0_va();
+   int ret;
 
-   if (!hv_clock)
-   return -ENODEV;
-
-   ret = kvm_hypercall2(KVM_HC_CLOC

[RFC PATCH v4 1/5] psci: Export psci_ops.conduit symbol as modules will use it.

2019-09-26 Thread Jianyong Wu
If arm_smccc_1_1_invoke used in modules, psci_ops.conduit should
be export.

Signed-off-by: Jianyong Wu 
---
 drivers/firmware/psci/psci.c | 6 ++
 include/linux/arm-smccc.h| 2 +-
 include/linux/psci.h | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index f82ccd39a913..35c4eaab1451 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -212,6 +212,12 @@ static unsigned long psci_migrate_info_up_cpu(void)
  0, 0, 0);
 }
 
+enum psci_conduit psci_get_conduit(void)
+{
+   return psci_ops.conduit;
+}
+EXPORT_SYMBOL(psci_get_conduit);
+
 static void set_conduit(enum psci_conduit conduit)
 {
switch (conduit) {
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 552cbd49abe8..a6e4d3e3d10a 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -357,7 +357,7 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned 
long a1,
  * The return value also provides the conduit that was used.
  */
 #define arm_smccc_1_1_invoke(...) ({   \
-   int method = psci_ops.conduit;  \
+   int method = psci_get_conduit();\
switch (method) {   \
case PSCI_CONDUIT_HVC:  \
arm_smccc_1_1_hvc(__VA_ARGS__); \
diff --git a/include/linux/psci.h b/include/linux/psci.h
index a8a15613c157..e5cedc986049 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -42,6 +42,7 @@ struct psci_operations {
enum smccc_version smccc_version;
 };
 
+extern enum psci_conduit psci_get_conduit(void);
 extern struct psci_operations psci_ops;
 
 #if defined(CONFIG_ARM_PSCI_FW)
-- 
2.17.1



[RFC PATCH v4 3/5] psci: Add hvc call service for ptp_kvm.

2019-09-26 Thread Jianyong Wu
This patch is the base of ptp_kvm for arm64.
ptp_kvm modules will call hvc to get this service.
The service offers real time and counter cycle of host for guest.

Signed-off-by: Jianyong Wu 
---
 include/linux/arm-smccc.h | 12 
 virt/kvm/arm/psci.c   | 18 ++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index a6e4d3e3d10a..bc0cdad10f35 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -94,6 +94,7 @@
 
 /* KVM "vendor specific" services */
 #define ARM_SMCCC_KVM_FUNC_FEATURES0
+#define ARM_SMCCC_KVM_PTP  1
 #define ARM_SMCCC_KVM_FUNC_FEATURES_2  127
 #define ARM_SMCCC_KVM_NUM_FUNCS128
 
@@ -103,6 +104,17 @@
   ARM_SMCCC_OWNER_VENDOR_HYP,  \
   ARM_SMCCC_KVM_FUNC_FEATURES)
 
+/*
+ * This ID used for virtual ptp kvm clock and it will pass second value
+ * and nanosecond value of host real time and system counter by vcpu
+ * register to guest.
+ */
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID   \
+   ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+  ARM_SMCCC_SMC_32,\
+  ARM_SMCCC_OWNER_VENDOR_HYP,  \
+  ARM_SMCCC_KVM_PTP)
+
 #ifndef __ASSEMBLY__
 
 #include 
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index 0debf49bf259..3f30fc42a5ca 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -392,6 +392,8 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
u32 func_id = smccc_get_function(vcpu);
u32 val[4] = {};
u32 option;
+   u64 cycles;
+   struct system_time_snapshot systime_snapshot;
 
val[0] = SMCCC_RET_NOT_SUPPORTED;
 
@@ -431,6 +433,22 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
break;
+   /*
+* This will used for virtual ptp kvm clock. three
+* values will be passed back.
+* reg0 stores high 32-bit host ktime;
+* reg1 stores low 32-bit host ktime;
+* reg2 stores high 32-bit difference of host cycles and cntvoff;
+* reg3 stores low 32-bit difference of host cycles and cntvoff.
+*/
+   case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+   ktime_get_snapshot(&systime_snapshot);
+   val[0] = systime_snapshot.real >> 32;
+   val[1] = systime_snapshot.real << 32 >> 32;
+   cycles = systime_snapshot.cycles - vcpu_vtimer(vcpu)->cntvoff;
+   val[2] = cycles >> 32;
+   val[3] = cycles << 32 >> 32;
+   break;
default:
return kvm_psci_call(vcpu);
}
-- 
2.17.1



[RFC PATCH v4 0/6] Enable ptp_kvm for arm64

2019-09-26 Thread Jianyong Wu
kvm ptp targets to provide high precision time sync between guest
and host in virtualization environment. This patch enable kvm ptp
for arm64.
This patch set base on [1][2][3]

change log:
from v3 to v4:
(1) fix clocksource of ptp_kvm to arch_sys_counter.
(2) move kvm_arch_ptp_get_clock_fn into arm_arch_timer.c
(3) subtract cntvoff before return cycles from host.
(4) use ktime_get_snapshot instead of getnstimeofday and
get_current_counterval to return time and counter value.
(5) split ktime and counter into two 32-bit block respectively
to avoid Y2038-safe issue.
(6) set time compensation to device time as half of the delay of hvc 
call.
(7) add ARM_ARCH_TIMER as dependency of ptp_kvm for
arm64.
from v2 to v3:
(1) fix some issues in commit log.
(2) add some receivers in send list.
from v1 to v2:
(1) move arch-specific code from arch/ to driver/ptp/
(2) offer mechanism to inform userspace if ptp_kvm service is
available.
(3) separate ptp_kvm code for arm64 into hypervisor part and
guest part.
(4) add API to expose monotonic clock and counter value.
(5) refine code: remove no necessary part and reconsitution.

[1]https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/
commit/?h=kvm/hvc&id=125ea89e4a21e2fc5235410f966a996a1a7148bf
[2]https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/
commit/?h=kvm/hvc&id=464f5a1741e5959c3e4d2be1966ae0093b4dce06
[3]https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/
commit/?h=kvm/hvc&id=6597490e005d0eeca8ed8c1c1d7b4318ee014681

Jianyong Wu (5):
  psci: Export psci_ops.conduit symbol as modules will use it.
  ptp: Reorganize ptp_kvm modules to make it arch-independent.
  psci: Add hvc call service for ptp_kvm.
  ptp: arm64: Enable ptp_kvm for arm64
  kvm: arm64: Add capability check extension for ptp_kvm

 drivers/clocksource/arm_arch_timer.c | 28 +
 drivers/firmware/psci/psci.c |  6 ++
 drivers/ptp/Kconfig  |  2 +-
 drivers/ptp/Makefile |  1 +
 drivers/ptp/{ptp_kvm.c => kvm_ptp.c} | 79 +++--
 drivers/ptp/ptp_kvm_arm64.c  | 73 +++
 drivers/ptp/ptp_kvm_x86.c| 87 
 include/asm-generic/ptp_kvm.h| 12 
 include/linux/arm-smccc.h| 14 -
 include/linux/psci.h |  1 +
 include/uapi/linux/kvm.h |  1 +
 virt/kvm/arm/arm.c   |  1 +
 virt/kvm/arm/psci.c  | 18 ++
 13 files changed, 262 insertions(+), 61 deletions(-)
 rename drivers/ptp/{ptp_kvm.c => kvm_ptp.c} (63%)
 create mode 100644 drivers/ptp/ptp_kvm_arm64.c
 create mode 100644 drivers/ptp/ptp_kvm_x86.c
 create mode 100644 include/asm-generic/ptp_kvm.h

-- 
2.17.1