subject:"\[Xen\-devel\] \[RFC 0\/5\] xen\/arm\: support big.little SoC"

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-23 Thread Stefano Stabellini

On Fri, 23 Sep 2016, Dario Faggioli wrote:
> On Fri, 2016-09-23 at 11:15 +0100, Julien Grall wrote:
> > On 23/09/16 11:05, Peng Fan wrote:
> > > If cluster is not prefered, cpuclass maybe a choice, but I
> > > personally perfer
> > > "cluster" split for ARM.
> > > 
> > > Thanks,
> > > Peng.
> > > 
> > > [1] https://en.wikipedia.org/wiki/ARM_big.LITTLE
> > 
> > Please try to have a think on all the use case and not only yours.
> > 
> This last line is absolutely true and very important!
> 
> That being said, I am a bit lost.
> 
> So, AFAICT, in order to act properly when the user asks for:
> 
>  vcpuclass = ["1,2:foo", "0,3:bar"]
> 
> we need to decide what "foo" and "bar" are at the xl and libxl level,
> and whether they are the same all the way down to Xen (and if not,
> what's the mapping).

I think "foo" and "bar" need to be "big" and "LITTLE" at the xl level.

Given that Xen is the one with the information about which core is big
and which is LITTLE, I think the hypervisor should provide the mapping
between labels and cpu and cluster indexes.


> We also said it would be nice to support:
> 
>  xl cpupool-split --feature=foobar
> 
> and hence we also need to decide what's foobar, whether it is in the
> same namespace of foo and bar (i.e., it can be foobar==foo, or
> foobar==bar, etc), or it is something else, or both.

I would be consistent and always use foobar=bigLITTLE at the xl level.


> Can someone list what are the various alternative approaches on the
> table?

The info available is:
http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/arm/cpus.txt
plus:
http://marc.info/?l=linux-arm-kernel=147308556729426=2

We have cpu and cluster indexes. We have cpu compatible strings which
tell us whether a cpu is an "a53" or an "a15". We have
an optional property that tells us the cpu "capacity". Higher capacity
means "big", lower capacity means "LITTLE".

Xen could always deal with cpu and cluster indexes, but provide
convenient labels to libxl.

For example:

cpus {
#size-cells = <0>;
#address-cells = <1>;

cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a15";
reg = <0x0>;
capacity-dmips-mhz = <1024>;
};

cpu@1 {
device_type = "cpu";
compatible = "arm,cortex-a15";
reg = <0x1>;
capacity-dmips-mhz = <1024>;
};

cpu@100 {
device_type = "cpu";
compatible = "arm,cortex-a7";
reg = <0x100>;
capacity-dmips-mhz = <512>;
};

cpu@101 {
device_type = "cpu";
compatible = "arm,cortex-a7";
reg = <0x101>;
capacity-dmips-mhz = <512>;
};
};

The reg property encodes cpu number and cluster number and matches the
value in the MPIDR register. That is what Xen could take as parameter.

The mapping between reg and "big" or "LITTLE" and the cpu compatible
name, such as "a7", could be returned by an hypercall such as
xen_arch_domainconfig.___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-23 Thread Stefano Stabellini

On Thu, 22 Sep 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 22/09/2016 18:31, Stefano Stabellini wrote:
> > On Thu, 22 Sep 2016, Julien Grall wrote:
> > > Hello Peng,
> > > 
> > > On 22/09/16 10:27, Peng Fan wrote:
> > > > On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > > > > On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
> > > > > > On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
> > > > > A feature like `xl cpupool-biglittle-split' can still be interesting,
> > > > 
> > > > "cpupool-cluster-split" maybe a better name?
> > > 
> > > You seem to assume that a cluster, from the MPIDR point of view, can only
> > > contain the same set of CPUs. I don't think this is part of the
> > > architecture,
> > > so this may not be true in the future.
> > 
> > Interesting. I also understood that a cluster can only have one kind if
> > cpus. Honestly it would be a little insane for it to be otherwise :-)
> 
> I don't think this is insane (or maybe I am insane :)). Cluster usually
> doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
> may not be present, so if you move a task from one cluster to another you will
> add latency because the new L2 cache has to be refilled.
> 
> The use case of big.LITTLE is big cores are used for short period of burst and
> little core are used for the rest (e.g listening audio, fetching mail...). If
> you want to reduce latency when switch between big and little CPUs, you may
> want to put them within the same cluster.
> 
> Also, as mentioned in another thread, you may have a platform with the same
> micro-architecture (e.g Cortex A-53) but different silicon implementation (e.g
> to have a different frequency, power efficiency). Here the concept of
> big.LITTLE is more blurred.

Different frequency is fine, we have been able to set per core frequency
on x86 cpus for a long time now. If they are cores of the same
micro-architecture, it doesn't matter the cpu frequency, we can deal
with them as usual.

To me big.LITTLE means: it is technically possible, but very difficult
(currently unimplemented), and slower than than usual to move a vcpu
across big and LITTLE pcpus. That's why they need to be dealt with in a
different way.

If we had big.LITTLE cores in the same cluster, sharing L2 caches, with
the same cache line sizes, maybe we could also deal with them as usual
because it wouldn't be much of an issue to migrate a vcpu across big and
LITTLE cores. If/when we come across such an architecture we'll deal
with it.

> That's why I am quite reluctant to name (even if it may be more handy to the
> user) "big" and "little" the different CPU set.

Technically you might be right, but "big.LITTLE" is how the architecture
has been advertized to people, so unfortunately we are stuck with the
name. We have to deal with it in those terms at least at the xl level.
Of course in Xen we are free to do whatever we want.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-23 Thread Julien Grall


Hi Dario,

On 22/09/16 17:31, Dario Faggioli wrote:

On Thu, 2016-09-22 at 12:24 +0100, Julien Grall wrote:

On 22/09/16 09:43, Dario Faggioli wrote:

Local migration basically --from the vcpu perspective-- means
create a
new vcpu, stop the original vcpu, copy the state from original to
new,
destroy the original vcpu and start the new one. My point is that
this
is not something that can be done within nor initiated by the
scheduler, e.g., during a context switch or a vcpu wakeup!


By local migration, I meant from the perspective of the hypervisor.
In
the hypervisor you have to trap feature registers and other
implementation defined registers to show the same value across all
the
physical CPUs.


You mean we trap feature registers during the (normal) execution of a
vcpu, because we want Xen to vet what's returned to the guest itself.
And that migration support, and hence the possibility that the guest
have been migrated to a cpu different than the one where it was
created, is already one of the reasons why this is necessary... right?


That's correct.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-23 Thread Dario Faggioli

On Fri, 2016-09-23 at 18:05 +0800, Peng Fan wrote:
> We still can introduce cpupool-cluster-split or as Juergen suggested,
> use "cpupool-slit feature=xx"  to split the cluster or cpuclasses
> into different cpupools. This is just a feature that better to have,
> I think.
> 
> The reason to include cpupool-cluster-split or else is to split the
> big and little
> cores into different cpupools. And now big and little cores are in
> different cpu
> clusters from the hardware[1] I can see. 
>
Note that this `cpupool-split' thing is meant to be an aid to the user
to quickly put the system in a state that we think it could be a common
or relevant setup.

For instance, cpupools can be used to partition big NUMA system, and we
thought users may be interested in having one pool per NUMA node, so an
helper for doing that quickly (i.e., with just one command) has been
provided.

That does not mean that it's the only use of cpupools, nor that it's
the only --or the only sane-- way to use cpupools on NUMA systems...
it's just a speculation, in an attempt to make life easier for users.

In a similar way, if we think that, for instance, creating a 'big pool'
and a 'LITTLE pool' would be something common, and/or we (Peng?
Stefano?) already have an usecase for this, we can well implement a
`cpupool-split' variant that does that.

*BUT* that does not mean that people must use it, or that they can't do
anything else or different with cpupools on ARM! In fact, on a NUMA
system, one can completely ignore `cpupool-numa-split', and create
whatever pools and assign pcpus to them at will. Or she can actually
use `cpupool-numa-split' as a basis, i.e., issue the command, manually
alter the resulting status, by doing some more movement of pcpus among
the pools the command created.

All this to say that, especially when thinking about this
cpupool-split thing, we "only" need to come up with something that we
think makes sense, either to be used as is or as a basis, not to the
one and only way cpupools and big.LITTLE --or ARM in general-- should
interact.

In fact:
> I think assigning cores from
> different clusters into one cpupool is not a good idea.
> 
I'd be perfectly fine with this, and with cpupool-split on big.LITTLE
to cut pools around clusters boundaries. But I definitely would not
want to forbid the user to manually shuffle things around, including
ending up in a situation where there are pcpus from different
class/cluster/whatever in the same pool... If that is shooting in his
own foot, then so be it!

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-23 Thread Dario Faggioli

On Fri, 2016-09-23 at 11:15 +0100, Julien Grall wrote:
> On 23/09/16 11:05, Peng Fan wrote:
> > If cluster is not prefered, cpuclass maybe a choice, but I
> > personally perfer
> > "cluster" split for ARM.
> > 
> > Thanks,
> > Peng.
> > 
> > [1] https://en.wikipedia.org/wiki/ARM_big.LITTLE
> 
> Please try to have a think on all the use case and not only yours.
> 
This last line is absolutely true and very important!

That being said, I am a bit lost.

So, AFAICT, in order to act properly when the user asks for:

 vcpuclass = ["1,2:foo", "0,3:bar"]

we need to decide what "foo" and "bar" are at the xl and libxl level,
and whether they are the same all the way down to Xen (and if not,
what's the mapping).

We also said it would be nice to support:

 xl cpupool-split --feature=foobar

and hence we also need to decide what's foobar, whether it is in the
same namespace of foo and bar (i.e., it can be foobar==foo, or
foobar==bar, etc), or it is something else, or both.

Can someone list what are the various alternative approaches on the
table?

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-23 Thread Julien Grall




On 23/09/16 11:05, Peng Fan wrote:

On Fri, Sep 23, 2016 at 10:24:37AM +0100, Julien Grall wrote:

Hello Peng,

On 23/09/16 03:14, Peng Fan wrote:

On Thu, Sep 22, 2016 at 07:54:02PM +0100, Julien Grall wrote:

Hi Stefano,

On 22/09/2016 18:31, Stefano Stabellini wrote:

On Thu, 22 Sep 2016, Julien Grall wrote:

Hello Peng,

On 22/09/16 10:27, Peng Fan wrote:

On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:

On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:

On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:

A feature like `xl cpupool-biglittle-split' can still be interesting,


"cpupool-cluster-split" maybe a better name?


You seem to assume that a cluster, from the MPIDR point of view, can only
contain the same set of CPUs. I don't think this is part of the architecture,
so this may not be true in the future.


Interesting. I also understood that a cluster can only have one kind if
cpus. Honestly it would be a little insane for it to be otherwise :-)


I don't think this is insane (or maybe I am insane :)). Cluster usually
doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
may not be present, so if you move a task from one cluster to another you
will add latency because the new L2 cache has to be refilled.

The use case of big.LITTLE is big cores are used for short period of burst
and little core are used for the rest (e.g listening audio, fetching
mail...). If you want to reduce latency when switch between big and little
CPUs, you may want to put them within the same cluster.

Also, as mentioned in another thread, you may have a platform with the same
micro-architecture (e.g Cortex A-53) but different silicon implementation
(e.g to have a different frequency, power efficiency). Here the concept of
big.LITTLE is more blurred.


That is possible that in one cluster, different pcpus runs with different cpu
frequency. This depends on hardware design. Some may require all the cores in
one cluster runs at the same frequency, some may have more complicated design 
that
supports different cores runs at different frequency.

This is just like you have a smp system, but different cores can run at
different cpu frequency. I think this is not what bit.LITTLE means.


big.LITTLE is a generic term to have "power hungry and powerful core
powerful" (big) with slower and battery-saving cores (LITTLE).

It is not mandatory to have different micro-architectures between big and
LITTLE cores.

In any case, the interface should not be big.LITTLE specific. We don't want
to tie us to one specific architecture.


If all the cores have the same micro-architecture, but for some reason,
they are put in different clusters or cpus in one cluster support running
at different cpu freq.

We still can introduce cpupool-cluster-split or as Juergen suggested,
use "cpupool-slit feature=xx"  to split the cluster or cpuclasses
into different cpupools. This is just a feature that better to have, I think.

The reason to include cpupool-cluster-split or else is to split the big and 
little
cores into different cpupools. And now big and little cores are in different cpu
clusters from the hardware[1] I can see. I think assigning cores from
different clusters into one cpupool is not a good idea.

I have no idea about future hardware.


If cluster is not prefered, cpuclass maybe a choice, but I personally perfer
"cluster" split for ARM.

Thanks,
Peng.

[1] https://en.wikipedia.org/wiki/ARM_big.LITTLE


Let me be clear here, the ARM ARM is authoritative not Wikipedia. The 
latter will only reflect what is done today, not what could be done.


If the ARM ARM does not forbid it, nothing prevent a semiconductor to do 
it. I gave an example on the mail you answered.


Please try to have a think on all the use case and not only yours.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-23 Thread Peng Fan

On Fri, Sep 23, 2016 at 10:24:37AM +0100, Julien Grall wrote:
>Hello Peng,
>
>On 23/09/16 03:14, Peng Fan wrote:
>>On Thu, Sep 22, 2016 at 07:54:02PM +0100, Julien Grall wrote:
>>>Hi Stefano,
>>>
>>>On 22/09/2016 18:31, Stefano Stabellini wrote:
On Thu, 22 Sep 2016, Julien Grall wrote:
>Hello Peng,
>
>On 22/09/16 10:27, Peng Fan wrote:
>>On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>A feature like `xl cpupool-biglittle-split' can still be interesting,
>>
>>"cpupool-cluster-split" maybe a better name?
>
>You seem to assume that a cluster, from the MPIDR point of view, can only
>contain the same set of CPUs. I don't think this is part of the 
>architecture,
>so this may not be true in the future.

Interesting. I also understood that a cluster can only have one kind if
cpus. Honestly it would be a little insane for it to be otherwise :-)
>>>
>>>I don't think this is insane (or maybe I am insane :)). Cluster usually
>>>doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
>>>may not be present, so if you move a task from one cluster to another you
>>>will add latency because the new L2 cache has to be refilled.
>>>
>>>The use case of big.LITTLE is big cores are used for short period of burst
>>>and little core are used for the rest (e.g listening audio, fetching
>>>mail...). If you want to reduce latency when switch between big and little
>>>CPUs, you may want to put them within the same cluster.
>>>
>>>Also, as mentioned in another thread, you may have a platform with the same
>>>micro-architecture (e.g Cortex A-53) but different silicon implementation
>>>(e.g to have a different frequency, power efficiency). Here the concept of
>>>big.LITTLE is more blurred.
>>
>>That is possible that in one cluster, different pcpus runs with different cpu
>>frequency. This depends on hardware design. Some may require all the cores in
>>one cluster runs at the same frequency, some may have more complicated design 
>>that
>>supports different cores runs at different frequency.
>>
>>This is just like you have a smp system, but different cores can run at
>>different cpu frequency. I think this is not what bit.LITTLE means.
>
>big.LITTLE is a generic term to have "power hungry and powerful core
>powerful" (big) with slower and battery-saving cores (LITTLE).
>
>It is not mandatory to have different micro-architectures between big and
>LITTLE cores.
>
>In any case, the interface should not be big.LITTLE specific. We don't want
>to tie us to one specific architecture.

If all the cores have the same micro-architecture, but for some reason,
they are put in different clusters or cpus in one cluster support running
at different cpu freq.

We still can introduce cpupool-cluster-split or as Juergen suggested,
use "cpupool-slit feature=xx"  to split the cluster or cpuclasses
into different cpupools. This is just a feature that better to have, I think.

The reason to include cpupool-cluster-split or else is to split the big and 
little
cores into different cpupools. And now big and little cores are in different cpu
clusters from the hardware[1] I can see. I think assigning cores from
different clusters into one cpupool is not a good idea.

I have no idea about future hardware.


If cluster is not prefered, cpuclass maybe a choice, but I personally perfer
"cluster" split for ARM.

Thanks,
Peng.

[1] https://en.wikipedia.org/wiki/ARM_big.LITTLE

>
>Regards,
>
>-- 
>Julien Grall

-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-23 Thread Julien Grall


Hello Peng,

On 23/09/16 03:14, Peng Fan wrote:

On Thu, Sep 22, 2016 at 07:54:02PM +0100, Julien Grall wrote:

Hi Stefano,

On 22/09/2016 18:31, Stefano Stabellini wrote:

On Thu, 22 Sep 2016, Julien Grall wrote:

Hello Peng,

On 22/09/16 10:27, Peng Fan wrote:

On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:

On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:

On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:

A feature like `xl cpupool-biglittle-split' can still be interesting,


"cpupool-cluster-split" maybe a better name?


You seem to assume that a cluster, from the MPIDR point of view, can only
contain the same set of CPUs. I don't think this is part of the architecture,
so this may not be true in the future.


Interesting. I also understood that a cluster can only have one kind if
cpus. Honestly it would be a little insane for it to be otherwise :-)


I don't think this is insane (or maybe I am insane :)). Cluster usually
doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
may not be present, so if you move a task from one cluster to another you
will add latency because the new L2 cache has to be refilled.

The use case of big.LITTLE is big cores are used for short period of burst
and little core are used for the rest (e.g listening audio, fetching
mail...). If you want to reduce latency when switch between big and little
CPUs, you may want to put them within the same cluster.

Also, as mentioned in another thread, you may have a platform with the same
micro-architecture (e.g Cortex A-53) but different silicon implementation
(e.g to have a different frequency, power efficiency). Here the concept of
big.LITTLE is more blurred.


That is possible that in one cluster, different pcpus runs with different cpu
frequency. This depends on hardware design. Some may require all the cores in
one cluster runs at the same frequency, some may have more complicated design 
that
supports different cores runs at different frequency.

This is just like you have a smp system, but different cores can run at
different cpu frequency. I think this is not what bit.LITTLE means.


big.LITTLE is a generic term to have "power hungry and powerful core 
powerful" (big) with slower and battery-saving cores (LITTLE).


It is not mandatory to have different micro-architectures between big 
and LITTLE cores.


In any case, the interface should not be big.LITTLE specific. We don't 
want to tie us to one specific architecture.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Thu, Sep 22, 2016 at 12:21:00PM +0100, Julien Grall wrote:

According to George's comments,
Then, I think we could use affinity to restrict little vcpus be scheduled 
on little vcpus,
and restrict big vcpus on big vcpus. Seems no need to consider soft 
affinity, use hard
affinity is to handle this.

We may need to provide some interface to let xl can get the information 
such as
big.little or smp. if it is big.little, which is big and which is little.

For how to differentiate cpus, I am looking the linaro eas cpu topology 
code,
The code has not been upstreamed (:, but merged into google android kernel.
I only plan to take some necessary code, such as device tree parse and
cpu topology build, because we only need to know the computing capacity of 
each pcpu.

Some doc about eas piece, including dts node examples:
https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
>>>
>>>I am reluctant to take any non-upstreamed bindings in Xen. There is a similar
>>>series going on the lklm [1].
>>
>>For how to differentiate cpu classes, how about directly use
>>compatible property of each cpu node?
>
>What do you mean by cpu classes? If it is power, then the compatible will not
>help here. You may have a platform with the same core (e.g cortex A53) but
>different silicon implementation, so the power efficiency will be different.

cpu classes, I mean cpu clusters. I checked the cpu capacity code[1] you listed,
it use dmips from dhystone. But now what I plan to implement to block vcpu
from being scheduled within big.LITTLE.

In my case, vcpu will be restricted in A53 or A72.

In the same cluster, different cores may run at different cpu freq or all the 
cores
run at the same freq. This depends on soc implementation.

This needs xen to choose which pcpu to run the vcpu, and no local migration 
scheduling vcpu on the cpus in one cluster.

Considering power for future, dmips needs to be used, but also need to 
differentiate
cpus from different clusters. So "dmips + compatible" both needs to be 
considered.

For cpus in one cluster, also need to take the dmips info for xen scheduling
vcpu between pcpu effectively.

Thanks,
Peng.

[1]https://lwn.net/Articles/699569/
>
>Regards,
>
>-- 
>Julien Grall

-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Thu, Sep 22, 2016 at 07:54:02PM +0100, Julien Grall wrote:
>Hi Stefano,
>
>On 22/09/2016 18:31, Stefano Stabellini wrote:
>>On Thu, 22 Sep 2016, Julien Grall wrote:
>>>Hello Peng,
>>>
>>>On 22/09/16 10:27, Peng Fan wrote:
On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>A feature like `xl cpupool-biglittle-split' can still be interesting,

"cpupool-cluster-split" maybe a better name?
>>>
>>>You seem to assume that a cluster, from the MPIDR point of view, can only
>>>contain the same set of CPUs. I don't think this is part of the architecture,
>>>so this may not be true in the future.
>>
>>Interesting. I also understood that a cluster can only have one kind if
>>cpus. Honestly it would be a little insane for it to be otherwise :-)
>
>I don't think this is insane (or maybe I am insane :)). Cluster usually
>doesn't share all L2 cache (assuming L1 is local to each core) and L3 cache
>may not be present, so if you move a task from one cluster to another you
>will add latency because the new L2 cache has to be refilled.
>
>The use case of big.LITTLE is big cores are used for short period of burst
>and little core are used for the rest (e.g listening audio, fetching
>mail...). If you want to reduce latency when switch between big and little
>CPUs, you may want to put them within the same cluster.
>
>Also, as mentioned in another thread, you may have a platform with the same
>micro-architecture (e.g Cortex A-53) but different silicon implementation
>(e.g to have a different frequency, power efficiency). Here the concept of
>big.LITTLE is more blurred.

That is possible that in one cluster, different pcpus runs with different cpu
frequency. This depends on hardware design. Some may require all the cores in
one cluster runs at the same frequency, some may have more complicated design 
that
supports different cores runs at different frequency.

This is just like you have a smp system, but different cores can run at
different cpu frequency. I think this is not what bit.LITTLE means.

For the pcpus in one cluster, xen needs to choose which pcpu for vcpu
for power or etc.


Thanks,
Peng.

>
>That's why I am quite reluctant to name (even if it may be more handy to the
>user) "big" and "little" the different CPU set.
>
>Cheers,
>
>-- 
>Julien Grall

-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Thu, Sep 22, 2016 at 12:29:53PM +0100, Julien Grall wrote:
>Hello Peng,
>
>On 22/09/16 10:27, Peng Fan wrote:
>>On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>>>A feature like `xl cpupool-biglittle-split' can still be interesting,
>>
>>"cpupool-cluster-split" maybe a better name?
>
>You seem to assume that a cluster, from the MPIDR point of view, can only
>contain the same set of CPUs. I don't think this is part of the architecture,
>so this may not be true in the future.
>
>>
>>>completely orthogonally and independently from the affinity based work,
>>>and this series looks like it can be used to implement that. :-)
>>
>>Agree. All pcpus default can be assigned into cpupool0 based on the affinity 
>>work.
>
>What do you mean by affinity? From MPIDR?

vcpu hard affinity. When allocate or initialize vcpu, the hard affinity needs
to be inititialized.

Thanks,
Peng.

>
>>We could add one like "cpupool-numa-split" to split different classes cpu
>>into different cpupools.
>
>Regards,
>
>-- 
>Julien Grall

-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Julien Grall


Hi Stefano,

On 22/09/2016 18:31, Stefano Stabellini wrote:

On Thu, 22 Sep 2016, Julien Grall wrote:

Hello Peng,

On 22/09/16 10:27, Peng Fan wrote:

On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:

On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:

On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:

A feature like `xl cpupool-biglittle-split' can still be interesting,


"cpupool-cluster-split" maybe a better name?


You seem to assume that a cluster, from the MPIDR point of view, can only
contain the same set of CPUs. I don't think this is part of the architecture,
so this may not be true in the future.


Interesting. I also understood that a cluster can only have one kind if
cpus. Honestly it would be a little insane for it to be otherwise :-)


I don't think this is insane (or maybe I am insane :)). Cluster usually 
doesn't share all L2 cache (assuming L1 is local to each core) and L3 
cache may not be present, so if you move a task from one cluster to 
another you will add latency because the new L2 cache has to be refilled.


The use case of big.LITTLE is big cores are used for short period of 
burst and little core are used for the rest (e.g listening audio, 
fetching mail...). If you want to reduce latency when switch between big 
and little CPUs, you may want to put them within the same cluster.


Also, as mentioned in another thread, you may have a platform with the 
same micro-architecture (e.g Cortex A-53) but different silicon 
implementation (e.g to have a different frequency, power efficiency). 
Here the concept of big.LITTLE is more blurred.


That's why I am quite reluctant to name (even if it may be more handy to 
the user) "big" and "little" the different CPU set.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Stefano Stabellini

On Thu, 22 Sep 2016, Dario Faggioli wrote:
> On Thu, 2016-09-22 at 18:05 +0800, Peng Fan wrote:
> > On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > > Yes (or I should say, "whatever", as I know nothing about all
> > > this! :-P)
> > 
> > One more thing I'd like to ask, do you prefer cpu classes to be ARM
> > specific or ARM/X86
> > common?
> > 
> I'm not sure. I'd say that it depends on where we are. I mean, in Xen,
> names can be rather specific, like some codename of the
> chip/core/family/etc.
> 
> I'm not sure what this means for you, on ARM, but I guess it would
> depend on what you, Julien and Stefano will come up and agree on.

Actually it depends on what the x86 maintainers think. For us (ARM
maintainers) it makes little difference whether the concept of cpu
classes is ARM specific or common.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Stefano Stabellini

On Thu, 22 Sep 2016, Julien Grall wrote:
> Hello Peng,
> 
> On 22/09/16 10:27, Peng Fan wrote:
> > On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > > On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
> > > > On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
> > > A feature like `xl cpupool-biglittle-split' can still be interesting,
> > 
> > "cpupool-cluster-split" maybe a better name?
> 
> You seem to assume that a cluster, from the MPIDR point of view, can only
> contain the same set of CPUs. I don't think this is part of the architecture,
> so this may not be true in the future.

Interesting. I also understood that a cluster can only have one kind if
cpus. Honestly it would be a little insane for it to be otherwise :-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Dario Faggioli

On Thu, 2016-09-22 at 12:24 +0100, Julien Grall wrote:
> On 22/09/16 09:43, Dario Faggioli wrote:
> > Local migration basically --from the vcpu perspective-- means
> > create a
> > new vcpu, stop the original vcpu, copy the state from original to
> > new,
> > destroy the original vcpu and start the new one. My point is that
> > this
> > is not something that can be done within nor initiated by the
> > scheduler, e.g., during a context switch or a vcpu wakeup!
> 
> By local migration, I meant from the perspective of the hypervisor.
> In 
> the hypervisor you have to trap feature registers and other 
> implementation defined registers to show the same value across all
> the 
> physical CPUs.
> 
You mean we trap feature registers during the (normal) execution of a
vcpu, because we want Xen to vet what's returned to the guest itself.
And that migration support, and hence the possibility that the guest
have been migrated to a cpu different than the one where it was
created, is already one of the reasons why this is necessary... right?

If yes, and if that's "all" we need, I think it should be fine.

> You don't need to recreate the vCPU every time you move from one set
> of 
> CPUs to another one. Sorry for the confusion.
> 
No, I am sorry... it's not you that you're making confusion, it's
probably me knowing to few about ARM, and did not think at the above
when you said "migration". :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Dario Faggioli

On Thu, 2016-09-22 at 18:05 +0800, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > Yes (or I should say, "whatever", as I know nothing about all
> > this! :-P)
> 
> One more thing I'd like to ask, do you prefer cpu classes to be ARM
> specific or ARM/X86
> common?
> 
I'm not sure. I'd say that it depends on where we are. I mean, in Xen,
names can be rather specific, like some codename of the
chip/core/family/etc.

I'm not sure what this means for you, on ARM, but I guess it would
depend on what you, Julien and Stefano will come up and agree on.

Then, at the toolstack level (xl and libxl) we can have aliases for the
various classes, and/or names for specific group of classes, arranged
according whatever criteria.

I also like George's idea of allowing to pick a class by its order in
the hypervisor hierarchy, if/as soon as we'll put classes in a
hierarchy withing the hypervisor.

But I'd like to hear others... In the meanwhile, if I were you, I'd
start with either "class 0", "class 1", etc., or just use the codename
of the chip ("A17, "A15", etc.)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Julien Grall


Hello Peng,

On 22/09/16 10:27, Peng Fan wrote:

On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:

On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:

On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:

A feature like `xl cpupool-biglittle-split' can still be interesting,


"cpupool-cluster-split" maybe a better name?


You seem to assume that a cluster, from the MPIDR point of view, can 
only contain the same set of CPUs. I don't think this is part of the 
architecture, so this may not be true in the future.





completely orthogonally and independently from the affinity based work,
and this series looks like it can be used to implement that. :-)


Agree. All pcpus default can be assigned into cpupool0 based on the affinity 
work.


What do you mean by affinity? From MPIDR?


We could add one like "cpupool-numa-split" to split different classes cpu
into different cpupools.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Julien Grall


Hi Dario,

On 22/09/16 09:43, Dario Faggioli wrote:

On Wed, 2016-09-21 at 20:28 +0100, Julien Grall wrote:

On 21/09/2016 16:45, Dario Faggioli wrote:

This does not seem to match with what has been said at some point
in
this thread... And if it's like that, how's that possible, if the
pcpus' ISAs are (even only slightly) different?


Right, at some point I mentioned that the set of errata and features
will be different between processor.


Yes, I read that, but wasn't (and still am not) sure about whether or
not that meant a vcpu can move freely between classes or not, in the
way that the scheduler does that.

In fact, you say:


With a bit of work in Xen, it would be possible to do move the vCPU
between big and LITTLE cpus. As mentioned above, we could sanitize
the
features to only enable a common set.
You can view the big.LITTLE
problem as a local live migration between two kind of CPUs.


Local migration basically --from the vcpu perspective-- means create a
new vcpu, stop the original vcpu, copy the state from original to new,
destroy the original vcpu and start the new one. My point is that this
is not something that can be done within nor initiated by the
scheduler, e.g., during a context switch or a vcpu wakeup!


By local migration, I meant from the perspective of the hypervisor. In 
the hypervisor you have to trap feature registers and other 
implementation defined registers to show the same value across all the 
physical CPUs.


You don't need to recreate the vCPU every time you move from one set of 
CPUs to another one. Sorry for the confusion.


Regards,
--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Julien Grall


Hello Peng,

On 22/09/16 10:45, Peng Fan wrote:

On Wed, Sep 21, 2016 at 11:15:35AM +0100, Julien Grall wrote:

Hello Peng,

On 21/09/16 09:38, Peng Fan wrote:

On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

On 20/09/2016 20:09, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

On 20/09/2016 12:27, George Dunlap wrote:

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
wrote:

On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Dario Faggioli wrote:

It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot cpu,
and utilize the second cluster only when the user demands it.


Why do you think the boot CPU will usually be a big one? In the case of Juno
platform it is configurable, and the boot CPU is a little core on r2 by
default.

In any case, what we care about is differentiate between two set of CPUs. I
don't think Xen should care about migrating a guest vCPU between big and
LITTLE cpus. So I am not sure why we would want to know that.


No, it is not about migrating (at least yet). It is about giving useful
information to the user. It would be nice if the user had to choose
between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
even "A7" or "A15".


As Dario mentioned in previous email,
for dom0 provide like this:

dom0_vcpus_big = 4
dom0_vcpus_little = 2

to dom0.

If these two no provided, we could let dom0 runs on big pcpus or big.little.
Anyway this is not the important point for dom0 only big or big.little.

For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
Such as:

vcpus.big = 2
vcpus.litle = 4


According to George's comments,
Then, I think we could use affinity to restrict little vcpus be scheduled on 
little vcpus,
and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, 
use hard
affinity is to handle this.

We may need to provide some interface to let xl can get the information such as
big.little or smp. if it is big.little, which is big and which is little.

For how to differentiate cpus, I am looking the linaro eas cpu topology code,
The code has not been upstreamed (:, but merged into google android kernel.
I only plan to take some necessary code, such as device tree parse and
cpu topology build, because we only need to know the computing capacity of each 
pcpu.

Some doc about eas piece, including dts node examples:
https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt


I am reluctant to take any non-upstreamed bindings in Xen. There is a similar
series going on the lklm [1].


For how to differentiate cpu classes, how about directly use
compatible property of each cpu node?


What do you mean by cpu classes? If it is power, then the compatible 
will not help here. You may have a platform with the same core (e.g 
cortex A53) but different silicon implementation, so the power 
efficiency will be different.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Dario Faggioli

[Trimming the Cc-list quite a bit!]

On Thu, 2016-09-22 at 18:09 +0800, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:51:04AM +0100, George Dunlap wrote:
> > I think we should name this however we name the different types of
> > cpus.
> > i.e., if we're going to call these "cpu classes", then we should
> > call
> > this "cpupool-cpuclass-split" or something.
> 
> Ok. Got it.
> 
Hey, Peng... non technical thing: can you trim your quotes when
replying to emails?

What I mean by that is exactly what I've done in this very message (and
in any message I write, unless I forget :-D), i.e., remove all the
content coming from previous emails in the conversation that is not
relevant for what you are actually talking about and replying to.

Of course, it's a matter of balance, and there is the risk of removing
too much, which then means one would have to open old emails to follow
the conversation. But, for instance in this case, I had to hit PgDown
_15_ times in my MUA, just to figure out you were saying "Ok. Got it",
which is certainly not ideal. :-/

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Juergen Gross

On 22/09/16 11:51, George Dunlap wrote:
> On 22/09/16 10:27, Peng Fan wrote:
>> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>> "cpupool-cluster-split" maybe a better name?
> 
> I think we should name this however we name the different types of cpus.
>  i.e., if we're going to call these "cpu classes", then we should call
> this "cpupool-cpuclass-split" or something.

I'd go with "cpupool-split feature=cpuclass". This can be extended later
to e.g.:

cpupool-split feature=cpuclass,numa

In order to combine it with "cpupool-numa-split" (which will be the same
as "cpupool-split feature=numa").


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Thu, Sep 22, 2016 at 10:51:04AM +0100, George Dunlap wrote:
>On 22/09/16 10:27, Peng Fan wrote:
>> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>>> On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
 On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>
> Hi Stefano,
>
> On 21/09/2016 19:13, Stefano Stabellini wrote:
>>
>> On Wed, 21 Sep 2016, Julien Grall wrote:
>>>
>>> (CC a couple of ARM folks)
>>>
>>> On 21/09/16 11:22, George Dunlap wrote:

 On 21/09/16 11:09, Julien Grall wrote:
>
>
>
> On 20/09/16 21:17, Stefano Stabellini wrote:
>>
>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>>
>>> Hi Stefano,
>>>
>>> On 20/09/2016 20:09, Stefano Stabellini wrote:

 On Tue, 20 Sep 2016, Julien Grall wrote:
>
> Hi,
>
> On 20/09/2016 12:27, George Dunlap wrote:
>>
>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>> 
>> wrote:
>>>
>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
>>> Faggioli
>>> wrote:

 On Mon, 2016-09-19 at 17:01 -0700, Stefano
 Stabellini wrote:
>
> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>> I'd like to add a computing capability in
>>> xen/arm, like this:
>>>
>>> struct compute_capatiliby
>>> {
>>> ?? char *core_name;
>>> ?? uint32_t rank;
>>> ?? uint32_t cpu_partnum;
>>> };
>>>
>>> struct compute_capatiliby cc=
>>> {
>>> ??{"A72", 4, 0xd08},
>>> ??{"A57", 3, 0},
>>> ??{"A53", 2, 0xd03},
>>> ??{"A35", 1, ...},
>>> }
>>>
>>> Then when identify cpu, we decide which cpu is
>>> big and which
>>> cpu is
>>> little
>>> according to the computing rank.
>>>
>>> Any comments?
>>
>> I think we definitely need to have Xen have some
>> kind of idea
>> the
>> order between processors, so that the user
>> doesn't need to
>> figure out
>> which class / pool is big and which pool is
>> LITTLE.Whether
>> this
>> sort
>> of enumeration is the best way to do that I'll
>> let Julien and
>> Stefano
>> give their opinion.
>
> I don't think an hardcoded list of processor in Xen
> is the right
> solution.
> There are many existing processors and combinations
> for big.LITTLE
> so it
> will
> nearly be impossible to keep updated.
>
> I would expect the firmware table (device tree,
> ACPI) to provide
> relevant
> data
> for each processor and differentiate big from
> LITTLE core.
> Note that I haven't looked at it for now. A good
> place to start is
> looking
> at
> how Linux does.

 That's right, see
 Documentation/devicetree/bindings/arm/cpus.txt. It
 is
 trivial to identify the two different CPU classes and
 which cores
 belong
 to which class.t, as
>>>
>>> The class of the CPU can be found from the MIDR, there
>>> is no need to
>>> use the
>>> device tree/acpi for that. Note that I don't think
>>> there is an easy
>>> way in
>>> ACPI (i.e not in AML) to find out the class.
>>>

 It is harder to figure out which one is supposed to
 be
 big and which one LITTLE. Regardless, we could
 default to using the
 first cluster (usually big), which is also the
 cluster of the boot
 cpu,
 and utilize the second cluster only when the user
 demands it.
>>>
>>> Why do you think the boot CPU will usually be a big
>>> one? In the case
>>> of Juno
>>> platform it is configurable, and the boot CPU is a
>>> little core on r2
>>> by
>>> default.
>>>
>>> In any case, what we care about is differentiate
>>> between two set of
>>> CPUs. I
>>> don't think Xen should care about migrating a guest
>>> vCPU between big
>>> and
>>> LITTLE cpus. So I am not sure why we would

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>> > 
>> > Hi Stefano,
>> > 
>> > On 21/09/2016 19:13, Stefano Stabellini wrote:
>> > > 
>> > > On Wed, 21 Sep 2016, Julien Grall wrote:
>> > > > 
>> > > > (CC a couple of ARM folks)
>> > > > 
>> > > > On 21/09/16 11:22, George Dunlap wrote:
>> > > > > 
>> > > > > On 21/09/16 11:09, Julien Grall wrote:
>> > > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > On 20/09/16 21:17, Stefano Stabellini wrote:
>> > > > > > > 
>> > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > > > > > > 
>> > > > > > > > Hi Stefano,
>> > > > > > > > 
>> > > > > > > > On 20/09/2016 20:09, Stefano Stabellini wrote:
>> > > > > > > > > 
>> > > > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > > > > > > > > 
>> > > > > > > > > > Hi,
>> > > > > > > > > > 
>> > > > > > > > > > On 20/09/2016 12:27, George Dunlap wrote:
>> > > > > > > > > > > 
>> > > > > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>> > > > > > > > > > > 
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
>> > > > > > > > > > > > Faggioli
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > 
>> > > > > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano
>> > > > > > > > > > > > > Stabellini wrote:
>> > > > > > > > > > > > > > 
>> > > > > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> > > > > > > > > > > > I'd like to add a computing capability in
>> > > > > > > > > > > > xen/arm, like this:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > struct compute_capatiliby
>> > > > > > > > > > > > {
>> > > > > > > > > > > > ?? char *core_name;
>> > > > > > > > > > > > ?? uint32_t rank;
>> > > > > > > > > > > > ?? uint32_t cpu_partnum;
>> > > > > > > > > > > > };
>> > > > > > > > > > > > 
>> > > > > > > > > > > > struct compute_capatiliby cc=
>> > > > > > > > > > > > {
>> > > > > > > > > > > > ??{"A72", 4, 0xd08},
>> > > > > > > > > > > > ??{"A57", 3, 0},
>> > > > > > > > > > > > ??{"A53", 2, 0xd03},
>> > > > > > > > > > > > ??{"A35", 1, ...},
>> > > > > > > > > > > > }
>> > > > > > > > > > > > 
>> > > > > > > > > > > > Then when identify cpu, we decide which cpu is
>> > > > > > > > > > > > big and which
>> > > > > > > > > > > > cpu is
>> > > > > > > > > > > > little
>> > > > > > > > > > > > according to the computing rank.
>> > > > > > > > > > > > 
>> > > > > > > > > > > > Any comments?
>> > > > > > > > > > > 
>> > > > > > > > > > > I think we definitely need to have Xen have some
>> > > > > > > > > > > kind of idea
>> > > > > > > > > > > the
>> > > > > > > > > > > order between processors, so that the user
>> > > > > > > > > > > doesn't need to
>> > > > > > > > > > > figure out
>> > > > > > > > > > > which class / pool is big and which pool is
>> > > > > > > > > > > LITTLE.Whether
>> > > > > > > > > > > this
>> > > > > > > > > > > sort
>> > > > > > > > > > > of enumeration is the best way to do that I'll
>> > > > > > > > > > > let Julien and
>> > > > > > > > > > > Stefano
>> > > > > > > > > > > give their opinion.
>> > > > > > > > > > 
>> > > > > > > > > > I don't think an hardcoded list of processor in Xen
>> > > > > > > > > > is the right
>> > > > > > > > > > solution.
>> > > > > > > > > > There are many existing processors and combinations
>> > > > > > > > > > for big.LITTLE
>> > > > > > > > > > so it
>> > > > > > > > > > will
>> > > > > > > > > > nearly be impossible to keep updated.
>> > > > > > > > > > 
>> > > > > > > > > > I would expect the firmware table (device tree,
>> > > > > > > > > > ACPI) to provide
>> > > > > > > > > > relevant
>> > > > > > > > > > data
>> > > > > > > > > > for each processor and differentiate big from
>> > > > > > > > > > LITTLE core.
>> > > > > > > > > > Note that I haven't looked at it for now. A good
>> > > > > > > > > > place to start is
>> > > > > > > > > > looking
>> > > > > > > > > > at
>> > > > > > > > > > how Linux does.
>> > > > > > > > > 
>> > > > > > > > > That's right, see
>> > > > > > > > > Documentation/devicetree/bindings/arm/cpus.txt. It
>> > > > > > > > > is
>> > > > > > > > > trivial to identify the two different CPU classes and
>> > > > > > > > > which cores
>> > > > > > > > > belong
>> > > > > > > > > to which class.t, as
>> > > > > > > > 
>> > > > > > > > The class of the CPU can be found from the MIDR, there
>> > > > > > > > is no need to
>> > > > > > > > use the
>> > > > > > > > device tree/acpi for that. Note that I don't think
>> > > > > > > > there is an easy
>> > > > > > > > way in
>> > > > > > > > ACPI (i.e not in AML) to find out the class.
>> > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > > It is harder to figure out which one is supposed to
>> > > > > > > > > be
>> > > > > > > > > big and which one LITTLE. Regardless, we could
>> > > > > > > > > default to using the

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread George Dunlap

On 22/09/16 10:27, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>> On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:

 Hi Stefano,

 On 21/09/2016 19:13, Stefano Stabellini wrote:
>
> On Wed, 21 Sep 2016, Julien Grall wrote:
>>
>> (CC a couple of ARM folks)
>>
>> On 21/09/16 11:22, George Dunlap wrote:
>>>
>>> On 21/09/16 11:09, Julien Grall wrote:



 On 20/09/16 21:17, Stefano Stabellini wrote:
>
> On Tue, 20 Sep 2016, Julien Grall wrote:
>>
>> Hi Stefano,
>>
>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>>>
>>> On Tue, 20 Sep 2016, Julien Grall wrote:

 Hi,

 On 20/09/2016 12:27, George Dunlap wrote:
>
> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
> 
> wrote:
>>
>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
>> Faggioli
>> wrote:
>>>
>>> On Mon, 2016-09-19 at 17:01 -0700, Stefano
>>> Stabellini wrote:

 On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> I'd like to add a computing capability in
>> xen/arm, like this:
>>
>> struct compute_capatiliby
>> {
>> ?? char *core_name;
>> ?? uint32_t rank;
>> ?? uint32_t cpu_partnum;
>> };
>>
>> struct compute_capatiliby cc=
>> {
>> ??{"A72", 4, 0xd08},
>> ??{"A57", 3, 0},
>> ??{"A53", 2, 0xd03},
>> ??{"A35", 1, ...},
>> }
>>
>> Then when identify cpu, we decide which cpu is
>> big and which
>> cpu is
>> little
>> according to the computing rank.
>>
>> Any comments?
>
> I think we definitely need to have Xen have some
> kind of idea
> the
> order between processors, so that the user
> doesn't need to
> figure out
> which class / pool is big and which pool is
> LITTLE.Whether
> this
> sort
> of enumeration is the best way to do that I'll
> let Julien and
> Stefano
> give their opinion.

 I don't think an hardcoded list of processor in Xen
 is the right
 solution.
 There are many existing processors and combinations
 for big.LITTLE
 so it
 will
 nearly be impossible to keep updated.

 I would expect the firmware table (device tree,
 ACPI) to provide
 relevant
 data
 for each processor and differentiate big from
 LITTLE core.
 Note that I haven't looked at it for now. A good
 place to start is
 looking
 at
 how Linux does.
>>>
>>> That's right, see
>>> Documentation/devicetree/bindings/arm/cpus.txt. It
>>> is
>>> trivial to identify the two different CPU classes and
>>> which cores
>>> belong
>>> to which class.t, as
>>
>> The class of the CPU can be found from the MIDR, there
>> is no need to
>> use the
>> device tree/acpi for that. Note that I don't think
>> there is an easy
>> way in
>> ACPI (i.e not in AML) to find out the class.
>>
>>>
>>> It is harder to figure out which one is supposed to
>>> be
>>> big and which one LITTLE. Regardless, we could
>>> default to using the
>>> first cluster (usually big), which is also the
>>> cluster of the boot
>>> cpu,
>>> and utilize the second cluster only when the user
>>> demands it.
>>
>> Why do you think the boot CPU will usually be a big
>> one? In the case
>> of Juno
>> platform it is configurable, and the boot CPU is a
>> little core on r2
>> by
>> default.
>>
>> In any case, what we care about is differentiate
>> between two set of
>> CPUs. I
>> don't think Xen should care about migrating a guest
>> vCPU between big
>> and
>> LITTLE cpus. So I am not sure why we would want to know
>> that.
>
> No, it is not about migrating (at least yet). It is about
> giving useful
> information to the user. It would be nice if the user had
> to

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Dario Faggioli

On Thu, 2016-09-22 at 17:27 +0800, Peng Fan wrote:
> On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
> > A feature like `xl cpupool-biglittle-split' can still be
> > interesting,
> 
> "cpupool-cluster-split" maybe a better name?
> 
Yeah, sure, whatever! :-D

> > 
> > completely orthogonally and independently from the affinity based
> > work,
> > and this series looks like it can be used to implement that. :-)
> 
> Agree. All pcpus default can be assigned into cpupool0 based on the
> affinity work.
>
Exactly. If we work on affinity, this cpupool splitting will not be by
any means necessary, and must not be done at boot. It will be something
that the user can do by himself at any time, and move/create domains
inside the various pools, if that's what he wants.

> We could add one like "cpupool-numa-split" to split different classes
> cpu
> into different cpupools.
> 
Yep.

Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Wed, Sep 21, 2016 at 11:15:35AM +0100, Julien Grall wrote:
>Hello Peng,
>
>On 21/09/16 09:38, Peng Fan wrote:
>>On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>>>On Tue, 20 Sep 2016, Julien Grall wrote:
On 20/09/2016 20:09, Stefano Stabellini wrote:
>On Tue, 20 Sep 2016, Julien Grall wrote:
>>On 20/09/2016 12:27, George Dunlap wrote:
>>>On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
>>>wrote:
On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>On Tue, 20 Sep 2016, Dario Faggioli wrote:
>It is harder to figure out which one is supposed to be
>big and which one LITTLE. Regardless, we could default to using the
>first cluster (usually big), which is also the cluster of the boot cpu,
>and utilize the second cluster only when the user demands it.

Why do you think the boot CPU will usually be a big one? In the case of Juno
platform it is configurable, and the boot CPU is a little core on r2 by
default.

In any case, what we care about is differentiate between two set of CPUs. I
don't think Xen should care about migrating a guest vCPU between big and
LITTLE cpus. So I am not sure why we would want to know that.
>>>
>>>No, it is not about migrating (at least yet). It is about giving useful
>>>information to the user. It would be nice if the user had to choose
>>>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>>even "A7" or "A15".
>>
>>As Dario mentioned in previous email,
>>for dom0 provide like this:
>>
>>dom0_vcpus_big = 4
>>dom0_vcpus_little = 2
>>
>>to dom0.
>>
>>If these two no provided, we could let dom0 runs on big pcpus or big.little.
>>Anyway this is not the important point for dom0 only big or big.little.
>>
>>For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
>>Such as:
>>
>>vcpus.big = 2
>>vcpus.litle = 4
>>
>>
>>According to George's comments,
>>Then, I think we could use affinity to restrict little vcpus be scheduled on 
>>little vcpus,
>>and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, 
>>use hard
>>affinity is to handle this.
>>
>>We may need to provide some interface to let xl can get the information such 
>>as
>>big.little or smp. if it is big.little, which is big and which is little.
>>
>>For how to differentiate cpus, I am looking the linaro eas cpu topology code,
>>The code has not been upstreamed (:, but merged into google android kernel.
>>I only plan to take some necessary code, such as device tree parse and
>>cpu topology build, because we only need to know the computing capacity of 
>>each pcpu.
>>
>>Some doc about eas piece, including dts node examples:
>>https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
>
>I am reluctant to take any non-upstreamed bindings in Xen. There is a similar
>series going on the lklm [1].

For how to differentiate cpu classes, how about directly use
compatible property of each cpu node?

A57_0: cpu@0 {
compatible = "arm,cortex-a57","arm,armv8";
reg = <0x0 0x0>;
...
};

A53_0: cpu@100 {
compatible = "arm,cortex-a53","arm,armv8";
reg = <0x0 0x100>;
.
}

Thanks,
Peng.

>
>But it sounds like it is a lot of works for little benefits (i.e giving a
>better name to the set of CPUs). The naming will also not fit if in the
>future hardware will have more than 2 kind of CPUs.
>
>[...]
>
>>I am not sure, but we may also need to handle mpidr for ARM, because big and 
>>little vcpus are supported.
>
>I am not sure to understand what you mean here.
>
>Regards,
>
>[1] https://lwn.net/Articles/699569/
>
>-- 
>Julien Grall

-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Thu, Sep 22, 2016 at 10:50:23AM +0200, Dario Faggioli wrote:
>On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
>> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>> > 
>> > Hi Stefano,
>> > 
>> > On 21/09/2016 19:13, Stefano Stabellini wrote:
>> > > 
>> > > On Wed, 21 Sep 2016, Julien Grall wrote:
>> > > > 
>> > > > (CC a couple of ARM folks)
>> > > > 
>> > > > On 21/09/16 11:22, George Dunlap wrote:
>> > > > > 
>> > > > > On 21/09/16 11:09, Julien Grall wrote:
>> > > > > > 
>> > > > > > 
>> > > > > > 
>> > > > > > On 20/09/16 21:17, Stefano Stabellini wrote:
>> > > > > > > 
>> > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > > > > > > 
>> > > > > > > > Hi Stefano,
>> > > > > > > > 
>> > > > > > > > On 20/09/2016 20:09, Stefano Stabellini wrote:
>> > > > > > > > > 
>> > > > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > > > > > > > > 
>> > > > > > > > > > Hi,
>> > > > > > > > > > 
>> > > > > > > > > > On 20/09/2016 12:27, George Dunlap wrote:
>> > > > > > > > > > > 
>> > > > > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>> > > > > > > > > > > 
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
>> > > > > > > > > > > > Faggioli
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > 
>> > > > > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano
>> > > > > > > > > > > > > Stabellini wrote:
>> > > > > > > > > > > > > > 
>> > > > > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> > > > > > > > > > > > I'd like to add a computing capability in
>> > > > > > > > > > > > xen/arm, like this:
>> > > > > > > > > > > > 
>> > > > > > > > > > > > struct compute_capatiliby
>> > > > > > > > > > > > {
>> > > > > > > > > > > > ?? char *core_name;
>> > > > > > > > > > > > ?? uint32_t rank;
>> > > > > > > > > > > > ?? uint32_t cpu_partnum;
>> > > > > > > > > > > > };
>> > > > > > > > > > > > 
>> > > > > > > > > > > > struct compute_capatiliby cc=
>> > > > > > > > > > > > {
>> > > > > > > > > > > > ??{"A72", 4, 0xd08},
>> > > > > > > > > > > > ??{"A57", 3, 0},
>> > > > > > > > > > > > ??{"A53", 2, 0xd03},
>> > > > > > > > > > > > ??{"A35", 1, ...},
>> > > > > > > > > > > > }
>> > > > > > > > > > > > 
>> > > > > > > > > > > > Then when identify cpu, we decide which cpu is
>> > > > > > > > > > > > big and which
>> > > > > > > > > > > > cpu is
>> > > > > > > > > > > > little
>> > > > > > > > > > > > according to the computing rank.
>> > > > > > > > > > > > 
>> > > > > > > > > > > > Any comments?
>> > > > > > > > > > > 
>> > > > > > > > > > > I think we definitely need to have Xen have some
>> > > > > > > > > > > kind of idea
>> > > > > > > > > > > the
>> > > > > > > > > > > order between processors, so that the user
>> > > > > > > > > > > doesn't need to
>> > > > > > > > > > > figure out
>> > > > > > > > > > > which class / pool is big and which pool is
>> > > > > > > > > > > LITTLE.Whether
>> > > > > > > > > > > this
>> > > > > > > > > > > sort
>> > > > > > > > > > > of enumeration is the best way to do that I'll
>> > > > > > > > > > > let Julien and
>> > > > > > > > > > > Stefano
>> > > > > > > > > > > give their opinion.
>> > > > > > > > > > 
>> > > > > > > > > > I don't think an hardcoded list of processor in Xen
>> > > > > > > > > > is the right
>> > > > > > > > > > solution.
>> > > > > > > > > > There are many existing processors and combinations
>> > > > > > > > > > for big.LITTLE
>> > > > > > > > > > so it
>> > > > > > > > > > will
>> > > > > > > > > > nearly be impossible to keep updated.
>> > > > > > > > > > 
>> > > > > > > > > > I would expect the firmware table (device tree,
>> > > > > > > > > > ACPI) to provide
>> > > > > > > > > > relevant
>> > > > > > > > > > data
>> > > > > > > > > > for each processor and differentiate big from
>> > > > > > > > > > LITTLE core.
>> > > > > > > > > > Note that I haven't looked at it for now. A good
>> > > > > > > > > > place to start is
>> > > > > > > > > > looking
>> > > > > > > > > > at
>> > > > > > > > > > how Linux does.
>> > > > > > > > > 
>> > > > > > > > > That's right, see
>> > > > > > > > > Documentation/devicetree/bindings/arm/cpus.txt. It
>> > > > > > > > > is
>> > > > > > > > > trivial to identify the two different CPU classes and
>> > > > > > > > > which cores
>> > > > > > > > > belong
>> > > > > > > > > to which class.t, as
>> > > > > > > > 
>> > > > > > > > The class of the CPU can be found from the MIDR, there
>> > > > > > > > is no need to
>> > > > > > > > use the
>> > > > > > > > device tree/acpi for that. Note that I don't think
>> > > > > > > > there is an easy
>> > > > > > > > way in
>> > > > > > > > ACPI (i.e not in AML) to find out the class.
>> > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > > It is harder to figure out which one is supposed to
>> > > > > > > > > be
>> > > > > > > > > big and which one LITTLE. Regardless, we could
>> > > > > > > > > default to using the

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Dario Faggioli

On Thu, 2016-09-22 at 14:49 +0800, Peng Fan wrote:
> On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
> > 
> > Hi Stefano,
> > 
> > On 21/09/2016 19:13, Stefano Stabellini wrote:
> > > 
> > > On Wed, 21 Sep 2016, Julien Grall wrote:
> > > > 
> > > > (CC a couple of ARM folks)
> > > > 
> > > > On 21/09/16 11:22, George Dunlap wrote:
> > > > > 
> > > > > On 21/09/16 11:09, Julien Grall wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 20/09/16 21:17, Stefano Stabellini wrote:
> > > > > > > 
> > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > > > > > > 
> > > > > > > > Hi Stefano,
> > > > > > > > 
> > > > > > > > On 20/09/2016 20:09, Stefano Stabellini wrote:
> > > > > > > > > 
> > > > > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > > > > > > > > 
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > On 20/09/2016 12:27, George Dunlap wrote:
> > > > > > > > > > > 
> > > > > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
> > > > > > > > > > > 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario
> > > > > > > > > > > > Faggioli
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano
> > > > > > > > > > > > > Stabellini wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > > > > > > > > > I'd like to add a computing capability in
> > > > > > > > > > > > xen/arm, like this:
> > > > > > > > > > > > 
> > > > > > > > > > > > struct compute_capatiliby
> > > > > > > > > > > > {
> > > > > > > > > > > >   char *core_name;
> > > > > > > > > > > >   uint32_t rank;
> > > > > > > > > > > >   uint32_t cpu_partnum;
> > > > > > > > > > > > };
> > > > > > > > > > > > 
> > > > > > > > > > > > struct compute_capatiliby cc=
> > > > > > > > > > > > {
> > > > > > > > > > > >  {"A72", 4, 0xd08},
> > > > > > > > > > > >  {"A57", 3, 0},
> > > > > > > > > > > >  {"A53", 2, 0xd03},
> > > > > > > > > > > >  {"A35", 1, ...},
> > > > > > > > > > > > }
> > > > > > > > > > > > 
> > > > > > > > > > > > Then when identify cpu, we decide which cpu is
> > > > > > > > > > > > big and which
> > > > > > > > > > > > cpu is
> > > > > > > > > > > > little
> > > > > > > > > > > > according to the computing rank.
> > > > > > > > > > > > 
> > > > > > > > > > > > Any comments?
> > > > > > > > > > > 
> > > > > > > > > > > I think we definitely need to have Xen have some
> > > > > > > > > > > kind of idea
> > > > > > > > > > > the
> > > > > > > > > > > order between processors, so that the user
> > > > > > > > > > > doesn't need to
> > > > > > > > > > > figure out
> > > > > > > > > > > which class / pool is big and which pool is
> > > > > > > > > > > LITTLE.  Whether
> > > > > > > > > > > this
> > > > > > > > > > > sort
> > > > > > > > > > > of enumeration is the best way to do that I'll
> > > > > > > > > > > let Julien and
> > > > > > > > > > > Stefano
> > > > > > > > > > > give their opinion.
> > > > > > > > > > 
> > > > > > > > > > I don't think an hardcoded list of processor in Xen
> > > > > > > > > > is the right
> > > > > > > > > > solution.
> > > > > > > > > > There are many existing processors and combinations
> > > > > > > > > > for big.LITTLE
> > > > > > > > > > so it
> > > > > > > > > > will
> > > > > > > > > > nearly be impossible to keep updated.
> > > > > > > > > > 
> > > > > > > > > > I would expect the firmware table (device tree,
> > > > > > > > > > ACPI) to provide
> > > > > > > > > > relevant
> > > > > > > > > > data
> > > > > > > > > > for each processor and differentiate big from
> > > > > > > > > > LITTLE core.
> > > > > > > > > > Note that I haven't looked at it for now. A good
> > > > > > > > > > place to start is
> > > > > > > > > > looking
> > > > > > > > > > at
> > > > > > > > > > how Linux does.
> > > > > > > > > 
> > > > > > > > > That's right, see
> > > > > > > > > Documentation/devicetree/bindings/arm/cpus.txt. It
> > > > > > > > > is
> > > > > > > > > trivial to identify the two different CPU classes and
> > > > > > > > > which cores
> > > > > > > > > belong
> > > > > > > > > to which class.t, as
> > > > > > > > 
> > > > > > > > The class of the CPU can be found from the MIDR, there
> > > > > > > > is no need to
> > > > > > > > use the
> > > > > > > > device tree/acpi for that. Note that I don't think
> > > > > > > > there is an easy
> > > > > > > > way in
> > > > > > > > ACPI (i.e not in AML) to find out the class.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > It is harder to figure out which one is supposed to
> > > > > > > > > be
> > > > > > > > > big and which one LITTLE. Regardless, we could
> > > > > > > > > default to using the
> > > > > > > > > first cluster (usually big), which is also the
> > > > > > > > > cluster of the boot
> > > > > > > > > cpu,
> > > > > > > > > and utilize the second cluster only when the user
> >

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Dario Faggioli

On Wed, 2016-09-21 at 20:28 +0100, Julien Grall wrote:
> On 21/09/2016 16:45, Dario Faggioli wrote:
> > This does not seem to match with what has been said at some point
> > in
> > this thread... And if it's like that, how's that possible, if the
> > pcpus' ISAs are (even only slightly) different?
> 
> Right, at some point I mentioned that the set of errata and features 
> will be different between processor.
> 
Yes, I read that, but wasn't (and still am not) sure about whether or
not that meant a vcpu can move freely between classes or not, in the
way that the scheduler does that.

In fact, you say:

> With a bit of work in Xen, it would be possible to do move the vCPU 
> between big and LITTLE cpus. As mentioned above, we could sanitize
> the 
> features to only enable a common set. 
> You can view the big.LITTLE 
> problem as a local live migration between two kind of CPUs.
> 
Local migration basically --from the vcpu perspective-- means create a
new vcpu, stop the original vcpu, copy the state from original to new,
destroy the original vcpu and start the new one. My point is that this
is not something that can be done within nor initiated by the
scheduler, e.g., during a context switch or a vcpu wakeup!

And I'm saying this because...

> In your suggestion you don't mention what would happen if the guest 
> configuration does not contain the affinity. Does it mean the vCPU
> will 
> be scheduled anywhere? A pCPU/class will be chosen randomly?
> 
...in my example there were vcpus for which no set of classes was
specified, and I said that it meant those vcpus can run on any pcpu of
any class. And this would be what I think we should do even in cases
where no "vcpuclass" parameter is specified at all.

*BUT* that is only possible if moving a vcpu from a pcpu of class A to
a pcpu of class B does *NOT* require the steps described above, similar
to local migration. IOW, this is only possible if moving a vcpu from a
pcpu of class A to a pcpu of class B *ONLY* requires a context switch.

If changing class requires local migration, the scheduler must be told
that he should never move vcpus between classes (or set of classes made
by homogeneous enough vcpus for which a context switch is sufficient).
If changing class is --or can be made to be, with some work in Xen--
just a context switch, then we can have the scheduler moving vcpus
between (set of) classes.

It's probably not too big of a deal, wrt the end result (see below),
but it changes the implementation a lot.

But, yeah, if changing class can be made simple with some work in Xen,
but is not simple/possible **right now**, then this means that,
_for_now_, vcpus for which a class is not specified must be assigned to
a class (or a set of classes within which the scheduler can freely move
vcpus). In future, we can change this, broadening the "default class"
as much as seamless migration within its pcpus allows that.

Hope I made myself clear enough. :-D

> To be honest, I quite like this idea. 
>
:-)

> It could be used as soft/hard 
> affinity for the moment. But can be extended in the future if/when
> the 
> scheduler gain knowledge of power efficiency and vCPU can migrate 
> between big and LITTLE.
> 
Yes, exactly, and this is, I think, true in both of the above outlined
cases. What I meant when I said it is the implementation, rather than
the end result that changes, is that:
 - if complex migration-alike operations are necessary for changing 
   class, migrating between classes (e.g., between big and LITTLE) 
   will have to happen, e.g., in a load and energy management and
   balancing component implemented above the scheduler itself
 - if just plain context switch is enough, the scheduler can do
   everything by itself.

But yes, in any case, the model we're coming up with looks to be a very
good starting point, because it is orthogonal to and independent from
other components and solution (e.g., cpupools) and is pretty simple and
basic, and leaves room for future extensions.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Wed, Sep 21, 2016 at 08:11:43PM +0100, Julien Grall wrote:
>Hi Stefano,
>
>On 21/09/2016 19:13, Stefano Stabellini wrote:
>>On Wed, 21 Sep 2016, Julien Grall wrote:
>>>(CC a couple of ARM folks)
>>>
>>>On 21/09/16 11:22, George Dunlap wrote:
On 21/09/16 11:09, Julien Grall wrote:
>
>
>On 20/09/16 21:17, Stefano Stabellini wrote:
>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>Hi Stefano,
>>>
>>>On 20/09/2016 20:09, Stefano Stabellini wrote:
On Tue, 20 Sep 2016, Julien Grall wrote:
>Hi,
>
>On 20/09/2016 12:27, George Dunlap wrote:
>>On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
>>
>>wrote:
>>>On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli
>>>wrote:
On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>I'd like to add a computing capability in xen/arm, like this:
>>>
>>>struct compute_capatiliby
>>>{
>>>   char *core_name;
>>>   uint32_t rank;
>>>   uint32_t cpu_partnum;
>>>};
>>>
>>>struct compute_capatiliby cc=
>>>{
>>>  {"A72", 4, 0xd08},
>>>  {"A57", 3, 0},
>>>  {"A53", 2, 0xd03},
>>>  {"A35", 1, ...},
>>>}
>>>
>>>Then when identify cpu, we decide which cpu is big and which
>>>cpu is
>>>little
>>>according to the computing rank.
>>>
>>>Any comments?
>>
>>I think we definitely need to have Xen have some kind of idea
>>the
>>order between processors, so that the user doesn't need to
>>figure out
>>which class / pool is big and which pool is LITTLE.  Whether
>>this
>>sort
>>of enumeration is the best way to do that I'll let Julien and
>>Stefano
>>give their opinion.
>
>I don't think an hardcoded list of processor in Xen is the right
>solution.
>There are many existing processors and combinations for big.LITTLE
>so it
>will
>nearly be impossible to keep updated.
>
>I would expect the firmware table (device tree, ACPI) to provide
>relevant
>data
>for each processor and differentiate big from LITTLE core.
>Note that I haven't looked at it for now. A good place to start is
>looking
>at
>how Linux does.

That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It
is
trivial to identify the two different CPU classes and which cores
belong
to which class.t, as
>>>
>>>The class of the CPU can be found from the MIDR, there is no need to
>>>use the
>>>device tree/acpi for that. Note that I don't think there is an easy
>>>way in
>>>ACPI (i.e not in AML) to find out the class.
>>>
It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot
cpu,
and utilize the second cluster only when the user demands it.
>>>
>>>Why do you think the boot CPU will usually be a big one? In the case
>>>of Juno
>>>platform it is configurable, and the boot CPU is a little core on r2
>>>by
>>>default.
>>>
>>>In any case, what we care about is differentiate between two set of
>>>CPUs. I
>>>don't think Xen should care about migrating a guest vCPU between big
>>>and
>>>LITTLE cpus. So I am not sure why we would want to know that.
>>
>>No, it is not about migrating (at least yet). It is about giving useful
>>information to the user. It would be nice if the user had to choose
>>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>even "A7" or "A15".
>
>I don't think it is wise to assume that we may have only 2 kind of CPUs
>on the platform. We may have more in the future, if so how would you
>name them?

I would suggest that internally Xen recognize an arbitrary number of
processor "classes", and order them according to more powerful -> less
powerful.  Then if at some point someone makes a platform with three
processors, you can say "class 0", "class 1" or "class 2".  "big" would
be an alias for "class 0" and "little" would be an alias for "class 1".
>>>
>>>As mentioned earlier, there is no upstreamed yet device tree bindings to know
>>>the "power" of a CPU (see [1]
>>>

And in my suggestion, we allow a richer set of labels, so that the user
could also be more specific -- e.g., asking for "A15" specifically, for
example, and failing to build if there are no A15 cores present, while
allowing users to simply write "big" or "little" if they want

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-22 Thread Peng Fan

On Wed, Sep 21, 2016 at 08:28:32PM +0100, Julien Grall wrote:
>Hi Dario,
>
>On 21/09/2016 16:45, Dario Faggioli wrote:
>>On Wed, 2016-09-21 at 14:06 +0100, Julien Grall wrote:
>>>(CC a couple of ARM folks)
>>>
>>Yay, thanks for this! :-)
>>
>>>I had few discussions and  more thought about big.LITTLE support in
>>>Xen.
>>>The main goal of big.LITTLE is power efficiency by moving task
>>>around
>>>and been able to idle one cluster. All the solutions suggested
>>>(including mine) so far, can be replicated by hand (except the VPIDR)
>>>so
>>>they are mostly an automatic way.
>>>
>>I'm sorry, how is this (going to be) handled in Linux? Is it that any
>>arbitrary task executing any arbitrary binary code can be run on both
>>big and LITTLE pcpus, depending on the scheduler's and energy
>>management's decisions?
>>
>>This does not seem to match with what has been said at some point in
>>this thread... And if it's like that, how's that possible, if the
>>pcpus' ISAs are (even only slightly) different?
>
>Right, at some point I mentioned that the set of errata and features will be
>different between processor.
>
>However, it is possible to sanitize the feature registers to expose a common
>set to the guest. This is what is done in Linux at boot time, only the
>features common to all the CPUs will be enabled.
>
>This allows a task to migrate between big and LITTLE CPUs seamlessly.
>
>>
>>>This will also remove the real
>>>benefits of big.LITTLE because Xen will not be able to migrate vCPU
>>>across cluster for power efficiency.
>>>
>>>If we care about power efficiency, we would have to handle
>>>seamlessly
>>>big.LITTLE in Xen (i.e a guess would only see a kind of CPU).
>>>
>>Well, I'm a big fan of an approach that leaves the guests' scheduler
>>dumb about things like these (i.e., load balancing, energy efficiency,
>>etc), and hence puts Xen in charge. In fact, on a Xen system, it is
>>only Xen that has all the info necessary to make wise decisions (e.g.,
>>the load of the _whole_ host, the effect of any decisions on the
>>_whole_ host, etc).
>>
>>But this case may be a LITTLE.bit ( :-PP ) different.
>>
>>Anyway, I guess I'll way your reply to my question above before
>>commenting more.
>>
>>>This arise
>>>quite few problem, nothing insurmountable, similar to migration
>>>across
>>>two platforms with different micro-architecture (e.g processors):
>>>errata, features supported... The guest would have to know the union
>>>of
>>>all the errata (this is done so far via the MIDR, so we would a PV
>>>way
>>>to do it), and only the intersection of features would be exposed to
>>>the
>>>guest. This also means the scheduler would have to be modified to
>>>handle
>>>power efficiency (not strictly necessary at the beginning).
>>>
>>>I agree that a such solution would require some work to implement,
>>>although Xen will have a better control of the energy consumption of
>>>the
>>>platform.
>>>
>>>So the question here, is what do we want to achieve with big.LITTLE?
>>>
>>Just thinking out loud here. So, instead of "just", as George
>>suggested:
>>
>> vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"]
>>
>>we can allow something like the following (note that I'm tossing out
>>random numbers next to the 'A's):
>>
>> vcpuclass = ["0-1:A35", "2-5:A53,A17", "6-7:A72,A24,A31", "12-13:A8"]
>>
>>with the following meaning:
>> - vcpus 0, 1 can only run on pcpus of class A35
>> - vcpus 2,3,4,5 can run on pcpus of class A53 _and_ on pcpus of class
>>   A17
>> - vcpus 6,7 can run on pcpus of class A72, A24, A31
>> - vcpus 8,9,10,11 --since they're not mentioned, can run on pcpus of
>>   any class
>> - vcpus 12,13 can only run on pcpus of class A8
>>
>>This will set the "boundaries", for each vcpu. Then, within these
>>boundaries, once in the (Xen's) scheduler, we can implement whatever
>>complex/magic/silly logic we want, e.g.:
>> - only use a pcpu of class A53 for vcpus that have an average load
>>   above 50%
>> - only use a pcpu of class A31 if there are no idle pcpus of class A24
>> - only use a pcpu of class A17 for a vcpu if the total system load
>>   divided by the vcpu ID give 42 as result
>> - whatever
>>
>>This allows us to achieve both the following goals:
>> - allow Xen to take smart decisions, considering the load and the
>>   efficiency of the host as a whole
>> - allow the guest to take smart decisions, like running lightweight
>>   tasks on low power vcpus (which then Xen will run on low
>>   power pcpus, at least on a properly configured system)
>>
>>Of course this **requires** that, for instance, vcpu 6 must be able to
>>run on A72, A24 and A31 just fine, i.e., it must be possible for it to
>>block on I/O when executing on an A72 pcpu, and, later, after wakeup,
>>restart executing on an A24 pcpu.
>
>With a bit of work in Xen, it would be possible to do move the vCPU between
>big and LITTLE cpus. As mentioned above, we could sanitize the features to
>only enable a common set. You can view the big.LITTLE problem as a local live

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Stefano Stabellini

On Wed, 21 Sep 2016, Julien Grall wrote:
> > > > And in my suggestion, we allow a richer set of labels, so that the user
> > > > could also be more specific -- e.g., asking for "A15" specifically, for
> > > > example, and failing to build if there are no A15 cores present, while
> > > > allowing users to simply write "big" or "little" if they want simplicity
> > > > / things which work across different platforms.
> > > 
> > > Well, before trying to do something clever like that (i.e naming "big" and
> > > "little"), we need to have upstreamed bindings available to acknowledge
> > > the
> > > difference. AFAICT, it is not yet upstreamed for Device Tree (see [1]) and
> > > I
> > > don't know any static ACPI tables providing the similar information.
> > 
> > I like George's idea that "big" and "little" could be just convenience
> > aliases. Of course they are predicated on the necessary device tree
> > bindings being upstream. We don't need [1] to be upstream in Linux, just
> > the binding:
> > 
> > http://marc.info/?l=linux-arm-kernel=147308556729426=2
> > 
> > which has already been acked by the relevant maintainers.
> 
> This is device tree only. What about ACPI?

ACPI will come along with similar information at some point. When we'll
have it, we'll use it.


> > > I had few discussions and  more thought about big.LITTLE support in Xen.
> > > The
> > > main goal of big.LITTLE is power efficiency by moving task around and been
> > > able to idle one cluster. All the solutions suggested (including mine) so
> > > far,
> > > can be replicated by hand (except the VPIDR) so they are mostly an
> > > automatic
> > > way. This will also remove the real benefits of big.LITTLE because Xen
> > > will
> > > not be able to migrate vCPU across cluster for power efficiency.
> > 
> > The goal of the architects of big.LITTLE might have been power
> > efficiency, but of course we are free to use any features that the
> > hardware provides in the best way for Xen and the Xen community.
> 
> This is very dependent on how the big.LITTLE has been implemented by the
> hardware. Some platform can not run both big and LITTLE cores at the same
> time. You need a proper switch in the firmware/hypervisor.
 
Fair enough, that hardware wouldn't benefit from this work.


> > > If we care about power efficiency, we would have to handle seamlessly
> > > big.LITTLE in Xen (i.e a guess would only see a kind of CPU). This arise
> > > quite
> > > few problem, nothing insurmountable, similar to migration across two
> > > platforms
> > > with different micro-architecture (e.g processors): errata, features
> > > supported... The guest would have to know the union of all the errata
> > > (this is
> > > done so far via the MIDR, so we would a PV way to do it), and only the
> > > intersection of features would be exposed to the guest. This also means
> > > the
> > > scheduler would have to be modified to handle power efficiency (not
> > > strictly
> > > necessary at the beginning).
> > > 
> > > I agree that a such solution would require some work to implement,
> > > although
> > > Xen will have a better control of the energy consumption of the platform.
> > > 
> > > So the question here, is what do we want to achieve with big.LITTLE?
> > 
> > I don't think that handling seamlessly big.LITTLE in Xen is the best way
> > to do it in the scenarios where Xen on ARM is being used today. I
> > understand the principles behind it, but I don't think that it will lead
> > to good results in a virtualized environment, where there is more
> > activity and more vcpus than pcpus.
> 
> Can you detail why you don't think it will give good results?

I think big.LITTLE works well for cases where you have short clear burst
of activity while most of the time the system is quasi-idle (but not
completely idle). Basically like a smartphone. For other scenarios with
more uniform activity patterns, like a server or an infotainment system,
big.LITTLE is too big of an hammer to be used for dynamic power saving.
In those cases it is more flexible to expose all cores to VMs, so that
they can exploit all resources when necessary and idle them when they
can (with wfi or deeper sleep state if possible).


> > What we discussed in this thread so far is actionable, and gives us
> > big.LITTLE support in a short time frame. It is a good fit for Xen on
> > ARM use cases and still leads to lower power consumption with an wise
> > allocation of big and LITTLE vcpus and pcpus to guests.
> 
> How this would lead to lower power consumption?  If there is nothing
> running of the processor we would have a wfi loop which will never put
> the physical CPU in deep sleep.

I expect that by assigning appropriate tasks to big and LITTLE cores,
some big cores will be left to idle which will lead to some power
saving, especially if put idle cores in deep sleep (maybe using PSCI?).


> The main advantage of big.LITTLE is too be able to switch off a
> cluster/cpu when it is not used.

To me the main

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Julien Grall


Hi Dario,

On 21/09/2016 16:45, Dario Faggioli wrote:

On Wed, 2016-09-21 at 14:06 +0100, Julien Grall wrote:

(CC a couple of ARM folks)


Yay, thanks for this! :-)


I had few discussions and  more thought about big.LITTLE support in
Xen.
The main goal of big.LITTLE is power efficiency by moving task
around
and been able to idle one cluster. All the solutions suggested
(including mine) so far, can be replicated by hand (except the VPIDR)
so
they are mostly an automatic way.


I'm sorry, how is this (going to be) handled in Linux? Is it that any
arbitrary task executing any arbitrary binary code can be run on both
big and LITTLE pcpus, depending on the scheduler's and energy
management's decisions?

This does not seem to match with what has been said at some point in
this thread... And if it's like that, how's that possible, if the
pcpus' ISAs are (even only slightly) different?


Right, at some point I mentioned that the set of errata and features 
will be different between processor.


However, it is possible to sanitize the feature registers to expose a 
common set to the guest. This is what is done in Linux at boot time, 
only the features common to all the CPUs will be enabled.


This allows a task to migrate between big and LITTLE CPUs seamlessly.




This will also remove the real
benefits of big.LITTLE because Xen will not be able to migrate vCPU
across cluster for power efficiency.

If we care about power efficiency, we would have to handle
seamlessly
big.LITTLE in Xen (i.e a guess would only see a kind of CPU).


Well, I'm a big fan of an approach that leaves the guests' scheduler
dumb about things like these (i.e., load balancing, energy efficiency,
etc), and hence puts Xen in charge. In fact, on a Xen system, it is
only Xen that has all the info necessary to make wise decisions (e.g.,
the load of the _whole_ host, the effect of any decisions on the
_whole_ host, etc).

But this case may be a LITTLE.bit ( :-PP ) different.

Anyway, I guess I'll way your reply to my question above before
commenting more.


This arise
quite few problem, nothing insurmountable, similar to migration
across
two platforms with different micro-architecture (e.g processors):
errata, features supported... The guest would have to know the union
of
all the errata (this is done so far via the MIDR, so we would a PV
way
to do it), and only the intersection of features would be exposed to
the
guest. This also means the scheduler would have to be modified to
handle
power efficiency (not strictly necessary at the beginning).

I agree that a such solution would require some work to implement,
although Xen will have a better control of the energy consumption of
the
platform.

So the question here, is what do we want to achieve with big.LITTLE?


Just thinking out loud here. So, instead of "just", as George
suggested:

 vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"]

we can allow something like the following (note that I'm tossing out
random numbers next to the 'A's):

 vcpuclass = ["0-1:A35", "2-5:A53,A17", "6-7:A72,A24,A31", "12-13:A8"]

with the following meaning:
 - vcpus 0, 1 can only run on pcpus of class A35
 - vcpus 2,3,4,5 can run on pcpus of class A53 _and_ on pcpus of class
   A17
 - vcpus 6,7 can run on pcpus of class A72, A24, A31
 - vcpus 8,9,10,11 --since they're not mentioned, can run on pcpus of
   any class
 - vcpus 12,13 can only run on pcpus of class A8

This will set the "boundaries", for each vcpu. Then, within these
boundaries, once in the (Xen's) scheduler, we can implement whatever
complex/magic/silly logic we want, e.g.:
 - only use a pcpu of class A53 for vcpus that have an average load
   above 50%
 - only use a pcpu of class A31 if there are no idle pcpus of class A24
 - only use a pcpu of class A17 for a vcpu if the total system load
   divided by the vcpu ID give 42 as result
 - whatever

This allows us to achieve both the following goals:
 - allow Xen to take smart decisions, considering the load and the
   efficiency of the host as a whole
 - allow the guest to take smart decisions, like running lightweight
   tasks on low power vcpus (which then Xen will run on low
   power pcpus, at least on a properly configured system)

Of course this **requires** that, for instance, vcpu 6 must be able to
run on A72, A24 and A31 just fine, i.e., it must be possible for it to
block on I/O when executing on an A72 pcpu, and, later, after wakeup,
restart executing on an A24 pcpu.


With a bit of work in Xen, it would be possible to do move the vCPU 
between big and LITTLE cpus. As mentioned above, we could sanitize the 
features to only enable a common set. You can view the big.LITTLE 
problem as a local live migration between two kind of CPUs.


In your suggestion you don't mention what would happen if the guest 
configuration does not contain the affinity. Does it mean the vCPU will 
be scheduled anywhere? A pCPU/class will be chosen randomly?


To be honest, I quite like this idea. It could be used as soft/hard

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Julien Grall




On 21/09/2016 20:11, Julien Grall wrote:

Hi Stefano,

On 21/09/2016 19:13, Stefano Stabellini wrote:

On Wed, 21 Sep 2016, Julien Grall wrote:

(CC a couple of ARM folks)

On 21/09/16 11:22, George Dunlap wrote:

On 21/09/16 11:09, Julien Grall wrote:



On 20/09/16 21:17, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi Stefano,

On 20/09/2016 20:09, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi,

On 20/09/2016 12:27, George Dunlap wrote:

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan

wrote:

On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli
wrote:

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Dario Faggioli wrote:

I'd like to add a computing capability in xen/arm, like this:

struct compute_capatiliby
{
   char *core_name;
   uint32_t rank;
   uint32_t cpu_partnum;
};

struct compute_capatiliby cc=
{
  {"A72", 4, 0xd08},
  {"A57", 3, 0},
  {"A53", 2, 0xd03},
  {"A35", 1, ...},
}

Then when identify cpu, we decide which cpu is big and which
cpu is
little
according to the computing rank.

Any comments?


I think we definitely need to have Xen have some kind of idea
the
order between processors, so that the user doesn't need to
figure out
which class / pool is big and which pool is LITTLE.  Whether
this
sort
of enumeration is the best way to do that I'll let Julien and
Stefano
give their opinion.


I don't think an hardcoded list of processor in Xen is the right
solution.
There are many existing processors and combinations for big.LITTLE
so it
will
nearly be impossible to keep updated.

I would expect the firmware table (device tree, ACPI) to provide
relevant
data
for each processor and differentiate big from LITTLE core.
Note that I haven't looked at it for now. A good place to start is
looking
at
how Linux does.


That's right, see
Documentation/devicetree/bindings/arm/cpus.txt. It
is
trivial to identify the two different CPU classes and which cores
belong
to which class.t, as


The class of the CPU can be found from the MIDR, there is no need to
use the
device tree/acpi for that. Note that I don't think there is an easy
way in
ACPI (i.e not in AML) to find out the class.


It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot
cpu,
and utilize the second cluster only when the user demands it.


Why do you think the boot CPU will usually be a big one? In the case
of Juno
platform it is configurable, and the boot CPU is a little core on r2
by
default.

In any case, what we care about is differentiate between two set of
CPUs. I
don't think Xen should care about migrating a guest vCPU between big
and
LITTLE cpus. So I am not sure why we would want to know that.


No, it is not about migrating (at least yet). It is about giving
useful
information to the user. It would be nice if the user had to choose
between "big" and "LITTLE" rather than "class 0x1" and "class
0x100", or
even "A7" or "A15".


I don't think it is wise to assume that we may have only 2 kind of
CPUs
on the platform. We may have more in the future, if so how would you
name them?


I would suggest that internally Xen recognize an arbitrary number of
processor "classes", and order them according to more powerful -> less
powerful.  Then if at some point someone makes a platform with three
processors, you can say "class 0", "class 1" or "class 2".  "big" would
be an alias for "class 0" and "little" would be an alias for "class 1".


As mentioned earlier, there is no upstreamed yet device tree bindings
to know
the "power" of a CPU (see [1]



And in my suggestion, we allow a richer set of labels, so that the user
could also be more specific -- e.g., asking for "A15" specifically, for
example, and failing to build if there are no A15 cores present, while
allowing users to simply write "big" or "little" if they want
simplicity
/ things which work across different platforms.


Well, before trying to do something clever like that (i.e naming
"big" and
"little"), we need to have upstreamed bindings available to
acknowledge the
difference. AFAICT, it is not yet upstreamed for Device Tree (see
[1]) and I
don't know any static ACPI tables providing the similar information.


I like George's idea that "big" and "little" could be just convenience
aliases. Of course they are predicated on the necessary device tree
bindings being upstream. We don't need [1] to be upstream in Linux, just
the binding:

http://marc.info/?l=linux-arm-kernel=147308556729426=2

which has already been acked by the relevant maintainers.


This is device tree only. What about ACPI?





I had few discussions and  more thought about big.LITTLE support in
Xen. The
main goal of big.LITTLE is power efficiency by moving task around and
been
able to idle one cluster. All the solutions suggested (including
mine) so far,
can be replicated

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Julien Grall


Hi Stefano,

On 21/09/2016 19:13, Stefano Stabellini wrote:

On Wed, 21 Sep 2016, Julien Grall wrote:

(CC a couple of ARM folks)

On 21/09/16 11:22, George Dunlap wrote:

On 21/09/16 11:09, Julien Grall wrote:



On 20/09/16 21:17, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi Stefano,

On 20/09/2016 20:09, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi,

On 20/09/2016 12:27, George Dunlap wrote:

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan

wrote:

On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli
wrote:

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Dario Faggioli wrote:

I'd like to add a computing capability in xen/arm, like this:

struct compute_capatiliby
{
   char *core_name;
   uint32_t rank;
   uint32_t cpu_partnum;
};

struct compute_capatiliby cc=
{
  {"A72", 4, 0xd08},
  {"A57", 3, 0},
  {"A53", 2, 0xd03},
  {"A35", 1, ...},
}

Then when identify cpu, we decide which cpu is big and which
cpu is
little
according to the computing rank.

Any comments?


I think we definitely need to have Xen have some kind of idea
the
order between processors, so that the user doesn't need to
figure out
which class / pool is big and which pool is LITTLE.  Whether
this
sort
of enumeration is the best way to do that I'll let Julien and
Stefano
give their opinion.


I don't think an hardcoded list of processor in Xen is the right
solution.
There are many existing processors and combinations for big.LITTLE
so it
will
nearly be impossible to keep updated.

I would expect the firmware table (device tree, ACPI) to provide
relevant
data
for each processor and differentiate big from LITTLE core.
Note that I haven't looked at it for now. A good place to start is
looking
at
how Linux does.


That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It
is
trivial to identify the two different CPU classes and which cores
belong
to which class.t, as


The class of the CPU can be found from the MIDR, there is no need to
use the
device tree/acpi for that. Note that I don't think there is an easy
way in
ACPI (i.e not in AML) to find out the class.


It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot
cpu,
and utilize the second cluster only when the user demands it.


Why do you think the boot CPU will usually be a big one? In the case
of Juno
platform it is configurable, and the boot CPU is a little core on r2
by
default.

In any case, what we care about is differentiate between two set of
CPUs. I
don't think Xen should care about migrating a guest vCPU between big
and
LITTLE cpus. So I am not sure why we would want to know that.


No, it is not about migrating (at least yet). It is about giving useful
information to the user. It would be nice if the user had to choose
between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
even "A7" or "A15".


I don't think it is wise to assume that we may have only 2 kind of CPUs
on the platform. We may have more in the future, if so how would you
name them?


I would suggest that internally Xen recognize an arbitrary number of
processor "classes", and order them according to more powerful -> less
powerful.  Then if at some point someone makes a platform with three
processors, you can say "class 0", "class 1" or "class 2".  "big" would
be an alias for "class 0" and "little" would be an alias for "class 1".


As mentioned earlier, there is no upstreamed yet device tree bindings to know
the "power" of a CPU (see [1]



And in my suggestion, we allow a richer set of labels, so that the user
could also be more specific -- e.g., asking for "A15" specifically, for
example, and failing to build if there are no A15 cores present, while
allowing users to simply write "big" or "little" if they want simplicity
/ things which work across different platforms.


Well, before trying to do something clever like that (i.e naming "big" and
"little"), we need to have upstreamed bindings available to acknowledge the
difference. AFAICT, it is not yet upstreamed for Device Tree (see [1]) and I
don't know any static ACPI tables providing the similar information.


I like George's idea that "big" and "little" could be just convenience
aliases. Of course they are predicated on the necessary device tree
bindings being upstream. We don't need [1] to be upstream in Linux, just
the binding:

http://marc.info/?l=linux-arm-kernel=147308556729426=2

which has already been acked by the relevant maintainers.


This is device tree only. What about ACPI?





I had few discussions and  more thought about big.LITTLE support in Xen. The
main goal of big.LITTLE is power efficiency by moving task around and been
able to idle one cluster. All the solutions suggested (including mine) so far,
can be replicated by hand (except the VPIDR) so they are

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Stefano Stabellini

On Wed, 21 Sep 2016, Julien Grall wrote:
> (CC a couple of ARM folks)
> 
> On 21/09/16 11:22, George Dunlap wrote:
> > On 21/09/16 11:09, Julien Grall wrote:
> > > 
> > > 
> > > On 20/09/16 21:17, Stefano Stabellini wrote:
> > > > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > > > Hi Stefano,
> > > > > 
> > > > > On 20/09/2016 20:09, Stefano Stabellini wrote:
> > > > > > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On 20/09/2016 12:27, George Dunlap wrote:
> > > > > > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan
> > > > > > > > 
> > > > > > > > wrote:
> > > > > > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli
> > > > > > > > > wrote:
> > > > > > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > > > > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > > > > > > I'd like to add a computing capability in xen/arm, like this:
> > > > > > > > > 
> > > > > > > > > struct compute_capatiliby
> > > > > > > > > {
> > > > > > > > >char *core_name;
> > > > > > > > >uint32_t rank;
> > > > > > > > >uint32_t cpu_partnum;
> > > > > > > > > };
> > > > > > > > > 
> > > > > > > > > struct compute_capatiliby cc=
> > > > > > > > > {
> > > > > > > > >   {"A72", 4, 0xd08},
> > > > > > > > >   {"A57", 3, 0},
> > > > > > > > >   {"A53", 2, 0xd03},
> > > > > > > > >   {"A35", 1, ...},
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > Then when identify cpu, we decide which cpu is big and which
> > > > > > > > > cpu is
> > > > > > > > > little
> > > > > > > > > according to the computing rank.
> > > > > > > > > 
> > > > > > > > > Any comments?
> > > > > > > > 
> > > > > > > > I think we definitely need to have Xen have some kind of idea
> > > > > > > > the
> > > > > > > > order between processors, so that the user doesn't need to
> > > > > > > > figure out
> > > > > > > > which class / pool is big and which pool is LITTLE.  Whether
> > > > > > > > this
> > > > > > > > sort
> > > > > > > > of enumeration is the best way to do that I'll let Julien and
> > > > > > > > Stefano
> > > > > > > > give their opinion.
> > > > > > > 
> > > > > > > I don't think an hardcoded list of processor in Xen is the right
> > > > > > > solution.
> > > > > > > There are many existing processors and combinations for big.LITTLE
> > > > > > > so it
> > > > > > > will
> > > > > > > nearly be impossible to keep updated.
> > > > > > > 
> > > > > > > I would expect the firmware table (device tree, ACPI) to provide
> > > > > > > relevant
> > > > > > > data
> > > > > > > for each processor and differentiate big from LITTLE core.
> > > > > > > Note that I haven't looked at it for now. A good place to start is
> > > > > > > looking
> > > > > > > at
> > > > > > > how Linux does.
> > > > > > 
> > > > > > That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It
> > > > > > is
> > > > > > trivial to identify the two different CPU classes and which cores
> > > > > > belong
> > > > > > to which class.t, as
> > > > > 
> > > > > The class of the CPU can be found from the MIDR, there is no need to
> > > > > use the
> > > > > device tree/acpi for that. Note that I don't think there is an easy
> > > > > way in
> > > > > ACPI (i.e not in AML) to find out the class.
> > > > > 
> > > > > > It is harder to figure out which one is supposed to be
> > > > > > big and which one LITTLE. Regardless, we could default to using the
> > > > > > first cluster (usually big), which is also the cluster of the boot
> > > > > > cpu,
> > > > > > and utilize the second cluster only when the user demands it.
> > > > > 
> > > > > Why do you think the boot CPU will usually be a big one? In the case
> > > > > of Juno
> > > > > platform it is configurable, and the boot CPU is a little core on r2
> > > > > by
> > > > > default.
> > > > > 
> > > > > In any case, what we care about is differentiate between two set of
> > > > > CPUs. I
> > > > > don't think Xen should care about migrating a guest vCPU between big
> > > > > and
> > > > > LITTLE cpus. So I am not sure why we would want to know that.
> > > > 
> > > > No, it is not about migrating (at least yet). It is about giving useful
> > > > information to the user. It would be nice if the user had to choose
> > > > between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
> > > > even "A7" or "A15".
> > > 
> > > I don't think it is wise to assume that we may have only 2 kind of CPUs
> > > on the platform. We may have more in the future, if so how would you
> > > name them?
> > 
> > I would suggest that internally Xen recognize an arbitrary number of
> > processor "classes", and order them according to more powerful -> less
> > powerful.  Then if at some point someone makes a platform with three
> > processors, you can say "class 0", "class 1" or "class 2".  "big" would
> > be an alias for "class 0" and "little" would be an alias for "class 1".
> 
> As mentioned earlier, there is no

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Dario Faggioli

On Wed, 2016-09-21 at 14:06 +0100, Julien Grall wrote:
> (CC a couple of ARM folks)
> 
Yay, thanks for this! :-)

> I had few discussions and  more thought about big.LITTLE support in
> Xen. 
> The main goal of big.LITTLE is power efficiency by moving task
> around 
> and been able to idle one cluster. All the solutions suggested 
> (including mine) so far, can be replicated by hand (except the VPIDR)
> so 
> they are mostly an automatic way. 
>
I'm sorry, how is this (going to be) handled in Linux? Is it that any
arbitrary task executing any arbitrary binary code can be run on both
big and LITTLE pcpus, depending on the scheduler's and energy
management's decisions?

This does not seem to match with what has been said at some point in
this thread... And if it's like that, how's that possible, if the
pcpus' ISAs are (even only slightly) different?

> This will also remove the real 
> benefits of big.LITTLE because Xen will not be able to migrate vCPU 
> across cluster for power efficiency.
> 
> If we care about power efficiency, we would have to handle
> seamlessly 
> big.LITTLE in Xen (i.e a guess would only see a kind of CPU). 
>
Well, I'm a big fan of an approach that leaves the guests' scheduler
dumb about things like these (i.e., load balancing, energy efficiency,
etc), and hence puts Xen in charge. In fact, on a Xen system, it is
only Xen that has all the info necessary to make wise decisions (e.g.,
the load of the _whole_ host, the effect of any decisions on the
_whole_ host, etc).

But this case may be a LITTLE.bit ( :-PP ) different.

Anyway, I guess I'll way your reply to my question above before
commenting more.

> This arise 
> quite few problem, nothing insurmountable, similar to migration
> across 
> two platforms with different micro-architecture (e.g processors): 
> errata, features supported... The guest would have to know the union
> of 
> all the errata (this is done so far via the MIDR, so we would a PV
> way 
> to do it), and only the intersection of features would be exposed to
> the 
> guest. This also means the scheduler would have to be modified to
> handle 
> power efficiency (not strictly necessary at the beginning).
> 
> I agree that a such solution would require some work to implement, 
> although Xen will have a better control of the energy consumption of
> the 
> platform.
> 
> So the question here, is what do we want to achieve with big.LITTLE?
> 
Just thinking out loud here. So, instead of "just", as George
suggested:

 vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"]

we can allow something like the following (note that I'm tossing out
random numbers next to the 'A's):

 vcpuclass = ["0-1:A35", "2-5:A53,A17", "6-7:A72,A24,A31", "12-13:A8"]

with the following meaning:
 - vcpus 0, 1 can only run on pcpus of class A35
 - vcpus 2,3,4,5 can run on pcpus of class A53 _and_ on pcpus of class 
   A17
 - vcpus 6,7 can run on pcpus of class A72, A24, A31
 - vcpus 8,9,10,11 --since they're not mentioned, can run on pcpus of 
   any class
 - vcpus 12,13 can only run on pcpus of class A8

This will set the "boundaries", for each vcpu. Then, within these
boundaries, once in the (Xen's) scheduler, we can implement whatever
complex/magic/silly logic we want, e.g.:
 - only use a pcpu of class A53 for vcpus that have an average load 
   above 50%
 - only use a pcpu of class A31 if there are no idle pcpus of class A24
 - only use a pcpu of class A17 for a vcpu if the total system load 
   divided by the vcpu ID give 42 as result
 - whatever

This allows us to achieve both the following goals:
 - allow Xen to take smart decisions, considering the load and the 
   efficiency of the host as a whole
 - allow the guest to take smart decisions, like running lightweight 
   tasks on low power vcpus (which then Xen will run on low 
   power pcpus, at least on a properly configured system)

Of course this **requires** that, for instance, vcpu 6 must be able to
run on A72, A24 and A31 just fine, i.e., it must be possible for it to
block on I/O when executing on an A72 pcpu, and, later, after wakeup,
restart executing on an A24 pcpu.

If that is not possible, and doing such vcpu movement, instead than
just calling schedule.c:vcpu_migrate() (or equivalent), requires some
more complex fiddling, involving local migration --or alike--
techniques, then I honestly don't think this is something that can be
solved at the scheduler level anyway... :-O

> [1] https://lwn.net/Articles/699569/
> 
I tried to have a quick look, but I don't have the time right now, and
firthermore, it's all about ARM, and I still speak too few ARM for
properly understanding what's going on... :-(

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Dario Faggioli

On Wed, 2016-09-21 at 20:28 +0800, Peng Fan wrote:
> Use this in xl cfg file?
> vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"] ?
> 
> I am not sure. If there are more kinds of CPUs, how to handle guest
> vcpus,
> as we discussed in this thread, we tend to support different classes
> of vcpu
> for guest. But if there are many kinds of physical CPUs, we also need
> to let
> guest have so many kinds of virtual cpus?
> 
We don't _need_ to necessarily do that, or not right now.

**However**, this is the main point of spending time designing things
and/or having the kind of conversation we're having here: i.e., if the
design, and the resulting implementation, is generic enough, we may get
that for free, which would be great.

This seems to me to be the case, if we go for George's "vcpuclass=[]"
suggesion, and, even better, it doesn't look like it would make the
code much more difficult to write or complex (wrt to just allowing
"vcpus_big" and "vcpus_little").

Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Dario Faggioli

On Wed, 2016-09-21 at 10:22 +0100, George Dunlap wrote:
> On 21/09/16 09:38, Peng Fan wrote:
> > User may change the hard affinity of a vcpu, so we also need to
> > block a little
> > vcpu be scheduled to a big physical cpu. Add some checking code in
> > xen,
> > when chaning the hard affnity, check whether the cap of a vcpu is
> > compatible
> > with the cap of the physical cpus.
> 
Yes, restricting affinity changes would indeed will be necessary. Note
that this is not a limit of the 'pinning based' implementation. Even
if/when we'll have in-scheduler support, trying to pin a LITTLE vcpu to
a big pcpu should, AFAIUI, will have to fail.

I was thinking to some parameter, that we can set from xl (applicable also on 
non-big.LITTLE or non-heterogeneous configurations), for askting Xen to make 
the hard-affinity 'immutable'. That would be rather simple to do.

But I like Peng's idea of validating hard-affinity against the class even 
better! :-)

> Dario, what do we do with vNUMA / soft affinity?
> 
We do nothing actually. I mean, for now, we just accept whatever the
user asks, which might well be setting soft, or even hard, affinity of
all the domain to a set of nodes when the domain itself does not have
any memory.

This is actually another case where either immutability or restriction
of the changes that we allow to the (soft, in this case) affinity would
be useful. I'd always liked to introduce the logic that at least would
print a warning is something that looks really bad is being done, but
never got down to actually do that.

Maybe this will be the chance to improve wrt to this too! :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Julien Grall


(CC a couple of ARM folks)

On 21/09/16 11:22, George Dunlap wrote:

On 21/09/16 11:09, Julien Grall wrote:



On 20/09/16 21:17, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi Stefano,

On 20/09/2016 20:09, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi,

On 20/09/2016 12:27, George Dunlap wrote:

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
wrote:

On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Dario Faggioli wrote:

I'd like to add a computing capability in xen/arm, like this:

struct compute_capatiliby
{
   char *core_name;
   uint32_t rank;
   uint32_t cpu_partnum;
};

struct compute_capatiliby cc=
{
  {"A72", 4, 0xd08},
  {"A57", 3, 0},
  {"A53", 2, 0xd03},
  {"A35", 1, ...},
}

Then when identify cpu, we decide which cpu is big and which cpu is
little
according to the computing rank.

Any comments?


I think we definitely need to have Xen have some kind of idea the
order between processors, so that the user doesn't need to figure out
which class / pool is big and which pool is LITTLE.  Whether this
sort
of enumeration is the best way to do that I'll let Julien and Stefano
give their opinion.


I don't think an hardcoded list of processor in Xen is the right
solution.
There are many existing processors and combinations for big.LITTLE
so it
will
nearly be impossible to keep updated.

I would expect the firmware table (device tree, ACPI) to provide
relevant
data
for each processor and differentiate big from LITTLE core.
Note that I haven't looked at it for now. A good place to start is
looking
at
how Linux does.


That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
trivial to identify the two different CPU classes and which cores
belong
to which class.t, as


The class of the CPU can be found from the MIDR, there is no need to
use the
device tree/acpi for that. Note that I don't think there is an easy
way in
ACPI (i.e not in AML) to find out the class.


It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot cpu,
and utilize the second cluster only when the user demands it.


Why do you think the boot CPU will usually be a big one? In the case
of Juno
platform it is configurable, and the boot CPU is a little core on r2 by
default.

In any case, what we care about is differentiate between two set of
CPUs. I
don't think Xen should care about migrating a guest vCPU between big and
LITTLE cpus. So I am not sure why we would want to know that.


No, it is not about migrating (at least yet). It is about giving useful
information to the user. It would be nice if the user had to choose
between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
even "A7" or "A15".


I don't think it is wise to assume that we may have only 2 kind of CPUs
on the platform. We may have more in the future, if so how would you
name them?


I would suggest that internally Xen recognize an arbitrary number of
processor "classes", and order them according to more powerful -> less
powerful.  Then if at some point someone makes a platform with three
processors, you can say "class 0", "class 1" or "class 2".  "big" would
be an alias for "class 0" and "little" would be an alias for "class 1".


As mentioned earlier, there is no upstreamed yet device tree bindings to 
know the "power" of a CPU (see [1]




And in my suggestion, we allow a richer set of labels, so that the user
could also be more specific -- e.g., asking for "A15" specifically, for
example, and failing to build if there are no A15 cores present, while
allowing users to simply write "big" or "little" if they want simplicity
/ things which work across different platforms.


Well, before trying to do something clever like that (i.e naming "big" 
and "little"), we need to have upstreamed bindings available to 
acknowledge the difference. AFAICT, it is not yet upstreamed for Device 
Tree (see [1]) and I don't know any static ACPI tables providing the 
similar information.


I had few discussions and  more thought about big.LITTLE support in Xen. 
The main goal of big.LITTLE is power efficiency by moving task around 
and been able to idle one cluster. All the solutions suggested 
(including mine) so far, can be replicated by hand (except the VPIDR) so 
they are mostly an automatic way. This will also remove the real 
benefits of big.LITTLE because Xen will not be able to migrate vCPU 
across cluster for power efficiency.


If we care about power efficiency, we would have to handle seamlessly 
big.LITTLE in Xen (i.e a guess would only see a kind of CPU). This arise 
quite few problem, nothing insurmountable, similar to migration across 
two platforms with different micro-architecture (e.g processors): 
errata, features supported... The guest

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Peng Fan

On Wed, Sep 21, 2016 at 11:09:11AM +0100, Julien Grall wrote:
>
>
>On 20/09/16 21:17, Stefano Stabellini wrote:
>>On Tue, 20 Sep 2016, Julien Grall wrote:
>>>Hi Stefano,
>>>
>>>On 20/09/2016 20:09, Stefano Stabellini wrote:
On Tue, 20 Sep 2016, Julien Grall wrote:
>Hi,
>
>On 20/09/2016 12:27, George Dunlap wrote:
>>On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
>>wrote:
>>>On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>>I'd like to add a computing capability in xen/arm, like this:
>>>
>>>struct compute_capatiliby
>>>{
>>>   char *core_name;
>>>   uint32_t rank;
>>>   uint32_t cpu_partnum;
>>>};
>>>
>>>struct compute_capatiliby cc=
>>>{
>>>  {"A72", 4, 0xd08},
>>>  {"A57", 3, 0},
>>>  {"A53", 2, 0xd03},
>>>  {"A35", 1, ...},
>>>}
>>>
>>>Then when identify cpu, we decide which cpu is big and which cpu is
>>>little
>>>according to the computing rank.
>>>
>>>Any comments?
>>
>>I think we definitely need to have Xen have some kind of idea the
>>order between processors, so that the user doesn't need to figure out
>>which class / pool is big and which pool is LITTLE.  Whether this sort
>>of enumeration is the best way to do that I'll let Julien and Stefano
>>give their opinion.
>
>I don't think an hardcoded list of processor in Xen is the right solution.
>There are many existing processors and combinations for big.LITTLE so it
>will
>nearly be impossible to keep updated.
>
>I would expect the firmware table (device tree, ACPI) to provide relevant
>data
>for each processor and differentiate big from LITTLE core.
>Note that I haven't looked at it for now. A good place to start is looking
>at
>how Linux does.

That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
trivial to identify the two different CPU classes and which cores belong
to which class.t, as
>>>
>>>The class of the CPU can be found from the MIDR, there is no need to use the
>>>device tree/acpi for that. Note that I don't think there is an easy way in
>>>ACPI (i.e not in AML) to find out the class.
>>>
It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot cpu,
and utilize the second cluster only when the user demands it.
>>>
>>>Why do you think the boot CPU will usually be a big one? In the case of Juno
>>>platform it is configurable, and the boot CPU is a little core on r2 by
>>>default.
>>>
>>>In any case, what we care about is differentiate between two set of CPUs. I
>>>don't think Xen should care about migrating a guest vCPU between big and
>>>LITTLE cpus. So I am not sure why we would want to know that.
>>
>>No, it is not about migrating (at least yet). It is about giving useful
>>information to the user. It would be nice if the user had to choose
>>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>even "A7" or "A15".
>
>I don't think it is wise to assume that we may have only 2 kind of CPUs on
>the platform. We may have more in the future, if so how would you name them?

Consider more than 2 kinds of physical cpus,
"vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"]" seems easier to be handled

Regards,
Peng.

>
>IHMO, asking the user to specify the type of CPUs he wants would be the
>easiest way (though a bit difficult for the user) and avoid us to rely on
>non-upstreamed bindings.
>
>Regards,
>
>-- 
>Julien Grall

-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Peng Fan

On Wed, Sep 21, 2016 at 10:22:14AM +0100, George Dunlap wrote:
>On 21/09/16 09:38, Peng Fan wrote:
>> On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>>> On Tue, 20 Sep 2016, Julien Grall wrote:
 Hi Stefano,

 On 20/09/2016 20:09, Stefano Stabellini wrote:
> On Tue, 20 Sep 2016, Julien Grall wrote:
>> Hi,
>>
>> On 20/09/2016 12:27, George Dunlap wrote:
>>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
>>> wrote:
 On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
> On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
 I'd like to add a computing capability in xen/arm, like this:

 struct compute_capatiliby
 {
char *core_name;
uint32_t rank;
uint32_t cpu_partnum;
 };

 struct compute_capatiliby cc=
 {
   {"A72", 4, 0xd08},
   {"A57", 3, 0},
   {"A53", 2, 0xd03},
   {"A35", 1, ...},
 }

 Then when identify cpu, we decide which cpu is big and which cpu is
 little
 according to the computing rank.

 Any comments?
>>>
>>> I think we definitely need to have Xen have some kind of idea the
>>> order between processors, so that the user doesn't need to figure out
>>> which class / pool is big and which pool is LITTLE.  Whether this sort
>>> of enumeration is the best way to do that I'll let Julien and Stefano
>>> give their opinion.
>>
>> I don't think an hardcoded list of processor in Xen is the right 
>> solution.
>> There are many existing processors and combinations for big.LITTLE so it
>> will
>> nearly be impossible to keep updated.
>>
>> I would expect the firmware table (device tree, ACPI) to provide relevant
>> data
>> for each processor and differentiate big from LITTLE core.
>> Note that I haven't looked at it for now. A good place to start is 
>> looking
>> at
>> how Linux does.
>
> That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
> trivial to identify the two different CPU classes and which cores belong
> to which class.t, as

 The class of the CPU can be found from the MIDR, there is no need to use 
 the
 device tree/acpi for that. Note that I don't think there is an easy way in
 ACPI (i.e not in AML) to find out the class.

> It is harder to figure out which one is supposed to be
> big and which one LITTLE. Regardless, we could default to using the
> first cluster (usually big), which is also the cluster of the boot cpu,
> and utilize the second cluster only when the user demands it.

 Why do you think the boot CPU will usually be a big one? In the case of 
 Juno
 platform it is configurable, and the boot CPU is a little core on r2 by
 default.

 In any case, what we care about is differentiate between two set of CPUs. I
 don't think Xen should care about migrating a guest vCPU between big and
 LITTLE cpus. So I am not sure why we would want to know that.
>>>
>>> No, it is not about migrating (at least yet). It is about giving useful
>>> information to the user. It would be nice if the user had to choose
>>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>> even "A7" or "A15".
>> 
>> As Dario mentioned in previous email,
>> for dom0 provide like this:
>> 
>> dom0_vcpus_big = 4
>> dom0_vcpus_little = 2
>> 
>> to dom0.
>> 
>> If these two no provided, we could let dom0 runs on big pcpus or big.little.
>> Anyway this is not the important point for dom0 only big or big.little.
>> 
>> For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
>> Such as:
>> 
>> vcpus.big = 2
>> vcpus.litle = 4
>
>FWIW, from a UI perspective, it would be nice if we designed the
>interface such that it *can* be used simply (i.e., just "big" or
>"little"), but can also be used more flexibly; for instance, specifying
>"A15" or "A7" instead.
>
>So maybe have a 'classifier' string; this could start by having just
>"big" and "little", but could then be extended to allow fuller ways of
>specifying specific kinds of cores.
>
>To keep the illusion of python syntax, what about something like this:
>
>vcpuclass=["big=2","little=4"]
>
>Or would it be better to have a mapping of vcpu to class?
>
>vcpuclass=["0-1:big","2-5:little"]


Both are good -:)

>
>
>> According to George's comments,
>> Then, I think we could use affinity to restrict little vcpus be scheduled on 
>> little vcpus,
>> and restrict big vcpus on big vcpus. Seems no need to consider soft 
>> affinity, use hard
>> affinity is to handle this.
>> 
>> We may need to provide some interface to let xl can get the information such 
>> as
>> big.little or smp. if it is big.little, which is

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Peng Fan

On Wed, Sep 21, 2016 at 11:15:35AM +0100, Julien Grall wrote:
>Hello Peng,
>
>On 21/09/16 09:38, Peng Fan wrote:
>>On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>>>On Tue, 20 Sep 2016, Julien Grall wrote:
On 20/09/2016 20:09, Stefano Stabellini wrote:
>On Tue, 20 Sep 2016, Julien Grall wrote:
>>On 20/09/2016 12:27, George Dunlap wrote:
>>>On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
>>>wrote:
On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>On Tue, 20 Sep 2016, Dario Faggioli wrote:
>It is harder to figure out which one is supposed to be
>big and which one LITTLE. Regardless, we could default to using the
>first cluster (usually big), which is also the cluster of the boot cpu,
>and utilize the second cluster only when the user demands it.

Why do you think the boot CPU will usually be a big one? In the case of Juno
platform it is configurable, and the boot CPU is a little core on r2 by
default.

In any case, what we care about is differentiate between two set of CPUs. I
don't think Xen should care about migrating a guest vCPU between big and
LITTLE cpus. So I am not sure why we would want to know that.
>>>
>>>No, it is not about migrating (at least yet). It is about giving useful
>>>information to the user. It would be nice if the user had to choose
>>>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>>>even "A7" or "A15".
>>
>>As Dario mentioned in previous email,
>>for dom0 provide like this:
>>
>>dom0_vcpus_big = 4
>>dom0_vcpus_little = 2
>>
>>to dom0.
>>
>>If these two no provided, we could let dom0 runs on big pcpus or big.little.
>>Anyway this is not the important point for dom0 only big or big.little.
>>
>>For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
>>Such as:
>>
>>vcpus.big = 2
>>vcpus.litle = 4
>>
>>
>>According to George's comments,
>>Then, I think we could use affinity to restrict little vcpus be scheduled on 
>>little vcpus,
>>and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, 
>>use hard
>>affinity is to handle this.
>>
>>We may need to provide some interface to let xl can get the information such 
>>as
>>big.little or smp. if it is big.little, which is big and which is little.
>>
>>For how to differentiate cpus, I am looking the linaro eas cpu topology code,
>>The code has not been upstreamed (:, but merged into google android kernel.
>>I only plan to take some necessary code, such as device tree parse and
>>cpu topology build, because we only need to know the computing capacity of 
>>each pcpu.
>>
>>Some doc about eas piece, including dts node examples:
>>https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
>
>I am reluctant to take any non-upstreamed bindings in Xen. There is a similar

Yeah. I understand :)

>series going on the lklm [1].

I'll have a look at this series, seems simpler than the code in linaro tree.

Whether the EAS cpu topology code or the series you listed, this is to let us
differentiate the cpu classes. This is not the hard point, just what
information to get from dts.

We need to reach a point that how to arrange the different cpu classes, I think.

Think we get dmips/cap from dts for each cpu, put the info into cpu_data for 
each cpu?

>
>But it sounds like it is a lot of works for little benefits (i.e giving a
>better name to the set of CPUs). The naming will also not fit if in the
>future hardware will have more than 2 kind of CPUs.

Oh. Yeah. There is possibility that an soc contains such as A35 + A53 + A72..
Then xx.big and xx.little seems not enough.

On such SoC, we still need to support big.little guest? We may not call it
big.little guest, if guest also needs A35 + A53 + A72 vcpu..

Use this in xl cfg file?
vcpuclass=["0-1:A35","2-5:A53", "6-7:A72"] ?

I am not sure. If there are more kinds of CPUs, how to handle guest vcpus,
as we discussed in this thread, we tend to support different classes of vcpu
for guest. But if there are many kinds of physical CPUs, we also need to let
guest have so many kinds of virtual cpus?

Anyway the first step for me is to differentiate the physical cpus and
add the info to cpu_data or else.

>
>[...]
>
>>I am not sure, but we may also need to handle mpidr for ARM, because big and 
>>little vcpus are supported.
>
>I am not sure to understand what you mean here.

For big.little guest, which vcpu is in cluster 0 , which is in cluster 1, also 
need
to fill related value for MPIDR for guest.

Regards,
Peng.
>
>Regards,
>
>[1] https://lwn.net/Articles/699569/
>
>-- 
>Julien Grall

-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread George Dunlap

On 21/09/16 11:09, Julien Grall wrote:
> 
> 
> On 20/09/16 21:17, Stefano Stabellini wrote:
>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>> Hi Stefano,
>>>
>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
 On Tue, 20 Sep 2016, Julien Grall wrote:
> Hi,
>
> On 20/09/2016 12:27, George Dunlap wrote:
>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
>> wrote:
>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
 On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>> I'd like to add a computing capability in xen/arm, like this:
>>>
>>> struct compute_capatiliby
>>> {
>>>char *core_name;
>>>uint32_t rank;
>>>uint32_t cpu_partnum;
>>> };
>>>
>>> struct compute_capatiliby cc=
>>> {
>>>   {"A72", 4, 0xd08},
>>>   {"A57", 3, 0},
>>>   {"A53", 2, 0xd03},
>>>   {"A35", 1, ...},
>>> }
>>>
>>> Then when identify cpu, we decide which cpu is big and which cpu is
>>> little
>>> according to the computing rank.
>>>
>>> Any comments?
>>
>> I think we definitely need to have Xen have some kind of idea the
>> order between processors, so that the user doesn't need to figure out
>> which class / pool is big and which pool is LITTLE.  Whether this
>> sort
>> of enumeration is the best way to do that I'll let Julien and Stefano
>> give their opinion.
>
> I don't think an hardcoded list of processor in Xen is the right
> solution.
> There are many existing processors and combinations for big.LITTLE
> so it
> will
> nearly be impossible to keep updated.
>
> I would expect the firmware table (device tree, ACPI) to provide
> relevant
> data
> for each processor and differentiate big from LITTLE core.
> Note that I haven't looked at it for now. A good place to start is
> looking
> at
> how Linux does.

 That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
 trivial to identify the two different CPU classes and which cores
 belong
 to which class.t, as
>>>
>>> The class of the CPU can be found from the MIDR, there is no need to
>>> use the
>>> device tree/acpi for that. Note that I don't think there is an easy
>>> way in
>>> ACPI (i.e not in AML) to find out the class.
>>>
 It is harder to figure out which one is supposed to be
 big and which one LITTLE. Regardless, we could default to using the
 first cluster (usually big), which is also the cluster of the boot cpu,
 and utilize the second cluster only when the user demands it.
>>>
>>> Why do you think the boot CPU will usually be a big one? In the case
>>> of Juno
>>> platform it is configurable, and the boot CPU is a little core on r2 by
>>> default.
>>>
>>> In any case, what we care about is differentiate between two set of
>>> CPUs. I
>>> don't think Xen should care about migrating a guest vCPU between big and
>>> LITTLE cpus. So I am not sure why we would want to know that.
>>
>> No, it is not about migrating (at least yet). It is about giving useful
>> information to the user. It would be nice if the user had to choose
>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>> even "A7" or "A15".
> 
> I don't think it is wise to assume that we may have only 2 kind of CPUs
> on the platform. We may have more in the future, if so how would you
> name them?

I would suggest that internally Xen recognize an arbitrary number of
processor "classes", and order them according to more powerful -> less
powerful.  Then if at some point someone makes a platform with three
processors, you can say "class 0", "class 1" or "class 2".  "big" would
be an alias for "class 0" and "little" would be an alias for "class 1".

And in my suggestion, we allow a richer set of labels, so that the user
could also be more specific -- e.g., asking for "A15" specifically, for
example, and failing to build if there are no A15 cores present, while
allowing users to simply write "big" or "little" if they want simplicity
/ things which work across different platforms.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Julien Grall


Hello Peng,

On 21/09/16 09:38, Peng Fan wrote:

On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

On 20/09/2016 20:09, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

On 20/09/2016 12:27, George Dunlap wrote:

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
wrote:

On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Dario Faggioli wrote:

It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot cpu,
and utilize the second cluster only when the user demands it.


Why do you think the boot CPU will usually be a big one? In the case of Juno
platform it is configurable, and the boot CPU is a little core on r2 by
default.

In any case, what we care about is differentiate between two set of CPUs. I
don't think Xen should care about migrating a guest vCPU between big and
LITTLE cpus. So I am not sure why we would want to know that.


No, it is not about migrating (at least yet). It is about giving useful
information to the user. It would be nice if the user had to choose
between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
even "A7" or "A15".


As Dario mentioned in previous email,
for dom0 provide like this:

dom0_vcpus_big = 4
dom0_vcpus_little = 2

to dom0.

If these two no provided, we could let dom0 runs on big pcpus or big.little.
Anyway this is not the important point for dom0 only big or big.little.

For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
Such as:

vcpus.big = 2
vcpus.litle = 4


According to George's comments,
Then, I think we could use affinity to restrict little vcpus be scheduled on 
little vcpus,
and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, 
use hard
affinity is to handle this.

We may need to provide some interface to let xl can get the information such as
big.little or smp. if it is big.little, which is big and which is little.

For how to differentiate cpus, I am looking the linaro eas cpu topology code,
The code has not been upstreamed (:, but merged into google android kernel.
I only plan to take some necessary code, such as device tree parse and
cpu topology build, because we only need to know the computing capacity of each 
pcpu.

Some doc about eas piece, including dts node examples:
https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt


I am reluctant to take any non-upstreamed bindings in Xen. There is a 
similar series going on the lklm [1].


But it sounds like it is a lot of works for little benefits (i.e giving 
a better name to the set of CPUs). The naming will also not fit if in 
the future hardware will have more than 2 kind of CPUs.


[...]


I am not sure, but we may also need to handle mpidr for ARM, because big and 
little vcpus are supported.


I am not sure to understand what you mean here.

Regards,

[1] https://lwn.net/Articles/699569/

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Julien Grall




On 20/09/16 21:17, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi Stefano,

On 20/09/2016 20:09, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi,

On 20/09/2016 12:27, George Dunlap wrote:

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
wrote:

On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Dario Faggioli wrote:

I'd like to add a computing capability in xen/arm, like this:

struct compute_capatiliby
{
   char *core_name;
   uint32_t rank;
   uint32_t cpu_partnum;
};

struct compute_capatiliby cc=
{
  {"A72", 4, 0xd08},
  {"A57", 3, 0},
  {"A53", 2, 0xd03},
  {"A35", 1, ...},
}

Then when identify cpu, we decide which cpu is big and which cpu is
little
according to the computing rank.

Any comments?


I think we definitely need to have Xen have some kind of idea the
order between processors, so that the user doesn't need to figure out
which class / pool is big and which pool is LITTLE.  Whether this sort
of enumeration is the best way to do that I'll let Julien and Stefano
give their opinion.


I don't think an hardcoded list of processor in Xen is the right solution.
There are many existing processors and combinations for big.LITTLE so it
will
nearly be impossible to keep updated.

I would expect the firmware table (device tree, ACPI) to provide relevant
data
for each processor and differentiate big from LITTLE core.
Note that I haven't looked at it for now. A good place to start is looking
at
how Linux does.


That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
trivial to identify the two different CPU classes and which cores belong
to which class.t, as


The class of the CPU can be found from the MIDR, there is no need to use the
device tree/acpi for that. Note that I don't think there is an easy way in
ACPI (i.e not in AML) to find out the class.


It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot cpu,
and utilize the second cluster only when the user demands it.


Why do you think the boot CPU will usually be a big one? In the case of Juno
platform it is configurable, and the boot CPU is a little core on r2 by
default.

In any case, what we care about is differentiate between two set of CPUs. I
don't think Xen should care about migrating a guest vCPU between big and
LITTLE cpus. So I am not sure why we would want to know that.


No, it is not about migrating (at least yet). It is about giving useful
information to the user. It would be nice if the user had to choose
between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
even "A7" or "A15".


I don't think it is wise to assume that we may have only 2 kind of CPUs 
on the platform. We may have more in the future, if so how would you 
name them?


IHMO, asking the user to specify the type of CPUs he wants would be the 
easiest way (though a bit difficult for the user) and avoid us to rely 
on non-upstreamed bindings.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Dario Faggioli

On Tue, 2016-09-20 at 18:03 +0800, Peng Fan wrote:
> Hi Dario,
> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
> > 
> > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > 
> > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > 
> > > > And this would work even if/when there is only one cpupool, or
> > > > in
> > > > general for domains that are in a pool that has both big and
> > > > LITTLE
> > > > pcpus. Furthermore, big.LITTLE support and cpupools will be
> > > > orthogonal,
> > > > just like pinning and cpupools are orthogonal right now. I.e.,
> > > > once
> > > > we
> > > > will have what I described above, nothing prevents us from
> > > > implementing
> > > > per-vcpu cpupool membership, and either create the two (or
> > > > more!)
> > > > big
> > > > and LITTLE pools, or from mixing things even more, for more
> > > > complex
> > > > and
> > > > specific use cases. :-)
> > > 
> > > I think that everybody agrees that this is the best long term
> > > solution.
> > > 
> > Well, no, that wasn't obvious to me. If that's the case, it's
> > already
> > something! :-)
> > 
> > > 
> > > > 
> > > > 
> > > > Actually, with the cpupool solution, if you want a guest (or
> > > > dom0)
> > > > to
> > > > actually have both big and LITTLE vcpus, you necessarily have
> > > > to
> > > > implement per-vcpu (rather than per-domain, as it is now)
> > > > cpupool
> > > > membership. I said myself it's not impossible, but certainly
> > > > it's
> > > > some
> > > > work... with the scheduler solution you basically get that for
> > > > free!
> > > > 
> > > > So, basically, if we use cpupools for the basics of big.LITTLE
> > > > support,
> > > > there's no way out of it (apart from going implementing
> > > > scheduling
> > > > support afterwords, but that looks backwards to me, especially
> > > > when
> > > > thinking at it with the code in mind).
> > > 
> > > The question is: what is the best short-term solution we can ask
> > > Peng
> > > to
> > > implement that allows Xen to run on big.LITTLE systems today?
> > > Possibly
> > > getting us closer to the long term solution, or at least not
> > > farther
> > > from it?
> > > 
> > So, I still have to look closely at the patches in these series.
> > But,
> > with Credit2 in mind, if one:
> > 
> > ??- take advantage of the knowledge of what arch a pcpu belongs
> > inside??
> 
> > 
> > ?? ??the code that arrange the pcpus in runqueues, which means
> > we'll end??
> > ?? ??up with big runqueues and LITTLE runqueues. I re-wrote that
> > code, I
> > ?? ??can provide pointers and help, if necessary;
> > ??- tweak the one or two instance of for_each_runqueue() [*] that
> > there
> > ?? ??are in the code into a for_each_runqueue_of_same_class(),
> > i.e.:
> 
> Do you have plan to add this support for big.LITTLE?
> 
> I admit that this is the first time I look into the scheduler part.
> If I understand wrongly, please correct me.
> 
No, I was not really planning to work on this directly myself... I was
only providing opinions and advice.

That of course may change, e.g., if we think that it is absolutely and
of capital importance for Xen to gain big.LITTLE support in matter of
days. :-)  That's a bit unlikely at this stage anyway, though, even
independently of who'll work on that, given where we stand in Xen 4.8
release process.

In any case, I'm happy to help, though, with any kind of advice --as
I'm already trying to do-- but also in a more concrete way, on actual
code... but I strongly think that it's better if you lead the effort,
e.g., by trying to do what we agree upon, and ask immediately, as soon
as you get stuck. :-)

> There is a runqueue for each physical cpu, and there are several
> vcpus in the runqueue.
> The scheduler will pick a vcpu in the runqueue to run on the physical
> cpu.
> 
If you start by "just" using pinning, as I envisioned for early
support, and that also George is suggesting as first step, there's
going to be nothing to do withing Xen and on scheduler's runqueue at
all.

And it won't actually even be wasted effort, because all the code for
parsing and implementing the interface in xl and libxl, will be
reusable for when we'll switch to ditch implicit pinning and integrate
the mechanism within the scheduler's logic.

> A vcpu is bind to a physical cpu when alloc_vcpu, but the vcpu can be
> scheduled
> or migrated to a different physical cpu.
> 
> Settings cpu soft affinity and hard affinity to restrict vcpus be
> scheduled
> on specific cpus. Then is there a need to introuduce more runqueues?
> 
No, it's all more dynamic and --allow me-- more elegant than this that
you describe... But I do understand the fact that you've never looked
at scheduling code, so it's ok to not have this clear. :-_

> This seems more complicated than cpupool (:
> 
Nah, it's not... It may be a comparable amount of effort, but for a
better end result! :-)

Regards,
Dario
-- 
<> (Raistlin Majere)

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread George Dunlap

On 21/09/16 09:38, Peng Fan wrote:
> On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>> On Tue, 20 Sep 2016, Julien Grall wrote:
>>> Hi Stefano,
>>>
>>> On 20/09/2016 20:09, Stefano Stabellini wrote:
 On Tue, 20 Sep 2016, Julien Grall wrote:
> Hi,
>
> On 20/09/2016 12:27, George Dunlap wrote:
>> On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
>> wrote:
>>> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
 On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>> I'd like to add a computing capability in xen/arm, like this:
>>>
>>> struct compute_capatiliby
>>> {
>>>char *core_name;
>>>uint32_t rank;
>>>uint32_t cpu_partnum;
>>> };
>>>
>>> struct compute_capatiliby cc=
>>> {
>>>   {"A72", 4, 0xd08},
>>>   {"A57", 3, 0},
>>>   {"A53", 2, 0xd03},
>>>   {"A35", 1, ...},
>>> }
>>>
>>> Then when identify cpu, we decide which cpu is big and which cpu is
>>> little
>>> according to the computing rank.
>>>
>>> Any comments?
>>
>> I think we definitely need to have Xen have some kind of idea the
>> order between processors, so that the user doesn't need to figure out
>> which class / pool is big and which pool is LITTLE.  Whether this sort
>> of enumeration is the best way to do that I'll let Julien and Stefano
>> give their opinion.
>
> I don't think an hardcoded list of processor in Xen is the right solution.
> There are many existing processors and combinations for big.LITTLE so it
> will
> nearly be impossible to keep updated.
>
> I would expect the firmware table (device tree, ACPI) to provide relevant
> data
> for each processor and differentiate big from LITTLE core.
> Note that I haven't looked at it for now. A good place to start is looking
> at
> how Linux does.

 That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
 trivial to identify the two different CPU classes and which cores belong
 to which class.t, as
>>>
>>> The class of the CPU can be found from the MIDR, there is no need to use the
>>> device tree/acpi for that. Note that I don't think there is an easy way in
>>> ACPI (i.e not in AML) to find out the class.
>>>
 It is harder to figure out which one is supposed to be
 big and which one LITTLE. Regardless, we could default to using the
 first cluster (usually big), which is also the cluster of the boot cpu,
 and utilize the second cluster only when the user demands it.
>>>
>>> Why do you think the boot CPU will usually be a big one? In the case of Juno
>>> platform it is configurable, and the boot CPU is a little core on r2 by
>>> default.
>>>
>>> In any case, what we care about is differentiate between two set of CPUs. I
>>> don't think Xen should care about migrating a guest vCPU between big and
>>> LITTLE cpus. So I am not sure why we would want to know that.
>>
>> No, it is not about migrating (at least yet). It is about giving useful
>> information to the user. It would be nice if the user had to choose
>> between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>> even "A7" or "A15".
> 
> As Dario mentioned in previous email,
> for dom0 provide like this:
> 
> dom0_vcpus_big = 4
> dom0_vcpus_little = 2
> 
> to dom0.
> 
> If these two no provided, we could let dom0 runs on big pcpus or big.little.
> Anyway this is not the important point for dom0 only big or big.little.
> 
> For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
> Such as:
> 
> vcpus.big = 2
> vcpus.litle = 4

FWIW, from a UI perspective, it would be nice if we designed the
interface such that it *can* be used simply (i.e., just "big" or
"little"), but can also be used more flexibly; for instance, specifying
"A15" or "A7" instead.

So maybe have a 'classifier' string; this could start by having just
"big" and "little", but could then be extended to allow fuller ways of
specifying specific kinds of cores.

To keep the illusion of python syntax, what about something like this:

vcpuclass=["big=2","little=4"]

Or would it be better to have a mapping of vcpu to class?

vcpuclass=["0-1:big","2-5:little"]


> According to George's comments,
> Then, I think we could use affinity to restrict little vcpus be scheduled on 
> little vcpus,
> and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, 
> use hard
> affinity is to handle this.
> 
> We may need to provide some interface to let xl can get the information such 
> as
> big.little or smp. if it is big.little, which is big and which is little.

If it's possible for Xen to order the cpus by class, then there could be
a hypercall listing the different classes starting with the largest
class.  On typical big.LITTLE systems, class 0 would be "big" and class

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-21 Thread Peng Fan

On Tue, Sep 20, 2016 at 01:17:04PM -0700, Stefano Stabellini wrote:
>On Tue, 20 Sep 2016, Julien Grall wrote:
>> Hi Stefano,
>> 
>> On 20/09/2016 20:09, Stefano Stabellini wrote:
>> > On Tue, 20 Sep 2016, Julien Grall wrote:
>> > > Hi,
>> > > 
>> > > On 20/09/2016 12:27, George Dunlap wrote:
>> > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
>> > > > wrote:
>> > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>> > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>> > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> > > > > I'd like to add a computing capability in xen/arm, like this:
>> > > > > 
>> > > > > struct compute_capatiliby
>> > > > > {
>> > > > >char *core_name;
>> > > > >uint32_t rank;
>> > > > >uint32_t cpu_partnum;
>> > > > > };
>> > > > > 
>> > > > > struct compute_capatiliby cc=
>> > > > > {
>> > > > >   {"A72", 4, 0xd08},
>> > > > >   {"A57", 3, 0},
>> > > > >   {"A53", 2, 0xd03},
>> > > > >   {"A35", 1, ...},
>> > > > > }
>> > > > > 
>> > > > > Then when identify cpu, we decide which cpu is big and which cpu is
>> > > > > little
>> > > > > according to the computing rank.
>> > > > > 
>> > > > > Any comments?
>> > > > 
>> > > > I think we definitely need to have Xen have some kind of idea the
>> > > > order between processors, so that the user doesn't need to figure out
>> > > > which class / pool is big and which pool is LITTLE.  Whether this sort
>> > > > of enumeration is the best way to do that I'll let Julien and Stefano
>> > > > give their opinion.
>> > > 
>> > > I don't think an hardcoded list of processor in Xen is the right 
>> > > solution.
>> > > There are many existing processors and combinations for big.LITTLE so it
>> > > will
>> > > nearly be impossible to keep updated.
>> > > 
>> > > I would expect the firmware table (device tree, ACPI) to provide relevant
>> > > data
>> > > for each processor and differentiate big from LITTLE core.
>> > > Note that I haven't looked at it for now. A good place to start is 
>> > > looking
>> > > at
>> > > how Linux does.
>> > 
>> > That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
>> > trivial to identify the two different CPU classes and which cores belong
>> > to which class.t, as
>> 
>> The class of the CPU can be found from the MIDR, there is no need to use the
>> device tree/acpi for that. Note that I don't think there is an easy way in
>> ACPI (i.e not in AML) to find out the class.
>> 
>> > It is harder to figure out which one is supposed to be
>> > big and which one LITTLE. Regardless, we could default to using the
>> > first cluster (usually big), which is also the cluster of the boot cpu,
>> > and utilize the second cluster only when the user demands it.
>> 
>> Why do you think the boot CPU will usually be a big one? In the case of Juno
>> platform it is configurable, and the boot CPU is a little core on r2 by
>> default.
>> 
>> In any case, what we care about is differentiate between two set of CPUs. I
>> don't think Xen should care about migrating a guest vCPU between big and
>> LITTLE cpus. So I am not sure why we would want to know that.
>
>No, it is not about migrating (at least yet). It is about giving useful
>information to the user. It would be nice if the user had to choose
>between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
>even "A7" or "A15".

As Dario mentioned in previous email,
for dom0 provide like this:

dom0_vcpus_big = 4
dom0_vcpus_little = 2

to dom0.

If these two no provided, we could let dom0 runs on big pcpus or big.little.
Anyway this is not the important point for dom0 only big or big.little.

For domU, provide "vcpus.big" and "vcpus.little" in xl configuration file.
Such as:

vcpus.big = 2
vcpus.litle = 4


According to George's comments,
Then, I think we could use affinity to restrict little vcpus be scheduled on 
little vcpus,
and restrict big vcpus on big vcpus. Seems no need to consider soft affinity, 
use hard
affinity is to handle this.

We may need to provide some interface to let xl can get the information such as
big.little or smp. if it is big.little, which is big and which is little.

For how to differentiate cpus, I am looking the linaro eas cpu topology code,
The code has not been upstreamed (:, but merged into google android kernel.
I only plan to take some necessary code, such as device tree parse and
cpu topology build, because we only need to know the computing capacity of each 
pcpu.

Some doc about eas piece, including dts node examples:
https://git.linaro.org/arm/eas/kernel.git/blob/refs/heads/lsk-v4.4-eas-v5.2:/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt

I pasted partial eas code:
for (i = 0, val = prop->value; i < nstates; i++) {
cap_states[i].cap = be32_to_cpup(val++);
cap_states[i].power = be32_to_cpup(val++);
}

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread Stefano Stabellini

On Tue, 20 Sep 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 20/09/2016 20:09, Stefano Stabellini wrote:
> > On Tue, 20 Sep 2016, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 20/09/2016 12:27, George Dunlap wrote:
> > > > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan 
> > > > wrote:
> > > > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
> > > > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > > > I'd like to add a computing capability in xen/arm, like this:
> > > > > 
> > > > > struct compute_capatiliby
> > > > > {
> > > > >char *core_name;
> > > > >uint32_t rank;
> > > > >uint32_t cpu_partnum;
> > > > > };
> > > > > 
> > > > > struct compute_capatiliby cc=
> > > > > {
> > > > >   {"A72", 4, 0xd08},
> > > > >   {"A57", 3, 0},
> > > > >   {"A53", 2, 0xd03},
> > > > >   {"A35", 1, ...},
> > > > > }
> > > > > 
> > > > > Then when identify cpu, we decide which cpu is big and which cpu is
> > > > > little
> > > > > according to the computing rank.
> > > > > 
> > > > > Any comments?
> > > > 
> > > > I think we definitely need to have Xen have some kind of idea the
> > > > order between processors, so that the user doesn't need to figure out
> > > > which class / pool is big and which pool is LITTLE.  Whether this sort
> > > > of enumeration is the best way to do that I'll let Julien and Stefano
> > > > give their opinion.
> > > 
> > > I don't think an hardcoded list of processor in Xen is the right solution.
> > > There are many existing processors and combinations for big.LITTLE so it
> > > will
> > > nearly be impossible to keep updated.
> > > 
> > > I would expect the firmware table (device tree, ACPI) to provide relevant
> > > data
> > > for each processor and differentiate big from LITTLE core.
> > > Note that I haven't looked at it for now. A good place to start is looking
> > > at
> > > how Linux does.
> > 
> > That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
> > trivial to identify the two different CPU classes and which cores belong
> > to which class.t, as
> 
> The class of the CPU can be found from the MIDR, there is no need to use the
> device tree/acpi for that. Note that I don't think there is an easy way in
> ACPI (i.e not in AML) to find out the class.
> 
> > It is harder to figure out which one is supposed to be
> > big and which one LITTLE. Regardless, we could default to using the
> > first cluster (usually big), which is also the cluster of the boot cpu,
> > and utilize the second cluster only when the user demands it.
> 
> Why do you think the boot CPU will usually be a big one? In the case of Juno
> platform it is configurable, and the boot CPU is a little core on r2 by
> default.
> 
> In any case, what we care about is differentiate between two set of CPUs. I
> don't think Xen should care about migrating a guest vCPU between big and
> LITTLE cpus. So I am not sure why we would want to know that.

No, it is not about migrating (at least yet). It is about giving useful
information to the user. It would be nice if the user had to choose
between "big" and "LITTLE" rather than "class 0x1" and "class 0x100", or
even "A7" or "A15".

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread Julien Grall


Hi Stefano,

On 20/09/2016 20:09, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Julien Grall wrote:

Hi,

On 20/09/2016 12:27, George Dunlap wrote:

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan  wrote:

On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Dario Faggioli wrote:

I'd like to add a computing capability in xen/arm, like this:

struct compute_capatiliby
{
   char *core_name;
   uint32_t rank;
   uint32_t cpu_partnum;
};

struct compute_capatiliby cc=
{
  {"A72", 4, 0xd08},
  {"A57", 3, 0},
  {"A53", 2, 0xd03},
  {"A35", 1, ...},
}

Then when identify cpu, we decide which cpu is big and which cpu is little
according to the computing rank.

Any comments?


I think we definitely need to have Xen have some kind of idea the
order between processors, so that the user doesn't need to figure out
which class / pool is big and which pool is LITTLE.  Whether this sort
of enumeration is the best way to do that I'll let Julien and Stefano
give their opinion.


I don't think an hardcoded list of processor in Xen is the right solution.
There are many existing processors and combinations for big.LITTLE so it will
nearly be impossible to keep updated.

I would expect the firmware table (device tree, ACPI) to provide relevant data
for each processor and differentiate big from LITTLE core.
Note that I haven't looked at it for now. A good place to start is looking at
how Linux does.


That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
trivial to identify the two different CPU classes and which cores belong
to which class.t, as


The class of the CPU can be found from the MIDR, there is no need to use 
the device tree/acpi for that. Note that I don't think there is an easy 
way in ACPI (i.e not in AML) to find out the class.



It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot cpu,
and utilize the second cluster only when the user demands it.


Why do you think the boot CPU will usually be a big one? In the case of 
Juno platform it is configurable, and the boot CPU is a little core on 
r2 by default.


In any case, what we care about is differentiate between two set of 
CPUs. I don't think Xen should care about migrating a guest vCPU between 
big and LITTLE cpus. So I am not sure why we would want to know that.


The only thing we need an identifier for each set (I might be the MIDR 
or the compatible in the device tree).


Note that, as Peng mentioned, Linaro is working on an energy-aware 
scheduler. So there is a way (maybe not yet upstreamed) to find the CPU 
topology.


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread Stefano Stabellini

On Tue, 20 Sep 2016, Julien Grall wrote:
> Hi,
> 
> On 20/09/2016 12:27, George Dunlap wrote:
> > On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan  wrote:
> > > On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
> > > > On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> > > > > On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > > I'd like to add a computing capability in xen/arm, like this:
> > > 
> > > struct compute_capatiliby
> > > {
> > >char *core_name;
> > >uint32_t rank;
> > >uint32_t cpu_partnum;
> > > };
> > > 
> > > struct compute_capatiliby cc=
> > > {
> > >   {"A72", 4, 0xd08},
> > >   {"A57", 3, 0},
> > >   {"A53", 2, 0xd03},
> > >   {"A35", 1, ...},
> > > }
> > > 
> > > Then when identify cpu, we decide which cpu is big and which cpu is little
> > > according to the computing rank.
> > > 
> > > Any comments?
> > 
> > I think we definitely need to have Xen have some kind of idea the
> > order between processors, so that the user doesn't need to figure out
> > which class / pool is big and which pool is LITTLE.  Whether this sort
> > of enumeration is the best way to do that I'll let Julien and Stefano
> > give their opinion.
> 
> I don't think an hardcoded list of processor in Xen is the right solution.
> There are many existing processors and combinations for big.LITTLE so it will
> nearly be impossible to keep updated.
> 
> I would expect the firmware table (device tree, ACPI) to provide relevant data
> for each processor and differentiate big from LITTLE core.
> Note that I haven't looked at it for now. A good place to start is looking at
> how Linux does.

That's right, see Documentation/devicetree/bindings/arm/cpus.txt. It is
trivial to identify the two different CPU classes and which cores belong
to which class. It is harder to figure out which one is supposed to be
big and which one LITTLE. Regardless, we could default to using the
first cluster (usually big), which is also the cluster of the boot cpu,
and utilize the second cluster only when the user demands it.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread Dario Faggioli

On Tue, 2016-09-20 at 17:34 +0200, Julien Grall wrote:
> On 20/09/2016 12:27, George Dunlap wrote:
> > I think we definitely need to have Xen have some kind of idea the
> > order between processors, so that the user doesn't need to figure
> > out
> > which class / pool is big and which pool is LITTLE.  Whether this
> > sort
> > of enumeration is the best way to do that I'll let Julien and
> > Stefano
> > give their opinion.
> 
> I don't think an hardcoded list of processor in Xen is the right 
> solution. There are many existing processors and combinations for 
> big.LITTLE so it will nearly be impossible to keep updated.
> 
As far as either the scheduler or cpupools go, what's necessary would
be:
 - in Xen, a function (or an array acting as a map, or whatever) to 
   call to know whether pcpu X is big or LITTLE;
 - at toolstack level, an hypercal (or a field, bit, whatever in a
   struct already returned by an existing hypercall) to know the same 
   thing, i.e., whether c pcpu is big or LITTLE.

Once we have this, we can do everything. We will probably want to
abstract things a bit, and make them as generic as practical, so that
the same interface can be used not only for ARM big.LITTLE, but for
whatever future heterogeneous cpu arch we'll support... but really, the
actual information that we need is "just" that.

I've absolutely no idea how such info could be achieved, and I have no
ARM big.LITTLE hardware to test on.

> I would expect the firmware table (device tree, ACPI) to provide 
> relevant data for each processor and differentiate big from LITTLE
> core.
> Note that I haven't looked at it for now. A good place to start is 
> looking at how Linux does.
> 
Makes sense.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread Julien Grall


Hi,

On 20/09/2016 12:27, George Dunlap wrote:

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan  wrote:

On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:

On Tue, 20 Sep 2016, Dario Faggioli wrote:

I'd like to add a computing capability in xen/arm, like this:

struct compute_capatiliby
{
   char *core_name;
   uint32_t rank;
   uint32_t cpu_partnum;
};

struct compute_capatiliby cc=
{
  {"A72", 4, 0xd08},
  {"A57", 3, 0},
  {"A53", 2, 0xd03},
  {"A35", 1, ...},
}

Then when identify cpu, we decide which cpu is big and which cpu is little
according to the computing rank.

Any comments?


I think we definitely need to have Xen have some kind of idea the
order between processors, so that the user doesn't need to figure out
which class / pool is big and which pool is LITTLE.  Whether this sort
of enumeration is the best way to do that I'll let Julien and Stefano
give their opinion.


I don't think an hardcoded list of processor in Xen is the right 
solution. There are many existing processors and combinations for 
big.LITTLE so it will nearly be impossible to keep updated.


I would expect the firmware table (device tree, ACPI) to provide 
relevant data for each processor and differentiate big from LITTLE core.
Note that I haven't looked at it for now. A good place to start is 
looking at how Linux does.


Regards,


--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread George Dunlap

On Tue, Sep 20, 2016 at 11:03 AM, Peng Fan  wrote:
> Hi Dario,
> On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>>> > And this would work even if/when there is only one cpupool, or in
>>> > general for domains that are in a pool that has both big and LITTLE
>>> > pcpus. Furthermore, big.LITTLE support and cpupools will be
>>> > orthogonal,
>>> > just like pinning and cpupools are orthogonal right now. I.e., once
>>> > we
>>> > will have what I described above, nothing prevents us from
>>> > implementing
>>> > per-vcpu cpupool membership, and either create the two (or more!)
>>> > big
>>> > and LITTLE pools, or from mixing things even more, for more complex
>>> > and
>>> > specific use cases. :-)
>>>
>>> I think that everybody agrees that this is the best long term
>>> solution.
>>>
>>Well, no, that wasn't obvious to me. If that's the case, it's already
>>something! :-)
>>
>>> >
>>> > Actually, with the cpupool solution, if you want a guest (or dom0)
>>> > to
>>> > actually have both big and LITTLE vcpus, you necessarily have to
>>> > implement per-vcpu (rather than per-domain, as it is now) cpupool
>>> > membership. I said myself it's not impossible, but certainly it's
>>> > some
>>> > work... with the scheduler solution you basically get that for
>>> > free!
>>> >
>>> > So, basically, if we use cpupools for the basics of big.LITTLE
>>> > support,
>>> > there's no way out of it (apart from going implementing scheduling
>>> > support afterwords, but that looks backwards to me, especially when
>>> > thinking at it with the code in mind).
>>>
>>> The question is: what is the best short-term solution we can ask Peng
>>> to
>>> implement that allows Xen to run on big.LITTLE systems today?
>>> Possibly
>>> getting us closer to the long term solution, or at least not farther
>>> from it?
>>>
>>So, I still have to look closely at the patches in these series. But,
>>with Credit2 in mind, if one:
>>
>>??- take advantage of the knowledge of what arch a pcpu belongs inside??
>
>>?? ??the code that arrange the pcpus in runqueues, which means we'll end??
>>?? ??up with big runqueues and LITTLE runqueues. I re-wrote that code, I
>>?? ??can provide pointers and help, if necessary;
>>??- tweak the one or two instance of for_each_runqueue() [*] that there
>>?? ??are in the code into a for_each_runqueue_of_same_class(), i.e.:
>
> Do you have plan to add this support for big.LITTLE?
>
> I admit that this is the first time I look into the scheduler part.
> If I understand wrongly, please correct me.
>
> There is a runqueue for each physical cpu, and there are several vcpus in the 
> runqueue.
> The scheduler will pick a vcpu in the runqueue to run on the physical cpu.
>
> A vcpu is bind to a physical cpu when alloc_vcpu, but the vcpu can be 
> scheduled
> or migrated to a different physical cpu.
>
> Settings cpu soft affinity and hard affinity to restrict vcpus be scheduled
> on specific cpus. Then is there a need to introuduce more runqueues?

Runqueues is a scheduler-specific thing.  The simplest thing to do
would be, in the toolstack, to limit the hard affinity of a vcpu to
its cpu class (either big or LITTLE).  Then the scheduler will simply
do the right thing.

> This seems more complicated than cpupool (:

It's more complicated than simply making 2 cpupools and having any
given domain be entirely big or entirely LITTLE.

But it's a lot *less* complicated than trying to make a single domain
cross two different kinds of cpupools. :-)

>>As mentioned in previous mail, and as drafted when replying to Peng,
>>the only think that the user should know is how many big and how many
>>LITTLE vcpus she wants (and, potentially, which one would be each). :-)
>
> Yeah. Comes a new question to me.
>
> For big.LITTLE, how to decide the physical cpu is a big CPU or a little cpu?
>
> I'd like to add a computing capability in xen/arm, like this:
>
> struct compute_capatiliby
> {
>char *core_name;
>uint32_t rank;
>uint32_t cpu_partnum;
> };
>
> struct compute_capatiliby cc=
> {
>   {"A72", 4, 0xd08},
>   {"A57", 3, 0},
>   {"A53", 2, 0xd03},
>   {"A35", 1, ...},
> }
>
> Then when identify cpu, we decide which cpu is big and which cpu is little
> according to the computing rank.
>
> Any comments?

I think we definitely need to have Xen have some kind of idea the
order between processors, so that the user doesn't need to figure out
which class / pool is big and which pool is LITTLE.  Whether this sort
of enumeration is the best way to do that I'll let Julien and Stefano
give their opinion.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread George Dunlap

On Tue, Sep 20, 2016 at 1:01 AM, Stefano Stabellini
 wrote:

>> Actually, with the cpupool solution, if you want a guest (or dom0) to
>> actually have both big and LITTLE vcpus, you necessarily have to
>> implement per-vcpu (rather than per-domain, as it is now) cpupool
>> membership. I said myself it's not impossible, but certainly it's some
>> work... with the scheduler solution you basically get that for free!
>>
>> So, basically, if we use cpupools for the basics of big.LITTLE support,
>> there's no way out of it (apart from going implementing scheduling
>> support afterwords, but that looks backwards to me, especially when
>> thinking at it with the code in mind).
>
> The question is: what is the best short-term solution we can ask Peng to
> implement that allows Xen to run on big.LITTLE systems today? Possibly
> getting us closer to the long term solution, or at least not farther
> from it?

So remember that there's the *interface* we're providing to the user
(specifying vcpus as little or BIG) and the guest OS (how does the
guest OS know whether a specific vcpu is little or BIG), and there's
the *implementation* of that.

For comparison, xl provides a "guest numa" interface; but the Xen
schedulers actually don't know anything about NUMA -- they only have
the concept of "soft affinity".  xl uses soft affinity to implement
the NUMA characteristics we want from the scheduler.

It seems to me that the best combination of functionality / simplicity
would be to provide a way to specify, in xl, which vcpus should be big
and LITTLE (something similar to what Dario mentions below); and then
to implement that initially only with cpu pinning inside of xl.

Then at some point we can extend that to tagging each vcpu with a
"pcpu class", and teaching schedulers about those classes, and making
sure each vcpu runs only within its own class.  This effectively
amounts to another layer of hard pinning, but one which is a bit more
robust (i.e., won't be confused if the user tries to set the hard
affinity of a vcpu).

That said, if the goal is to get *something* up as quick as humanly
possible, implementing "xl cpupool-bigLITTLE-split" (or a
'cpupool_setup=biglittle' Xen command-line option) would probably do
the job; but only by limiting domains to having only big or LITTLE
vcpus.

> Sure, but it needs to be very clear. We cannot ask people to spot
> architecture specific flags among the output of `xl info' to be able to
> appropriately start a guest. Even what I suggested isn't great as `xl
> cpupool-list' isn't a common command to run.

Well fundamentally, given that a vcpu has to start and stay on the
same class of processors, the user will have to make some action to
deviate from the default.  I don't fundamentally see any difference
between saying "In order to use the LITTLE cpus you have to specify
the LITTLE cpupool" and "In order to use the LITTLE cpus you have to
specify some vcpus as LITTLE vcpus".

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread Peng Fan

Hi Dario,
On Tue, Sep 20, 2016 at 02:54:06AM +0200, Dario Faggioli wrote:
>On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
>> On Tue, 20 Sep 2016, Dario Faggioli wrote:
>> > And this would work even if/when there is only one cpupool, or in
>> > general for domains that are in a pool that has both big and LITTLE
>> > pcpus. Furthermore, big.LITTLE support and cpupools will be
>> > orthogonal,
>> > just like pinning and cpupools are orthogonal right now. I.e., once
>> > we
>> > will have what I described above, nothing prevents us from
>> > implementing
>> > per-vcpu cpupool membership, and either create the two (or more!)
>> > big
>> > and LITTLE pools, or from mixing things even more, for more complex
>> > and
>> > specific use cases. :-)
>> 
>> I think that everybody agrees that this is the best long term
>> solution.
>> 
>Well, no, that wasn't obvious to me. If that's the case, it's already
>something! :-)
>
>> > 
>> > Actually, with the cpupool solution, if you want a guest (or dom0)
>> > to
>> > actually have both big and LITTLE vcpus, you necessarily have to
>> > implement per-vcpu (rather than per-domain, as it is now) cpupool
>> > membership. I said myself it's not impossible, but certainly it's
>> > some
>> > work... with the scheduler solution you basically get that for
>> > free!
>> > 
>> > So, basically, if we use cpupools for the basics of big.LITTLE
>> > support,
>> > there's no way out of it (apart from going implementing scheduling
>> > support afterwords, but that looks backwards to me, especially when
>> > thinking at it with the code in mind).
>> 
>> The question is: what is the best short-term solution we can ask Peng
>> to
>> implement that allows Xen to run on big.LITTLE systems today?
>> Possibly
>> getting us closer to the long term solution, or at least not farther
>> from it?
>> 
>So, I still have to look closely at the patches in these series. But,
>with Credit2 in mind, if one:
>
>??- take advantage of the knowledge of what arch a pcpu belongs inside??

>?? ??the code that arrange the pcpus in runqueues, which means we'll end??
>?? ??up with big runqueues and LITTLE runqueues. I re-wrote that code, I
>?? ??can provide pointers and help, if necessary;
>??- tweak the one or two instance of for_each_runqueue() [*] that there
>?? ??are in the code into a for_each_runqueue_of_same_class(), i.e.:

Do you have plan to add this support for big.LITTLE?

I admit that this is the first time I look into the scheduler part.
If I understand wrongly, please correct me.

There is a runqueue for each physical cpu, and there are several vcpus in the 
runqueue.
The scheduler will pick a vcpu in the runqueue to run on the physical cpu.

A vcpu is bind to a physical cpu when alloc_vcpu, but the vcpu can be scheduled
or migrated to a different physical cpu.

Settings cpu soft affinity and hard affinity to restrict vcpus be scheduled
on specific cpus. Then is there a need to introuduce more runqueues?

This seems more complicated than cpupool (:

>
>??if (is_big(this_cpu))
>??{
>?? ??for_each_big_runqueue()
>?? ??{
>?? ?? ?? ..
>?? ??}
>??}
>??else
>??{
>?? ??for_each_LITTLE_runqueue()
>?? ??{
>?? ?? ??..
>?? ??}
>??}??
>
>then big.LITTLE support in Credit2 would be done already, and all it
>would be left is support for the syntax of new config switches in xl,
>and a way of telling, from xl/libxl down to Xen, what arch a vcpu
>belongs to, so that it can be associated with one runqueue of the
>proper class.
>
>Thinking to Credit1, we need to make sure thet, in load_balance() and
>runq_steal(), a LITTLE cpu *only* ever try to steal work from another
>LITTLE cpu, and __never__ from a big cpu (and vice versa). And also
>that when a vcpu wakes up, and what it has in its v->processor is a
>LITTLE pcpu, that only LITTLE processors are considered for being
>tickled (I'm less certain of this last part, but it should be more or
>less like this).
>
>Then, of course the the same glue and vcpu classification code.
>
>However, in Credit1, it's possible that a trick like that would affect
>the accounting and credit algorithm, and hence provide unfair, or in
>general, unexpected results. Credit2 should, OTOH, be a lot mere
>resilient, wrt that.
>
>> > > The whole process would be more explicit and obvious if we used
>> > > cpupools. It would be easier for users to know what it is going
>> > > on --
>> > > they just need to issue an `xl cpupool-list' command and they
>> > > would
>> > > see
>> > > two clearly named pools (something like big-pool and LITTLE-
>> > > pool).??
>> > > 
>> > Well, I guess that, as part of big.LITTLE support, there will be a
>> > way
>> > to tell what pcpus are big and which are LITTLE anyway, probably
>> > both
>> > from `xl info' and from `xl cpupool-list -c' (and most likely in
>> > other
>> > ways too).
>> 
>> Sure, but it needs to be very clear. We cannot ask people to spot
>> architecture specific flags among the output of `xl info' to be able
>> to
>> appropriately start

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-20 Thread Peng Fan

On Tue, Sep 20, 2016 at 02:11:04AM +0200, Dario Faggioli wrote:
>On Mon, 2016-09-19 at 21:33 +0800, Peng Fan wrote:
>> On Mon, Sep 19, 2016 at 11:33:58AM +0100, George Dunlap wrote:
>> >??
>> > No, I think it would be a lot simpler to just teach the scheduler
>> > about
>> > different classes of cpus.credit1 would probably need to be
>> > modified
>> > so that its credit algorithm would be per-class rather than pool-
>> > wide;
>> > but credit2 shouldn't need much modification at all, other than to
>> > make
>> > sure that a given runqueue doesn't include more than one class; and
>> > to
>> > do load-balancing only with runqueues of the same class.
>> 
>> I try to follow.
>> ??- scheduler needs to be aware of different classes of cpus. ARM
>> big.Little cpus.
>>
>Yes, I think this is essential.
>
>> ??- scheduler schedules vcpus on different physical cpus in one
>> cpupool.
>>
>Yep, that's what the scheduler does. And personally, I'd start
>implementing big.LITTLE support for a situation where both big and
>LITTLE cpus coexists in the same pool.

It's great if you have plan to work on the scheduler part.

>
>> ??- different cpu classes needs to be in different runqueue.
>> 
>Yes. So, basically, imagine to use vcpu pinning to support big.LITTLE.
>I've spoken briefly about this in my reply to Juergen. You probably can
>even get something like this up-&-running by writing very few or zero
>code (you'll need --for now-- max_dom0_vcpus, dom0_vcpus_pin, and then,
>in domain config files, "cpus='...'").
>
>Then, the real goal, would be to achieve the same behavior
>automatically, by acting on runqueues' arrangement and load balancing
>logic in the scheduler(s).
>
>Anyway, sorry for my ignorance on big.LITTLE, but there's something I'm
>missing: _when_ is it that it is (or needs to be) decided whether a
>vcpu will run on a big or LITTLE core?

Big cores are more powerful than little cores, but consumes more power.
In Linux kernel, linaro is working on EAS scheduler to take advantage of 
big.LITTLE.
http://www.linaro.org/blog/core-dump/energy-aware-scheduling-eas-project/

As discussed, for big.little guest os that have big vcpu and little vcpu,
we only need to take care of big vcpu scheduled on big physical cpus, and little
vcpu sheduled on little physical cpus.
So a vcpu is not be scheduled between big and little physical cpus.

>
>Thinking to a bare metal system, I think that cpu X is, for instance, big, and 
>will always be like that; similarly, cpu Y is LITTLE.
>
>This makes me think that, for a virtual machine, it is ok to choose/specify at 
>_domain_creation_ time, which vcpus are big and which vcpus are LITTLE, is 
>this correct?
>If yes, this also means that --whatever way we find to make this happen, 
>cpupools, scheduler, etc-- the vcpus that we decided they are big, must only 
>be scheduled on actual big pcpus, and pcpus that we decided they are LITTLE, 
>must only be scheduled on actual LITTLE pcpus, correct again?
>
>> Then for implementation.
>> ??- When create a guest, specific physical cpus that the guest will be
>> run on.
>>
>I'd actually do that the other way round. I'd ask the user to specify
>how many --and, if that's important-- vcpus are big and how many/which
>are LITTLE.
>
>Knowing that, we also know whether the domain is a big only, LITTLE
>only or big.LITTLE one. And we also know on which set of pcpus each set
>of vcpus should be restrict to.
>
>So, basically (but it's just an example) something like this, in the xl
>config file of a guest:
>
>1) big.LITTLE guest, with 2 big and 2 LITTLE pcpus. User doesn't care ??
>?? ??which is which, so a default could be 0,1 big and 2,3 LITTLE:
>
>??vcpus = 4
>??vcpus.big = 2
>
>2) big.LITTLE guest, with 8 vcpus, of which 0,2,4 and 6 are big:
>
>vcpus = 8
>vcpus.big = [0, 2, 4, 6]
>
>Which would be the same as
>
>vcpus = 8
>vcpus.little = [1, 3, 5, 7]
>
>3) guest with 4 vcpus, all big:
>
>vcpus = 4
>vcpus.big = "all"
>
>Which would be the same as:
>
>vcpus = 4
>vcpus.little = "none"
>
>And also the same as just:
>
>vcpus = 4
>
>
>Or something like this
>
>> ??- If the physical cpus are different cpus, indicate the guest would
>> like to be a big.little guest.
>> ??And have big vcpus and little vcpus.
>>
>Not liking this as _the_ way of specifying the guest topology, wrt
>big.LITTLE-ness (see alternative proposal right above. :-))
>
>However, right now we support pinning/affinity already. We certainly
>need to decide what to do if, e.g., no vcpus.big or vcpus.little are
>present, but the vcpus have hard or soft affinity to some specific
>pcpus.
>
>So, right now, this, in the xl config file:
>
>cpus = [2, 8, 12, 13, 15, 17]
>
>means that we want to ping 1-to-1 vcpu 0 to pcpu 2, vcpu 1 to pcpu 8,
>vcpu 2 to pcpu 12, vcpu 3 to pcpu 13, vcpu 4 to pcpu 15 and vcpu 5 to
>pcpu 17. Now, if cores 2, 8 and 12 are big, and no vcpus.big or
>vcpu.little is specified, I'd put forward the assumption that the user
>wants vcpus 0, 1 and 2 to be big, and vcpus

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Dario Faggioli

On Mon, 2016-09-19 at 17:01 -0700, Stefano Stabellini wrote:
> On Tue, 20 Sep 2016, Dario Faggioli wrote:
> > And this would work even if/when there is only one cpupool, or in
> > general for domains that are in a pool that has both big and LITTLE
> > pcpus. Furthermore, big.LITTLE support and cpupools will be
> > orthogonal,
> > just like pinning and cpupools are orthogonal right now. I.e., once
> > we
> > will have what I described above, nothing prevents us from
> > implementing
> > per-vcpu cpupool membership, and either create the two (or more!)
> > big
> > and LITTLE pools, or from mixing things even more, for more complex
> > and
> > specific use cases. :-)
> 
> I think that everybody agrees that this is the best long term
> solution.
> 
Well, no, that wasn't obvious to me. If that's the case, it's already
something! :-)

> > 
> > Actually, with the cpupool solution, if you want a guest (or dom0)
> > to
> > actually have both big and LITTLE vcpus, you necessarily have to
> > implement per-vcpu (rather than per-domain, as it is now) cpupool
> > membership. I said myself it's not impossible, but certainly it's
> > some
> > work... with the scheduler solution you basically get that for
> > free!
> > 
> > So, basically, if we use cpupools for the basics of big.LITTLE
> > support,
> > there's no way out of it (apart from going implementing scheduling
> > support afterwords, but that looks backwards to me, especially when
> > thinking at it with the code in mind).
> 
> The question is: what is the best short-term solution we can ask Peng
> to
> implement that allows Xen to run on big.LITTLE systems today?
> Possibly
> getting us closer to the long term solution, or at least not farther
> from it?
> 
So, I still have to look closely at the patches in these series. But,
with Credit2 in mind, if one:

 - take advantage of the knowledge of what arch a pcpu belongs inside 
   the code that arrange the pcpus in runqueues, which means we'll end 
   up with big runqueues and LITTLE runqueues. I re-wrote that code, I
   can provide pointers and help, if necessary;
 - tweak the one or two instance of for_each_runqueue() [*] that there
   are in the code into a for_each_runqueue_of_same_class(), i.e.:

 if (is_big(this_cpu))
 {
   for_each_big_runqueue()
   {
      ..
   }
 }
 else
 {
   for_each_LITTLE_runqueue()
   {
     ..
   }
 } 

then big.LITTLE support in Credit2 would be done already, and all it
would be left is support for the syntax of new config switches in xl,
and a way of telling, from xl/libxl down to Xen, what arch a vcpu
belongs to, so that it can be associated with one runqueue of the
proper class.

Thinking to Credit1, we need to make sure thet, in load_balance() and
runq_steal(), a LITTLE cpu *only* ever try to steal work from another
LITTLE cpu, and __never__ from a big cpu (and vice versa). And also
that when a vcpu wakes up, and what it has in its v->processor is a
LITTLE pcpu, that only LITTLE processors are considered for being
tickled (I'm less certain of this last part, but it should be more or
less like this).

Then, of course the the same glue and vcpu classification code.

However, in Credit1, it's possible that a trick like that would affect
the accounting and credit algorithm, and hence provide unfair, or in
general, unexpected results. Credit2 should, OTOH, be a lot mere
resilient, wrt that.

> > > The whole process would be more explicit and obvious if we used
> > > cpupools. It would be easier for users to know what it is going
> > > on --
> > > they just need to issue an `xl cpupool-list' command and they
> > > would
> > > see
> > > two clearly named pools (something like big-pool and LITTLE-
> > > pool). 
> > > 
> > Well, I guess that, as part of big.LITTLE support, there will be a
> > way
> > to tell what pcpus are big and which are LITTLE anyway, probably
> > both
> > from `xl info' and from `xl cpupool-list -c' (and most likely in
> > other
> > ways too).
> 
> Sure, but it needs to be very clear. We cannot ask people to spot
> architecture specific flags among the output of `xl info' to be able
> to
> appropriately start a guest. 
>
As mentioned in previous mail, and as drafted when replying to Peng,
the only think that the user should know is how many big and how many
LITTLE vcpus she wants (and, potentially, which one would be each). :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Dario Faggioli

On Mon, 2016-09-19 at 21:33 +0800, Peng Fan wrote:
> On Mon, Sep 19, 2016 at 11:33:58AM +0100, George Dunlap wrote:
> > 
> > No, I think it would be a lot simpler to just teach the scheduler
> > about
> > different classes of cpus.  credit1 would probably need to be
> > modified
> > so that its credit algorithm would be per-class rather than pool-
> > wide;
> > but credit2 shouldn't need much modification at all, other than to
> > make
> > sure that a given runqueue doesn't include more than one class; and
> > to
> > do load-balancing only with runqueues of the same class.
> 
> I try to follow.
>  - scheduler needs to be aware of different classes of cpus. ARM
> big.Little cpus.
>
Yes, I think this is essential.

>  - scheduler schedules vcpus on different physical cpus in one
> cpupool.
>
Yep, that's what the scheduler does. And personally, I'd start
implementing big.LITTLE support for a situation where both big and
LITTLE cpus coexists in the same pool.

>  - different cpu classes needs to be in different runqueue.
> 
Yes. So, basically, imagine to use vcpu pinning to support big.LITTLE.
I've spoken briefly about this in my reply to Juergen. You probably can
even get something like this up-&-running by writing very few or zero
code (you'll need --for now-- max_dom0_vcpus, dom0_vcpus_pin, and then,
in domain config files, "cpus='...'").

Then, the real goal, would be to achieve the same behavior
automatically, by acting on runqueues' arrangement and load balancing
logic in the scheduler(s).

Anyway, sorry for my ignorance on big.LITTLE, but there's something I'm
missing: _when_ is it that it is (or needs to be) decided whether a
vcpu will run on a big or LITTLE core?

Thinking to a bare metal system, I think that cpu X is, for instance, big, and 
will always be like that; similarly, cpu Y is LITTLE.

This makes me think that, for a virtual machine, it is ok to choose/specify at 
_domain_creation_ time, which vcpus are big and which vcpus are LITTLE, is this 
correct?
If yes, this also means that --whatever way we find to make this happen, 
cpupools, scheduler, etc-- the vcpus that we decided they are big, must only be 
scheduled on actual big pcpus, and pcpus that we decided they are LITTLE, must 
only be scheduled on actual LITTLE pcpus, correct again?

> Then for implementation.
>  - When create a guest, specific physical cpus that the guest will be
> run on.
>
I'd actually do that the other way round. I'd ask the user to specify
how many --and, if that's important-- vcpus are big and how many/which
are LITTLE.

Knowing that, we also know whether the domain is a big only, LITTLE
only or big.LITTLE one. And we also know on which set of pcpus each set
of vcpus should be restrict to.

So, basically (but it's just an example) something like this, in the xl
config file of a guest:

1) big.LITTLE guest, with 2 big and 2 LITTLE pcpus. User doesn't care  
   which is which, so a default could be 0,1 big and 2,3 LITTLE:

 vcpus = 4
 vcpus.big = 2

2) big.LITTLE guest, with 8 vcpus, of which 0,2,4 and 6 are big:

vcpus = 8
vcpus.big = [0, 2, 4, 6]

Which would be the same as

vcpus = 8
vcpus.little = [1, 3, 5, 7]

3) guest with 4 vcpus, all big:

vcpus = 4
vcpus.big = "all"

Which would be the same as:

vcpus = 4
vcpus.little = "none"

And also the same as just:

vcpus = 4

Or something like this

>  - If the physical cpus are different cpus, indicate the guest would
> like to be a big.little guest.
>    And have big vcpus and little vcpus.
>
Not liking this as _the_ way of specifying the guest topology, wrt
big.LITTLE-ness (see alternative proposal right above. :-))

However, right now we support pinning/affinity already. We certainly
need to decide what to do if, e.g., no vcpus.big or vcpus.little are
present, but the vcpus have hard or soft affinity to some specific
pcpus.

So, right now, this, in the xl config file:

cpus = [2, 8, 12, 13, 15, 17]

means that we want to ping 1-to-1 vcpu 0 to pcpu 2, vcpu 1 to pcpu 8,
vcpu 2 to pcpu 12, vcpu 3 to pcpu 13, vcpu 4 to pcpu 15 and vcpu 5 to
pcpu 17. Now, if cores 2, 8 and 12 are big, and no vcpus.big or
vcpu.little is specified, I'd put forward the assumption that the user
wants vcpus 0, 1 and 2 to be big, and vcpus 3, 4, and 5 to be LITTLE.

If, instead, there are vcpus.big or vcpus.little specified, and there's
disagreement, I'd either error out or decide which overrun the other
(and print a WARNING about that happening).

Still right now, this:

cpus = "2-12"

means that all the vcpus of the domain have hard affinity (i.e., are
pinned) to pcpus 2-12. And in this case I'd conclude that the user
wants for all the vcpus to be big.

I'm less sure what to do if _only_ soft-affinity is specified (via
"cpus_soft="), or if hard-affinity contains both big and LITTLE pcpus,
like, e.g.:

cpus = "2-15"

>  - If no physical cpus specificed, then the guest may runs on big
> cpus or on little cpus. But not both.
>
Yes. if nothing (or something contradictory) is specified, we

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Stefano Stabellini

On Tue, 20 Sep 2016, Dario Faggioli wrote:
> On Mon, 2016-09-19 at 14:03 -0700, Stefano Stabellini wrote:
> > On Mon, 19 Sep 2016, Dario Faggioli wrote:
> > > Setting thing up like this, even automatically, either in
> > hypervisor or
> > > toolstack, is basically already possible (with all the good and bad
> > > aspects of pinning, of course).
> > > 
> > > Then, sure (as I said when replying to George), we may want things
> > to
> > > be more flexible, and we also probably want to be on the safe side
> > --if 
> > > ever some components manages to undo our automatic pinning-- wrt
> > the
> > > scheduler not picking up work for the wrong architecture... But
> > still
> > > I'm a bit surprised this did not came up... Julien, Peng, is that
> > > because you think this is not doable for any reason I'm missing?
> > 
> > Let's suppose that Xen detects big.LITTLE and pins dom0 vcpus to big
> > automatically. How can the user know that she really needs to be
> > careful
> > in the way she pins the vcpus of new VMs? Xen would also need to pin
> > automatically vcpus of new VMs to either big or LITTLE cores, or xl
> > would have to do it.
> > 
> Actually doing things with what we currently have for pinning is only
> something I've brought up as an example, and (potentially) useful for
> proof-of-concept, or very early stage level support.
> 
> In the long run, when thinking to the scheduler based solution, I see
> things happening the other way round: you specify in xl config file
> (and with boot parameters, for dom0) how many big and how many LITTLE
> vcpus you want, and the scheduler will know that it can only schedule
> the big ones on big physical cores, and the LITTLE ones on LITTLE
> physical cores.
> 
> Note that we're saying 'pinning' (yeah, I know, I did it myself in the
> first place :-/), but that would not be an actual 1-to-1 pinning. For
> instance, if domain X has 4 big pcpus, say 0,1,2,3,4, and the host has
> 8 big pcpus, say 8-15, then dXv1, dXv2, dXv3 and dXv4 will only be run
> by the scheduler on pcpus 8-15. Any of them, and with migration and
> load balancing within the set possible. This is what I'm talking about.
> 
> And this would work even if/when there is only one cpupool, or in
> general for domains that are in a pool that has both big and LITTLE
> pcpus. Furthermore, big.LITTLE support and cpupools will be orthogonal,
> just like pinning and cpupools are orthogonal right now. I.e., once we
> will have what I described above, nothing prevents us from implementing
> per-vcpu cpupool membership, and either create the two (or more!) big
> and LITTLE pools, or from mixing things even more, for more complex and
> specific use cases. :-)

I think that everybody agrees that this is the best long term solution.


> Actually, with the cpupool solution, if you want a guest (or dom0) to
> actually have both big and LITTLE vcpus, you necessarily have to
> implement per-vcpu (rather than per-domain, as it is now) cpupool
> membership. I said myself it's not impossible, but certainly it's some
> work... with the scheduler solution you basically get that for free!
> 
> So, basically, if we use cpupools for the basics of big.LITTLE support,
> there's no way out of it (apart from going implementing scheduling
> support afterwords, but that looks backwards to me, especially when
> thinking at it with the code in mind).

The question is: what is the best short-term solution we can ask Peng to
implement that allows Xen to run on big.LITTLE systems today? Possibly
getting us closer to the long term solution, or at least not farther
from it?


> > The whole process would be more explicit and obvious if we used
> > cpupools. It would be easier for users to know what it is going on --
> > they just need to issue an `xl cpupool-list' command and they would
> > see
> > two clearly named pools (something like big-pool and LITTLE-pool). 
> >
> Well, I guess that, as part of big.LITTLE support, there will be a way
> to tell what pcpus are big and which are LITTLE anyway, probably both
> from `xl info' and from `xl cpupool-list -c' (and most likely in other
> ways too).

Sure, but it needs to be very clear. We cannot ask people to spot
architecture specific flags among the output of `xl info' to be able to
appropriately start a guest. Even what I suggested isn't great as `xl
cpupool-list' isn't a common command to run.___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Dario Faggioli

On Mon, 2016-09-19 at 14:03 -0700, Stefano Stabellini wrote:
> On Mon, 19 Sep 2016, Dario Faggioli wrote:
> > Setting thing up like this, even automatically, either in
> hypervisor or
> > toolstack, is basically already possible (with all the good and bad
> > aspects of pinning, of course).
> > 
> > Then, sure (as I said when replying to George), we may want things
> to
> > be more flexible, and we also probably want to be on the safe side
> --if 
> > ever some components manages to undo our automatic pinning-- wrt
> the
> > scheduler not picking up work for the wrong architecture... But
> still
> > I'm a bit surprised this did not came up... Julien, Peng, is that
> > because you think this is not doable for any reason I'm missing?
> 
> Let's suppose that Xen detects big.LITTLE and pins dom0 vcpus to big
> automatically. How can the user know that she really needs to be
> careful
> in the way she pins the vcpus of new VMs? Xen would also need to pin
> automatically vcpus of new VMs to either big or LITTLE cores, or xl
> would have to do it.
> 
Actually doing things with what we currently have for pinning is only
something I've brought up as an example, and (potentially) useful for
proof-of-concept, or very early stage level support.

In the long run, when thinking to the scheduler based solution, I see
things happening the other way round: you specify in xl config file
(and with boot parameters, for dom0) how many big and how many LITTLE
vcpus you want, and the scheduler will know that it can only schedule
the big ones on big physical cores, and the LITTLE ones on LITTLE
physical cores.

Note that we're saying 'pinning' (yeah, I know, I did it myself in the
first place :-/), but that would not be an actual 1-to-1 pinning. For
instance, if domain X has 4 big pcpus, say 0,1,2,3,4, and the host has
8 big pcpus, say 8-15, then dXv1, dXv2, dXv3 and dXv4 will only be run
by the scheduler on pcpus 8-15. Any of them, and with migration and
load balancing within the set possible. This is what I'm talking about.

And this would work even if/when there is only one cpupool, or in
general for domains that are in a pool that has both big and LITTLE
pcpus. Furthermore, big.LITTLE support and cpupools will be orthogonal,
just like pinning and cpupools are orthogonal right now. I.e., once we
will have what I described above, nothing prevents us from implementing
per-vcpu cpupool membership, and either create the two (or more!) big
and LITTLE pools, or from mixing things even more, for more complex and
specific use cases. :-)

Actually, with the cpupool solution, if you want a guest (or dom0) to
actually have both big and LITTLE vcpus, you necessarily have to
implement per-vcpu (rather than per-domain, as it is now) cpupool
membership. I said myself it's not impossible, but certainly it's some
work... with the scheduler solution you basically get that for free!

So, basically, if we use cpupools for the basics of big.LITTLE support,
there's no way out of it (apart from going implementing scheduling
support afterwords, but that looks backwards to me, especially when
thinking at it with the code in mind).

> The whole process would be more explicit and obvious if we used
> cpupools. It would be easier for users to know what it is going on --
> they just need to issue an `xl cpupool-list' command and they would
> see
> two clearly named pools (something like big-pool and LITTLE-pool). 
>
Well, I guess that, as part of big.LITTLE support, there will be a way
to tell what pcpus are big and which are LITTLE anyway, probably both
from `xl info' and from `xl cpupool-list -c' (and most likely in other
ways too).

> We
> wouldn't have to pin vcpus to cpus automatically in Xen or xl, which
> doesn't sound like fun.
>
As tried to say above, it will _look_ like some kind of automatic
pinning, but that does not mean it has to be implemented by means of
it, or dealt with by the user in the same way.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Stefano Stabellini

On Mon, 19 Sep 2016, Dario Faggioli wrote:
> On Mon, 2016-09-19 at 12:23 +0200, Juergen Gross wrote:
> > On 19/09/16 12:06, Julien Grall wrote:
> > > On 19/09/2016 11:45, George Dunlap wrote:
> > > > But expanding the schedulers to know about different classes of
> > > > cpus,
> > > > and having vcpus specified as running only on specific types of
> > > > pcpus,
> > > > seems like a more flexible approach.
> > > 
> > > So, if I understand correctly, you would not recommend to extend
> > > the
> > > number of CPU pool per domain, correct?
> > 
> > Before deciding in which direction to go (multiple cpupools, sub-
> > pools,
> > kind of implicit cpu pinning) 
> >
> You mention "implicit pinning" here, and I'd like to stress this,
> because basically no one (else) in the conversation seem to have
> considered it. In fact, it may not necessarily be the best long term
> solution, but doing something based on pinning is, IMO, a very
> convenient first step (and may well become one of the 'modes' available
> to the user for taking advantage of big.LITTLE.
> 
> So, if cpus 0-3 are big and cpus 4,5 are LITTLE, we can:
>  - for domain X, which wants to run only on big cores, pin all it's
>    vcpus to pcpus 0-3
>  - for domain Y, which wants to run only on LITTLE cores, pin all it's
>    vcpus to pcpus 4,5
>  - for domain Z, which wants its vcpus 0,1 to run on big cores, and
>    it's vcpus 2,3 to run on LITTLE cores, pin vcpus 0,1 to pcpus 0-3, 
>    and pin vcpus 2,3 to pcpus 4,5
> 
> Setting thing up like this, even automatically, either in hypervisor or
> toolstack, is basically already possible (with all the good and bad
> aspects of pinning, of course).
> 
> Then, sure (as I said when replying to George), we may want things to
> be more flexible, and we also probably want to be on the safe side --if 
> ever some components manages to undo our automatic pinning-- wrt the
> scheduler not picking up work for the wrong architecture... But still
> I'm a bit surprised this did not came up... Julien, Peng, is that
> because you think this is not doable for any reason I'm missing?

Let's suppose that Xen detects big.LITTLE and pins dom0 vcpus to big
automatically. How can the user know that she really needs to be careful
in the way she pins the vcpus of new VMs? Xen would also need to pin
automatically vcpus of new VMs to either big or LITTLE cores, or xl
would have to do it.

The whole process would be more explicit and obvious if we used
cpupools. It would be easier for users to know what it is going on --
they just need to issue an `xl cpupool-list' command and they would see
two clearly named pools (something like big-pool and LITTLE-pool). We
wouldn't have to pin vcpus to cpus automatically in Xen or xl, which
doesn't sound like fun.___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Stefano Stabellini

On Mon, 19 Sep 2016, Peng Fan wrote:
> On Mon, Sep 19, 2016 at 11:59:05AM +0200, Julien Grall wrote:
> >
> >
> >On 19/09/2016 11:38, Peng Fan wrote:
> >>On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
> >>>Hello,
> >>>
> >>>On 19/09/2016 10:36, Peng Fan wrote:
> On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
> >Hello Peng,
> >
> >On 19/09/2016 04:08, van.free...@gmail.com wrote:
> >>From: Peng Fan 
> >>
> >>This patchset is to support XEN run on big.little SoC.
> >>The idea of the patch is from
> >>"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;
> >>
> >>There are some changes to cpupool and add x86 stub functions to avoid 
> >>build
> >>break. Sending The RFC patchset out is to request for comments to see 
> >>whether
> >>this implementation is acceptable or not. Patchset have been tested 
> >>based on
> >>xen-4.8 unstable on NXP i.MX8.
> >>
> >>I use Big/Little CPU and cpupool to explain the idea.
> >>A pool contains Big CPUs is called Big Pool.
> >>A pool contains Little CPUs is called Little Pool.
> >>If a pool does not contains any physical cpus, Little CPUs or Big CPUs
> >>can be added to the cpupool. But the cpupool can not contain both Little
> >>and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr 
> >>value for ARM).
> >>CPUs can not be added to the cpupool which contains cpus that have 
> >>different cpu type.
> >>Little CPUs can not be moved to Big Pool if there are Big CPUs in Big 
> >>Pool,
> >>and versa. Domain in Big Pool can not be migrated to Little Pool, and 
> >>versa.
> >>When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
> >>type(same midr value)
> >>into cpupool0.
> >
> >As mentioned in the mail you pointed above, this series is not enough to 
> >make
> >big.LITTLE working on then. Xen is always using the boot CPU to detect 
> >the
> >list of features. With big.LITTLE features may not be the same.
> >
> >And I would prefer to see Xen supporting big.LITTLE correctly before
> >beginning to think to expose big.LITTLE to the userspace (via cpupool)
> >automatically.
> 
> Do you mean vcpus be scheduled between big and little cpus freely?
> >>>
> >>>By supporting big.LITTLE correctly I meant Xen thinks that all the cores 
> >>>has
> >>>the same set of features. So the feature detection is only done the boot 
> >>>CPU.
> >>>See processor_setup for instance...
> >>>
> >>>Moving vCPUs between big and little cores would be a hard task (cache line
> >>>issue, and possibly feature) and I don't expect to ever cross this in Xen.
> >>>However, I am expecting to see big.LITTLE exposed to the guest (i.e having
> >>>big and little vCPUs).
> >>
> >>big vCPUs scheduled on big Physical CPUs and little vCPUs scheduled on 
> >>little
> >>physical cpus, right?
> >>If it is, is there is a need to let Xen think all the cores has the same set
> >>of features?
> >
> >I think you missed my point. The feature registers on big and little cores
> >may be different. Currently, Xen is reading the feature registers of the CPU
> >boot and wrongly assumes that those features will exists on all CPUs. This is
> >not the case and should be fixed before we are getting in trouble.
> >
> >>
> >>Developing big.little guest support, I am not sure how much efforts needed.
> >>Is this really needed?
> >
> >This is not necessary at the moment, although I have seen some interest about
> >it. Running a guest only on a little core is a nice beginning, but a guest
> >may want to take advantage of big.LITTLE (running hungry app on big one and
> >little on small one).
> >
> >>
> >>>
> 
> This patchset is to use cpupool to block the vcpu be scheduled between 
> big and
> little cpus.
> 
> >
> >See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).
> 
> Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
> the A53 and A72 TRM about AArch64 implementationd defined registers.
> This actlr can be added to the cpupool_arch_info as midr.
> 
> Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
> Please advise if I missed anything else.
> >>>
> >>>Have you check the register emulation?
> >>
> >>Checked midr. Have not checked others.
> >>I think I missed some registers in ctxt_switch_to.
> >>
> >>>
> 
> >
> >>
> >>Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the 
> >>first one
> >>that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
> >>cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
> >>`xl cpupool-list -c` will show cpu[0-3] in Pool-0.
> >>
> >>Then use the following script to create a new cpupool and add cpu[4-5] 
> >>to
> >>the cpupool.
>

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Stefano Stabellini

On Mon, 19 Sep 2016, Juergen Gross wrote:
> On 19/09/16 12:06, Julien Grall wrote:
> > Hi George,
> > 
> > On 19/09/2016 11:45, George Dunlap wrote:
> >> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall 
> >> wrote:
> > As mentioned in the mail you pointed above, this series is not
> > enough to
> > make
> > big.LITTLE working on then. Xen is always using the boot CPU to detect
> > the
> > list of features. With big.LITTLE features may not be the same.
> >
> > And I would prefer to see Xen supporting big.LITTLE correctly before
> > beginning to think to expose big.LITTLE to the userspace (via cpupool)
> > automatically.
> 
> 
>  Do you mean vcpus be scheduled between big and little cpus freely?
> >>>
> >>>
> >>> By supporting big.LITTLE correctly I meant Xen thinks that all the
> >>> cores has
> >>> the same set of features. So the feature detection is only done the boot
> >>> CPU. See processor_setup for instance...
> >>>
> >>> Moving vCPUs between big and little cores would be a hard task (cache
> >>> line
> >>> issue, and possibly feature) and I don't expect to ever cross this in
> >>> Xen.
> >>> However, I am expecting to see big.LITTLE exposed to the guest (i.e
> >>> having
> >>> big and little vCPUs).
> >>
> >> So it sounds like the big and LITTLE cores are architecturally
> >> different enough that software must be aware of which one it's running
> >> on?
> > 
> > That's correct. Each big and LITTLE cores may have different errata,
> > different features...
> > 
> > It has also the advantage to let the guest dealing itself with its own
> > power efficiency without introducing a specific Xen interface.
> > 
> >>
> >> Exposing varying numbers of big and LITTLE vcpus to guests seems like
> >> a sensible approach.  But at the moment cpupools only allow a domain
> >> to be in exactly one pool -- meaning if we use cpupools to control the
> >> big.LITTLE placement, you won't be *able* to have guests with both big
> >> and LITTLE vcpus.
> >>
>  If need to create all the pools, need to decided how many pools need
>  to be
>  created.
>  I thought about this, but I do not come out a good idea.
> 
>  The cpupool0 is defined in xen/common/cpupool.c, if need to create many
>  pools,
>  need to alloc cpupools dynamically when booting. I would not like to
>  change a
>  lot to common code.
> >>>
> >>>
> >>> Why? We should avoid to choose a specific design just because the common
> >>> code does not allow you to do it without heavy change.
> >>>
> >>> We never came across the big.LITTLE problem on x86, so it is normal to
> >>> modify the code.
> >>
> >> Julien is correct; there's no reason we couldn't have a default
> >> multiple pools on boot.
> >>
>  The implementation in this patchset I think is an easy way to let
>  Big and
>  Little
>  CPUs all run.
> >>>
> >>>
> >>> I care about having a design allowing an easy use of big.LITTLE on
> >>> Xen. Your
> >>> solution requires the administrator to know the underlying platform and
> >>> create the pool.
> >>>
> >>> In the solution I suggested, the pools would be created by Xen (and
> >>> the info
> >>> exposed to the userspace for the admin).
> >>
> >> FWIW another approach could be the one taken by "xl
> >> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
> >> something that would automatically set up the pools.
> >>
> >> But expanding the schedulers to know about different classes of cpus,
> >> and having vcpus specified as running only on specific types of pcpus,
> >> seems like a more flexible approach.
> > 
> > So, if I understand correctly, you would not recommend to extend the
> > number of CPU pool per domain, correct?
> 
> Before deciding in which direction to go (multiple cpupools, sub-pools,
> kind of implicit cpu pinning) I think we should think about the
> implications regarding today's interfaces:
> 
> - Do we want to be able to use different schedulers for big/little
>   (this would mean some cpupool related solution)? I'd prefer to
>   have only one scheduler type for each domain. :-)
> 
> - What about scheduling parameters like weight and cap? How would
>   those apply (answer probably influencing pinning solution).
>   Remember that especially the downsides of pinning led to the
>   introduction of cpupools.

It isn't easy to answer these questions, but there might be a reason to
have different schedulers because they are supposed to run different
classes of workloads. big cores are of cpu intensive tasks (think of V8,
the Javascript engine), while LITTLE cores are for low key background
applications (think of the whatsapp daemon that runs in the background
on your phone).


> - Is big.LITTLE to be expected to be combined with NUMA?

NUMA is not a popular way to design hardware in the ARM ecosystem today.
If it did become more widespread, I would expect it to happen on server
hardware, where big.LITTLE

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Dario Faggioli

On Mon, 2016-09-19 at 12:23 +0200, Juergen Gross wrote:
> On 19/09/16 12:06, Julien Grall wrote:
> > On 19/09/2016 11:45, George Dunlap wrote:
> > > But expanding the schedulers to know about different classes of
> > > cpus,
> > > and having vcpus specified as running only on specific types of
> > > pcpus,
> > > seems like a more flexible approach.
> > 
> > So, if I understand correctly, you would not recommend to extend
> > the
> > number of CPU pool per domain, correct?
> 
> Before deciding in which direction to go (multiple cpupools, sub-
> pools,
> kind of implicit cpu pinning) 
>
You mention "implicit pinning" here, and I'd like to stress this,
because basically no one (else) in the conversation seem to have
considered it. In fact, it may not necessarily be the best long term
solution, but doing something based on pinning is, IMO, a very
convenient first step (and may well become one of the 'modes' available
to the user for taking advantage of big.LITTLE.

So, if cpus 0-3 are big and cpus 4,5 are LITTLE, we can:
 - for domain X, which wants to run only on big cores, pin all it's
   vcpus to pcpus 0-3
 - for domain Y, which wants to run only on LITTLE cores, pin all it's
   vcpus to pcpus 4,5
 - for domain Z, which wants its vcpus 0,1 to run on big cores, and
   it's vcpus 2,3 to run on LITTLE cores, pin vcpus 0,1 to pcpus 0-3, 
   and pin vcpus 2,3 to pcpus 4,5

Setting thing up like this, even automatically, either in hypervisor or
toolstack, is basically already possible (with all the good and bad
aspects of pinning, of course).

Then, sure (as I said when replying to George), we may want things to
be more flexible, and we also probably want to be on the safe side --if 
ever some components manages to undo our automatic pinning-- wrt the
scheduler not picking up work for the wrong architecture... But still
I'm a bit surprised this did not came up... Julien, Peng, is that
because you think this is not doable for any reason I'm missing?

> I think we should think about the
> implications regarding today's interfaces:
> 
I totally agree. (At least) These three things should be very clear,
before starting to implement anything:
 - what is the behavior that we want to achieve, from the point of 
   view of both the hypervisor and the guests
 - what will be the interface
 - how this new interface will map and will interact with existing 
   interfaces

> - Do we want to be able to use different schedulers for big/little
>   (this would mean some cpupool related solution)? I'd prefer to
>   have only one scheduler type for each domain. :-)
> 
Well, this, actually is, IMO, from a behavioral perspective, a nice
point in favour of supporting a split-cpupool solution. In fact, I
think I can envision scenario and reasons for having different
schedulers between big cpus and LITTLE cpus (or same scheduler with
different parameters).

But then, yes, if we then want a domain to have both big and LITTLE
cpus, we'd need to allow a domain to live in more than one cpupool at a
time, which means a domain will have multiple schedulers.

I don't think this is impossible... almost all the scheduling happens
at the vcpu level already. The biggest challenge is probably the
interface. _HOWEVER_, I think this is something that can well come
later, like in phase 2 or 3, as an enhancement/possibility, instead
than be the foundation of big.LITTLE support in Xen.

> - What about scheduling parameters like weight and cap? How would
>   those apply (answer probably influencing pinning solution).
>   Remember that especially the downsides of pinning led to the
>   introduction of cpupools.
> 
Very important bit indeed. FWIW, there's already a scheduler that
supports per-vcpu parameters (so some glue code, or code from which to
take inspiration) is there already. And scheduling happens at the vcpu
level anyway. I.e., it would not be to hard to make it possible to pass
down to Xen, say, per-vcpu weights. Then, at, e.g., xl level, you
specify a set of parameters for big cpus, and another set for LITTLE
cpus, and either xl itself or libxl will do the mapping and prepare the
per-vcpu values.

Again, this is just to say that the "cpupool way" does not look too
impossible, and may be interesting. However, although I'd like to think
more (and see more thoughts) about designs and possibilities, I still
continue to think it should not be neither the only nor the first mode
that we will implement.

> - Is big.LITTLE to be expected to be combined with NUMA?
> 
> - Do we need to support live migration for domains containing both
>   types of cpus?
> 
Interesting points too.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Dario Faggioli

On Mon, 2016-09-19 at 11:33 +0100, George Dunlap wrote:
> On 19/09/16 11:06, Julien Grall wrote:
> > So, if I understand correctly, you would not recommend to extend
> > the
> > number of CPU pool per domain, correct?
> 
> Well imagine trying to set the scheduling parameters, such as weight,
> which in the past have been per-domain.  Now you have to specify
> parameters for a domain in each of the cpupools that its' in.
> 
True, and not really convenient indeed. I think we can think of a way
to shape the interface in such a way that it's not too bad to use
(provide sane defaults/default behavior, etc), but this should be
definitely kept in mind.

In general, I agree with Juergen that, before implementing anything, we
must come up with a design, bearing in mind both behavior and
interface.

(I'll reply in some more details directly to Juergen's email.)

> No, I think it would be a lot simpler to just teach the scheduler
> aboutIf we want to support heterogeneous CPUs, some like this is
> absolutely necessary. In fact, either we set (and enforce) very
> strict rules on cpupools and pinning, or we'd end up scheduling stuff
> built for arch A on a processor of arch B! :-O
> different classes of cpus.  credit1 would probably need to be
> modified
> so that its credit algorithm would be per-class rather than pool-
> wide;
> but credit2 shouldn't need much modification at all, other than to
> make
> sure that a given runqueue doesn't include more than one class; and
> to
> do load-balancing only with runqueues of the same class.
> 
If we want to support heterogeneous CPUs, some like this is absolutely
necessary. In fact, either we set (and enforce) very strict rules on
cpupools and pinning, or we'd end up scheduling stuff built for arch A
on a processor of arch B! :-O

The "strict limits" approach may be an option --and this patch is a
first example of it-- but it's easy to see that it's very inflexible
(cpus can't move between pools, domains can't be migrated, etc). On the
other hand, as soon as we "relax" the constraints a little bit, we
absolutely need to modify the scheduler code to avoid bad things to
happen.

As George is saying, both Credit1 and Credit2 needs to be modified in
order to make sure that a vcpu that is meant to run on a big cpu is not
picked up for being executed by a LITTLE cpu. This has to do with
tweaking the load balancing code in both of them (e.g., in Credit1, a
LITTLE cpu must not steal work from a big cpu). Whether or not it will
also be required to change the Credit-ing algorithm, it will have to be
seen. The effect would be similar to some sort of pinning, which indeed
does not play well with Credit1 accounting logic... but we can probably
see about this along the way (or just focus only con Credit2! :-P)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Peng Fan

On Mon, Sep 19, 2016 at 11:33:58AM +0100, George Dunlap wrote:
>On 19/09/16 11:06, Julien Grall wrote:
>> Hi George,
>> 
>> On 19/09/2016 11:45, George Dunlap wrote:
>>> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall 
>>> wrote:
>> As mentioned in the mail you pointed above, this series is not
>> enough to
>> make
>> big.LITTLE working on then. Xen is always using the boot CPU to detect
>> the
>> list of features. With big.LITTLE features may not be the same.
>>
>> And I would prefer to see Xen supporting big.LITTLE correctly before
>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>> automatically.
>
>
> Do you mean vcpus be scheduled between big and little cpus freely?


 By supporting big.LITTLE correctly I meant Xen thinks that all the
 cores has
 the same set of features. So the feature detection is only done the boot
 CPU. See processor_setup for instance...

 Moving vCPUs between big and little cores would be a hard task (cache
 line
 issue, and possibly feature) and I don't expect to ever cross this in
 Xen.
 However, I am expecting to see big.LITTLE exposed to the guest (i.e
 having
 big and little vCPUs).
>>>
>>> So it sounds like the big and LITTLE cores are architecturally
>>> different enough that software must be aware of which one it's running
>>> on?
>> 
>> That's correct. Each big and LITTLE cores may have different errata,
>> different features...
>> 
>> It has also the advantage to let the guest dealing itself with its own
>> power efficiency without introducing a specific Xen interface.
>
>Well in theory there would be advantages either way -- either to
>allowing Xen to automatically add power-saving "smarts" to guests which
>weren't programmed for them, or to exposing the power-saving abilities
>to guests which were.  But it sounds like automatically migrating
>between them isn't really an option (or would be a lot more trouble than
>it's worth).
>
 I care about having a design allowing an easy use of big.LITTLE on
 Xen. Your
 solution requires the administrator to know the underlying platform and
 create the pool.

 In the solution I suggested, the pools would be created by Xen (and
 the info
 exposed to the userspace for the admin).
>>>
>>> FWIW another approach could be the one taken by "xl
>>> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
>>> something that would automatically set up the pools.
>>>
>>> But expanding the schedulers to know about different classes of cpus,
>>> and having vcpus specified as running only on specific types of pcpus,
>>> seems like a more flexible approach.
>> 
>> So, if I understand correctly, you would not recommend to extend the
>> number of CPU pool per domain, correct?
>
>Well imagine trying to set the scheduling parameters, such as weight,
>which in the past have been per-domain.  Now you have to specify
>parameters for a domain in each of the cpupools that its' in.
>
>No, I think it would be a lot simpler to just teach the scheduler about
>different classes of cpus.  credit1 would probably need to be modified
>so that its credit algorithm would be per-class rather than pool-wide;
>but credit2 shouldn't need much modification at all, other than to make
>sure that a given runqueue doesn't include more than one class; and to
>do load-balancing only with runqueues of the same class.

I try to follow.
 - scheduler needs to be aware of different classes of cpus. ARM big.Little 
cpus.
 - scheduler schedules vcpus on different physical cpus in one cpupool.
 - different cpu classes needs to be in different runqueue.

Then for implementation.
 - When create a guest, specific physical cpus that the guest will be run on.
 - If the physical cpus are different cpus, indicate the guest would like to be 
a big.little guest.
   And have big vcpus and little vcpus.
 - If no physical cpus specificed, then the guest may runs on big cpus or on 
little cpus. But not both.
   How to decide runs on big or little physical cpus?
 - For Dom0, I am still not sure,default big.little or else?

If use scheduler to handle the different classes cpu, we do not need to use 
cpupool
to block vcpus be scheduled onto different physical cpus. And using scheudler 
to handle this
gives an opportunity to support big.little guest.

Thanks,
Peng.

>
> -George

-- 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Peng Fan

On Mon, Sep 19, 2016 at 11:59:05AM +0200, Julien Grall wrote:
>
>
>On 19/09/2016 11:38, Peng Fan wrote:
>>On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
>>>Hello,
>>>
>>>On 19/09/2016 10:36, Peng Fan wrote:
On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>Hello Peng,
>
>On 19/09/2016 04:08, van.free...@gmail.com wrote:
>>From: Peng Fan 
>>
>>This patchset is to support XEN run on big.little SoC.
>>The idea of the patch is from
>>"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;
>>
>>There are some changes to cpupool and add x86 stub functions to avoid 
>>build
>>break. Sending The RFC patchset out is to request for comments to see 
>>whether
>>this implementation is acceptable or not. Patchset have been tested based 
>>on
>>xen-4.8 unstable on NXP i.MX8.
>>
>>I use Big/Little CPU and cpupool to explain the idea.
>>A pool contains Big CPUs is called Big Pool.
>>A pool contains Little CPUs is called Little Pool.
>>If a pool does not contains any physical cpus, Little CPUs or Big CPUs
>>can be added to the cpupool. But the cpupool can not contain both Little
>>and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr 
>>value for ARM).
>>CPUs can not be added to the cpupool which contains cpus that have 
>>different cpu type.
>>Little CPUs can not be moved to Big Pool if there are Big CPUs in Big 
>>Pool,
>>and versa. Domain in Big Pool can not be migrated to Little Pool, and 
>>versa.
>>When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
>>type(same midr value)
>>into cpupool0.
>
>As mentioned in the mail you pointed above, this series is not enough to 
>make
>big.LITTLE working on then. Xen is always using the boot CPU to detect the
>list of features. With big.LITTLE features may not be the same.
>
>And I would prefer to see Xen supporting big.LITTLE correctly before
>beginning to think to expose big.LITTLE to the userspace (via cpupool)
>automatically.

Do you mean vcpus be scheduled between big and little cpus freely?
>>>
>>>By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
>>>the same set of features. So the feature detection is only done the boot CPU.
>>>See processor_setup for instance...
>>>
>>>Moving vCPUs between big and little cores would be a hard task (cache line
>>>issue, and possibly feature) and I don't expect to ever cross this in Xen.
>>>However, I am expecting to see big.LITTLE exposed to the guest (i.e having
>>>big and little vCPUs).
>>
>>big vCPUs scheduled on big Physical CPUs and little vCPUs scheduled on little
>>physical cpus, right?
>>If it is, is there is a need to let Xen think all the cores has the same set
>>of features?
>
>I think you missed my point. The feature registers on big and little cores
>may be different. Currently, Xen is reading the feature registers of the CPU
>boot and wrongly assumes that those features will exists on all CPUs. This is
>not the case and should be fixed before we are getting in trouble.
>
>>
>>Developing big.little guest support, I am not sure how much efforts needed.
>>Is this really needed?
>
>This is not necessary at the moment, although I have seen some interest about
>it. Running a guest only on a little core is a nice beginning, but a guest
>may want to take advantage of big.LITTLE (running hungry app on big one and
>little on small one).
>
>>
>>>

This patchset is to use cpupool to block the vcpu be scheduled between big 
and
little cpus.

>
>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).

Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
the A53 and A72 TRM about AArch64 implementationd defined registers.
This actlr can be added to the cpupool_arch_info as midr.

Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
Please advise if I missed anything else.
>>>
>>>Have you check the register emulation?
>>
>>Checked midr. Have not checked others.
>>I think I missed some registers in ctxt_switch_to.
>>
>>>

>
>>
>>Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first 
>>one
>>that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
>>cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
>>`xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>>
>>Then use the following script to create a new cpupool and add cpu[4-5] to
>>the cpupool.
>>#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>>#xl cpupool-cpu-add Pool-A72 4
>>#xl cpupool-cpu-add Pool-A72 5
>>#xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>
>I am a bit confused with these runes. It means that only the first kind of
>CPUs have pool

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Peng Fan

Hello Julien,
On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
>Hello,
>
>On 19/09/2016 10:36, Peng Fan wrote:
>>On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>>>Hello Peng,
>>>
>>>On 19/09/2016 04:08, van.free...@gmail.com wrote:
From: Peng Fan 

This patchset is to support XEN run on big.little SoC.
The idea of the patch is from
"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;

There are some changes to cpupool and add x86 stub functions to avoid build
break. Sending The RFC patchset out is to request for comments to see 
whether
this implementation is acceptable or not. Patchset have been tested based on
xen-4.8 unstable on NXP i.MX8.

I use Big/Little CPU and cpupool to explain the idea.
A pool contains Big CPUs is called Big Pool.
A pool contains Little CPUs is called Little Pool.
If a pool does not contains any physical cpus, Little CPUs or Big CPUs
can be added to the cpupool. But the cpupool can not contain both Little
and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value 
for ARM).
CPUs can not be added to the cpupool which contains cpus that have 
different cpu type.
Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
type(same midr value)
into cpupool0.
>>>
>>>As mentioned in the mail you pointed above, this series is not enough to make
>>>big.LITTLE working on then. Xen is always using the boot CPU to detect the
>>>list of features. With big.LITTLE features may not be the same.
>>>
>>>And I would prefer to see Xen supporting big.LITTLE correctly before
>>>beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>automatically.
>>
>>Do you mean vcpus be scheduled between big and little cpus freely?
>
>By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
>the same set of features. So the feature detection is only done the boot CPU.
>See processor_setup for instance...
>
>Moving vCPUs between big and little cores would be a hard task (cache line
>issue, and possibly feature) and I don't expect to ever cross this in Xen.
>However, I am expecting to see big.LITTLE exposed to the guest (i.e having
>big and little vCPUs).
>
>>
>>This patchset is to use cpupool to block the vcpu be scheduled between big and
>>little cpus.
>>
>>>
>>>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).

Back to this.
In xen/arch/arm/traps.c, I found that
"
WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,
 HCR_EL2);
"

HCR_TACR, HCR_TIDx is not set. HCR_TIDCP is set, but this is used to trap
implementation defined registers.

So accessing the actlr and cpu feature registers(exclude the implementation 
defined ones)
in guest os will not trap to xen, right?
If this is true, the actlr and cpu feature registers for DomU in Pool-A72 in my 
case
should be correct.

Thanks,
Peng.

>>
>>Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
>>the A53 and A72 TRM about AArch64 implementationd defined registers.
>>This actlr can be added to the cpupool_arch_info as midr.
>>
>>Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
>>Please advise if I missed anything else.
>
>Have you check the register emulation?


>
>>
>>>

Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first 
one
that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
`xl cpupool-list -c` will show cpu[0-3] in Pool-0.

Then use the following script to create a new cpupool and add cpu[4-5] to
the cpupool.
#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
#xl cpupool-cpu-add Pool-A72 4
#xl cpupool-cpu-add Pool-A72 5
#xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>>>
>>>I am a bit confused with these runes. It means that only the first kind of
>>>CPUs have pool assigned. Why don't you directly create all the pools at boot
>>>time?
>>
>>If need to create all the pools, need to decided how many pools need to be 
>>created.
>>I thought about this, but I do not come out a good idea.
>>
>>The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
>>need to alloc cpupools dynamically when booting. I would not like to change a
>>lot to common code.
>
>Why? We should avoid to choose a specific design just because the common code
>does not allow you to do it without heavy change.
>
>We never came across the big.LITTLE problem on x86, so it is normal to modify
>the code.
>
>>The implementation in this patchset I think is an easy way to

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread George Dunlap

On 19/09/16 11:06, Julien Grall wrote:
> Hi George,
> 
> On 19/09/2016 11:45, George Dunlap wrote:
>> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall 
>> wrote:
> As mentioned in the mail you pointed above, this series is not
> enough to
> make
> big.LITTLE working on then. Xen is always using the boot CPU to detect
> the
> list of features. With big.LITTLE features may not be the same.
>
> And I would prefer to see Xen supporting big.LITTLE correctly before
> beginning to think to expose big.LITTLE to the userspace (via cpupool)
> automatically.


 Do you mean vcpus be scheduled between big and little cpus freely?
>>>
>>>
>>> By supporting big.LITTLE correctly I meant Xen thinks that all the
>>> cores has
>>> the same set of features. So the feature detection is only done the boot
>>> CPU. See processor_setup for instance...
>>>
>>> Moving vCPUs between big and little cores would be a hard task (cache
>>> line
>>> issue, and possibly feature) and I don't expect to ever cross this in
>>> Xen.
>>> However, I am expecting to see big.LITTLE exposed to the guest (i.e
>>> having
>>> big and little vCPUs).
>>
>> So it sounds like the big and LITTLE cores are architecturally
>> different enough that software must be aware of which one it's running
>> on?
> 
> That's correct. Each big and LITTLE cores may have different errata,
> different features...
> 
> It has also the advantage to let the guest dealing itself with its own
> power efficiency without introducing a specific Xen interface.

Well in theory there would be advantages either way -- either to
allowing Xen to automatically add power-saving "smarts" to guests which
weren't programmed for them, or to exposing the power-saving abilities
to guests which were.  But it sounds like automatically migrating
between them isn't really an option (or would be a lot more trouble than
it's worth).

>>> I care about having a design allowing an easy use of big.LITTLE on
>>> Xen. Your
>>> solution requires the administrator to know the underlying platform and
>>> create the pool.
>>>
>>> In the solution I suggested, the pools would be created by Xen (and
>>> the info
>>> exposed to the userspace for the admin).
>>
>> FWIW another approach could be the one taken by "xl
>> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
>> something that would automatically set up the pools.
>>
>> But expanding the schedulers to know about different classes of cpus,
>> and having vcpus specified as running only on specific types of pcpus,
>> seems like a more flexible approach.
> 
> So, if I understand correctly, you would not recommend to extend the
> number of CPU pool per domain, correct?

Well imagine trying to set the scheduling parameters, such as weight,
which in the past have been per-domain.  Now you have to specify
parameters for a domain in each of the cpupools that its' in.

No, I think it would be a lot simpler to just teach the scheduler about
different classes of cpus.  credit1 would probably need to be modified
so that its credit algorithm would be per-class rather than pool-wide;
but credit2 shouldn't need much modification at all, other than to make
sure that a given runqueue doesn't include more than one class; and to
do load-balancing only with runqueues of the same class.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Juergen Gross

On 19/09/16 12:06, Julien Grall wrote:
> Hi George,
> 
> On 19/09/2016 11:45, George Dunlap wrote:
>> On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall 
>> wrote:
> As mentioned in the mail you pointed above, this series is not
> enough to
> make
> big.LITTLE working on then. Xen is always using the boot CPU to detect
> the
> list of features. With big.LITTLE features may not be the same.
>
> And I would prefer to see Xen supporting big.LITTLE correctly before
> beginning to think to expose big.LITTLE to the userspace (via cpupool)
> automatically.


 Do you mean vcpus be scheduled between big and little cpus freely?
>>>
>>>
>>> By supporting big.LITTLE correctly I meant Xen thinks that all the
>>> cores has
>>> the same set of features. So the feature detection is only done the boot
>>> CPU. See processor_setup for instance...
>>>
>>> Moving vCPUs between big and little cores would be a hard task (cache
>>> line
>>> issue, and possibly feature) and I don't expect to ever cross this in
>>> Xen.
>>> However, I am expecting to see big.LITTLE exposed to the guest (i.e
>>> having
>>> big and little vCPUs).
>>
>> So it sounds like the big and LITTLE cores are architecturally
>> different enough that software must be aware of which one it's running
>> on?
> 
> That's correct. Each big and LITTLE cores may have different errata,
> different features...
> 
> It has also the advantage to let the guest dealing itself with its own
> power efficiency without introducing a specific Xen interface.
> 
>>
>> Exposing varying numbers of big and LITTLE vcpus to guests seems like
>> a sensible approach.  But at the moment cpupools only allow a domain
>> to be in exactly one pool -- meaning if we use cpupools to control the
>> big.LITTLE placement, you won't be *able* to have guests with both big
>> and LITTLE vcpus.
>>
 If need to create all the pools, need to decided how many pools need
 to be
 created.
 I thought about this, but I do not come out a good idea.

 The cpupool0 is defined in xen/common/cpupool.c, if need to create many
 pools,
 need to alloc cpupools dynamically when booting. I would not like to
 change a
 lot to common code.
>>>
>>>
>>> Why? We should avoid to choose a specific design just because the common
>>> code does not allow you to do it without heavy change.
>>>
>>> We never came across the big.LITTLE problem on x86, so it is normal to
>>> modify the code.
>>
>> Julien is correct; there's no reason we couldn't have a default
>> multiple pools on boot.
>>
 The implementation in this patchset I think is an easy way to let
 Big and
 Little
 CPUs all run.
>>>
>>>
>>> I care about having a design allowing an easy use of big.LITTLE on
>>> Xen. Your
>>> solution requires the administrator to know the underlying platform and
>>> create the pool.
>>>
>>> In the solution I suggested, the pools would be created by Xen (and
>>> the info
>>> exposed to the userspace for the admin).
>>
>> FWIW another approach could be the one taken by "xl
>> cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
>> something that would automatically set up the pools.
>>
>> But expanding the schedulers to know about different classes of cpus,
>> and having vcpus specified as running only on specific types of pcpus,
>> seems like a more flexible approach.
> 
> So, if I understand correctly, you would not recommend to extend the
> number of CPU pool per domain, correct?

Before deciding in which direction to go (multiple cpupools, sub-pools,
kind of implicit cpu pinning) I think we should think about the
implications regarding today's interfaces:

- Do we want to be able to use different schedulers for big/little
  (this would mean some cpupool related solution)? I'd prefer to
  have only one scheduler type for each domain. :-)

- What about scheduling parameters like weight and cap? How would
  those apply (answer probably influencing pinning solution).
  Remember that especially the downsides of pinning led to the
  introduction of cpupools.

- Is big.LITTLE to be expected to be combined with NUMA?

- Do we need to support live migration for domains containing both
  types of cpus?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Julien Grall


Hi George,

On 19/09/2016 11:45, George Dunlap wrote:

On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall  wrote:

As mentioned in the mail you pointed above, this series is not enough to
make
big.LITTLE working on then. Xen is always using the boot CPU to detect
the
list of features. With big.LITTLE features may not be the same.

And I would prefer to see Xen supporting big.LITTLE correctly before
beginning to think to expose big.LITTLE to the userspace (via cpupool)
automatically.



Do you mean vcpus be scheduled between big and little cpus freely?



By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
the same set of features. So the feature detection is only done the boot
CPU. See processor_setup for instance...

Moving vCPUs between big and little cores would be a hard task (cache line
issue, and possibly feature) and I don't expect to ever cross this in Xen.
However, I am expecting to see big.LITTLE exposed to the guest (i.e having
big and little vCPUs).


So it sounds like the big and LITTLE cores are architecturally
different enough that software must be aware of which one it's running
on?


That's correct. Each big and LITTLE cores may have different errata, 
different features...


It has also the advantage to let the guest dealing itself with its own 
power efficiency without introducing a specific Xen interface.




Exposing varying numbers of big and LITTLE vcpus to guests seems like
a sensible approach.  But at the moment cpupools only allow a domain
to be in exactly one pool -- meaning if we use cpupools to control the
big.LITTLE placement, you won't be *able* to have guests with both big
and LITTLE vcpus.


If need to create all the pools, need to decided how many pools need to be
created.
I thought about this, but I do not come out a good idea.

The cpupool0 is defined in xen/common/cpupool.c, if need to create many
pools,
need to alloc cpupools dynamically when booting. I would not like to
change a
lot to common code.



Why? We should avoid to choose a specific design just because the common
code does not allow you to do it without heavy change.

We never came across the big.LITTLE problem on x86, so it is normal to
modify the code.


Julien is correct; there's no reason we couldn't have a default
multiple pools on boot.


The implementation in this patchset I think is an easy way to let Big and
Little
CPUs all run.



I care about having a design allowing an easy use of big.LITTLE on Xen. Your
solution requires the administrator to know the underlying platform and
create the pool.

In the solution I suggested, the pools would be created by Xen (and the info
exposed to the userspace for the admin).


FWIW another approach could be the one taken by "xl
cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
something that would automatically set up the pools.

But expanding the schedulers to know about different classes of cpus,
and having vcpus specified as running only on specific types of pcpus,
seems like a more flexible approach.


So, if I understand correctly, you would not recommend to extend the 
number of CPU pool per domain, correct?


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Julien Grall




On 19/09/2016 11:38, Peng Fan wrote:

On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:

Hello,

On 19/09/2016 10:36, Peng Fan wrote:

On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:

Hello Peng,

On 19/09/2016 04:08, van.free...@gmail.com wrote:

From: Peng Fan 

This patchset is to support XEN run on big.little SoC.
The idea of the patch is from
"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;

There are some changes to cpupool and add x86 stub functions to avoid build
break. Sending The RFC patchset out is to request for comments to see whether
this implementation is acceptable or not. Patchset have been tested based on
xen-4.8 unstable on NXP i.MX8.

I use Big/Little CPU and cpupool to explain the idea.
A pool contains Big CPUs is called Big Pool.
A pool contains Little CPUs is called Little Pool.
If a pool does not contains any physical cpus, Little CPUs or Big CPUs
can be added to the cpupool. But the cpupool can not contain both Little
and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for 
ARM).
CPUs can not be added to the cpupool which contains cpus that have different 
cpu type.
Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
type(same midr value)
into cpupool0.


As mentioned in the mail you pointed above, this series is not enough to make
big.LITTLE working on then. Xen is always using the boot CPU to detect the
list of features. With big.LITTLE features may not be the same.

And I would prefer to see Xen supporting big.LITTLE correctly before
beginning to think to expose big.LITTLE to the userspace (via cpupool)
automatically.


Do you mean vcpus be scheduled between big and little cpus freely?


By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
the same set of features. So the feature detection is only done the boot CPU.
See processor_setup for instance...

Moving vCPUs between big and little cores would be a hard task (cache line
issue, and possibly feature) and I don't expect to ever cross this in Xen.
However, I am expecting to see big.LITTLE exposed to the guest (i.e having
big and little vCPUs).


big vCPUs scheduled on big Physical CPUs and little vCPUs scheduled on little
physical cpus, right?
If it is, is there is a need to let Xen think all the cores has the same set
of features?


I think you missed my point. The feature registers on big and little 
cores may be different. Currently, Xen is reading the feature registers 
of the CPU boot and wrongly assumes that those features will exists on 
all CPUs. This is not the case and should be fixed before we are getting 
in trouble.




Developing big.little guest support, I am not sure how much efforts needed.
Is this really needed?


This is not necessary at the moment, although I have seen some interest 
about it. Running a guest only on a little core is a nice beginning, but 
a guest may want to take advantage of big.LITTLE (running hungry app on 
big one and little on small one).








This patchset is to use cpupool to block the vcpu be scheduled between big and
little cpus.



See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).


Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
the A53 and A72 TRM about AArch64 implementationd defined registers.
This actlr can be added to the cpupool_arch_info as midr.

Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
Please advise if I missed anything else.


Have you check the register emulation?


Checked midr. Have not checked others.
I think I missed some registers in ctxt_switch_to.









Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
`xl cpupool-list -c` will show cpu[0-3] in Pool-0.

Then use the following script to create a new cpupool and add cpu[4-5] to
the cpupool.
#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
#xl cpupool-cpu-add Pool-A72 4
#xl cpupool-cpu-add Pool-A72 5
#xl create -d /root/xen/domu-test pool=\"Pool-A72\"


I am a bit confused with these runes. It means that only the first kind of
CPUs have pool assigned. Why don't you directly create all the pools at boot
time?


If need to create all the pools, need to decided how many pools need to be 
created.
I thought about this, but I do not come out a good idea.

The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
need to alloc cpupools dynamically when booting. I would not like to change a
lot to common code.


Why? We should avoid to choose a specific design just because the common code
does not allow you to do it without heavy change.

We never came across

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread George Dunlap

On Mon, Sep 19, 2016 at 9:53 AM, Julien Grall  wrote:
>>> As mentioned in the mail you pointed above, this series is not enough to
>>> make
>>> big.LITTLE working on then. Xen is always using the boot CPU to detect
>>> the
>>> list of features. With big.LITTLE features may not be the same.
>>>
>>> And I would prefer to see Xen supporting big.LITTLE correctly before
>>> beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>> automatically.
>>
>>
>> Do you mean vcpus be scheduled between big and little cpus freely?
>
>
> By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
> the same set of features. So the feature detection is only done the boot
> CPU. See processor_setup for instance...
>
> Moving vCPUs between big and little cores would be a hard task (cache line
> issue, and possibly feature) and I don't expect to ever cross this in Xen.
> However, I am expecting to see big.LITTLE exposed to the guest (i.e having
> big and little vCPUs).

So it sounds like the big and LITTLE cores are architecturally
different enough that software must be aware of which one it's running
on?

Exposing varying numbers of big and LITTLE vcpus to guests seems like
a sensible approach.  But at the moment cpupools only allow a domain
to be in exactly one pool -- meaning if we use cpupools to control the
big.LITTLE placement, you won't be *able* to have guests with both big
and LITTLE vcpus.

>> If need to create all the pools, need to decided how many pools need to be
>> created.
>> I thought about this, but I do not come out a good idea.
>>
>> The cpupool0 is defined in xen/common/cpupool.c, if need to create many
>> pools,
>> need to alloc cpupools dynamically when booting. I would not like to
>> change a
>> lot to common code.
>
>
> Why? We should avoid to choose a specific design just because the common
> code does not allow you to do it without heavy change.
>
> We never came across the big.LITTLE problem on x86, so it is normal to
> modify the code.

Julien is correct; there's no reason we couldn't have a default
multiple pools on boot.

>> The implementation in this patchset I think is an easy way to let Big and
>> Little
>> CPUs all run.
>
>
> I care about having a design allowing an easy use of big.LITTLE on Xen. Your
> solution requires the administrator to know the underlying platform and
> create the pool.
>
> In the solution I suggested, the pools would be created by Xen (and the info
> exposed to the userspace for the admin).

FWIW another approach could be the one taken by "xl
cpupool-numa-split": you could have "xl cpupool-bigLITTLE-split" or
something that would automatically set up the pools.

But expanding the schedulers to know about different classes of cpus,
and having vcpus specified as running only on specific types of pcpus,
seems like a more flexible approach.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Peng Fan

On Mon, Sep 19, 2016 at 10:53:56AM +0200, Julien Grall wrote:
>Hello,
>
>On 19/09/2016 10:36, Peng Fan wrote:
>>On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>>>Hello Peng,
>>>
>>>On 19/09/2016 04:08, van.free...@gmail.com wrote:
From: Peng Fan 

This patchset is to support XEN run on big.little SoC.
The idea of the patch is from
"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;

There are some changes to cpupool and add x86 stub functions to avoid build
break. Sending The RFC patchset out is to request for comments to see 
whether
this implementation is acceptable or not. Patchset have been tested based on
xen-4.8 unstable on NXP i.MX8.

I use Big/Little CPU and cpupool to explain the idea.
A pool contains Big CPUs is called Big Pool.
A pool contains Little CPUs is called Little Pool.
If a pool does not contains any physical cpus, Little CPUs or Big CPUs
can be added to the cpupool. But the cpupool can not contain both Little
and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value 
for ARM).
CPUs can not be added to the cpupool which contains cpus that have 
different cpu type.
Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
type(same midr value)
into cpupool0.
>>>
>>>As mentioned in the mail you pointed above, this series is not enough to make
>>>big.LITTLE working on then. Xen is always using the boot CPU to detect the
>>>list of features. With big.LITTLE features may not be the same.
>>>
>>>And I would prefer to see Xen supporting big.LITTLE correctly before
>>>beginning to think to expose big.LITTLE to the userspace (via cpupool)
>>>automatically.
>>
>>Do you mean vcpus be scheduled between big and little cpus freely?
>
>By supporting big.LITTLE correctly I meant Xen thinks that all the cores has
>the same set of features. So the feature detection is only done the boot CPU.
>See processor_setup for instance...
>
>Moving vCPUs between big and little cores would be a hard task (cache line
>issue, and possibly feature) and I don't expect to ever cross this in Xen.
>However, I am expecting to see big.LITTLE exposed to the guest (i.e having
>big and little vCPUs).

big vCPUs scheduled on big Physical CPUs and little vCPUs scheduled on little
physical cpus, right?
If it is, is there is a need to let Xen think all the cores has the same set
of features?

Developing big.little guest support, I am not sure how much efforts needed.
Is this really needed? 

>
>>
>>This patchset is to use cpupool to block the vcpu be scheduled between big and
>>little cpus.
>>
>>>
>>>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).
>>
>>Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
>>the A53 and A72 TRM about AArch64 implementationd defined registers.
>>This actlr can be added to the cpupool_arch_info as midr.
>>
>>Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
>>Please advise if I missed anything else.
>
>Have you check the register emulation?

Checked midr. Have not checked others.
I think I missed some registers in ctxt_switch_to.

>
>>
>>>

Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first 
one
that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
`xl cpupool-list -c` will show cpu[0-3] in Pool-0.

Then use the following script to create a new cpupool and add cpu[4-5] to
the cpupool.
#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
#xl cpupool-cpu-add Pool-A72 4
#xl cpupool-cpu-add Pool-A72 5
#xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>>>
>>>I am a bit confused with these runes. It means that only the first kind of
>>>CPUs have pool assigned. Why don't you directly create all the pools at boot
>>>time?
>>
>>If need to create all the pools, need to decided how many pools need to be 
>>created.
>>I thought about this, but I do not come out a good idea.
>>
>>The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
>>need to alloc cpupools dynamically when booting. I would not like to change a
>>lot to common code.
>
>Why? We should avoid to choose a specific design just because the common code
>does not allow you to do it without heavy change.
>
>We never came across the big.LITTLE problem on x86, so it is normal to modify
>the code.
>
>>The implementation in this patchset I think is an easy way to let Big and 
>>Little
>>CPUs all run.
>
>I care about having a design allowing an easy use of big.LITTLE on Xen. Your
>solution requires the administrator to know the underlying platform and
>create the pool.

I

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Julien Grall


Hello,

On 19/09/2016 10:36, Peng Fan wrote:

On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:

Hello Peng,

On 19/09/2016 04:08, van.free...@gmail.com wrote:

From: Peng Fan 

This patchset is to support XEN run on big.little SoC.
The idea of the patch is from
"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;

There are some changes to cpupool and add x86 stub functions to avoid build
break. Sending The RFC patchset out is to request for comments to see whether
this implementation is acceptable or not. Patchset have been tested based on
xen-4.8 unstable on NXP i.MX8.

I use Big/Little CPU and cpupool to explain the idea.
A pool contains Big CPUs is called Big Pool.
A pool contains Little CPUs is called Little Pool.
If a pool does not contains any physical cpus, Little CPUs or Big CPUs
can be added to the cpupool. But the cpupool can not contain both Little
and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for 
ARM).
CPUs can not be added to the cpupool which contains cpus that have different 
cpu type.
Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
type(same midr value)
into cpupool0.


As mentioned in the mail you pointed above, this series is not enough to make
big.LITTLE working on then. Xen is always using the boot CPU to detect the
list of features. With big.LITTLE features may not be the same.

And I would prefer to see Xen supporting big.LITTLE correctly before
beginning to think to expose big.LITTLE to the userspace (via cpupool)
automatically.


Do you mean vcpus be scheduled between big and little cpus freely?


By supporting big.LITTLE correctly I meant Xen thinks that all the cores 
has the same set of features. So the feature detection is only done the 
boot CPU. See processor_setup for instance...


Moving vCPUs between big and little cores would be a hard task (cache 
line issue, and possibly feature) and I don't expect to ever cross this 
in Xen. However, I am expecting to see big.LITTLE exposed to the guest 
(i.e having big and little vCPUs).




This patchset is to use cpupool to block the vcpu be scheduled between big and
little cpus.



See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).


Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
the A53 and A72 TRM about AArch64 implementationd defined registers.
This actlr can be added to the cpupool_arch_info as midr.

Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
Please advise if I missed anything else.


Have you check the register emulation?







Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
`xl cpupool-list -c` will show cpu[0-3] in Pool-0.

Then use the following script to create a new cpupool and add cpu[4-5] to
the cpupool.
#xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
#xl cpupool-cpu-add Pool-A72 4
#xl cpupool-cpu-add Pool-A72 5
#xl create -d /root/xen/domu-test pool=\"Pool-A72\"


I am a bit confused with these runes. It means that only the first kind of
CPUs have pool assigned. Why don't you directly create all the pools at boot
time?


If need to create all the pools, need to decided how many pools need to be 
created.
I thought about this, but I do not come out a good idea.

The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
need to alloc cpupools dynamically when booting. I would not like to change a
lot to common code.


Why? We should avoid to choose a specific design just because the common 
code does not allow you to do it without heavy change.


We never came across the big.LITTLE problem on x86, so it is normal to 
modify the code.



The implementation in this patchset I think is an easy way to let Big and Little
CPUs all run.


I care about having a design allowing an easy use of big.LITTLE on Xen. 
Your solution requires the administrator to know the underlying platform 
and create the pool.


In the solution I suggested, the pools would be created by Xen (and the 
info exposed to the userspace for the admin).






Also, in which pool a domain will be created if none is specified?


Now `xl cpupool-list -c` shows:
NameCPU list
Pool-0  0,1,2,3
Pool-A724,5

`xl cpupool-list` shows:
Name   CPUs   Sched Active   Domain count
Pool-0   4credit   y  1
Pool-A72 2   credit2   y  1

`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.

`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
in Pool-A72

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Peng Fan

Hello Julien,

On Mon, Sep 19, 2016 at 10:09:06AM +0200, Julien Grall wrote:
>Hello Peng,
>
>On 19/09/2016 04:08, van.free...@gmail.com wrote:
>>From: Peng Fan 
>>
>>This patchset is to support XEN run on big.little SoC.
>>The idea of the patch is from
>>"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;
>>
>>There are some changes to cpupool and add x86 stub functions to avoid build
>>break. Sending The RFC patchset out is to request for comments to see whether
>>this implementation is acceptable or not. Patchset have been tested based on
>>xen-4.8 unstable on NXP i.MX8.
>>
>>I use Big/Little CPU and cpupool to explain the idea.
>>A pool contains Big CPUs is called Big Pool.
>>A pool contains Little CPUs is called Little Pool.
>>If a pool does not contains any physical cpus, Little CPUs or Big CPUs
>>can be added to the cpupool. But the cpupool can not contain both Little
>>and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value 
>>for ARM).
>>CPUs can not be added to the cpupool which contains cpus that have different 
>>cpu type.
>>Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
>>and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
>>When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
>>type(same midr value)
>>into cpupool0.
>
>As mentioned in the mail you pointed above, this series is not enough to make
>big.LITTLE working on then. Xen is always using the boot CPU to detect the
>list of features. With big.LITTLE features may not be the same.
>
>And I would prefer to see Xen supporting big.LITTLE correctly before
>beginning to think to expose big.LITTLE to the userspace (via cpupool)
>automatically.

Do you mean vcpus be scheduled between big and little cpus freely?

This patchset is to use cpupool to block the vcpu be scheduled between big and
little cpus.

>
>See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).

Thanks for this. I only expose cpuid to guest, missed actlr. I'll check
the A53 and A72 TRM about AArch64 implementationd defined registers.
This actlr can be added to the cpupool_arch_info as midr.

Reading "vcpu_initialise", seems only MIDR and ACTLR needs to be handled.
Please advise if I missed anything else.

>
>>
>>Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
>>that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
>>cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
>>`xl cpupool-list -c` will show cpu[0-3] in Pool-0.
>>
>>Then use the following script to create a new cpupool and add cpu[4-5] to
>>the cpupool.
>> #xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
>> #xl cpupool-cpu-add Pool-A72 4
>> #xl cpupool-cpu-add Pool-A72 5
>> #xl create -d /root/xen/domu-test pool=\"Pool-A72\"
>
>I am a bit confused with these runes. It means that only the first kind of
>CPUs have pool assigned. Why don't you directly create all the pools at boot
>time?

If need to create all the pools, need to decided how many pools need to be 
created.
I thought about this, but I do not come out a good idea.

The cpupool0 is defined in xen/common/cpupool.c, if need to create many pools,
need to alloc cpupools dynamically when booting. I would not like to change a
lot to common code.

The implementation in this patchset I think is an easy way to let Big and Little
CPUs all run.

>
>Also, in which pool a domain will be created if none is specified?
>
>>Now `xl cpupool-list -c` shows:
>>NameCPU list
>>Pool-0  0,1,2,3
>>Pool-A724,5
>>
>>`xl cpupool-list` shows:
>>Name   CPUs   Sched Active   Domain count
>>Pool-0   4credit   y  1
>>Pool-A72 2   credit2   y  1
>>
>>`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
>>not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.
>>
>>`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
>>in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.
>>
>>Patch 1/5:
>>use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
>>because num_online_cpus() counts all the online CPUs, but now we only
>>need Big or Little CPUs.
>
>So if I understand correctly, if the boot CPU is a little CPU, DOM0 will
>always be able to only use little ones. Is that right?

Yeah. Dom0 only use the little ones.

Thanks,
Peng.
>
>Regards,
>
>-- 
>Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-19 Thread Julien Grall


Hello Peng,

On 19/09/2016 04:08, van.free...@gmail.com wrote:

From: Peng Fan 

This patchset is to support XEN run on big.little SoC.
The idea of the patch is from
"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;

There are some changes to cpupool and add x86 stub functions to avoid build
break. Sending The RFC patchset out is to request for comments to see whether
this implementation is acceptable or not. Patchset have been tested based on
xen-4.8 unstable on NXP i.MX8.

I use Big/Little CPU and cpupool to explain the idea.
A pool contains Big CPUs is called Big Pool.
A pool contains Little CPUs is called Little Pool.
If a pool does not contains any physical cpus, Little CPUs or Big CPUs
can be added to the cpupool. But the cpupool can not contain both Little
and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for 
ARM).
CPUs can not be added to the cpupool which contains cpus that have different 
cpu type.
Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
type(same midr value)
into cpupool0.


As mentioned in the mail you pointed above, this series is not enough to 
make big.LITTLE working on then. Xen is always using the boot CPU to 
detect the list of features. With big.LITTLE features may not be the same.


And I would prefer to see Xen supporting big.LITTLE correctly before 
beginning to think to expose big.LITTLE to the userspace (via cpupool) 
automatically.


See for instance v->arch.actlr = READ_SYSREG32(ACTLR_EL1).



Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
`xl cpupool-list -c` will show cpu[0-3] in Pool-0.

Then use the following script to create a new cpupool and add cpu[4-5] to
the cpupool.
 #xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
 #xl cpupool-cpu-add Pool-A72 4
 #xl cpupool-cpu-add Pool-A72 5
 #xl create -d /root/xen/domu-test pool=\"Pool-A72\"


I am a bit confused with these runes. It means that only the first kind 
of CPUs have pool assigned. Why don't you directly create all the pools 
at boot time?


Also, in which pool a domain will be created if none is specified?


Now `xl cpupool-list -c` shows:
NameCPU list
Pool-0  0,1,2,3
Pool-A724,5

`xl cpupool-list` shows:
Name   CPUs   Sched Active   Domain count
Pool-0   4credit   y  1
Pool-A72 2   credit2   y  1

`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.

`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.

Patch 1/5:
use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
because num_online_cpus() counts all the online CPUs, but now we only
need Big or Little CPUs.


So if I understand correctly, if the boot CPU is a little CPU, DOM0 will 
always be able to only use little ones. Is that right?


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [RFC 0/5] xen/arm: support big.little SoC

2016-09-18 Thread van . freenix

From: Peng Fan 

This patchset is to support XEN run on big.little SoC.
The idea of the patch is from
"https://lists.xenproject.org/archives/html/xen-devel/2016-05/msg00465.html;

There are some changes to cpupool and add x86 stub functions to avoid build
break. Sending The RFC patchset out is to request for comments to see whether
this implementation is acceptable or not. Patchset have been tested based on
xen-4.8 unstable on NXP i.MX8.

I use Big/Little CPU and cpupool to explain the idea.
A pool contains Big CPUs is called Big Pool.
A pool contains Little CPUs is called Little Pool.
If a pool does not contains any physical cpus, Little CPUs or Big CPUs
can be added to the cpupool. But the cpupool can not contain both Little
and Big CPUs. The CPUs in a cpupool must have the same cpu type(midr value for 
ARM).
CPUs can not be added to the cpupool which contains cpus that have different 
cpu type.
Little CPUs can not be moved to Big Pool if there are Big CPUs in Big Pool,
and versa. Domain in Big Pool can not be migrated to Little Pool, and versa.
When XEN tries to bringup all the CPUs, only add CPUs with the same cpu 
type(same midr value)
into cpupool0.

Thinking an SoC with 4 A53(cpu[0-3]) + 2 A72(cpu[4-5]), cpu0 is the first one
that boots up. When XEN tries to bringup secondary CPUs, add cpu[0-3] to
cpupool0 and leave cpu[4-5] not in any cpupool. Then when Dom0 boots up,
`xl cpupool-list -c` will show cpu[0-3] in Pool-0.

Then use the following script to create a new cpupool and add cpu[4-5] to
the cpupool.
 #xl cpupool-create name=\"Pool-A72\" sched=\"credit2\"
 #xl cpupool-cpu-add Pool-A72 4
 #xl cpupool-cpu-add Pool-A72 5
 #xl create -d /root/xen/domu-test pool=\"Pool-A72\"
Now `xl cpupool-list -c` shows:
NameCPU list
Pool-0  0,1,2,3 
Pool-A724,5

`xl cpupool-list` shows:
Name   CPUs   Sched Active   Domain count
Pool-0   4credit   y  1
Pool-A72 2   credit2   y  1

`xl cpupool-cpu-remove Pool-A72 4`, then `xl cpupool-cpu-add Pool-0 4`
not success, because Pool-0 contains A53 CPUs, but CPU4 is an A72 CPU.

`xl cpupool-migrate DomU Pool-0` will also fail, because DomU is created
in Pool-A72 with A72 vcpu, while Pool-0 have A53 physical cpus.

Patch 1/5:
use "cpumask_weight(cpupool0->cpu_valid);" to replace "num_online_cpus()",
because num_online_cpus() counts all the online CPUs, but now we only
need Big or Little CPUs.

Patch 2/5:
Introduce cpupool_arch_info. To ARM SoC, need to add midr info to the cpupool.
The info will be used in patch [3,4,5]/5.

Patch 3/5:
Need to check whether it is ok to add a physical cpu to a cpupool,
When the cpupool does not contain any physical cpus, it is ok
to add a cpu to the cpupool without care the cpu type.
Need to check whether it is ok to move a domain to another cpupool.

Patch 4/5:
move vpidr from arch_domain to arch_vcpu.
The vpidr in arch_domain is initialized in arch_domain_create,
at this time, the domain is still in cpupool0, not moved the specified
cpupool. We need to initialize vpidr later. But at the late stage,
no method to initialize vpidr in arch_domain, so I move it to
arch_vcpu.

Patch 5/5:
This is to check whether it is ok to move a domain to another cpupool.

Peng Fan (5):
  xen/arm: domain_build: setting opt_dom0_max_vcpus according to
cpupool0 info
  xen: cpupool: introduce cpupool_arch_info
  xen: cpupool: add arch cpupool hook
  xen/arm: move vpidr from arch_domain to arch_vcpu
  xen/arm: cpupool: implement arch_domain_cpupool_compatible

 xen/arch/arm/Makefile |  1 +
 xen/arch/arm/cpupool.c| 60 +++
 xen/arch/arm/domain.c |  9 ---
 xen/arch/arm/domain_build.c   |  3 ++-
 xen/arch/arm/traps.c  |  2 +-
 xen/arch/x86/cpu/Makefile |  1 +
 xen/arch/x86/cpu/cpupool.c| 30 ++
 xen/common/cpupool.c  | 30 ++
 xen/include/asm-arm/cpupool.h | 16 
 xen/include/asm-arm/domain.h  |  9 ---
 xen/include/asm-x86/cpupool.h | 16 
 xen/include/xen/sched-if.h|  5 
 12 files changed, 173 insertions(+), 9 deletions(-)
 create mode 100644 xen/arch/arm/cpupool.c
 create mode 100644 xen/arch/x86/cpu/cpupool.c
 create mode 100644 xen/include/asm-arm/cpupool.h
 create mode 100644 xen/include/asm-x86/cpupool.h

-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

80 matches

Mail list logo