Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-25 Thread Yi Sun
On 17-04-25 02:24:40, Jan Beulich wrote:
> >>> On 25.04.17 at 09:15,  wrote:
> > Sorry, this may cause potential issue and is not a good example. But from SW
> > view, there is another case where the per-socket supporting is important in
> > real-time scenarios. You may have a real-time domain on one socket that 
> > requires
> > different masks (especially for code/data) to guarantee its performance vs.
> > other general-purpose domains run on a different socket. In that case it’s a
> > heterogeneous software usage model (rather than heterogeneous hardware). 
> > And,
> > we should not force same masks setting on different sockets in such case.
> 
> I don't follow: The COS IDs for the real-time and general purpose
> domains would be different, wouldn't they? Thus there would be
> different masks in use, as intended.
> 
Yes, you are right. But as above case, the real-time domain only runs on one
socket. If per-socket supporting is enabled, we can allocate this COS ID to
other domains on other sockets. This can help to improve the scarce cache
utilization.

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-25 Thread Jan Beulich
>>> On 25.04.17 at 09:15,  wrote:
> Sorry, this may cause potential issue and is not a good example. But from SW
> view, there is another case where the per-socket supporting is important in
> real-time scenarios. You may have a real-time domain on one socket that 
> requires
> different masks (especially for code/data) to guarantee its performance vs.
> other general-purpose domains run on a different socket. In that case it’s a
> heterogeneous software usage model (rather than heterogeneous hardware). And,
> we should not force same masks setting on different sockets in such case.

I don't follow: The COS IDs for the real-time and general purpose
domains would be different, wouldn't they? Thus there would be
different masks in use, as intended.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-25 Thread Yi Sun
On 17-04-24 00:55:38, Jan Beulich wrote:
> >>> On 24.04.17 at 08:40,  wrote:
> > As what we talked on IRC last Friday, I have got answers for your
> > two final questions below:
> > 1. Why domain setting is designed to per-socket, any reason? 
> > Answer: There is a real case from Intel's customer. HSX (Haswell server)
> > and BDX (Broadwell server) processors are plugged into each socket of a
> > Grantley platform. HSX does not support CAT but BDX does.
> > 
> > You and Chao Peng discussed this before for CAT feature enabling patches.
> > The asymmetry supporting is agreed.
> 
> I don't see any agreement in those threads. The first sub-thread is
> merely mentioning this configuration, while the second is only about
> nr_sockets calculation.
> 
> > http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs
> >  
> > 3ti+state:results
> > http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63e
> >  
> > jte+state:results
> > 
> > 2. Why cannot the previous setting to the domains be kept when socket is 
> > online?
> > Answer: If the asymmetry system is supported, we cannot assume the 
> > configuration
> > can be applied to new socket.
> 
> I'm afraid we'd have problems elsewhere if we tried to run Xen on
> a mixed-model system. Unless you can prove PSR/CAT is the only
> relevant (read: affecting Xen's behavior) hardware difference
> between Haswell and Broadwell (for example, isn't the former
> SMEP only, but the latter SMEP+SMAP?), I don't buy this as a
> reason to have more complicated than necessary code.
> 
Sorry, this may cause potential issue and is not a good example. But from SW
view, there is another case where the per-socket supporting is important in
real-time scenarios. You may have a real-time domain on one socket that requires
different masks (especially for code/data) to guarantee its performance vs.
other general-purpose domains run on a different socket. In that case it’s a
heterogeneous software usage model (rather than heterogeneous hardware). And,
we should not force same masks setting on different sockets in such case.
Because CLOS are a scarce software resource which should be wasted. One of
the most important reasons for managing CLOS independently across sockets is
to preserve the flexibility in using CLOS which is key as they are a scarce
resource. 

BRs,
Sun Yi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-24 Thread Jan Beulich
>>> On 24.04.17 at 08:40,  wrote:
> As what we talked on IRC last Friday, I have got answers for your
> two final questions below:
> 1. Why domain setting is designed to per-socket, any reason? 
> Answer: There is a real case from Intel's customer. HSX (Haswell server)
> and BDX (Broadwell server) processors are plugged into each socket of a
> Grantley platform. HSX does not support CAT but BDX does.
> 
> You and Chao Peng discussed this before for CAT feature enabling patches.
> The asymmetry supporting is agreed.

I don't see any agreement in those threads. The first sub-thread is
merely mentioning this configuration, while the second is only about
nr_sockets calculation.

> http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs 
> 3ti+state:results
> http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63e 
> jte+state:results
> 
> 2. Why cannot the previous setting to the domains be kept when socket is 
> online?
> Answer: If the asymmetry system is supported, we cannot assume the 
> configuration
> can be applied to new socket.

I'm afraid we'd have problems elsewhere if we tried to run Xen on
a mixed-model system. Unless you can prove PSR/CAT is the only
relevant (read: affecting Xen's behavior) hardware difference
between Haswell and Broadwell (for example, isn't the former
SMEP only, but the latter SMEP+SMAP?), I don't buy this as a
reason to have more complicated than necessary code.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-24 Thread Yi Sun
On 17-04-21 00:18:27, Jan Beulich wrote:
> >>> On 20.04.17 at 04:14,  wrote:
> > On 17-04-19 03:00:29, Jan Beulich wrote:
> >> >>> On 19.04.17 at 10:22,  wrote:
> >> > On 17-04-18 05:46:43, Jan Beulich wrote:
> >> >> >>> On 18.04.17 at 12:55,  wrote:
> >> >> > I made a test patch based on v10 and attached it in mail. Could you 
> >> >> > please
> >> >> > help to check it? Thanks!
> >> >> 
> >> >> This looks reasonable at the first glance, albeit I continue to be
> >> >> unconvinced that this is the only (reasonable) way of solving the
> > 
> > Do you have any other suggestion on this? Thanks!
> > 
> >> >> problem. After all we don't have to go through similar hoops for
> >> >> any other of the register state associated with a vCPU. There
> >> > 
> >> > Sorry, I do not understand your meaning clearly.
> >> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this 
> >> > patch,
> >> >we check 'dom_ids' array to know if the domain's cos id has not been 
> >> > set but
> >> >its 'psr_cos_ids[socket]' already has a non zero value. This case 
> >> > means the
> >> >socket offline has happened so that we have to restore the domain's
> >> >'psr_cos_ids[socket]' to default value 0 which points to default COS 
> >> > MSR.
> >> >I think we have discussed this in previous mails and achieved 
> >> > agreement on
> >> >such logic. 
> >> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, 
> >> > when
> >> >socket is online, the registers values may be modified by FW and 
> >> > others.
> >> >So, we have to restore them to default value which is being done in
> >> >'cat_init_feature'.
> >> > 
> >> > So, what is your exactly meaning here? Thanks!
> >> 
> >> Neither of the two. I'm comparing with COS/PSR-_unrelated_
> >> handling of register state. After all there are other MSRs which
> >> we need to put into the right state after a core comes online.
> >> 
> > For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
> > is offline. The values in it are all 0 when socket is online again. This 
> > causes
> > error when user shows the CBMs. So, we have two options when socket is 
> > online:
> > 1. Read COS MSRs values and save them into 'cos_reg_val[]'.
> > 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.
> > 
> > Per our discussion on v9 patch 05, we decided to adopt option 2.
> 
> Btw., having thought some more about this, putting code into the
> context switch path when there is an alternative is probably the
> wrong thing after all, i.e. if special treatment _is_ really needed,
> doing it in less frequently executed code would likely be better.
> But as before - much depends on clarifying why special treatment
> is needed here, but not elsewhere (and to avoid questions, with
> "elsewhere" I mean outside of PSR/CAT code - there's plenty of
> other CPU register state to take as reference).
> 
Hi, Jan,

As what we talked on IRC last Friday, I have got answers for your
two final questions below:
1. Why domain setting is designed to per-socket, any reason? 
Answer: There is a real case from Intel's customer. HSX (Haswell server)
and BDX (Broadwell server) processors are plugged into each socket of a
Grantley platform. HSX does not support CAT but BDX does.

You and Chao Peng discussed this before for CAT feature enabling patches.
The asymmetry supporting is agreed.
http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:smfz7fbatbnxs3ti+state:results
http://markmail.org/message/xcq5odezfngszvcb#query:+page:1+mid:wlovqpg7oj63ejte+state:results

2. Why cannot the previous setting to the domains be kept when socket is online?
Answer: If the asymmetry system is supported, we cannot assume the configuration
can be applied to new socket.

BRs,
Sun Yi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-21 Thread Jan Beulich
>>> On 20.04.17 at 04:14,  wrote:
> On 17-04-19 03:00:29, Jan Beulich wrote:
>> >>> On 19.04.17 at 10:22,  wrote:
>> > On 17-04-18 05:46:43, Jan Beulich wrote:
>> >> >>> On 18.04.17 at 12:55,  wrote:
>> >> > I made a test patch based on v10 and attached it in mail. Could you 
>> >> > please
>> >> > help to check it? Thanks!
>> >> 
>> >> This looks reasonable at the first glance, albeit I continue to be
>> >> unconvinced that this is the only (reasonable) way of solving the
> 
> Do you have any other suggestion on this? Thanks!
> 
>> >> problem. After all we don't have to go through similar hoops for
>> >> any other of the register state associated with a vCPU. There
>> > 
>> > Sorry, I do not understand your meaning clearly.
>> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this 
>> > patch,
>> >we check 'dom_ids' array to know if the domain's cos id has not been 
>> > set but
>> >its 'psr_cos_ids[socket]' already has a non zero value. This case means 
>> > the
>> >socket offline has happened so that we have to restore the domain's
>> >'psr_cos_ids[socket]' to default value 0 which points to default COS 
>> > MSR.
>> >I think we have discussed this in previous mails and achieved agreement 
>> > on
>> >such logic. 
>> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
>> >socket is online, the registers values may be modified by FW and others.
>> >So, we have to restore them to default value which is being done in
>> >'cat_init_feature'.
>> > 
>> > So, what is your exactly meaning here? Thanks!
>> 
>> Neither of the two. I'm comparing with COS/PSR-_unrelated_
>> handling of register state. After all there are other MSRs which
>> we need to put into the right state after a core comes online.
>> 
> For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
> is offline. The values in it are all 0 when socket is online again. This 
> causes
> error when user shows the CBMs. So, we have two options when socket is online:
> 1. Read COS MSRs values and save them into 'cos_reg_val[]'.
> 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.
> 
> Per our discussion on v9 patch 05, we decided to adopt option 2.

Btw., having thought some more about this, putting code into the
context switch path when there is an alternative is probably the
wrong thing after all, i.e. if special treatment _is_ really needed,
doing it in less frequently executed code would likely be better.
But as before - much depends on clarifying why special treatment
is needed here, but not elsewhere (and to avoid questions, with
"elsewhere" I mean outside of PSR/CAT code - there's plenty of
other CPU register state to take as reference).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-21 Thread Jan Beulich
>>> On 20.04.17 at 18:52,  wrote:
> So to summarise: 
> 
> On item 1: it appears that you are looking for a some more justification why 
> some of the changes were made, maybe with a rationale for some of the choices 
> that were made. Given that this is quite a complex series which has diverged 
> quite a lot from the design, the goal is to make it easier for either you (or 
> someone else) to sanity check the proposal which on the face of things look 
> OK. But you have some doubts and you can't easily check against the design as 
> it is out-of-date.
> 
> On item 2: you think something may not be quite right, but you can't really 
> decide until a couple of questions (not quite sure which, but I am sure Yi 
> can locate them) are answered.
> 
> Let me know whether this is actually true. 

Well, afaict both actually boil down to the same single question
regarding the special handling of CAT MSRs after onlining (at
runtime) a core on a socket all of whose cores had been offline,
namely considering that other CPU registers don't require any
such special treatment (in context switch code or elsewhere).

As to the design possibly being out of date - I have to admit I
didn't even check whether the accompanying documentation
has been kept up to date with the actual code changes. The
matter here really isn't with comparing with the design, but
rather whether the design choice (written down or not) was
an appropriate one.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-20 Thread Konrad Rzeszutek Wilk
> If this reaction of mine is not acceptable, all I can do is refrain
> from further looking at this series. And Yi, I certainly apologize
> for perhaps not doing these reviews wholeheartedly, since -
> as also expressed before - I continue to not really view this as
> very important functionality. Yet considering for how long some
> of the versions hadn't been looked at by anyone at all, the
> alternative would have been to simply let it sit further without
> looking at it. I actually take this lack of interest by others as an
> indication that I'm not the only one considering this nice to have,
> but no more.

I do have an interest in this series. And I can certainly give it
a review - but once you get your teeth in a patchset I feel it is
bit counterintuitive to review it - as you do a much much better
job that I could have.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-20 Thread Lars Kurth

> On 20 Apr 2017, at 14:21, Jan Beulich  wrote:
> 
 On 20.04.17 at 15:02,  wrote:
>>> On 20 Apr 2017, at 10:43, Jan Beulich  wrote:
>>> 
>> On 20.04.17 at 04:14,  wrote:
 On 17-04-19 03:00:29, Jan Beulich wrote:
 On 19.04.17 at 10:22,  wrote:
>> On 17-04-18 05:46:43, Jan Beulich wrote:
>> On 18.04.17 at 12:55,  wrote:
 I made a test patch based on v10 and attached it in mail. Could you 
 please
 help to check it? Thanks!
>>> 

[Item 1]

>>> This looks reasonable at the first glance, albeit I continue to be
>>> unconvinced that this is the only (reasonable) way of solving the
 
 Do you have any other suggestion on this? Thanks!
>>> 
>>> I'm sorry, but no. I'm already spending _much_ more time on this
>>> series than I should be, considering its (afaic) relatively low
>>> priority. I really have to ask you to properly think through both
>>> the data layout and code structure, including considering
>>> alternatives especially in places where you can _anticipate_
>>> controversy.
>> 
>> Jan, can you confirm whether you are happy with the current proposal. You 
>> say "this looks reasonable at the first glance", but then go on to say that 
>> there may be other "reasonable ways" of solving the problem at hand.
>> 
>> This is not very concrete and hard to respond to from a contributors point 
>> of view: it would be good to establish whether a) the v10 proposal in this 
>> area is good enough, b) whether there are any concrete improvements to be 
>> made.

> I understand it's not very concrete, but please understand that with
> the over 100 patches wanting looking at right now it is simply
> impossible for me to give precise suggestions everywhere. I really
> have to be allowed to defer to the originator to come up with
> possible better mechanisms (or defend what there is by addressing
> questions raised),

Jan, I don't object to the principle of deferring issues to a contributor, for 
contributor to defend their viewpoint or to come up with a better mechanism. I 
just observed, that I could not make a lot of sense what you were looking for 
in this particular review. I am assuming that it would be even harder for Yi. 

> especially with - as said - the amount of time spent
> here already having been way higher than is justifiable.

And of course we have passed the 4.9 code freeze, so some of the pressure is 
off. At the same time I understand that because of the upcoming releases you 
need to focus on bug fixes, etc.

> Just go and
> compare v10 with one of the initial versions: Almost all of the data
> layout and code flow have fundamentally changed, mostly based on
> feedback I gave. I'm sorry for saying that, but to me this is an
> indication that things hadn't been thought through well in the design
> phase, i.e. before even submitting a first RFC.

That is good feedback which may contain some valuable lessons. Once we are 
through this (or maybe at the summit) it may be worthwhile to look at what has 
gone wrong and see how we can do better in future.

[Item 2]

>> You say please think through whether "you can _anticipate_ controversy", but 
>> at the same time you can't currently identify/think of any issues. That 
>> seems 
>> to suggest to me that maybe the proposal is good enough. Or that something 
>> is 
>> missing, which has not been articulated. Taking into account language 
>> barriers, more clarity would probably be helpful.
> 
> I've given criteria by which I have the gut feeling (but no more)
> that this isn't the right approach. I'm absolutely fine to be
> convinced that my gut feeling is wrong. That would require to
> simply answer the question I raised multiple times, and which was
> repeatedly "answered" by simply re-stating previously expressed
> facts.

I have not followed the full thread. But it seems that we have communications 
issue there. Normally this happens when expectations don't quite match and one 
party (in this case Yi) does not quite get what the other one is looking for. 
Maybe the best approach would be for Yi to get some of these things resolved 
during a short IRC conversation with you. I did see him and others resolve some 
previous issues more effectively on IRC. 

> If this reaction of mine is not acceptable, all I can do is refrain
> from further looking at this series. And Yi, I certainly apologize
> for perhaps not doing these reviews wholeheartedly, since -
> as also expressed before - I continue to not really view this as
> very important functionality.

As I said earlier, I stepped in, as I didn't really understand what was going 
on. I think this is a little clearer now. 

So to summarise: 

On item 1: it appears that you are looking for a some more justification why 
some of the changes were made, maybe with a rationale for some of the choices 

Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-20 Thread Jan Beulich
>>> On 20.04.17 at 15:02,  wrote:
>> On 20 Apr 2017, at 10:43, Jan Beulich  wrote:
>> 
> On 20.04.17 at 04:14,  wrote:
>>> On 17-04-19 03:00:29, Jan Beulich wrote:
>>> On 19.04.17 at 10:22,  wrote:
> On 17-04-18 05:46:43, Jan Beulich wrote:
> On 18.04.17 at 12:55,  wrote:
>>> I made a test patch based on v10 and attached it in mail. Could you 
>>> please
>>> help to check it? Thanks!
>> 
>> This looks reasonable at the first glance, albeit I continue to be
>> unconvinced that this is the only (reasonable) way of solving the
>>> 
>>> Do you have any other suggestion on this? Thanks!
>> 
>> I'm sorry, but no. I'm already spending _much_ more time on this
>> series than I should be, considering its (afaic) relatively low
>> priority. I really have to ask you to properly think through both
>> the data layout and code structure, including considering
>> alternatives especially in places where you can _anticipate_
>> controversy.
> 
> Jan, can you confirm whether you are happy with the current proposal. You 
> say "this looks reasonable at the first glance", but then go on to say that 
> there may be other "reasonable ways" of solving the problem at hand.
> 
> This is not very concrete and hard to respond to from a contributors point 
> of view: it would be good to establish whether a) the v10 proposal in this 
> area is good enough, b) whether there are any concrete improvements to be 
> made.

I understand it's not very concrete, but please understand that with
the over 100 patches wanting looking at right now it is simply
impossible for me to give precise suggestions everywhere. I really
have to be allowed to defer to the originator to come up with
possible better mechanisms (or defend what there is by addressing
questions raised), especially with - as said - the amount of time spent
here already having been way higher than is justifiable. Just go and
compare v10 with one of the initial versions: Almost all of the data
layout and code flow have fundamentally changed, mostly based on
feedback I gave. I'm sorry for saying that, but to me this is an
indication that things hadn't been thought through well in the design
phase, i.e. before even submitting a first RFC.

> You say please think through whether "you can _anticipate_ controversy", but 
> at the same time you can't currently identify/think of any issues. That seems 
> to suggest to me that maybe the proposal is good enough. Or that something is 
> missing, which has not been articulated. Taking into account language 
> barriers, more clarity would probably be helpful.

I've given criteria by which I have the gut feeling (but no more)
that this isn't the right approach. I'm absolutely fine to be
convinced that my gut feeling is wrong. That would require to
simply answer the question I raised multiple times, and which was
repeatedly "answered" by simply re-stating previously expressed
facts.

If this reaction of mine is not acceptable, all I can do is refrain
from further looking at this series. And Yi, I certainly apologize
for perhaps not doing these reviews wholeheartedly, since -
as also expressed before - I continue to not really view this as
very important functionality. Yet considering for how long some
of the versions hadn't been looked at by anyone at all, the
alternative would have been to simply let it sit further without
looking at it. I actually take this lack of interest by others as an
indication that I'm not the only one considering this nice to have,
but no more.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-20 Thread Lars Kurth
Apologies for stepping in here. But it feels to me that this thread is at risk 
of becoming unproductive.

> On 20 Apr 2017, at 10:43, Jan Beulich  wrote:
> 
 On 20.04.17 at 04:14,  wrote:
>> On 17-04-19 03:00:29, Jan Beulich wrote:
>> On 19.04.17 at 10:22,  wrote:
 On 17-04-18 05:46:43, Jan Beulich wrote:
 On 18.04.17 at 12:55,  wrote:
>> I made a test patch based on v10 and attached it in mail. Could you 
>> please
>> help to check it? Thanks!
> 
> This looks reasonable at the first glance, albeit I continue to be
> unconvinced that this is the only (reasonable) way of solving the
>> 
>> Do you have any other suggestion on this? Thanks!
> 
> I'm sorry, but no. I'm already spending _much_ more time on this
> series than I should be, considering its (afaic) relatively low
> priority. I really have to ask you to properly think through both
> the data layout and code structure, including considering
> alternatives especially in places where you can _anticipate_
> controversy.

Jan, can you confirm whether you are happy with the current proposal. You say 
"this looks reasonable at the first glance", but then go on to say that there 
may be other "reasonable ways" of solving the problem at hand.

This is not very concrete and hard to respond to from a contributors point of 
view: it would be good to establish whether a) the v10 proposal in this area is 
good enough, b) whether there are any concrete improvements to be made.

You say please think through whether "you can _anticipate_ controversy", but at 
the same time you can't currently identify/think of any issues. That seems to 
suggest to me that maybe the proposal is good enough. Or that something is 
missing, which has not been articulated. Taking into account language barriers, 
more clarity would probably be helpful.

>> Per our discussion on v9 patch 05, we decided to adopt option 2. Below are
>> what we discussed. FYR.
> 
> Well, we decided option 2 is better than option 1. I'm still
> unconvinced there's not a yet better alternative.

I suppose that is the same type of argument. Aka we looked at a number of 
options, seem to have agreed one is better than the other. But there is no 
clarity as to whether in this case option 2 is good enough and what concrete 
steps would be needed to get to a better alternative.

Of course I may have missed some of the context here in some of the older 
threads and thus I may have missed some of the context. 

>> If it is not 0 which means the domain's cos id does not need restore
>> to 0, we can directly set it into ASSOC register. That can avoid
>> unnecessary lock. I will send out the test patch again to ask your
>> help to provide review comments (you said there are also 'a number
>> of cosmetics issues').
> 
> And I would hope you would try to eliminate some (if not all) yourself
> first. After all you can easily go over your patch yourself, checking
> for e.g. style violations.

I think this is fair enough.

Best Regards
Lars
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-20 Thread Jan Beulich
>>> On 20.04.17 at 04:14,  wrote:
> On 17-04-19 03:00:29, Jan Beulich wrote:
>> >>> On 19.04.17 at 10:22,  wrote:
>> > On 17-04-18 05:46:43, Jan Beulich wrote:
>> >> >>> On 18.04.17 at 12:55,  wrote:
>> >> > I made a test patch based on v10 and attached it in mail. Could you 
>> >> > please
>> >> > help to check it? Thanks!
>> >> 
>> >> This looks reasonable at the first glance, albeit I continue to be
>> >> unconvinced that this is the only (reasonable) way of solving the
> 
> Do you have any other suggestion on this? Thanks!

I'm sorry, but no. I'm already spending _much_ more time on this
series than I should be, considering its (afaic) relatively low
priority. I really have to ask you to properly think through both
the data layout and code structure, including considering
alternatives especially in places where you can _anticipate_
controversy.

>> >> problem. After all we don't have to go through similar hoops for
>> >> any other of the register state associated with a vCPU. There
>> > 
>> > Sorry, I do not understand your meaning clearly.
>> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this 
>> > patch,
>> >we check 'dom_ids' array to know if the domain's cos id has not been 
>> > set but
>> >its 'psr_cos_ids[socket]' already has a non zero value. This case means 
>> > the
>> >socket offline has happened so that we have to restore the domain's
>> >'psr_cos_ids[socket]' to default value 0 which points to default COS 
>> > MSR.
>> >I think we have discussed this in previous mails and achieved agreement 
>> > on
>> >such logic. 
>> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
>> >socket is online, the registers values may be modified by FW and others.
>> >So, we have to restore them to default value which is being done in
>> >'cat_init_feature'.
>> > 
>> > So, what is your exactly meaning here? Thanks!
>> 
>> Neither of the two. I'm comparing with COS/PSR-_unrelated_
>> handling of register state. After all there are other MSRs which
>> we need to put into the right state after a core comes online.
>> 
> For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
> is offline. The values in it are all 0 when socket is online again. This 
> causes
> error when user shows the CBMs. So, we have two options when socket is 
> online:
> 1. Read COS MSRs values and save them into 'cos_reg_val[]'.
> 2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.

This re-states what you want to do; it does not answer my question.
Along the lines of what you say, for example FS and GS base MSRs
come back as zero too after a socket has been (re-)onlined. We
don't need to go through any hoops there, nevertheless.

> Per our discussion on v9 patch 05, we decided to adopt option 2. Below are
> what we discussed. FYR.

Well, we decided option 2 is better than option 1. I'm still
unconvinced there's not a yet better alternative.

>> >> are a number of cosmetic issues, but commenting on an attached
>> >> (rather than inlined) patch is a little difficult.
>> >> 
>> > Sorry for that, I will directly send patch out next time.
>> > 
>> >> One remark regarding the locking though: Acquiring a lock in the
>> >> context switch path should be made as low risk of long stalls as
>> >> possible. Therefore you will want to consider using r/w locks
>> >> instead of spin locks here, which would allow parallelism on all
>> >> cores of a socket as long as COS IDs aren't being updated.
>> >> 
>> > In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions.
>> > So, I do not understand why read-write lock is better? Anything I don't
>> > consider? Please help to point out. Thanks!
>> 
>> Hmm, looking again I can see that r/w locks indeed may not help
>> here. However, what you say still doesn't look correct to me: You
>> acquire the lock depending on _only_ psra->cos_mask being non-
>> zero. This means that all cores on one socket are being serialized
>> here. Quite possibly all you need is for some of the checks done
>> inside the locked region to be replicated (but _not_ moved) to the
>> outer if(), to limit the number of times where the lock is to be
>> acquired.
>> 
> I think your suggestions is to check old_cos outer the lock region.

My suggestion is to check as much state as possible, to prevent
having to acquire the lock whenever possible.

> If it is not 0 which means the domain's cos id does not need restore
> to 0, we can directly set it into ASSOC register. That can avoid
> unnecessary lock. I will send out the test patch again to ask your
> help to provide review comments (you said there are also 'a number
> of cosmetics issues').

And I would hope you would try to eliminate some (if not all) yourself
first. After all you can easily go over your patch yourself, checking
for e.g. style violations.

Jan


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-19 Thread Yi Sun
On 17-04-19 03:00:29, Jan Beulich wrote:
> >>> On 19.04.17 at 10:22,  wrote:
> > On 17-04-18 05:46:43, Jan Beulich wrote:
> >> >>> On 18.04.17 at 12:55,  wrote:
> >> > I made a test patch based on v10 and attached it in mail. Could you 
> >> > please
> >> > help to check it? Thanks!
> >> 
> >> This looks reasonable at the first glance, albeit I continue to be
> >> unconvinced that this is the only (reasonable) way of solving the

Do you have any other suggestion on this? Thanks!

> >> problem. After all we don't have to go through similar hoops for
> >> any other of the register state associated with a vCPU. There
> > 
> > Sorry, I do not understand your meaning clearly.
> > 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this 
> > patch,
> >we check 'dom_ids' array to know if the domain's cos id has not been set 
> > but
> >its 'psr_cos_ids[socket]' already has a non zero value. This case means 
> > the
> >socket offline has happened so that we have to restore the domain's
> >'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
> >I think we have discussed this in previous mails and achieved agreement 
> > on
> >such logic. 
> > 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
> >socket is online, the registers values may be modified by FW and others.
> >So, we have to restore them to default value which is being done in
> >'cat_init_feature'.
> > 
> > So, what is your exactly meaning here? Thanks!
> 
> Neither of the two. I'm comparing with COS/PSR-_unrelated_
> handling of register state. After all there are other MSRs which
> we need to put into the right state after a core comes online.
> 
For PSR feature, the 'cos_reg_val[]' of the feature is destroied when socket
is offline. The values in it are all 0 when socket is online again. This causes
error when user shows the CBMs. So, we have two options when socket is online:
1. Read COS MSRs values and save them into 'cos_reg_val[]'.
2. Restore COS MSRs values and 'cos_reg_val[]' values to default CBM.

Per our discussion on v9 patch 05, we decided to adopt option 2. Below are
what we discussed. FYR.

[v9 patch 05]
>> >> > +/* cos=0 is reserved as default cbm(all bits within cbm_len are 
>> >> > 1). */
>> >> > +feat->cos_reg_val[0] = cat_default_val(cat.cbm_len);
>> >> > +/*
>> >> > + * To handle cpu offline and then online case, we need read MSRs 
>> >> > back to
>> >> > + * save values into cos_reg_val array.
>> >> > + */
>> >> > +for ( i = 1; i <= cat.cos_max; i++ )
>> >> > +{
>> >> > +rdmsrl(MSR_IA32_PSR_L3_MASK(i), val);
>> >> > +feat->cos_reg_val[i] = (uint32_t)val;
>> >> > +}
>> >> 
[Jan]
>> >> You mention this in the changes done, but I don't understand why 
>> >> you do this. What meaning to these values have to you? If you want 
>> >> hardware and cached values to match up, the much more conventional 
>> >> way of enforcing this would be to write the values you actually 
>> >> want (normally all zero).
>> >>
[Sun Yi] 
>> > When all cpus on a socket are offline, the free_feature() is called 
>> > to free features resources so that the values saved in 
>> > cos_reg_val[] are lost. When the socket is online again, features 
>> > are allocated again so that cos_reg_val[] members are all 
>> > initialized to 0. Only is cos_reg_val[0] initialized to default value in 
>> > this function in old codes.
>> > 
>> > But domain is still alive so that its cos id on the socket is kept. 
>> > The corresponding MSR value is kept too per test. To make 
>> > cos_reg_val[] values be same as HW to not to mislead user, we 
>> > should read back the valid values on HW into cos_reg_val[].
>> 
[Jan]
>> Okay, I understand the background, but I don't view this solution as 
>> viable: Once the last core on a socket goes offline, all references 
>> to it should be cleaned up. After all what will be brought back 
>> online may be a different physical CPU altogether; you can't assume 
>> MSR values to have survived even if it is the same CPU which comes 
>> back online, as it may have undergone a reset cycle, or BIOS/SMM may 
>> have played with the MSRs.
>> That's even a possibility for a single core coming back online, so 
>> you have to reload MSRs explicitly anyway if implicit reloading (i.e. 
>> once vCPU-s get scheduled onto it) doesn't suffice.
>> 
[Sun Yi]
> So, you think the MSRs values may not be valid after such process and 
> reloading (write MSRs to default value) is needed. If so, I would like 
> to do more operations in 'free_feature()':
> 1. Iterate all domains working on the offline socket to change
>'d->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
>status.
> 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
> 
> These can make the socket's info be totally restored back to init status.
[Jan]
Yes, that's what I think is 

Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-19 Thread Jan Beulich
>>> On 19.04.17 at 10:22,  wrote:
> On 17-04-18 05:46:43, Jan Beulich wrote:
>> >>> On 18.04.17 at 12:55,  wrote:
>> > I made a test patch based on v10 and attached it in mail. Could you please
>> > help to check it? Thanks!
>> 
>> This looks reasonable at the first glance, albeit I continue to be
>> unconvinced that this is the only (reasonable) way of solving the
>> problem. After all we don't have to go through similar hoops for
>> any other of the register state associated with a vCPU. There
> 
> Sorry, I do not understand your meaning clearly.
> 1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch,
>we check 'dom_ids' array to know if the domain's cos id has not been set 
> but
>its 'psr_cos_ids[socket]' already has a non zero value. This case means the
>socket offline has happened so that we have to restore the domain's
>'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
>I think we have discussed this in previous mails and achieved agreement on
>such logic. 
> 2. DYM the COS MSRs? We have discussed this before. Per your comments, when
>socket is online, the registers values may be modified by FW and others.
>So, we have to restore them to default value which is being done in
>'cat_init_feature'.
> 
> So, what is your exactly meaning here? Thanks!

Neither of the two. I'm comparing with COS/PSR-_unrelated_
handling of register state. After all there are other MSRs which
we need to put into the right state after a core comes online.

>> are a number of cosmetic issues, but commenting on an attached
>> (rather than inlined) patch is a little difficult.
>> 
> Sorry for that, I will directly send patch out next time.
> 
>> One remark regarding the locking though: Acquiring a lock in the
>> context switch path should be made as low risk of long stalls as
>> possible. Therefore you will want to consider using r/w locks
>> instead of spin locks here, which would allow parallelism on all
>> cores of a socket as long as COS IDs aren't being updated.
>> 
> In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions.
> So, I do not understand why read-write lock is better? Anything I don't
> consider? Please help to point out. Thanks!

Hmm, looking again I can see that r/w locks indeed may not help
here. However, what you say still doesn't look correct to me: You
acquire the lock depending on _only_ psra->cos_mask being non-
zero. This means that all cores on one socket are being serialized
here. Quite possibly all you need is for some of the checks done
inside the locked region to be replicated (but _not_ moved) to the
outer if(), to limit the number of times where the lock is to be
acquired.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-19 Thread Yi Sun
On 17-04-18 05:46:43, Jan Beulich wrote:
> >>> On 18.04.17 at 12:55,  wrote:
> > I made a test patch based on v10 and attached it in mail. Could you please
> > help to check it? Thanks!
> 
> This looks reasonable at the first glance, albeit I continue to be
> unconvinced that this is the only (reasonable) way of solving the
> problem. After all we don't have to go through similar hoops for
> any other of the register state associated with a vCPU. There

Sorry, I do not understand your meaning clearly.
1. DYM the ASSOC register which is set in 'psr_ctxt_switch_to'? In this patch,
   we check 'dom_ids' array to know if the domain's cos id has not been set but
   its 'psr_cos_ids[socket]' already has a non zero value. This case means the
   socket offline has happened so that we have to restore the domain's
   'psr_cos_ids[socket]' to default value 0 which points to default COS MSR.
   I think we have discussed this in previous mails and achieved agreement on
   such logic. 
2. DYM the COS MSRs? We have discussed this before. Per your comments, when
   socket is online, the registers values may be modified by FW and others.
   So, we have to restore them to default value which is being done in
   'cat_init_feature'.

So, what is your exactly meaning here? Thanks!

> are a number of cosmetic issues, but commenting on an attached
> (rather than inlined) patch is a little difficult.
> 
Sorry for that, I will directly send patch out next time.

> One remark regarding the locking though: Acquiring a lock in the
> context switch path should be made as low risk of long stalls as
> possible. Therefore you will want to consider using r/w locks
> instead of spin locks here, which would allow parallelism on all
> cores of a socket as long as COS IDs aren't being updated.
> 
In 'psr_ctxt_switch_to', I use the lock only to protect 'write' actions.
So, I do not understand why read-write lock is better? Anything I don't
consider? Please help to point out. Thanks!

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-18 Thread Jan Beulich
>>> On 18.04.17 at 12:55,  wrote:
> I made a test patch based on v10 and attached it in mail. Could you please
> help to check it? Thanks!

This looks reasonable at the first glance, albeit I continue to be
unconvinced that this is the only (reasonable) way of solving the
problem. After all we don't have to go through similar hoops for
any other of the register state associated with a vCPU. There
are a number of cosmetic issues, but commenting on an attached
(rather than inlined) patch is a little difficult.

One remark regarding the locking though: Acquiring a lock in the
context switch path should be made as low risk of long stalls as
possible. Therefore you will want to consider using r/w locks
instead of spin locks here, which would allow parallelism on all
cores of a socket as long as COS IDs aren't being updated.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-18 Thread Yi Sun
On 17-04-13 05:50:01, Jan Beulich wrote:
> >>> On 13.04.17 at 13:44,  wrote:
> > On 17-04-13 05:31:41, Jan Beulich wrote:
> >> >>> On 13.04.17 at 13:11,  wrote:
> >> > On 17-04-13 04:58:06, Jan Beulich wrote:
> >> >> >>> On 13.04.17 at 12:49,  wrote:
> >> >> > How about a per socket array like this:
> >> >> > uint32_t domain_switch[1024];
> >> >> > 
> >> >> > Every bit represents a domain id. Then, we can handle this case as 
> >> >> > below:
> >> >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place 
> >> >> > is enough to
> >> >> >cover socket offline case. We do not need to clear it in 
> >> >> > 'free_socket_resources'.
> >> >> > 
> >> >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, 
> >> >> > domain_switch) to set
> >> >> >the bit to 1 according to domain_id. If the old value is 0 and the 
> >> >> >'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to 
> >> >> > be 0.
> >> >> > 
> >> >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to 
> >> >> > set the bit
> >> >> >to 1 too. Then, update 'psr_cos_ids[socket]' according to 
> >> >> > find/pick flow.
> >> >> > 
> >> >> > Then, we only use 4KB for one socket.
> >> >> 
> >> >> This looks to come closer to something I'd consider acceptable, but
> >> >> I may not understand your intentions in full yet: For one, there's
> >> >> nowhere you clear the bit (other than presumably during socket
> >> >> cleanup). 
> >> > 
> >> > Actually, clear the array in 'free_socket_resources' has same effect. I 
> >> > can
> >> > move clear action into it.
> >> 
> >> That wasn't my point - I was asking about clearing individual bits.
> >> Point being that if you only ever set bits in the map, you'll likely
> >> end up iterating through all active domains anyway.
> >> 
> > If entering 'free_socket_resources', that means no more actions to
> > the array on this socket except clearing it. Can I just memset this array
> > of the socekt to 0?
> 
> You can, afaict, but unless first you act on the set bits I can't see why
> you would want to track the bits in the first place. Or maybe I'm still
> not understanding your intention, in which case I guess the best you
> can do is simply implement your plan, and we'll discuss it in v11 review.
> 
I made a test patch based on v10 and attached it in mail. Could you please
help to check it? Thanks!

> Jan
diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index a85ea99..ef8d3e9 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -125,6 +125,8 @@ struct feat_node {
 uint32_t cos_reg_val[MAX_COS_REG_CNT];
 };
 
+#define PSR_DOM_IDS_NUM ((DOMID_IDLE + 1) / sizeof(uint32_t))
+
 /*
  * PSR features are managed per socket. Below structure defines the members
  * used to manage these features.
@@ -134,9 +136,11 @@ struct feat_node {
  * COS ID. Every entry of cos_ref corresponds to one COS ID.
  */
 struct psr_socket_info {
-struct feat_node *features[PSR_SOCKET_MAX_FEAT];
 spinlock_t ref_lock;
+spinlock_t dom_ids_lock;
+struct feat_node *features[PSR_SOCKET_MAX_FEAT];
 unsigned int cos_ref[MAX_COS_REG_CNT];
+uint32_t dom_ids[PSR_DOM_IDS_NUM];
 };
 
 struct psr_assoc {
@@ -194,26 +198,11 @@ static void free_socket_resources(unsigned int socket)
 {
 unsigned int i;
 struct psr_socket_info *info = socket_info + socket;
-struct domain *d;
+unsigned long flag;
 
 if ( !info )
 return;
 
-/* Restore domain cos id to 0 when socket is offline. */
-for_each_domain ( d )
-{
-unsigned int cos = d->arch.psr_cos_ids[socket];
-if ( cos == 0 )
-continue;
-
-spin_lock(>ref_lock);
-ASSERT(!cos || info->cos_ref[cos]);
-info->cos_ref[cos]--;
-spin_unlock(>ref_lock);
-
-d->arch.psr_cos_ids[socket] = 0;
-}
-
 /*
  * Free resources of features. The global feature object, e.g. feat_l3_cat,
  * may not be freed here if it is not added into array. It is simply being
@@ -221,12 +210,17 @@ static void free_socket_resources(unsigned int socket)
  */
 for ( i = 0; i < PSR_SOCKET_MAX_FEAT; i++ )
 {
-if ( !info->features[i] )
-continue;
-
 xfree(info->features[i]);
 info->features[i] = NULL;
 }
+
+spin_lock(>ref_lock);
+memset(info->cos_ref, 0, MAX_COS_REG_CNT * sizeof(unsigned int));
+spin_unlock(>ref_lock);
+
+spin_lock_irqsave(>dom_ids_lock, flag);
+memset(info->dom_ids, 0, PSR_DOM_IDS_NUM * sizeof(uint32_t));
+spin_unlock_irqrestore(>dom_ids_lock, flag);
 }
 
 static bool feat_init_done(const struct psr_socket_info *info)
@@ -682,9 +676,34 @@ void psr_ctxt_switch_to(struct domain *d)
 psr_assoc_rmid(, d->arch.psr_rmid);
 
 if ( psra->cos_mask )
-psr_assoc_cos(, d->arch.psr_cos_ids ?
-  

Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Jan Beulich
>>> On 13.04.17 at 13:44,  wrote:
> On 17-04-13 05:31:41, Jan Beulich wrote:
>> >>> On 13.04.17 at 13:11,  wrote:
>> > On 17-04-13 04:58:06, Jan Beulich wrote:
>> >> >>> On 13.04.17 at 12:49,  wrote:
>> >> > How about a per socket array like this:
>> >> > uint32_t domain_switch[1024];
>> >> > 
>> >> > Every bit represents a domain id. Then, we can handle this case as 
>> >> > below:
>> >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is 
>> >> > enough to
>> >> >cover socket offline case. We do not need to clear it in 
>> >> > 'free_socket_resources'.
>> >> > 
>> >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, 
>> >> > domain_switch) to set
>> >> >the bit to 1 according to domain_id. If the old value is 0 and the 
>> >> >'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 
>> >> > 0.
>> >> > 
>> >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to 
>> >> > set the bit
>> >> >to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick 
>> >> > flow.
>> >> > 
>> >> > Then, we only use 4KB for one socket.
>> >> 
>> >> This looks to come closer to something I'd consider acceptable, but
>> >> I may not understand your intentions in full yet: For one, there's
>> >> nowhere you clear the bit (other than presumably during socket
>> >> cleanup). 
>> > 
>> > Actually, clear the array in 'free_socket_resources' has same effect. I can
>> > move clear action into it.
>> 
>> That wasn't my point - I was asking about clearing individual bits.
>> Point being that if you only ever set bits in the map, you'll likely
>> end up iterating through all active domains anyway.
>> 
> If entering 'free_socket_resources', that means no more actions to
> the array on this socket except clearing it. Can I just memset this array
> of the socekt to 0?

You can, afaict, but unless first you act on the set bits I can't see why
you would want to track the bits in the first place. Or maybe I'm still
not understanding your intention, in which case I guess the best you
can do is simply implement your plan, and we'll discuss it in v11 review.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Yi Sun
On 17-04-13 05:31:41, Jan Beulich wrote:
> >>> On 13.04.17 at 13:11,  wrote:
> > On 17-04-13 04:58:06, Jan Beulich wrote:
> >> >>> On 13.04.17 at 12:49,  wrote:
> >> > How about a per socket array like this:
> >> > uint32_t domain_switch[1024];
> >> > 
> >> > Every bit represents a domain id. Then, we can handle this case as below:
> >> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is 
> >> > enough to
> >> >cover socket offline case. We do not need to clear it in 
> >> > 'free_socket_resources'.
> >> > 
> >> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) 
> >> > to set
> >> >the bit to 1 according to domain_id. If the old value is 0 and the 
> >> >'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
> >> > 
> >> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set 
> >> > the bit
> >> >to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick 
> >> > flow.
> >> > 
> >> > Then, we only use 4KB for one socket.
> >> 
> >> This looks to come closer to something I'd consider acceptable, but
> >> I may not understand your intentions in full yet: For one, there's
> >> nowhere you clear the bit (other than presumably during socket
> >> cleanup). 
> > 
> > Actually, clear the array in 'free_socket_resources' has same effect. I can
> > move clear action into it.
> 
> That wasn't my point - I was asking about clearing individual bits.
> Point being that if you only ever set bits in the map, you'll likely
> end up iterating through all active domains anyway.
> 
If entering 'free_socket_resources', that means no more actions to
the array on this socket except clearing it. Can I just memset this array
of the socekt to 0?

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Jan Beulich
>>> On 13.04.17 at 13:11,  wrote:
> On 17-04-13 04:58:06, Jan Beulich wrote:
>> >>> On 13.04.17 at 12:49,  wrote:
>> > How about a per socket array like this:
>> > uint32_t domain_switch[1024];
>> > 
>> > Every bit represents a domain id. Then, we can handle this case as below:
>> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is 
>> > enough to
>> >cover socket offline case. We do not need to clear it in 
>> > 'free_socket_resources'.
>> > 
>> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) 
>> > to set
>> >the bit to 1 according to domain_id. If the old value is 0 and the 
>> >'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
>> > 
>> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set 
>> > the bit
>> >to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick 
>> > flow.
>> > 
>> > Then, we only use 4KB for one socket.
>> 
>> This looks to come closer to something I'd consider acceptable, but
>> I may not understand your intentions in full yet: For one, there's
>> nowhere you clear the bit (other than presumably during socket
>> cleanup). 
> 
> Actually, clear the array in 'free_socket_resources' has same effect. I can
> move clear action into it.

That wasn't my point - I was asking about clearing individual bits.
Point being that if you only ever set bits in the map, you'll likely
end up iterating through all active domains anyway.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Yi Sun
On 17-04-13 19:11:54, Yi Sun wrote:
> On 17-04-13 04:58:06, Jan Beulich wrote:
> > >>> On 13.04.17 at 12:49,  wrote:
> > > On 17-04-13 03:41:44, Jan Beulich wrote:
> > >> >>> On 13.04.17 at 10:11,  wrote:
> > >> > On 17-04-12 06:42:01, Jan Beulich wrote:
> > >> >> >>> On 12.04.17 at 14:23,  wrote:
> > >> >> > On 17-04-12 03:09:56, Jan Beulich wrote:
> > >> >> >> >>> On 12.04.17 at 07:53,  wrote:
> > >> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
> > >> >> >> >> >>> On 01.04.17 at 15:53,  wrote:
> > >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do 
> > >> >> >> >> in the
> > >> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> > >> >> >> >> have a few thousand VMs, the loop above may take a while.
> > >> >> >> >> 
> > >> >> >> > Hmm, that may be a potential issue. I have two proposals below. 
> > >> >> >> > Could you
> > >> >> >> > please help to check which one you prefer? Or provide another 
> > >> >> >> > solution?
> > >> >> >> > 
> > >> >> >> > 1. Start a tasklet in free_socket_resources() to restore 
> > >> >> > 'psr_cos_ids[socket]'
> > >> >> >> >of all domains. The action is protected by 'ref_lock' to 
> > >> >> >> > avoid 
> > >> >> > confliction
> > >> >> >> >in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in 
> > >> >> >> > tasklet or 
> > > memset
> > >> >> >> >the array to 0 in free_socket_resources().
> > >> >> >> > 
> > >> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and 
> > >> >> >> > change 
> > > index
> > >> >> >> >from 'socket' to 'domain_id'. So we keep all domains' COS IDs 
> > >> >> >> > per 
> > > socket
> > >> >> >> >and can memset the array to 0 when socket is offline. But 
> > >> >> >> > here is an 
> > >> >> > issue
> > >> >> >> >that we do not know how many members this array should have. 
> > >> >> >> > I cannot 
> > >> >> > find
> > >> >> >> >a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to 
> > >> >> >> > use 
> > >> >> > reallocation
> > >> >> >> >in 'psr_alloc_cos' if the newly created domain's id is bigger 
> > >> >> >> > than 
> > >> >> > current
> > >> >> >> >array number.
> > >> >> >> 
> > >> >> >> The number of domains is limited by the special DOMID_* values.
> > >> >> >> However, allocating an array with 32k entries doesn't sound very
> > >> >> >> reasonable.
> > >> >> > 
> > >> >> > I think 32K entries should be the extreme case. I can allocate e.g. 
> > >> >> > 100 
> > > entries
> > >> >> > when the first domain is created. If a new domain's id exceeds 100, 
> > > reallocate
> > >> >> > another 100 entries. The total number of entries allocated should 
> > >> >> > be less 
> > > than
> > >> >> > 32K. This is a functional requirement which cannot be avoided. How 
> > >> >> > do you 
> > >> >> > think?
> > >> >> 
> > >> >> So how many entries would your array have once I start the 32,000th
> > >> >> domain (having at any one time at most a single one running, besides
> > >> >> Dom0)?
> > >> >> 
> > >> > In such case, we have to keep a 32K array because the domain_id is the 
> > > index to
> > >> > access the array. But this array is per socket so the whole memory 
> > >> > used 
> > > should
> > >> > not be too much.
> > >> 
> > >> We carefully avoid any runtime allocations of order > 0, so if you
> > >> were to set up such an array, you'd need to use vmalloc()/vzalloc().
> > >> But I continue to be unconvinced that we want such a large array
> > >> in the first place.
> > >> 
> > >> > After considering this issue more, I think the original codes might 
> > >> > not be
> > >> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at 
> > >> > most
> > >> > 288 CPUs. So, I think the domains running at same time in reality may 
> > >> > not 
> > > be
> > >> > so many (no efficient resources). If this hypothesis is right, a loop 
> > >> > to 
> > > write
> > >> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If 
> > >> > I am
> > >> > wrong, please correct me. Thanks!
> > >> 
> > >> What relationship does the number of CPUs have to the number of
> > >> domains on a host? There could be thousands with just a few dozen
> > >> CPUs, provided none or very few of them have high demands on
> > >> CPU resources. Additionally please never forget that system sizes
> > >> basically only ever grow. Plus we wouldn't want a latent issue here
> > >> in case we ever end up needing to widen domain IDs beyond 16 bits.
> > >> 
> > > How about a per socket array like this:
> > > uint32_t domain_switch[1024];
> > > 
> > > Every bit represents a domain id. Then, we can handle this case as below:
> > > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is 
> > > enough to
> > >cover socket offline case. We do not need to clear it in 
> > > 'free_socket_resources'.
> > > 
> > > 2. In 

Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Yi Sun
On 17-04-13 04:58:06, Jan Beulich wrote:
> >>> On 13.04.17 at 12:49,  wrote:
> > On 17-04-13 03:41:44, Jan Beulich wrote:
> >> >>> On 13.04.17 at 10:11,  wrote:
> >> > On 17-04-12 06:42:01, Jan Beulich wrote:
> >> >> >>> On 12.04.17 at 14:23,  wrote:
> >> >> > On 17-04-12 03:09:56, Jan Beulich wrote:
> >> >> >> >>> On 12.04.17 at 07:53,  wrote:
> >> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
> >> >> >> >> >>> On 01.04.17 at 15:53,  wrote:
> >> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in 
> >> >> >> >> the
> >> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> >> >> >> >> have a few thousand VMs, the loop above may take a while.
> >> >> >> >> 
> >> >> >> > Hmm, that may be a potential issue. I have two proposals below. 
> >> >> >> > Could you
> >> >> >> > please help to check which one you prefer? Or provide another 
> >> >> >> > solution?
> >> >> >> > 
> >> >> >> > 1. Start a tasklet in free_socket_resources() to restore 
> >> >> > 'psr_cos_ids[socket]'
> >> >> >> >of all domains. The action is protected by 'ref_lock' to avoid 
> >> >> > confliction
> >> >> >> >in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet 
> >> >> >> > or 
> > memset
> >> >> >> >the array to 0 in free_socket_resources().
> >> >> >> > 
> >> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and 
> >> >> >> > change 
> > index
> >> >> >> >from 'socket' to 'domain_id'. So we keep all domains' COS IDs 
> >> >> >> > per 
> > socket
> >> >> >> >and can memset the array to 0 when socket is offline. But here 
> >> >> >> > is an 
> >> >> > issue
> >> >> >> >that we do not know how many members this array should have. I 
> >> >> >> > cannot 
> >> >> > find
> >> >> >> >a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> >> >> > reallocation
> >> >> >> >in 'psr_alloc_cos' if the newly created domain's id is bigger 
> >> >> >> > than 
> >> >> > current
> >> >> >> >array number.
> >> >> >> 
> >> >> >> The number of domains is limited by the special DOMID_* values.
> >> >> >> However, allocating an array with 32k entries doesn't sound very
> >> >> >> reasonable.
> >> >> > 
> >> >> > I think 32K entries should be the extreme case. I can allocate e.g. 
> >> >> > 100 
> > entries
> >> >> > when the first domain is created. If a new domain's id exceeds 100, 
> > reallocate
> >> >> > another 100 entries. The total number of entries allocated should be 
> >> >> > less 
> > than
> >> >> > 32K. This is a functional requirement which cannot be avoided. How do 
> >> >> > you 
> >> >> > think?
> >> >> 
> >> >> So how many entries would your array have once I start the 32,000th
> >> >> domain (having at any one time at most a single one running, besides
> >> >> Dom0)?
> >> >> 
> >> > In such case, we have to keep a 32K array because the domain_id is the 
> > index to
> >> > access the array. But this array is per socket so the whole memory used 
> > should
> >> > not be too much.
> >> 
> >> We carefully avoid any runtime allocations of order > 0, so if you
> >> were to set up such an array, you'd need to use vmalloc()/vzalloc().
> >> But I continue to be unconvinced that we want such a large array
> >> in the first place.
> >> 
> >> > After considering this issue more, I think the original codes might not 
> >> > be
> >> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at 
> >> > most
> >> > 288 CPUs. So, I think the domains running at same time in reality may 
> >> > not 
> > be
> >> > so many (no efficient resources). If this hypothesis is right, a loop to 
> > write
> >> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I 
> >> > am
> >> > wrong, please correct me. Thanks!
> >> 
> >> What relationship does the number of CPUs have to the number of
> >> domains on a host? There could be thousands with just a few dozen
> >> CPUs, provided none or very few of them have high demands on
> >> CPU resources. Additionally please never forget that system sizes
> >> basically only ever grow. Plus we wouldn't want a latent issue here
> >> in case we ever end up needing to widen domain IDs beyond 16 bits.
> >> 
> > How about a per socket array like this:
> > uint32_t domain_switch[1024];
> > 
> > Every bit represents a domain id. Then, we can handle this case as below:
> > 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is 
> > enough to
> >cover socket offline case. We do not need to clear it in 
> > 'free_socket_resources'.
> > 
> > 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to 
> > set
> >the bit to 1 according to domain_id. If the old value is 0 and the 
> >'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
> > 
> > 3. In 'psr_set_val()', test_and_set_bit(domain_id, 

Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Jan Beulich
>>> On 13.04.17 at 12:49,  wrote:
> On 17-04-13 03:41:44, Jan Beulich wrote:
>> >>> On 13.04.17 at 10:11,  wrote:
>> > On 17-04-12 06:42:01, Jan Beulich wrote:
>> >> >>> On 12.04.17 at 14:23,  wrote:
>> >> > On 17-04-12 03:09:56, Jan Beulich wrote:
>> >> >> >>> On 12.04.17 at 07:53,  wrote:
>> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
>> >> >> >> >>> On 01.04.17 at 15:53,  wrote:
>> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in 
>> >> >> >> the
>> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
>> >> >> >> have a few thousand VMs, the loop above may take a while.
>> >> >> >> 
>> >> >> > Hmm, that may be a potential issue. I have two proposals below. 
>> >> >> > Could you
>> >> >> > please help to check which one you prefer? Or provide another 
>> >> >> > solution?
>> >> >> > 
>> >> >> > 1. Start a tasklet in free_socket_resources() to restore 
>> >> > 'psr_cos_ids[socket]'
>> >> >> >of all domains. The action is protected by 'ref_lock' to avoid 
>> >> > confliction
>> >> >> >in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet 
>> >> >> > or 
> memset
>> >> >> >the array to 0 in free_socket_resources().
>> >> >> > 
>> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and 
>> >> >> > change 
> index
>> >> >> >from 'socket' to 'domain_id'. So we keep all domains' COS IDs per 
> socket
>> >> >> >and can memset the array to 0 when socket is offline. But here is 
>> >> >> > an 
>> >> > issue
>> >> >> >that we do not know how many members this array should have. I 
>> >> >> > cannot 
>> >> > find
>> >> >> >a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
>> >> > reallocation
>> >> >> >in 'psr_alloc_cos' if the newly created domain's id is bigger 
>> >> >> > than 
>> >> > current
>> >> >> >array number.
>> >> >> 
>> >> >> The number of domains is limited by the special DOMID_* values.
>> >> >> However, allocating an array with 32k entries doesn't sound very
>> >> >> reasonable.
>> >> > 
>> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 
> entries
>> >> > when the first domain is created. If a new domain's id exceeds 100, 
> reallocate
>> >> > another 100 entries. The total number of entries allocated should be 
>> >> > less 
> than
>> >> > 32K. This is a functional requirement which cannot be avoided. How do 
>> >> > you 
>> >> > think?
>> >> 
>> >> So how many entries would your array have once I start the 32,000th
>> >> domain (having at any one time at most a single one running, besides
>> >> Dom0)?
>> >> 
>> > In such case, we have to keep a 32K array because the domain_id is the 
> index to
>> > access the array. But this array is per socket so the whole memory used 
> should
>> > not be too much.
>> 
>> We carefully avoid any runtime allocations of order > 0, so if you
>> were to set up such an array, you'd need to use vmalloc()/vzalloc().
>> But I continue to be unconvinced that we want such a large array
>> in the first place.
>> 
>> > After considering this issue more, I think the original codes might not be
>> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
>> > 288 CPUs. So, I think the domains running at same time in reality may not 
> be
>> > so many (no efficient resources). If this hypothesis is right, a loop to 
> write
>> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
>> > wrong, please correct me. Thanks!
>> 
>> What relationship does the number of CPUs have to the number of
>> domains on a host? There could be thousands with just a few dozen
>> CPUs, provided none or very few of them have high demands on
>> CPU resources. Additionally please never forget that system sizes
>> basically only ever grow. Plus we wouldn't want a latent issue here
>> in case we ever end up needing to widen domain IDs beyond 16 bits.
>> 
> How about a per socket array like this:
> uint32_t domain_switch[1024];
> 
> Every bit represents a domain id. Then, we can handle this case as below:
> 1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough 
> to
>cover socket offline case. We do not need to clear it in 
> 'free_socket_resources'.
> 
> 2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to 
> set
>the bit to 1 according to domain_id. If the old value is 0 and the 
>'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.
> 
> 3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the 
> bit
>to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.
> 
> Then, we only use 4KB for one socket.

This looks to come closer to something I'd consider acceptable, but
I may not understand your intentions in full yet: For one, there's
nowhere you clear the bit (other 

Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Yi Sun
On 17-04-13 03:41:44, Jan Beulich wrote:
> >>> On 13.04.17 at 10:11,  wrote:
> > On 17-04-12 06:42:01, Jan Beulich wrote:
> >> >>> On 12.04.17 at 14:23,  wrote:
> >> > On 17-04-12 03:09:56, Jan Beulich wrote:
> >> >> >>> On 12.04.17 at 07:53,  wrote:
> >> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
> >> >> >> >>> On 01.04.17 at 15:53,  wrote:
> >> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
> >> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> >> >> >> have a few thousand VMs, the loop above may take a while.
> >> >> >> 
> >> >> > Hmm, that may be a potential issue. I have two proposals below. Could 
> >> >> > you
> >> >> > please help to check which one you prefer? Or provide another 
> >> >> > solution?
> >> >> > 
> >> >> > 1. Start a tasklet in free_socket_resources() to restore 
> >> > 'psr_cos_ids[socket]'
> >> >> >of all domains. The action is protected by 'ref_lock' to avoid 
> >> > confliction
> >> >> >in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or 
> >> >> > memset
> >> >> >the array to 0 in free_socket_resources().
> >> >> > 
> >> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change 
> >> >> > index
> >> >> >from 'socket' to 'domain_id'. So we keep all domains' COS IDs per 
> >> >> > socket
> >> >> >and can memset the array to 0 when socket is offline. But here is 
> >> >> > an 
> >> > issue
> >> >> >that we do not know how many members this array should have. I 
> >> >> > cannot 
> >> > find
> >> >> >a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> >> > reallocation
> >> >> >in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> >> > current
> >> >> >array number.
> >> >> 
> >> >> The number of domains is limited by the special DOMID_* values.
> >> >> However, allocating an array with 32k entries doesn't sound very
> >> >> reasonable.
> >> > 
> >> > I think 32K entries should be the extreme case. I can allocate e.g. 100 
> >> > entries
> >> > when the first domain is created. If a new domain's id exceeds 100, 
> >> > reallocate
> >> > another 100 entries. The total number of entries allocated should be 
> >> > less than
> >> > 32K. This is a functional requirement which cannot be avoided. How do 
> >> > you 
> >> > think?
> >> 
> >> So how many entries would your array have once I start the 32,000th
> >> domain (having at any one time at most a single one running, besides
> >> Dom0)?
> >> 
> > In such case, we have to keep a 32K array because the domain_id is the 
> > index to
> > access the array. But this array is per socket so the whole memory used 
> > should
> > not be too much.
> 
> We carefully avoid any runtime allocations of order > 0, so if you
> were to set up such an array, you'd need to use vmalloc()/vzalloc().
> But I continue to be unconvinced that we want such a large array
> in the first place.
> 
> > After considering this issue more, I think the original codes might not be
> > so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
> > 288 CPUs. So, I think the domains running at same time in reality may not be
> > so many (no efficient resources). If this hypothesis is right, a loop to 
> > write
> > 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
> > wrong, please correct me. Thanks!
> 
> What relationship does the number of CPUs have to the number of
> domains on a host? There could be thousands with just a few dozen
> CPUs, provided none or very few of them have high demands on
> CPU resources. Additionally please never forget that system sizes
> basically only ever grow. Plus we wouldn't want a latent issue here
> in case we ever end up needing to widen domain IDs beyond 16 bits.
> 
How about a per socket array like this:
uint32_t domain_switch[1024];

Every bit represents a domain id. Then, we can handle this case as below:
1. In 'psr_cpu_init()', clear the array to be 0. I think this place is enough to
   cover socket offline case. We do not need to clear it in 
'free_socket_resources'.

2. In 'psr_ctxt_switch_to()', test_and_set_bit(domain_id, domain_switch) to set
   the bit to 1 according to domain_id. If the old value is 0 and the 
   'psr_cos_ids[socket]' is not 0, restore 'psr_cos_ids[socket]' to be 0.

3. In 'psr_set_val()', test_and_set_bit(domain_id, domain_switch) to set the bit
   to 1 too. Then, update 'psr_cos_ids[socket]' according to find/pick flow.

Then, we only use 4KB for one socket.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Jan Beulich
>>> On 13.04.17 at 10:11,  wrote:
> On 17-04-12 06:42:01, Jan Beulich wrote:
>> >>> On 12.04.17 at 14:23,  wrote:
>> > On 17-04-12 03:09:56, Jan Beulich wrote:
>> >> >>> On 12.04.17 at 07:53,  wrote:
>> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
>> >> >> >>> On 01.04.17 at 15:53,  wrote:
>> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
>> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
>> >> >> have a few thousand VMs, the loop above may take a while.
>> >> >> 
>> >> > Hmm, that may be a potential issue. I have two proposals below. Could 
>> >> > you
>> >> > please help to check which one you prefer? Or provide another solution?
>> >> > 
>> >> > 1. Start a tasklet in free_socket_resources() to restore 
>> > 'psr_cos_ids[socket]'
>> >> >of all domains. The action is protected by 'ref_lock' to avoid 
>> > confliction
>> >> >in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or 
>> >> > memset
>> >> >the array to 0 in free_socket_resources().
>> >> > 
>> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change 
>> >> > index
>> >> >from 'socket' to 'domain_id'. So we keep all domains' COS IDs per 
>> >> > socket
>> >> >and can memset the array to 0 when socket is offline. But here is an 
>> > issue
>> >> >that we do not know how many members this array should have. I 
>> >> > cannot 
>> > find
>> >> >a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
>> > reallocation
>> >> >in 'psr_alloc_cos' if the newly created domain's id is bigger than 
>> > current
>> >> >array number.
>> >> 
>> >> The number of domains is limited by the special DOMID_* values.
>> >> However, allocating an array with 32k entries doesn't sound very
>> >> reasonable.
>> > 
>> > I think 32K entries should be the extreme case. I can allocate e.g. 100 
>> > entries
>> > when the first domain is created. If a new domain's id exceeds 100, 
>> > reallocate
>> > another 100 entries. The total number of entries allocated should be less 
>> > than
>> > 32K. This is a functional requirement which cannot be avoided. How do you 
>> > think?
>> 
>> So how many entries would your array have once I start the 32,000th
>> domain (having at any one time at most a single one running, besides
>> Dom0)?
>> 
> In such case, we have to keep a 32K array because the domain_id is the index 
> to
> access the array. But this array is per socket so the whole memory used should
> not be too much.

We carefully avoid any runtime allocations of order > 0, so if you
were to set up such an array, you'd need to use vmalloc()/vzalloc().
But I continue to be unconvinced that we want such a large array
in the first place.

> After considering this issue more, I think the original codes might not be
> so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
> 288 CPUs. So, I think the domains running at same time in reality may not be
> so many (no efficient resources). If this hypothesis is right, a loop to write
> 'psr_cos_ids[socket]' of every domain to 0 may not take much time. If I am
> wrong, please correct me. Thanks!

What relationship does the number of CPUs have to the number of
domains on a host? There could be thousands with just a few dozen
CPUs, provided none or very few of them have high demands on
CPU resources. Additionally please never forget that system sizes
basically only ever grow. Plus we wouldn't want a latent issue here
in case we ever end up needing to widen domain IDs beyond 16 bits.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-13 Thread Yi Sun
On 17-04-12 06:42:01, Jan Beulich wrote:
> >>> On 12.04.17 at 14:23,  wrote:
> > On 17-04-12 03:09:56, Jan Beulich wrote:
> >> >>> On 12.04.17 at 07:53,  wrote:
> >> > On 17-04-11 09:01:53, Jan Beulich wrote:
> >> >> >>> On 01.04.17 at 15:53,  wrote:
> >> >> > +info->cos_ref[cos]--;
> >> >> > +spin_unlock(>ref_lock);
> >> >> > +
> >> >> > +d->arch.psr_cos_ids[socket] = 0;
> >> >> > +}
> >> >> 
> >> >> Overall, while you say in the revision log that this was a suggestion of
> >> >> mine, I don't recall any such (and I've just checked the v9 thread of
> >> >> this patch without finding anything), and hence it's not really clear to
> >> >> me why this is needed. After all you should be freeing info anyway
> >> > 
> >> > We discussed this in v9 05 patch.
> >> 
> >> Ah, that's why I didn't find it.
> >> 
> >> > I paste it below for your convenience to
> >> > check.
> >> > [Sun Yi]:
> >> >   > So, you think the MSRs values may not be valid after such process 
> >> > and 
> >> >   > reloading (write MSRs to default value) is needed. If so, I would 
> >> > like 
> >> >   > to do more operations in 'free_feature()':
> >> >   > 1. Iterate all domains working on the offline socket to change
> >> >   >'d->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to 
> >> > init
> >> >   >status.
> >> >   > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
> >> >   > 
> >> >   > These can make the socket's info be totally restored back to init 
> > status.
> >> > 
> >> > [Jan]
> >> >   Yes, that's what I think is needed.
> >> > 
> >> >> (albeit I can't see this happening, which would look to be a bug in
> >> >> patch 5), so getting the refcounts adjusted seems pointless in any
> >> >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
> >> > 
> >> > We only free resources in 'socket_info[socekt]' but do not free itself.
> >> > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
> >> > is not a pointer that can be directly freed.
> >> >   socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);
> >> > 
> >> > That is the reason to reduce the 'info->cos_ref[cos]'.
> >> 
> >> I see. But then there's no need to decrement it for each domain
> >> using it, you could simply flush it to zero.
> >> 
> >> >> certain - this may indeed by unavoidable, to match up with
> >> >> psr_alloc_cos() using xzalloc.
> >> >> 
> >> >> Furthermore I'm not at all convinced this is appropriate to do in the
> >> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> >> >> have a few thousand VMs, the loop above may take a while.
> >> >> 
> >> > Hmm, that may be a potential issue. I have two proposals below. Could you
> >> > please help to check which one you prefer? Or provide another solution?
> >> > 
> >> > 1. Start a tasklet in free_socket_resources() to restore 
> > 'psr_cos_ids[socket]'
> >> >of all domains. The action is protected by 'ref_lock' to avoid 
> > confliction
> >> >in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or 
> >> > memset
> >> >the array to 0 in free_socket_resources().
> >> > 
> >> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change 
> >> > index
> >> >from 'socket' to 'domain_id'. So we keep all domains' COS IDs per 
> >> > socket
> >> >and can memset the array to 0 when socket is offline. But here is an 
> > issue
> >> >that we do not know how many members this array should have. I cannot 
> > find
> >> >a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> > reallocation
> >> >in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> > current
> >> >array number.
> >> 
> >> The number of domains is limited by the special DOMID_* values.
> >> However, allocating an array with 32k entries doesn't sound very
> >> reasonable.
> > 
> > I think 32K entries should be the extreme case. I can allocate e.g. 100 
> > entries
> > when the first domain is created. If a new domain's id exceeds 100, 
> > reallocate
> > another 100 entries. The total number of entries allocated should be less 
> > than
> > 32K. This is a functional requirement which cannot be avoided. How do you 
> > think?
> 
> So how many entries would your array have once I start the 32,000th
> domain (having at any one time at most a single one running, besides
> Dom0)?
> 
In such case, we have to keep a 32K array because the domain_id is the index to
access the array. But this array is per socket so the whole memory used should
not be too much.

After considering this issue more, I think the original codes might not be
so unacceptable. Per my knowledge, Intel Xeon Phi chip can support at most
288 CPUs. So, I think the domains running at same time in reality may not be
so many (no efficient resources). If this hypothesis is right, a loop to write
'psr_cos_ids[socket]' of every domain to 0 may not take 

Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-12 Thread Jan Beulich
>>> On 12.04.17 at 14:23,  wrote:
> On 17-04-12 03:09:56, Jan Beulich wrote:
>> >>> On 12.04.17 at 07:53,  wrote:
>> > On 17-04-11 09:01:53, Jan Beulich wrote:
>> >> >>> On 01.04.17 at 15:53,  wrote:
>> >> > +info->cos_ref[cos]--;
>> >> > +spin_unlock(>ref_lock);
>> >> > +
>> >> > +d->arch.psr_cos_ids[socket] = 0;
>> >> > +}
>> >> 
>> >> Overall, while you say in the revision log that this was a suggestion of
>> >> mine, I don't recall any such (and I've just checked the v9 thread of
>> >> this patch without finding anything), and hence it's not really clear to
>> >> me why this is needed. After all you should be freeing info anyway
>> > 
>> > We discussed this in v9 05 patch.
>> 
>> Ah, that's why I didn't find it.
>> 
>> > I paste it below for your convenience to
>> > check.
>> > [Sun Yi]:
>> >   > So, you think the MSRs values may not be valid after such process and 
>> >   > reloading (write MSRs to default value) is needed. If so, I would like 
>> >   > to do more operations in 'free_feature()':
>> >   > 1. Iterate all domains working on the offline socket to change
>> >   >'d->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
>> >   >status.
>> >   > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
>> >   > 
>> >   > These can make the socket's info be totally restored back to init 
> status.
>> > 
>> > [Jan]
>> >   Yes, that's what I think is needed.
>> > 
>> >> (albeit I can't see this happening, which would look to be a bug in
>> >> patch 5), so getting the refcounts adjusted seems pointless in any
>> >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
>> > 
>> > We only free resources in 'socket_info[socekt]' but do not free itself.
>> > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
>> > is not a pointer that can be directly freed.
>> >   socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);
>> > 
>> > That is the reason to reduce the 'info->cos_ref[cos]'.
>> 
>> I see. But then there's no need to decrement it for each domain
>> using it, you could simply flush it to zero.
>> 
>> >> certain - this may indeed by unavoidable, to match up with
>> >> psr_alloc_cos() using xzalloc.
>> >> 
>> >> Furthermore I'm not at all convinced this is appropriate to do in the
>> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
>> >> have a few thousand VMs, the loop above may take a while.
>> >> 
>> > Hmm, that may be a potential issue. I have two proposals below. Could you
>> > please help to check which one you prefer? Or provide another solution?
>> > 
>> > 1. Start a tasklet in free_socket_resources() to restore 
> 'psr_cos_ids[socket]'
>> >of all domains. The action is protected by 'ref_lock' to avoid 
> confliction
>> >in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or 
>> > memset
>> >the array to 0 in free_socket_resources().
>> > 
>> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
>> >from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
>> >and can memset the array to 0 when socket is offline. But here is an 
> issue
>> >that we do not know how many members this array should have. I cannot 
> find
>> >a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> reallocation
>> >in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> current
>> >array number.
>> 
>> The number of domains is limited by the special DOMID_* values.
>> However, allocating an array with 32k entries doesn't sound very
>> reasonable.
> 
> I think 32K entries should be the extreme case. I can allocate e.g. 100 
> entries
> when the first domain is created. If a new domain's id exceeds 100, reallocate
> another 100 entries. The total number of entries allocated should be less than
> 32K. This is a functional requirement which cannot be avoided. How do you 
> think?

So how many entries would your array have once I start the 32,000th
domain (having at any one time at most a single one running, besides
Dom0)?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-12 Thread Yi Sun
On 17-04-12 03:09:56, Jan Beulich wrote:
> >>> On 12.04.17 at 07:53,  wrote:
> > On 17-04-11 09:01:53, Jan Beulich wrote:
> >> >>> On 01.04.17 at 15:53,  wrote:
> >> > +info->cos_ref[cos]--;
> >> > +spin_unlock(>ref_lock);
> >> > +
> >> > +d->arch.psr_cos_ids[socket] = 0;
> >> > +}
> >> 
> >> Overall, while you say in the revision log that this was a suggestion of
> >> mine, I don't recall any such (and I've just checked the v9 thread of
> >> this patch without finding anything), and hence it's not really clear to
> >> me why this is needed. After all you should be freeing info anyway
> > 
> > We discussed this in v9 05 patch.
> 
> Ah, that's why I didn't find it.
> 
> > I paste it below for your convenience to
> > check.
> > [Sun Yi]:
> >   > So, you think the MSRs values may not be valid after such process and 
> >   > reloading (write MSRs to default value) is needed. If so, I would like 
> >   > to do more operations in 'free_feature()':
> >   > 1. Iterate all domains working on the offline socket to change
> >   >'d->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
> >   >status.
> >   > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
> >   > 
> >   > These can make the socket's info be totally restored back to init 
> > status.
> > 
> > [Jan]
> >   Yes, that's what I think is needed.
> > 
> >> (albeit I can't see this happening, which would look to be a bug in
> >> patch 5), so getting the refcounts adjusted seems pointless in any
> >> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
> > 
> > We only free resources in 'socket_info[socekt]' but do not free itself.
> > Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
> > is not a pointer that can be directly freed.
> >   socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);
> > 
> > That is the reason to reduce the 'info->cos_ref[cos]'.
> 
> I see. But then there's no need to decrement it for each domain
> using it, you could simply flush it to zero.
> 
> >> certain - this may indeed by unavoidable, to match up with
> >> psr_alloc_cos() using xzalloc.
> >> 
> >> Furthermore I'm not at all convinced this is appropriate to do in the
> >> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> >> have a few thousand VMs, the loop above may take a while.
> >> 
> > Hmm, that may be a potential issue. I have two proposals below. Could you
> > please help to check which one you prefer? Or provide another solution?
> > 
> > 1. Start a tasklet in free_socket_resources() to restore 
> > 'psr_cos_ids[socket]'
> >of all domains. The action is protected by 'ref_lock' to avoid 
> > confliction
> >in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
> >the array to 0 in free_socket_resources().
> > 
> > 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
> >from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
> >and can memset the array to 0 when socket is offline. But here is an 
> > issue
> >that we do not know how many members this array should have. I cannot 
> > find
> >a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> > reallocation
> >in 'psr_alloc_cos' if the newly created domain's id is bigger than 
> > current
> >array number.
> 
> The number of domains is limited by the special DOMID_* values.
> However, allocating an array with 32k entries doesn't sound very
> reasonable.

I think 32K entries should be the extreme case. I can allocate e.g. 100 entries
when the first domain is created. If a new domain's id exceeds 100, reallocate
another 100 entries. The total number of entries allocated should be less than
32K. This is a functional requirement which cannot be avoided. How do you think?

> Sadly the other solution doesn't look very attractive
> either, as there'll be quite a bit of synchronization needed (you'd
> have to defer the same socket being re-used by a CPU being
> onlined until you've done the cleanup). This may need some more
> thought, but I can't see myself finding time for this any time soon,
> so I'm afraid I'll have to leave it to you for now.
> 
Yes, the first option need carefully consider the synchronization which is more
complex than second option.

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-12 Thread Jan Beulich
>>> On 12.04.17 at 07:53,  wrote:
> On 17-04-11 09:01:53, Jan Beulich wrote:
>> >>> On 01.04.17 at 15:53,  wrote:
>> > +info->cos_ref[cos]--;
>> > +spin_unlock(>ref_lock);
>> > +
>> > +d->arch.psr_cos_ids[socket] = 0;
>> > +}
>> 
>> Overall, while you say in the revision log that this was a suggestion of
>> mine, I don't recall any such (and I've just checked the v9 thread of
>> this patch without finding anything), and hence it's not really clear to
>> me why this is needed. After all you should be freeing info anyway
> 
> We discussed this in v9 05 patch.

Ah, that's why I didn't find it.

> I paste it below for your convenience to
> check.
> [Sun Yi]:
>   > So, you think the MSRs values may not be valid after such process and 
>   > reloading (write MSRs to default value) is needed. If so, I would like 
>   > to do more operations in 'free_feature()':
>   > 1. Iterate all domains working on the offline socket to change
>   >'d->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
>   >status.
>   > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
>   > 
>   > These can make the socket's info be totally restored back to init status.
> 
> [Jan]
>   Yes, that's what I think is needed.
> 
>> (albeit I can't see this happening, which would look to be a bug in
>> patch 5), so getting the refcounts adjusted seems pointless in any
>> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
> 
> We only free resources in 'socket_info[socekt]' but do not free itself.
> Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
> is not a pointer that can be directly freed.
>   socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);
> 
> That is the reason to reduce the 'info->cos_ref[cos]'.

I see. But then there's no need to decrement it for each domain
using it, you could simply flush it to zero.

>> certain - this may indeed by unavoidable, to match up with
>> psr_alloc_cos() using xzalloc.
>> 
>> Furthermore I'm not at all convinced this is appropriate to do in the
>> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
>> have a few thousand VMs, the loop above may take a while.
>> 
> Hmm, that may be a potential issue. I have two proposals below. Could you
> please help to check which one you prefer? Or provide another solution?
> 
> 1. Start a tasklet in free_socket_resources() to restore 'psr_cos_ids[socket]'
>of all domains. The action is protected by 'ref_lock' to avoid confliction
>in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
>the array to 0 in free_socket_resources().
> 
> 2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
>from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
>and can memset the array to 0 when socket is offline. But here is an issue
>that we do not know how many members this array should have. I cannot find
>a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use 
> reallocation
>in 'psr_alloc_cos' if the newly created domain's id is bigger than current
>array number.

The number of domains is limited by the special DOMID_* values.
However, allocating an array with 32k entries doesn't sound very
reasonable. Sadly the other solution doesn't look very attractive
either, as there'll be quite a bit of synchronization needed (you'd
have to defer the same socket being re-used by a CPU being
onlined until you've done the cleanup). This may need some more
thought, but I can't see myself finding time for this any time soon,
so I'm afraid I'll have to leave it to you for now.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-11 Thread Yi Sun
On 17-04-11 09:01:53, Jan Beulich wrote:
> >>> On 01.04.17 at 15:53,  wrote:
> > --- a/xen/arch/x86/psr.c
> > +++ b/xen/arch/x86/psr.c
> > @@ -157,10 +157,26 @@ static void free_socket_resources(unsigned int socket)
> >  {
> >  unsigned int i;
> >  struct psr_socket_info *info = socket_info + socket;
> > +struct domain *d;
> >  
> >  if ( !info )
> >  return;
> >  
> > +/* Restore domain cos id to 0 when socket is offline. */
> > +for_each_domain ( d )
> > +{
> > +unsigned int cos = d->arch.psr_cos_ids[socket];
> > +if ( cos == 0 )
> 
> Blank line between declaration(s) and statement(s) please.
> 
Ok, will modify other places where have same issue.

> > +continue;
> > +
> > +spin_lock(>ref_lock);
> > +ASSERT(!cos || info->cos_ref[cos]);
> 
> You've checked cos to be non-zero already above.
> 
Yep. Thanks!

> > +info->cos_ref[cos]--;
> > +spin_unlock(>ref_lock);
> > +
> > +d->arch.psr_cos_ids[socket] = 0;
> > +}
> 
> Overall, while you say in the revision log that this was a suggestion of
> mine, I don't recall any such (and I've just checked the v9 thread of
> this patch without finding anything), and hence it's not really clear to
> me why this is needed. After all you should be freeing info anyway

We discussed this in v9 05 patch. I paste it below for your convenience to
check.
[Sun Yi]:
  > So, you think the MSRs values may not be valid after such process and 
  > reloading (write MSRs to default value) is needed. If so, I would like 
  > to do more operations in 'free_feature()':
  > 1. Iterate all domains working on the offline socket to change
  >'d->arch.psr_cos_ids[socket]' to COS 0, i.e restore it back to init
  >status.
  > 2. Restore 'socket_info[socket].cos_ref[]' to all 0.
  > 
  > These can make the socket's info be totally restored back to init status.

[Jan]
  Yes, that's what I think is needed.

> (albeit I can't see this happening, which would look to be a bug in
> patch 5), so getting the refcounts adjusted seems pointless in any
> event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not

We only free resources in 'socket_info[socekt]' but do not free itself.
Below is how we allocate 'socket_info'. So, the 'socket_info[socekt]'
is not a pointer that can be directly freed.
  socket_info = xzalloc_array(struct psr_socket_info, nr_sockets);

That is the reason to reduce the 'info->cos_ref[cos]'.

> certain - this may indeed by unavoidable, to match up with
> psr_alloc_cos() using xzalloc.
> 
> Furthermore I'm not at all convinced this is appropriate to do in the
> context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
> have a few thousand VMs, the loop above may take a while.
> 
Hmm, that may be a potential issue. I have two proposals below. Could you
please help to check which one you prefer? Or provide another solution?

1. Start a tasklet in free_socket_resources() to restore 'psr_cos_ids[socket]'
   of all domains. The action is protected by 'ref_lock' to avoid confliction
   in 'psr_set_val'. We can reduce 'info->cos_ref[cos]' in tasklet or memset
   the array to 0 in free_socket_resources().

2. Move 'psr_cos_ids[]' from 'domain' to 'psr_socket_info' and change index
   from 'socket' to 'domain_id'. So we keep all domains' COS IDs per socket
   and can memset the array to 0 when socket is offline. But here is an issue
   that we do not know how many members this array should have. I cannot find
   a macro something like 'DOMAIN_MAX_NUMBER'. So, I prefer to use reallocation
   in 'psr_alloc_cos' if the newly created domain's id is bigger than current
   array number.

> > @@ -574,15 +590,210 @@ int psr_get_val(struct domain *d, unsigned int 
> > socket,
> >  return 0;
> >  }
> >  
> > -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> > -   uint64_t cbm, enum cbm_type type)
> > +/* Set value functions */
> > +static unsigned int get_cos_num(const struct psr_socket_info *info)
> >  {
> >  return 0;
> >  }
> >  
> > +static int gather_val_array(uint32_t val[],
> > +unsigned int array_len,
> > +const struct psr_socket_info *info,
> > +unsigned int old_cos)
> > +{
> > +return -EINVAL;
> > +}
> > +
> > +static int insert_val_to_array(uint32_t val[],
> 
> As indicated before, I'm pretty convinced "into" would be more
> natural than "to".
> 
Got it. Thanks!

> > +   unsigned int array_len,
> > +   const struct psr_socket_info *info,
> > +   enum psr_feat_type feat_type,
> > +   enum cbm_type type,
> > +   uint32_t new_val)
> > +{
> > +return -EINVAL;
> > +}
> > +
> > +static int find_cos(const uint32_t val[], unsigned int array_len,
> > +enum 

Re: [Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-11 Thread Jan Beulich
>>> On 01.04.17 at 15:53,  wrote:
> --- a/xen/arch/x86/psr.c
> +++ b/xen/arch/x86/psr.c
> @@ -157,10 +157,26 @@ static void free_socket_resources(unsigned int socket)
>  {
>  unsigned int i;
>  struct psr_socket_info *info = socket_info + socket;
> +struct domain *d;
>  
>  if ( !info )
>  return;
>  
> +/* Restore domain cos id to 0 when socket is offline. */
> +for_each_domain ( d )
> +{
> +unsigned int cos = d->arch.psr_cos_ids[socket];
> +if ( cos == 0 )

Blank line between declaration(s) and statement(s) please.

> +continue;
> +
> +spin_lock(>ref_lock);
> +ASSERT(!cos || info->cos_ref[cos]);

You've checked cos to be non-zero already above.

> +info->cos_ref[cos]--;
> +spin_unlock(>ref_lock);
> +
> +d->arch.psr_cos_ids[socket] = 0;
> +}

Overall, while you say in the revision log that this was a suggestion of
mine, I don't recall any such (and I've just checked the v9 thread of
this patch without finding anything), and hence it's not really clear to
me why this is needed. After all you should be freeing info anyway
(albeit I can't see this happening, which would look to be a bug in
patch 5), so getting the refcounts adjusted seems pointless in any
event. Whether d->arch.psr_cos_ids[socket] needs clearing I'm not
certain - this may indeed by unavoidable, to match up with
psr_alloc_cos() using xzalloc.

Furthermore I'm not at all convinced this is appropriate to do in the
context of a CPU_UP_CANCELED / CPU_DEAD notification: If you
have a few thousand VMs, the loop above may take a while.

> @@ -574,15 +590,210 @@ int psr_get_val(struct domain *d, unsigned int socket,
>  return 0;
>  }
>  
> -int psr_set_l3_cbm(struct domain *d, unsigned int socket,
> -   uint64_t cbm, enum cbm_type type)
> +/* Set value functions */
> +static unsigned int get_cos_num(const struct psr_socket_info *info)
>  {
>  return 0;
>  }
>  
> +static int gather_val_array(uint32_t val[],
> +unsigned int array_len,
> +const struct psr_socket_info *info,
> +unsigned int old_cos)
> +{
> +return -EINVAL;
> +}
> +
> +static int insert_val_to_array(uint32_t val[],

As indicated before, I'm pretty convinced "into" would be more
natural than "to".

> +   unsigned int array_len,
> +   const struct psr_socket_info *info,
> +   enum psr_feat_type feat_type,
> +   enum cbm_type type,
> +   uint32_t new_val)
> +{
> +return -EINVAL;
> +}
> +
> +static int find_cos(const uint32_t val[], unsigned int array_len,
> +enum psr_feat_type feat_type,
> +const struct psr_socket_info *info,
> +spinlock_t *ref_lock)

I don't think I did suggest adding another parameter here. The lock
is accessible from info, isn't it? In which case, as I _did_ suggest,
you should drop const from it instead. But ...

> +{
> +ASSERT(spin_is_locked(ref_lock));

... the assertion seems rather pointless anyway with there just
being a single caller, which very visibly acquires the lock up front.

> +static int pick_avail_cos(const struct psr_socket_info *info,
> +  spinlock_t *ref_lock,

Same here then.

> +int psr_set_val(struct domain *d, unsigned int socket,
> +uint32_t val, enum cbm_type type)
> +{
> +unsigned int old_cos;
> +int cos, ret;
> +unsigned int *ref;
> +uint32_t *val_array;
> +struct psr_socket_info *info = get_socket_info(socket);
> +unsigned int array_len;
> +enum psr_feat_type feat_type;
> +
> +if ( IS_ERR(info) )
> +return PTR_ERR(info);
> +
> +feat_type = psr_cbm_type_to_feat_type(type);
> +if ( feat_type > ARRAY_SIZE(info->features) ||

I think I did point out the off-by-one mistake here in an earlier patch
already.

> + !info->features[feat_type] )
> +return -ENOENT;
> +
> +/*
> + * Step 0:
> + * old_cos means the COS ID current domain is using. By default, it is 0.
> + *
> + * For every COS ID, there is a reference count to record how many 
> domains
> + * are using the COS register corresponding to this COS ID.
> + * - If ref[old_cos] is 0, that means this COS is not used by any domain.
> + * - If ref[old_cos] is 1, that means this COS is only used by current
> + *   domain.
> + * - If ref[old_cos] is more than 1, that mean multiple domains are using
> + *   this COS.
> + */
> +old_cos = d->arch.psr_cos_ids[socket];
> +ASSERT(old_cos < MAX_COS_REG_CNT);
> +
> +ref = info->cos_ref;
> +
> +/*
> + * Step 1:
> + * Gather a value array to store all features cos_reg_val[old_cos].
> + * And, set the input new val into array according to 

[Xen-devel] [PATCH v10 09/25] x86: refactor psr: L3 CAT: set value: implement framework.

2017-04-01 Thread Yi Sun
As set value flow is the most complicated one in psr, it will be
divided to some patches to make things clearer. This patch
implements the set value framework to show a whole picture firstly.

It also changes domctl interface to make it more general.

To make the set value flow be general and can support multiple features
at same time, it includes below steps:
1. Get COS ID that current domain is using.
2. Gather a value array to store all features current value
   into it and replace the current value of the feature which is
   being set to the new input value.
3. Find if there is already a COS ID on which all features'
   values are same as the array. Then, we can reuse this COS
   ID.
4. If fail to find, we need pick an available COS ID. Only COS ID which ref
   is 0 or 1 can be picked.
5. Write the feature's MSR according to the COS ID and cbm_type.
6. Update ref according to COS ID.
7. Save the COS ID into current domain's psr_cos_ids[socket] so that we
   can know which COS the domain is using on the socket.

So, some functions are abstracted and the callback functions will be
implemented in next patches.

Here is an example to understand the process. The CPU supports
two featuers, e.g. L3 CAT and L2 CAT. User wants to set L3 CAT
of Dom1 to 0x1ff.
1. At the initial time, the old_cos of Dom1 is 0. The COS registers values
are below at this time.
---
| COS 0 | COS 1 | COS 2 | ... |
---
L3 CAT  | 0x7ff | 0x7ff | 0x7ff | ... |
---
L2 CAT  | 0xff  | 0xff  | 0xff  | ... |
---

2. Gather the value array and insert new value into it:
val[0]: 0x1ff
val[1]: 0xff

3. It cannot find a matching COS.

4. Pick COS 1 to store the value set.

5. Write the L3 CAT COS 1 registers. The COS registers values are
changed to below now.
---
| COS 0 | COS 1 | COS 2 | ... |
---
L3 CAT  | 0x7ff | 0x1ff | ...   | ... |
---
L2 CAT  | 0xff  | 0xff  | ...   | ... |
---

6. The ref[1] is increased to 1 because Dom1 is using it now.

7. Save 1 to Dom1's psr_cos_ids[socket].

Then, user wants to set L3 CAT of Dom2 to 0x1ff too. The old_cos
of Dom2 is 0 too. Repeat above flow.

The val array assembled is:
val[0]: 0x1ff
val[1]: 0xff

So, it can find a matching COS, COS 1. Then, it can reuse COS 1
for Dom2.

The ref[1] is increased to 2 now because both Dom1 and Dom2 are
using this COS ID. Set 1 to Dom2's psr_cos_ids[socket].

Another thing need to emphasize is the context switch. When context
switch happens, 'psr_ctxt_switch_to' is called by system to get
domain's COS ID from 'psr_cos_ids[socket]'. But 'psr_cos_ids[socket]'
is set at step 7 above. So, there are three scenarios, e.g.:
1. User calls domctl interface on Dom0 to set a COS ID 1 for Dom1 into its
   psr_cos_ids[]. Then, Dom1 is scheduled so that 'psr_ctxt_switch_to()' is
   called which makes COS ID 1 work. For this case, no action is needed.

2. Dom1 runs on CPU 1 and COS ID 1 is working. At same time, user calls domctl
   interface on Dom0 to set a new COS ID 2 for Dom1 into psr_cos_ids[]. After
   time slice ends, the Dom1 is scheduled again, the new COS ID 2 will work.

3. When a new COS ID is being set to psr_cos_ids[], 'psr_ctxt_switch_to()'
   is called to access the same psr_cos_ids[] member through 'psr_assoc_cos'.
   The COS ID is constrained by cos_mask so that it cannot exceeds the cos_max.
   So even the COS ID got here is wrong, it is still a workable ID (within
   cos_max). The functionality is still workable, the actual valid CBM will be
   effective for a short time. In next schedule, the correct CBM will take
   effect.

All these cases will not cause race condition and no harm to system. The PSR
features are to set cache capacity for a domain. The setting to cache is
progressively effective. When the cache setting becomes really effective, the
time slice to schedule a domain may have passed. So, even if a wrong COS ID is
used to set ASSOC, only another valid CBM be effective for a short time during
cache preparation time. The correct COS ID will take effect in a short time.
This does not affect cache capacity setting much.

Signed-off-by: Yi Sun 
---
v10:
- restore domain cos id to 0 when socket is offline.
  (suggested by Jan Beulich)
- check 'psr_cat_op.data' to make sure only lower 32 bits are valid.
  (suggested by Jan Beulich)
- remove unnecessary fixed width type of parameters and variables.
  (suggested by Jan Beulich)
- rename 'insert_new_val_to_array' to 'insert_val_to_array'.
  (suggested by Jan Beulich)
- input 'ref_lock' pointer into functions to check if it has been locked.
  (suggested by Jan Beulich)
- add comment to declare the set process is protected by 'domctl_lock'.