Re: [Xen-devel] BUG: sched=credit2 crashes system when using cpupools

2018-09-12 Thread Steven Haigh
On Thursday, 13 September 2018 1:11:20 AM AEST Dario Faggioli wrote:
> On Thu, 2018-08-30 at 18:49 +1000, Steven Haigh wrote:
> > On 2018-08-30 18:33, Jan Beulich wrote:
> > > Anyway - as Jürgen says, something for the scheduler
> > > maintainers to look into.
> 
> Ok, I'm back.
> 
> > Yep - I just want to confirm that we tested this in BOTH NUMA
> > configurations - and credit2 crashed on both.
> > 
> > I switched back to sched=credit, and it seems to work as expected:
> > # xl cpupool-list
> > Name   CPUs   Sched Active   Domain count
> > Pool-node0  12credit   y  3
> > Pool-node1  12credit   y  0

Hi Dario,

I'll try to clarify below.
 
> Wait, in a previous message, you said: "A machine where we could get
> this working every time shows". Doesn't that mean creating a separate
> pool for node 1 works with both Credit and Credit2, if the node has
> memory?
> 
> I mean, trying to clarifying, my understanding is that you have to
> systems:
> 
> system A: node 1 has *no* memory
> system B: both node 0 and node 1 have memory
> 
> Creating a Credit pool with pcpus from node 1 always work on both
> systems.

Correct. With the credit scheduler, the pool split worked correctly on both 
systems.
 
> OTOH, when you try to create a Credit2 pool with pcpus from node 1,
> does it always crash on both systems, or does it work on system B and
> crashes on system A ?

Both systems crashed when using credit2. We originally thought this was due to 
the different memory layout between the two systems. This bit turned out to 
not matter as both systems crashed in the same way.

> I do have a NUMA box with RAM in both nodes (so similar to system B).
> Last time I checked, what you're trying to do worked there, pretty much
> with any scheduler combination, but I'll recheck.
> 
> I don't have a box similar to system A. I'll try to remove some of the
> RAM from that NUMA box, and check what happens.

In theory, if you reproduce what we did, it should crash anyway. The RAM 
layout shouldn't matter.

We changed the scheduler in the grub line as 'sched=credit2'. Then did the 
split. I didn't try changing the Dom0 to boot with credit, but making the 
pools credit2.

-- 
Steven Haigh

 net...@crc.id.au    https://www.crc.id.au
 +61 (3) 9001 6090 0412 935 897


signature.asc
Description: This is a digitally signed message part.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] BUG: sched=credit2 crashes system when using cpupools

2018-09-12 Thread Dario Faggioli
On Thu, 2018-08-30 at 18:49 +1000, Steven Haigh wrote:
> On 2018-08-30 18:33, Jan Beulich wrote:
> > > > > 
> > Anyway - as Jürgen says, something for the scheduler
> > maintainers to look into.
> 
Ok, I'm back.

> Yep - I just want to confirm that we tested this in BOTH NUMA 
> configurations - and credit2 crashed on both.
> 
> I switched back to sched=credit, and it seems to work as expected:
> # xl cpupool-list
> Name   CPUs   Sched Active   Domain count
> Pool-node0  12credit   y  3
> Pool-node1  12credit   y  0
> 
Wait, in a previous message, you said: "A machine where we could get
this working every time shows". Doesn't that mean creating a separate
pool for node 1 works with both Credit and Credit2, if the node has
memory?

I mean, trying to clarifying, my understanding is that you have to
systems:

system A: node 1 has *no* memory
system B: both node 0 and node 1 have memory

Creating a Credit pool with pcpus from node 1 always work on both
systems.

OTOH, when you try to create a Credit2 pool with pcpus from node 1,
does it always crash on both systems, or does it work on system B and
crashes on system A ?

I do have a NUMA box with RAM in both nodes (so similar to system B).
Last time I checked, what you're trying to do worked there, pretty much
with any scheduler combination, but I'll recheck.

I don't have a box similar to system A. I'll try to remove some of the
RAM from that NUMA box, and check what happens.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] BUG: sched=credit2 crashes system when using cpupools

2018-08-30 Thread Steven Haigh

On 2018-08-30 18:33, Jan Beulich wrote:

On 30.08.18 at 06:01,  wrote:
Managed to get the same crash log when adding CPUs to Pool-1 as 
follows:


Create the pool:
(XEN) Initializing Credit2 scheduler
(XEN)  load_precision_shift: 18
(XEN)  load_window_shift: 30
(XEN)  underload_balance_tolerance: 0
(XEN)  overload_balance_tolerance: -3
(XEN)  runqueues arrangement: socket
(XEN)  cap enforcement granularity: 10ms
(XEN) load tracking window length 1073741824 ns

Add the CPUs:
(XEN) Adding cpu 12 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Removing cpu 12 from runqueue 0
(XEN) Adding cpu 13 to runqueue 0
(XEN) Removing cpu 13 from runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Removing cpu 14 from runqueue 0
(XEN) Xen BUG at sched_credit2.c:3452


credit2 still not being the default - do things work if you don't 
override

the default (of using credit1)? I guess the problem is connected to the
"Removing cpu  from runqueue 0", considering this

BUG_ON(!cpumask_test_cpu(cpu, >active));

is what triggers. Anyway - as Jürgen says, something for the scheduler
maintainers to look into.


Yep - I just want to confirm that we tested this in BOTH NUMA 
configurations - and credit2 crashed on both.


I switched back to sched=credit, and it seems to work as expected:
# xl cpupool-list
Name   CPUs   Sched Active   Domain count
Pool-node0  12credit   y  3
Pool-node1  12credit   y  0

I've updated the subject - as this isn't a NUMA issue at all.

--
Steven Haigh

? net...@crc.id.au ? http://www.crc.id.au
? +61 (3) 9001 6090? 0412 935 897

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel