Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-20 Thread George Mitchell
On 04/17/18 19:01, George Mitchell wrote:
> On 04/17/18 17:20, EBFE via freebsd-stable wrote:
>> [...]
>> For interactive tasks, there is a "special" tunable:
>> % sysctl kern.sched.interact
>> kern.sched.interact: 10 # default is 30
>> % sysctl -d kern.sched.interact
>> kern.sched.interact: Interactivity score threshold
>>
>> reducing the value from 30 to 10-15 keeps your gui/system responsive,
>> even under high load.
>> [...]
> 
> I suspect my case (make buildworld while running misc/dnetc) doesn't
> qualify.  However, I just completed a SCHED_ULE run with
> preempt_thresh set to 5, and "time make buildworld" reports:
> 7336.748u 677.085s 9:25:19.86 23.6% 27482+473k 42147+431581io 38010pf+0w
> Much closer to SCHED_4BSD!  I'll try preempt_thresh=0 next, and I
> guess I'll at least try preempt_thresh=224 to see how that works
> for me. -- George
> 
I've now done SCHED_ULE runs with preempt_thresh set to 0, 1, 5, 80,
and 224.  The wall clock time is uniformly in the vicinity of 10 hours.
The "time" output is consistent with SCHED_4BSD, but the wall clock
time is really what I care about.

Now I have set kern.sched.preempt_thresh back to the default of 80 and
I am experimenting with kern.sched.interact.  I'm pretty sure that
setting kern.sched.preempt_thresh is not the answer to my problem.
-- George



signature.asc
Description: OpenPGP digital signature


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-18 Thread Eugene Grosbein
19.04.2018 0:59, Peter wrote:

>   thank You very much for Your commenting and reports!
> 
> From what I see, we have (at least) two rather different demands here:
> while George looks at the over-all speed of compute throughput, others are 
> concerned about interactive response.
> 
> My own issue is again a little bit different: I am running this small 
> single-CPU machine as my home-office router, and it also runs a backup 
> service, which involves compressing big files and handling an outgrown 
> database (but that does not need to happen fast, as it's just backup stuff).
> So, my demand is to maintain a good balance between realtime network activity 
> being immediately served, and low-priority batch compute jobs, while still 
> staying responsive to shell-commands - but the over-all compute throughput is 
> not important here.
> 
> But then, I find it very difficult to devise some metrics, by which such a 
> demand could be properly measured, to get compareable figures.

I run similar system (AMD Geode 500Mhz i386-compatible) and found that
SCHED_4BSD does it just fine without any additional non-default configuration:
no other kernel options (*PREEMPT*), no loader.conf/sysctl.conf tuning.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-18 Thread Peter

Hi all of You,

  thank You very much for Your commenting and reports!

From what I see, we have (at least) two rather different demands here: 
while George looks at the over-all speed of compute throughput, others 
are concerned about interactive response.


My own issue is again a little bit different: I am running this small 
single-CPU machine as my home-office router, and it also runs a backup 
service, which involves compressing big files and handling an outgrown 
database (but that does not need to happen fast, as it's just backup stuff).
So, my demand is to maintain a good balance between realtime network 
activity being immediately served, and low-priority batch compute jobs, 
while still staying responsive to shell-commands - but the over-all 
compute throughput is not important here.


But then, I find it very difficult to devise some metrics, by which such 
a demand could be properly measured, to get compareable figures.



George Mitchell wrote:

I suspect my case (make buildworld while running misc/dnetc) doesn't
qualify.  However, I just completed a SCHED_ULE run with
preempt_thresh set to 5, and "time make buildworld" reports:
7336.748u 677.085s 9:25:19.86 23.6% 27482+473k 42147+431581io 38010pf+0w
Much closer to SCHED_4BSD!  I'll try preempt_thresh=0 next, and I
guess I'll at least try preempt_thresh=224 to see how that works
for me. -- George



I found that preempt_thresh=0 cannot be used in practice:
When I try to do this on my quadcode desktop, and then start four 
endless-loops to get the cores busy, the (internet)radio will have a 
dropout every 2-3 seconds (and there is nothing else running, just a 
sleeping icewm and a mostly sleeping firefox)!


So, the (SMP) system *depends* on preemption, it cannot handle streaming 
data without it. (@George: Your buildworld test is pure batch load, and 
may not be bothered by this effect.)



I think the problem is *not* to be solved by finding a good setting for
preempt_thresh (or other tuneables). I think the problem lies deeper, 
and these tuneables only change its appearance.


I have worked out a writeup explaining my thoughts in detail, and I 
would be glad if You stay tuned and evaluate that.


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-18 Thread Peter

EBFE via freebsd-stable wrote:

On Tue, 17 Apr 2018 09:05:48 -0700
Freddie Cash  wrote:


# Tune for desktop usage
kern.sched.preempt_thresh=224

​Works quite nicely on a 4-core AMD Phenom-II X4 960T Processor
(3010.09-MHz K8-class CPU) running KDE4 using an Nvidia 210 GPU.


For interactive tasks, there is a "special" tunable:
% sysctl kern.sched.interact
kern.sched.interact: 10 # default is 30
% sysctl -d kern.sched.interact
kern.sched.interact: Interactivity score threshold

reducing the value from 30 to 10-15 keeps your gui/system responsive,
even under high load.


Yes, this may improve the "irresponsive-desktop" problem. Because 
threads that are scored interactive, are run as realtime threads, ahead 
of all regular workload queues.
But it will likely not solve the problem described by George, having two 
competing batch jobs. And for my problem as described at the beginning
of the thread, I could probably tune so far that my "worker" thread 
would be considered interactive, but then it would just toggle between
realtime and timesharing queues - and while this may make things better, 
it will probably not lead to a smooth system behaviour.


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-17 Thread George Mitchell
On 04/17/18 17:20, EBFE via freebsd-stable wrote:
> On Tue, 17 Apr 2018 09:05:48 -0700
> Freddie Cash  wrote:
> 
>> # Tune for desktop usage
>> kern.sched.preempt_thresh=224
>>
>> ​Works quite nicely on a 4-core AMD Phenom-II X4 960T Processor
>> (3010.09-MHz K8-class CPU) running KDE4 using an Nvidia 210 GPU.
> 
> For interactive tasks, there is a "special" tunable:
> % sysctl kern.sched.interact
> kern.sched.interact: 10 # default is 30
> % sysctl -d kern.sched.interact
> kern.sched.interact: Interactivity score threshold
> 
> reducing the value from 30 to 10-15 keeps your gui/system responsive,
> even under high load.
> [...]

I suspect my case (make buildworld while running misc/dnetc) doesn't
qualify.  However, I just completed a SCHED_ULE run with
preempt_thresh set to 5, and "time make buildworld" reports:
7336.748u 677.085s 9:25:19.86 23.6% 27482+473k 42147+431581io 38010pf+0w
Much closer to SCHED_4BSD!  I'll try preempt_thresh=0 next, and I
guess I'll at least try preempt_thresh=224 to see how that works
for me. -- George



signature.asc
Description: OpenPGP digital signature


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-17 Thread EBFE via freebsd-stable
On Tue, 17 Apr 2018 09:05:48 -0700
Freddie Cash  wrote:

> # Tune for desktop usage
> kern.sched.preempt_thresh=224
> 
> ​Works quite nicely on a 4-core AMD Phenom-II X4 960T Processor
> (3010.09-MHz K8-class CPU) running KDE4 using an Nvidia 210 GPU.

For interactive tasks, there is a "special" tunable:
% sysctl kern.sched.interact
kern.sched.interact: 10 # default is 30
% sysctl -d kern.sched.interact
kern.sched.interact: Interactivity score threshold

reducing the value from 30 to 10-15 keeps your gui/system responsive,
even under high load.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-17 Thread Freddie Cash
On Tue, Apr 17, 2018 at 8:49 AM, Kevin Oberman  wrote:

> On Mon, Apr 16, 2018 at 11:56 PM, Eivind Nicolay Evensen <
> eivi...@terraplane.org> wrote:
>
> > On Wed, Apr 04, 2018 at 09:32:58AM -0400, George Mitchell wrote:
> > > On 04/04/18 06:39, Alban Hertroys wrote:
> > > > [...]
> > > > That said, SCHED_ULE (the default scheduler for quite a while now)
> was
> > designed with multi-CPU configurations in mind and there are claims that
> > SCHED_4BSD works better for single-CPU configurations. You may give that
> a
> > try, if you're not already on SCHED_4BSD.
> > > > [...]
> > >
> > > A small, disgruntled community of FreeBSD users who have never seen
> > > proof that SCHED_ULE is better than SCHED_4BSD in any environment
> > > continue to regularly recompile with SCHED_4BSD.  I dread the day when
> > > that becomes impossible, but at least it isn't here yet.  -- George
> >
> > Indeed 4bsd is better in my case aswell. While for some unknown to me
> > reason ule performed a bit better in the 10.x series than before, in 11.x
> > it again is in my case not usable.
> >
> > Mouse freezes for around half a second with even frequency by just moving
> > it around in x11. Using 4bsd instead makes the problem go away.
> > I'm actually very happy that ule became worse again because going
> > back to 4bsd yet again also gave improved performance from other
> > dreadfully slow but (to me) still useful programs, like darktable.
> >
> > With 4bsd, when adjusting shadows and highlights it is possible to see
> > what I do when moving sliders. With ule it has never been better than
> waiting
> > 10-20-30 seconds to see where it was able to read a slider position
> > and update display, when working on images around 10500x10500 greyscale.
> >
> > It's not single cpu/single core either:
> > CPU: AMD FX(tm)-6300 Six-Core Processor  (3817.45-MHz
> K8-class
> > CPU)
>
> My experience has long been that 4BSD works far better for interactive, X
> based systems than ULE. Even on 10 I saw long, annoying pauses with ULE and
> I don't se those with 4BSD. I'd really like to see it better known that
> this is often the case. BTW, my system is 2 core/4 thread Sandybridge.
> ​
>

​The following has been suggested multiple times over the years on various
mailing lists as the "solution" to making ULE work well for interactive
tasks like running X-based desktops (in /etc/sysctl.conf):​

# Tune for desktop usage
kern.sched.preempt_thresh=224

​Works quite nicely on a 4-core AMD Phenom-II X4 960T Processor
(3010.09-MHz K8-class CPU) running KDE4 using an Nvidia 210 GPU.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-17 Thread Kevin Oberman
On Mon, Apr 16, 2018 at 11:56 PM, Eivind Nicolay Evensen <
eivi...@terraplane.org> wrote:

> On Wed, Apr 04, 2018 at 09:32:58AM -0400, George Mitchell wrote:
> > On 04/04/18 06:39, Alban Hertroys wrote:
> > > [...]
> > > That said, SCHED_ULE (the default scheduler for quite a while now) was
> designed with multi-CPU configurations in mind and there are claims that
> SCHED_4BSD works better for single-CPU configurations. You may give that a
> try, if you're not already on SCHED_4BSD.
> > > [...]
> >
> > A small, disgruntled community of FreeBSD users who have never seen
> > proof that SCHED_ULE is better than SCHED_4BSD in any environment
> > continue to regularly recompile with SCHED_4BSD.  I dread the day when
> > that becomes impossible, but at least it isn't here yet.  -- George
>
> Indeed 4bsd is better in my case aswell. While for some unknown to me
> reason
> ule performed a bit better in the 10.x series than before, in 11.x
> it again is in my case not usable.
>
> Mouse freezes for around half a second with even frequency by just moving
> it around in x11. Using 4bsd instead makes the problem go away.
> I'm actually very happy that ule became worse again because going
> back to 4bsd yet again also gave improved performance from other
> dreadfully slow but (to me) still useful programs, like darktable.
>
> With 4bsd, when adjusting shadows and highlights it is possible to see
> what I
> do when moving sliders. With ule it has never been better than waiting
> 10-20-30 seconds to see where it was able to read a slider position
> and update display, when working on images around 10500x10500 greyscale.
>
> It's not single cpu/single core either:
> CPU: AMD FX(tm)-6300 Six-Core Processor  (3817.45-MHz K8-class
> CPU)
>
>
>
>
> --
> Eivind
>

My experience has long been that 4BSD works far better for interactive, X
based systems than ULE. Even on 10 I saw long, annoying pauses with ULE and
I don't se those with 4BSD. I'd really like to see it better known that
this is often the case. BTW, my system is 2 core/4 thread Sandybridge.
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-17 Thread Eivind Nicolay Evensen
On Wed, Apr 04, 2018 at 09:32:58AM -0400, George Mitchell wrote:
> On 04/04/18 06:39, Alban Hertroys wrote:
> > [...]
> > That said, SCHED_ULE (the default scheduler for quite a while now) was 
> > designed with multi-CPU configurations in mind and there are claims that 
> > SCHED_4BSD works better for single-CPU configurations. You may give that a 
> > try, if you're not already on SCHED_4BSD.
> > [...]
> 
> A small, disgruntled community of FreeBSD users who have never seen
> proof that SCHED_ULE is better than SCHED_4BSD in any environment
> continue to regularly recompile with SCHED_4BSD.  I dread the day when
> that becomes impossible, but at least it isn't here yet.  -- George

Indeed 4bsd is better in my case aswell. While for some unknown to me reason
ule performed a bit better in the 10.x series than before, in 11.x
it again is in my case not usable.

Mouse freezes for around half a second with even frequency by just moving
it around in x11. Using 4bsd instead makes the problem go away.
I'm actually very happy that ule became worse again because going
back to 4bsd yet again also gave improved performance from other
dreadfully slow but (to me) still useful programs, like darktable.

With 4bsd, when adjusting shadows and highlights it is possible to see what I
do when moving sliders. With ule it has never been better than waiting
10-20-30 seconds to see where it was able to read a slider position
and update display, when working on images around 10500x10500 greyscale.

It's not single cpu/single core either:
CPU: AMD FX(tm)-6300 Six-Core Processor  (3817.45-MHz K8-class CPU)




-- 
Eivind
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-08 Thread Julian Elischer

On 7/4/18 10:21 pm, Peter wrote:

Julian Elischer wrote:
for a single CPU you really should compile a kernel with SMP turned 
off

and 4BSD scheduler.

ULE is just trying too hard to do stuff you don't need.


Julian,

if we agree on this, I am fine.
(This implies that SCHED_4BSD will *not* be retired for an 
indefinite time!)


There is no reason to retire it.
We implemented a scheduler interface that both schedulers stick to.



I tested yesterday, and SCHED_4BSD doesn't show the annoying behaviour.
SMP seems to be no problem (and I need that), but PREEMPTION is 
definitely related to the problem (see my other message sent now).


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to 
"freebsd-stable-unsubscr...@freebsd.org"




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


more data: SCHED_ULE+PREEMPTION is the problem (was: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-07 Thread Peter

Hi all,
 in the meantime I did some tests and found the following:


A. The Problem:
---
On a single CPU, there are -exactly- two processes runnable:
One is doing mostly compute without I/O - this can be a compressing job 
or similar; in the tests I used simply an endless-loop. Lets call this 
the "piglet".

The other is doing frequent file reads, but also some compute interim -
this can be a backup job traversing the FS, or a postgres VACUUM, or 
some fast compressor like lz4. Lets call this the "worker".


It then happens that the piglet gets 99% CPU, while the worker gets only 
0.5% CPU and makes nearly no progress at all.


Investigations shows that the worker makes precisely one I/O per 
timeslice (timeslice as defined in kern.sched.quantum) - or two I/O on a 
mirrored ZFS.



B. Findings:

1. Filesystem

I could never reproduce this when reading from plain UFS. Only when 
reading from ZFS (direct or via l2arc).


2. Machine

The problem originally appeared on a pentium3@1GHz. I was able to 
reproduce it on an i5-3570T, given the following measures:

 * config in BIOS to use only one CPU
 * reduce speed: "dev.cpu.0.freq=200"
I did see the problem also when running full speed (which means it 
happens there also), but could not reproduce it well.


3. kern.sched.preempt_thresh

I could make the problem disappear by changing kern.sched.preempt_thresh 
 from the default 80 to either 11 (i5-3570T) or 7 (p3) or smaller. This 
seems to correspond to the disk interrupt threads, which run at intr:12 
(i5-3570T) or intr:8 (p3).


4. dynamic behaviour

Here the piglet is already running as PID=2119. Then we can watch the 
dynamic behaviour as follows (on i5-3570T@200MHz):


a. with kern.sched.preempt_thresh=80

$ lz4 DATABASE_TEST_FILE /dev/null & while true; 

  do ps -o pid,pri,"%cpu",command -p 2119,$! 

  sleep 3 

done 

[1] 6073 

 PID PRI %CPU COMMAND 

6073  20  0.0 lz4 DATABASE_TEST_FILE /dev/null 

2119 100 91.0 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  76 15.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  95 74.5 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  52 19.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  94 71.5 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  52 16.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  95 76.5 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  52 14.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  96 80.0 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  52 12.5 lz4 DATABASE_TEST_FILE /dev/null 

2119  96 82.5 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  74 10.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  98 86.5 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  52  8.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  98 89.0 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  52  7.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  98 90.5 -bash (bash) 

 PID PRI %CPU COMMAND 

6073  52  6.5 lz4 DATABASE_TEST_FILE /dev/null 


2119  99 91.5 -bash (bash)

b. with kern.sched.preempt_thresh=11

 PID PRI %CPU COMMAND 

4920  21  0.0 lz4 DATABASE_TEST_FILE /dev/null 

2119 101 93.5 -bash (bash) 

 PID PRI %CPU COMMAND 

4920  78 20.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  94 70.5 -bash (bash) 

 PID PRI %CPU COMMAND 

4920  82 34.5 lz4 DATABASE_TEST_FILE /dev/null 

2119  88 54.0 -bash (bash) 

 PID PRI %CPU COMMAND 

4920  85 42.5 lz4 DATABASE_TEST_FILE /dev/null 

2119  86 45.0 -bash (bash) 

 PID PRI %CPU COMMAND 

4920  85 43.5 lz4 DATABASE_TEST_FILE /dev/null 

2119  86 44.5 -bash (bash) 

 PID PRI %CPU COMMAND 

4920  85 43.0 lz4 DATABASE_TEST_FILE /dev/null 

2119  85 45.0 -bash (bash) 

 PID PRI %CPU COMMAND 

4920  85 43.0 lz4 DATABASE_TEST_FILE /dev/null 


2119  85 45.5 -bash (bash)

From this we can see that in case b. both processes balance out nicely 
and meet at equal CPU shares.
Whereas in case a., after about 10 Seconds (the first 3 records) they 
move to opposite ends of the scale and stay there.


From this I might suppose that here is some kind of mis-calculation or 
mis-adjustment of the task priorities happening.


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-07 Thread Peter

Julian Elischer wrote:

for a single CPU you really should compile a kernel with SMP turned off
and 4BSD scheduler.

ULE is just trying too hard to do stuff you don't need.


Julian,

if we agree on this, I am fine.
(This implies that SCHED_4BSD will *not* be retired for an indefinite time!)

I tested yesterday, and SCHED_4BSD doesn't show the annoying behaviour.
SMP seems to be no problem (and I need that), but PREEMPTION is 
definitely related to the problem (see my other message sent now).


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-07 Thread Julian Elischer

On 4/4/18 9:32 pm, George Mitchell wrote:

On 04/04/18 06:39, Alban Hertroys wrote:

[...]
That said, SCHED_ULE (the default scheduler for quite a while now) was designed 
with multi-CPU configurations in mind and there are claims that SCHED_4BSD 
works better for single-CPU configurations. You may give that a try, if you're 
not already on SCHED_4BSD.
[...]

A small, disgruntled community of FreeBSD users who have never seen
proof that SCHED_ULE is better than SCHED_4BSD in any environment
continue to regularly recompile with SCHED_4BSD.  I dread the day when
that becomes impossible, but at least it isn't here yet.  -- George

for a single CPU you really should compile a kernel with SMP turned 
off and 4BSD scheduler.


ULE is just trying too hard to do stuff you don't need.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-06 Thread Peter

Eugene Grosbein wrote:


I see no reasons to use SHED_ULE for such single core systems and use SCHED_BSD.


Nitpicking: it is not a single core system, it's a dual that for now is 
equipped with only one chip, the other is in the shelf.


But seriously, I am currently working myself through the design papers
for the SCHED_ULE and the SMP stuff, and I tend to be with You and 
George, in that I do not really need these features.


Nevertheless, I think the system should have proper behaviour *as 
default*, or otherwise there should be a hint in the docs what to do about.
Thats the reason why I raise this issue - if the matter can be fixed, 
thats great, but if we come to the conclusion that 
small/single-core/CPU-bound/whatever systems are better off with 
SCHED_4BSD, then thats perfectly fine as well. Or maybe, that those 
systems should disable preemption? I currently don't know, but i hope we 
can figure this out, as the problem is clearly visible.


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread Eugene Grosbein
04.04.2018 21:16, Peter wrote:

> // With nCPU compute-bound processes running, with SCHED_ULE, any other
> // process that is interactive (which to me means frequently waiting for
> // I/O) gets ABYSMAL performance -- over an order of magnitude worse
> // than it gets with SCHED_4BSD under the same conditions. --
> https://lists.freebsd.org/pipermail/freebsd-stable/2011-December/064984.html
> 
> And this describes quite exactly what I perceive.
> Now, I would like to ask: what has been done about this issue?

I see no reasons to use SHED_ULE for such single core systems and use SCHED_BSD.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread Peter

Andriy Gapon wrote:

Not everyone has a postgres server and a suitable database.
Could you please devise a test scenario that demonstrates the problem and that
anyone could run?



Alright, simple things first: I can reproduce the effect without 
postgres, with regular commands. I run this on my database file:


# lz4 2058067.1 /dev/null

And have this as throughput:

poolalloc   free   read  write   read  write
cache   -  -  -  -  -  -
  ada1s47.08G  10.9G889  0  7.07M  42.3K

  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
51298 root870 16184K  7912K RUN  1:00  51.60% lz4

I start the piglet:

$ while true; do :; done

And, same effect:

poolalloc   free   read  write   read  write
cache   -  -  -  -  -  -
  ada1s47.08G  10.9G 10  0  82.0K  0

  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
 1911 admin   980  7044K  2860K RUN 65:48  89.22% bash
51298 root520 16184K  7880K RUN  0:05   0.59% lz4


It does *not* happen with plain "cat" instead of "lz4".

What may or may not have an influence on it: the respective filesystem 
is block=8k, and is 100% resident in l2arc.


What is also interesting: I started trying this with "tar" (no effect,
behaves properly), then with "tar --lz4". In the latter case "tar" 
starts "lz4" as a sub-process, so we have three processes in the play - 
and in that case the effect happens, but to lesser extent: about 75 I/Os 
per second.
So, it seems quite clear that this has something to do with the logic 
inside the scheduler.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread Peter

Andriy Gapon wrote:

On 04/04/2018 03:52, Peter wrote:

Lets run an I/O-active task, e.g, postgres VACUUM that would
continuousely read from big files (while doing compute as well [1]):


Not everyone has a postgres server and a suitable database.
Could you please devise a test scenario that demonstrates the problem and that
anyone could run?



Andriy,

and maybe nobody anymore has such old system that is CPU-bound instead 
of IO-bound. I'd rather think about reproducing it on my IvyBridge.


I know for sure that it is *not* specifically dependent on postgres.
What I posted was the case when an endless-loop piglet starves a 
postgres VACUUM - and there we see a very pronounced effect of almost

factor 100.
When I first clearly discovered it (after a long time of belly-feeling 
that something behaves strange), it was postgres pg_dump (which does 
compression, i.e. CPU-bound) as the piglet starving an bacula-fd

backup that would scan the filesystem.

So, there is a general rule: we have one process that is a CPU-hog, and 
another process that does periodic I/O (but also *some* compute). and 
-important!- nothing else.


If we understand the logic of the scheduler, that information should 
already suit for some logical verification *eg* - but I will see if I 
get it reprocuved on the IvyBridge machine and/or see if I get a 
testcase together. May take a while.


P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread Andriy Gapon
On 04/04/2018 03:52, Peter wrote:
> Lets run an I/O-active task, e.g, postgres VACUUM that would
> continuousely read from big files (while doing compute as well [1]):

Not everyone has a postgres server and a suitable database.
Could you please devise a test scenario that demonstrates the problem and that
anyone could run?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread George Mitchell
On 04/04/18 10:34, Peter wrote:
> [...] It does not make sense to me if now we state that
> we cannot do it anymore because single-CPU is uncommon today.
> [...]
+1.-- George



signature.asc
Description: OpenPGP digital signature


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread Peter

Hi Alban!

Alban Hertroys wrote:

Occasionally I noticed that the system would not quickly process the
tasks i need done, but instead prefer other, longrunning tasks. I
figured it must be related to the scheduler, and decided it hates me.


If it hated you, it would behave much worse.


Thats encouraging :)  But I would say, running a job 100 times slower 
than expected is quite an amount of hate for my taste.



A closer look shows the behaviour as follows (single CPU):


A single CPU? That's becoming rare! Is that a VM? Old hardware? Something 
really specific?


I don't plug in another CPU because there is no need to. Yes, its old 
hardware:

CPU: Intel Pentium III (945.02-MHz 686-class CPU)
ACPI APIC Table: 

If I had bought new hardware, this one would now rot in Africa, and I 
would have new hardware idling along that is spectre/meltdown affected 
nevertheless.



Lets run an I/O-active task, e.g, postgres VACUUM that would


And you're running a multi-process database server on it no less. That > is 
going to hurt,


I'm running a lot more than only that on it. But it's all private use, 
idling most of the time.



no matter how well the scheduler works.


Maybe. But this post is not about my personal expectations on over-all 
performance - it is about a specific behaviour that is not how a 
scheduler is expected to behave - no matter if we're on a PDP-11 or on a 
KabyLake.



Now, as usual, the "root-cause" questions arise: What exactly does
this "quantum"? Is this solution a workaround, i.e. actually something
else is wrong, and has it tradeoff in other situations? Or otherwise,
why is such a default value chosen, which appears to be ill-deceived?

The docs for the quantum parameter are a bit unsatisfying - they say
its the max num of ticks a process gets - and what happens when
they're exhausted? If by default the endless loop is actually allowed
to continue running for 94k ticks (or 94ms, more likely) uninterrupted,
then that explains the perceived behaviour - buts thats certainly not
what a scheduler should do when other procs are ready to run.


I can answer this from the operating systems course I followed recently. This 
does not apply to FreeBSD specifically, it is general job scheduling theory. I 
still need to read up on SCHED_ULE to see how the details were implemented 
there. Or are you using the older SCHED_4BSD?


I'm using the default scheduler, which is ULE. I would not go 
non-default without reason. (But it seems, a reason is just appering now.)



Now, that would cause a much worse situation in your example case. The endless 
loop would keep running once it gets the CPU and would never release it. No 
other process would ever get a turn again. You wouldn't even be able to get 
into such a system in that state using remote ssh.

That is why the scheduler has this "quantum", which limits the maximum time the 
CPU will be assigned to a specific job. Once the quantum has expired (with the job 
unfinished), the scheduler removes the job from the CPU, puts it back on the ready queue 
and assigns the next job from that queue to the CPU.
That's why you seem to get better performance with a smaller value for the 
quantum; the endless loop gets forcibly interrupted more often.


Good description. Only my (old-fashioned) understanding was that this is 
the purpose of the HZ value: to give control back to the kernel, so that 
a new decision can be made.
So, I would not have been surpized to see 200 I/Os for postgres 
(kern.hz=200), but what I see is 9 I/Os (which indeed figures to a 
"quantum" of 94ms).


But then, we were able to do all this nicely on single-CPU machines for 
almost four decades. It does not make sense to me if now we state that 
we cannot do it anymore because single-CPU is uncommon today.
(Yes indeed, we also cannot fly to the moon anymore, because today 
nobody seems to recall how that stuff was built. *headbangwall*)



This changing of the active job however, involves a context switch for the CPU. 
Memory, registers, file handles, etc. that were required by the previous job 
needs to be put aside and replaced by any such resources related to the new job 
to be run. That uses up time and does nothing to progress the jobs that are 
waiting for the CPU. Hence, you don't want the quantum to be too small either, 
or you'll end up spending significant time switching contexts.


Yepp. My understanding was that I can influence this behaviour via the 
HZ value, so to tradeoff responsiveness against performance. Obviousely 
that was wrong.

From Your writing, it seems the "quantum" is indeed the correct place
to tune this. (But I will still have to ponder a while about the knob 
mentioned by Stefan, concerning preemption, which seems to magically 
resolve the issue.)



That said, SCHED_ULE (the default scheduler for quite a while now) was designed 
with multi-CPU configurations in mind and there are claims that SCHED_4BSD 
works better for single-CPU configurations. You may 

Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread Peter

George Mitchell wrote:

On 04/04/18 06:39, Alban Hertroys wrote:

[...]
That said, SCHED_ULE (the default scheduler for quite a while now) was designed 
with multi-CPU configurations in mind and there are claims that SCHED_4BSD 
works better for single-CPU configurations. You may give that a try, if you're 
not already on SCHED_4BSD.
[...]


A small, disgruntled community of FreeBSD users who have never seen
proof that SCHED_ULE is better than SCHED_4BSD in any environment
continue to regularly recompile with SCHED_4BSD.  I dread the day when
that becomes impossible, but at least it isn't here yet.  -- George



Yes *laugh*, I found a very lengthy and mind-boggling discussion from 
back in 2011. And I found that You made this statement somewhere there:


// With nCPU compute-bound processes running, with SCHED_ULE, any other
// process that is interactive (which to me means frequently waiting for
// I/O) gets ABYSMAL performance -- over an order of magnitude worse
// than it gets with SCHED_4BSD under the same conditions. --
https://lists.freebsd.org/pipermail/freebsd-stable/2011-December/064984.html

And this describes quite exactly what I perceive.
Now, I would like to ask: what has been done about this issue?

P.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread George Mitchell
On 04/04/18 06:39, Alban Hertroys wrote:
> [...]
> That said, SCHED_ULE (the default scheduler for quite a while now) was 
> designed with multi-CPU configurations in mind and there are claims that 
> SCHED_4BSD works better for single-CPU configurations. You may give that a 
> try, if you're not already on SCHED_4BSD.
> [...]

A small, disgruntled community of FreeBSD users who have never seen
proof that SCHED_ULE is better than SCHED_4BSD in any environment
continue to regularly recompile with SCHED_4BSD.  I dread the day when
that becomes impossible, but at least it isn't here yet.  -- George



signature.asc
Description: OpenPGP digital signature


Try setting kern.sched.preempt_thresh != 0 (was: Re: kern.sched.quantum: Creepy, sadistic scheduler)

2018-04-04 Thread Stefan Esser
Am 04.04.18 um 12:39 schrieb Alban Hertroys:
> 
>> On 4 Apr 2018, at 2:52, Peter  wrote:
>>
>> Occasionally I noticed that the system would not quickly process the
>> tasks i need done, but instead prefer other, longrunning tasks. I
>> figured it must be related to the scheduler, and decided it hates me.
> 
> If it hated you, it would behave much worse.
> 
>> A closer look shows the behaviour as follows (single CPU):
> 
> A single CPU? That's becoming rare! Is that a VM? Old hardware? Something 
> really specific?
> 
>> Lets run an I/O-active task, e.g, postgres VACUUM that would
> 
> And you're running a multi-process database server on it no less. That is 
> going to hurt, no matter how well the scheduler works.
> 
>> continuousely read from big files (while doing compute as well [1]):
>>> poolalloc   free   read  write   read  write
>>> cache   -  -  -  -  -  -
>>>  ada1s47.08G  10.9G  1.58K  0  12.9M  0
>>
>> Now start an endless loop:
>> # while true; do :; done
>>
>> And the effect is:
>>> poolalloc   free   read  write   read  write
>>> cache   -  -  -  -  -  -
>>>  ada1s47.08G  10.9G  9  0  76.8K  0
>>
>> The VACUUM gets almost stuck! This figures with WCPU in "top":
>>
>>>  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
>>> 85583 root990  7044K  1944K RUN  1:06  92.21% bash
>>> 53005 pgsql   520   620M 91856K RUN  5:47   0.50% postgres
>>
>> Hacking on kern.sched.quantum makes it quite a bit better:
>> # sysctl kern.sched.quantum=1
>> kern.sched.quantum: 94488 -> 7874
>>
>>> poolalloc   free   read  write   read  write
>>> cache   -  -  -  -  -  -
>>>  ada1s47.08G  10.9G395  0  3.12M  0
>>
>>>  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
>>> 85583 root940  7044K  1944K RUN  4:13  70.80% bash
>>> 53005 pgsql   520   276M 91856K RUN  5:52  11.83% postgres
>>
>>
>> Now, as usual, the "root-cause" questions arise: What exactly does
>> this "quantum"? Is this solution a workaround, i.e. actually something
>> else is wrong, and has it tradeoff in other situations? Or otherwise,
>> why is such a default value chosen, which appears to be ill-deceived?
>>
>> The docs for the quantum parameter are a bit unsatisfying - they say
>> its the max num of ticks a process gets - and what happens when
>> they're exhausted? If by default the endless loop is actually allowed
>> to continue running for 94k ticks (or 94ms, more likely) uninterrupted,
>> then that explains the perceived behaviour - buts thats certainly not
>> what a scheduler should do when other procs are ready to run.
> 
> I can answer this from the operating systems course I followed recently. This 
> does not apply to FreeBSD specifically, it is general job scheduling theory. 
> I still need to read up on SCHED_ULE to see how the details were implemented 
> there. Or are you using the older SCHED_4BSD?
> Anyway...
> 
> Jobs that are ready to run are collected on a ready queue. Since you have a 
> single CPU, there can only be a single job active on the CPU. When that job 
> is finished, the scheduler takes the next job in the ready queue and assigns 
> it to the CPU, etc.

I'm guessing that the problem is caused by kern.sched.preempt_thresh=0, which
prevents preemption of low priority processes by interactive or I/O bound
processes.

For a quick test try:

# sysctl kern.sched.preempt_thresh=1

to see whether it makes a difference. The value 1 is unreasonably low, but it
has the most visible effect in that any higher priority process can steal the
CPU from any lower priority one (high priority corresponds to low PRI values
as displayed by ps -l or top).

Reasonable values might be in the range of 80 to 224 depending on the system
usage scenario (that's what I found to have been suggested in the mail-lists).

Higher values result in less preemption.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kern.sched.quantum: Creepy, sadistic scheduler

2018-04-04 Thread Alban Hertroys

> On 4 Apr 2018, at 2:52, Peter  wrote:
> 
> Occasionally I noticed that the system would not quickly process the
> tasks i need done, but instead prefer other, longrunning tasks. I
> figured it must be related to the scheduler, and decided it hates me.

If it hated you, it would behave much worse.

> A closer look shows the behaviour as follows (single CPU):

A single CPU? That's becoming rare! Is that a VM? Old hardware? Something 
really specific?

> Lets run an I/O-active task, e.g, postgres VACUUM that would

And you're running a multi-process database server on it no less. That is going 
to hurt, no matter how well the scheduler works.

> continuousely read from big files (while doing compute as well [1]):
> >poolalloc   free   read  write   read  write
> >cache   -  -  -  -  -  -
> >  ada1s47.08G  10.9G  1.58K  0  12.9M  0
> 
> Now start an endless loop:
> # while true; do :; done
> 
> And the effect is:
> >poolalloc   free   read  write   read  write
> >cache   -  -  -  -  -  -
> >  ada1s47.08G  10.9G  9  0  76.8K  0
> 
> The VACUUM gets almost stuck! This figures with WCPU in "top":
> 
> >  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
> >85583 root990  7044K  1944K RUN  1:06  92.21% bash
> >53005 pgsql   520   620M 91856K RUN  5:47   0.50% postgres
> 
> Hacking on kern.sched.quantum makes it quite a bit better:
> # sysctl kern.sched.quantum=1
> kern.sched.quantum: 94488 -> 7874
> 
> >poolalloc   free   read  write   read  write
> >cache   -  -  -  -  -  -
> >  ada1s47.08G  10.9G395  0  3.12M  0
> 
> >  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
> >85583 root940  7044K  1944K RUN  4:13  70.80% bash
> >53005 pgsql   520   276M 91856K RUN  5:52  11.83% postgres
> 
> 
> Now, as usual, the "root-cause" questions arise: What exactly does
> this "quantum"? Is this solution a workaround, i.e. actually something
> else is wrong, and has it tradeoff in other situations? Or otherwise,
> why is such a default value chosen, which appears to be ill-deceived?
> 
> The docs for the quantum parameter are a bit unsatisfying - they say
> its the max num of ticks a process gets - and what happens when
> they're exhausted? If by default the endless loop is actually allowed
> to continue running for 94k ticks (or 94ms, more likely) uninterrupted,
> then that explains the perceived behaviour - buts thats certainly not
> what a scheduler should do when other procs are ready to run.

I can answer this from the operating systems course I followed recently. This 
does not apply to FreeBSD specifically, it is general job scheduling theory. I 
still need to read up on SCHED_ULE to see how the details were implemented 
there. Or are you using the older SCHED_4BSD?
Anyway...

Jobs that are ready to run are collected on a ready queue. Since you have a 
single CPU, there can only be a single job active on the CPU. When that job is 
finished, the scheduler takes the next job in the ready queue and assigns it to 
the CPU, etc.

Now, that would cause a much worse situation in your example case. The endless 
loop would keep running once it gets the CPU and would never release it. No 
other process would ever get a turn again. You wouldn't even be able to get 
into such a system in that state using remote ssh.

That is why the scheduler has this "quantum", which limits the maximum time the 
CPU will be assigned to a specific job. Once the quantum has expired (with the 
job unfinished), the scheduler removes the job from the CPU, puts it back on 
the ready queue and assigns the next job from that queue to the CPU.
That's why you seem to get better performance with a smaller value for the 
quantum; the endless loop gets forcibly interrupted more often.

This changing of the active job however, involves a context switch for the CPU. 
Memory, registers, file handles, etc. that were required by the previous job 
needs to be put aside and replaced by any such resources related to the new job 
to be run. That uses up time and does nothing to progress the jobs that are 
waiting for the CPU. Hence, you don't want the quantum to be too small either, 
or you'll end up spending significant time switching contexts. That gets worse 
when the job involves system calls, which are handled by the kernel, which is 
also a process that needs to be switched (and Meltdown made that worse, because 
more rigorous clean-up is necessary to prevent peeks into sections of memory 
that were owned by the kernel process previously).

The "correct" value for the quantum depends on your type of workload. 
PostgreSQL's auto-vacuum is a typical background process that will probably (I 
didn't verify) request to be run at a lower priority, giving other, more 
important, jobs more chance 

kern.sched.quantum: Creepy, sadistic scheduler

2018-04-03 Thread Peter

Occasionally I noticed that the system would not quickly process the
tasks i need done, but instead prefer other, longrunning tasks. I
figured it must be related to the scheduler, and decided it hates me.


A closer look shows the behaviour as follows (single CPU):

Lets run an I/O-active task, e.g, postgres VACUUM that would
continuousely read from big files (while doing compute as well [1]):
>poolalloc   free   read  write   read  write
>cache   -  -  -  -  -  -
>  ada1s47.08G  10.9G  1.58K  0  12.9M  0

Now start an endless loop:
# while true; do :; done

And the effect is:
>poolalloc   free   read  write   read  write
>cache   -  -  -  -  -  -
>  ada1s47.08G  10.9G  9  0  76.8K  0

The VACUUM gets almost stuck! This figures with WCPU in "top":

>  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
>85583 root990  7044K  1944K RUN  1:06  92.21% bash
>53005 pgsql   520   620M 91856K RUN  5:47   0.50% postgres

Hacking on kern.sched.quantum makes it quite a bit better:
# sysctl kern.sched.quantum=1
kern.sched.quantum: 94488 -> 7874

>poolalloc   free   read  write   read  write
>cache   -  -  -  -  -  -
>  ada1s47.08G  10.9G395  0  3.12M  0

>  PID USERNAME   PRI NICE   SIZERES STATETIMEWCPU COMMAND
>85583 root940  7044K  1944K RUN  4:13  70.80% bash
>53005 pgsql   520   276M 91856K RUN  5:52  11.83% postgres


Now, as usual, the "root-cause" questions arise: What exactly does
this "quantum"? Is this solution a workaround, i.e. actually something
else is wrong, and has it tradeoff in other situations? Or otherwise,
why is such a default value chosen, which appears to be ill-deceived?

The docs for the quantum parameter are a bit unsatisfying - they say
its the max num of ticks a process gets - and what happens when
they're exhausted? If by default the endless loop is actually allowed
to continue running for 94k ticks (or 94ms, more likely) uninterrupted,
then that explains the perceived behaviour - buts thats certainly not
what a scheduler should do when other procs are ready to run.

11.1-RELEASE-p7, kern.hz=200. Switching tickless mode on or off does
not influence the matter. Starting the endless loop with "nice" does
not influence the matter.


[1]
A pure-I/O job without compute load, like "dd", does not show
this behaviour. Also, when other tasks are running, the unjust
behaviour is not so stongly pronounced.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"