Propose New Linux Scheduling Policies

2019-07-12 Thread Mitchell Erblich



Proposed new Linux Scheduling PoliciesJuly 
2019

Version 0.9

 

 

./uapi/linux/sched.h

 

#define SCHED_INTERACTIVE7

#defineSCHED_UNEQUAL8

 

Each of these new scheduling policies have been implemented in one or more 
other UNIX operating systems in the past and currently to support a new 
scheduling behaviour. Isn’t ist about time that Linux supports these in the 
mainline OS.

 

SCHED_INTERACTIVE is meant to be used for tasks that are infrequently runnable, 
but once runnable should run with an absolute minimum latency. An example of 
this is the mouse icon. Currently this behaviour is determined based on past 
running behaviour where a minimum percentage of use of its runnable time has 
occurred and thus can loose its interactive as its runs. Common sense says that 
this type of process should always be interactive.

 

SCHED_UNEQUAL is meant to support an unfair scheduling policy for tasks that 
SHOULD run infrequently buy when runnable, SHOULD run to completion. This is 
more of a scalable flag, where the number of runnable tasks may increase, thus 
limiting its runnable time within a specific time window and/or the objects 
that the task is dealing with may increase that almost two times or more 
executing cycles are necessary for the task to run to completion. An example of 
this is a routing task that processes Link State Advertisements (LSAs) so a 
change in the routing table can be accomplished within a minimum of time. 
Without doing this network routing loops and black holes can exist for short 
periods of time loosing and/or increasing unnecessary control and data packet 
exchange.

 

Manual pages: Any manual page such as SCHED_SETATTR(2)  SHOULD be updated to 
acknowledge these new scheduling policies.

 

Applications: Tasks that display output about tasks / scheduling minimally need 
to be recompliled and be made aware of these new scheduling policies.

 

Inheritance: It is not believed that  these new policies SHOULD be inherited at 
time of a clone(2) / fork().

 

Weighting: Currently the suggested / proposed change is to implement a 
simplified differential weighting policy when SCHED_UNEQUAL is specified. A 
possible future change COULD support multiple UNEQUAL scheduling policies by 
combining an existing metric. This is for a later proposed change sometime 
after this proposed change is accepted.

 

Security / Denial Of Service (DoS) issues:

SCHED_INTERACTIVE:  Code must be implemented to prevent repeatedly  
inserting SCHED_INTERACTIVE tasks onto cpu cores and to deny other tasks from 
executing.

 

SCHED_UNEQUAL: Currently only when multiple tasks are runnable and 
consume the entire window of cpu cycles, by running this task two or three 
times in a row SHOULD on the worse case add less than 8% cpu cycles.

 

To mitigate any scheduling disturbances setting these flags should 
be limited to an administrator aka root type user with specific capabilities.

 

TBDs: If Linux’s kernel.org sees that this feature is wanted in the near and/or 
distant future, then “UNKNOWNs / TBDs” or issues should be listed here.

 

This notice is not a proposal for kernel.org code integration at this time, but 
is intended for Copyright Notice and a general consensus whether this feature 
is beneficial for its future inclusion into Linux’s kernel.org.

 

Copyright Notice:

While this two page introduction on these new scheduling policies is being 
proposed, it is actually a minimal awareness document that should satisfy that 
a code change has been implemented within public domain source in the past and 
a determination whether this feature is wanted by kernel.org.

 

While the actual implementation code that uses these flags is proprietary, as 
most OS code is modified, it is coded / changed by multiple engineers and 
without general acceptance that this change is accepted, anything beyond an 
inception / concept proposal to specify new scheduling policy flags kernel.org 
is unnecessary at this time.

 

Warantee:

Usage etc, and incorporation of minimally tested / unknown / untested code is 
“AS-IS” with no warranties suggested or implied.

 

Reference for a past scheduler:

SunOS within the SVR4.x used SCHED_INTERACTIVE

 

VMware and others: Support a weighted process scheduler to allow UNEQUAL 
scheduling.

 

 

Mitchell Erblich
UNIX Kernel Engineer Developer


Mitchell Erblich
erbli...@earthlink.net





mm/page_alloc.c : Intent to clone zone_watermark_ok()

2015-07-22 Thread Mitchell Erblich
Group,

This is a notification that local changes are believed to be 
needed to the zone_watermark_ok() : mm/page_alloc.c for a family of embedded 
devices.

The embedded device has no secondary storage thus cleaning 
dirty pages are not available and crossing below the min levels is unwarranted.

This function seems to contain the necessary logic with minor 
modifications and thus the intent is to duplicate most of its functionality.

The local function is create a new family of 
zone_watermark_num() based on the family of zone_watermark_ok()  that now 
returns an int, number of free pages.

The number of free_pages, where a positive number is num of 
free pages is above the watermark at the order, etc and negative below.

Please inform me whether the equivalent function is already 
present in what release.

Thank you,
Mitchell Erblich
OS Engineer


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mm/page_alloc.c : Intent to clone zone_watermark_ok()

2015-07-22 Thread Mitchell Erblich
Group,

This is a notification that local changes are believed to be 
needed to the zone_watermark_ok() : mm/page_alloc.c for a family of embedded 
devices.

The embedded device has no secondary storage thus cleaning 
dirty pages are not available and crossing below the min levels is unwarranted.

This function seems to contain the necessary logic with minor 
modifications and thus the intent is to duplicate most of its functionality.

The local function is create a new family of 
zone_watermark_num() based on the family of zone_watermark_ok()  that now 
returns an int, number of free pages.

The number of free_pages, where a positive number is num of 
free pages is above the watermark at the order, etc and negative below.

Please inform me whether the equivalent function is already 
present in what release.

Thank you,
Mitchell Erblich
OS Engineer


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux Kernel Scheduling Addition Notification : Hybrid Sleepers and Unfair scheduling

2015-03-18 Thread Mitchell Erblich
Group,

As a contractor or employee, the code that I write while being 
employed by them is owned by the company that I work for.

It is then up to them / legal / management, etc  whether they 
offer that code implementation to kernel.org or a ISP,  insert it into their 
release,  or whatever….

My notification is to say here is a minimal TOI, explain 
additionally where necessary, that I never saw a patch offered,  it exists, and 
I would be willing to repeat the code with a different implementation for FREE 
as an individual.

If the code is within a networking area, then maybe a 
simple Request For Enhancement (RFE) that explains functionality, probably some 
minimal API, but the direction is NOT to offer an implementation of code.

 Thus  is just a legal statement that needs repeating each 
and every time that make my offer.

How rude is that?

  Thus, Mike ,, your statement is totally uncalled for, 
inappropriate… and that being behind a email address does not excuse that.

Mitchell Erblich
Kernel Engineer



On Mar 18, 2015, at 8:59 PM, Mike Galbraith  wrote:

> On Wed, 2015-03-18 at 20:43 -0700, Mitchell Erblich wrote:
> 
>> This proposal was ONLY to resolve the legal issue with public domain
>> code of notification when a patch was not offered.…
> 
> Ah, so completely off topic here.. but then you knew that.  How rude.
> 
>   -Mike
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux Kernel Scheduling Addition Notification : Hybrid Sleepers and Unfair scheduling

2015-03-18 Thread Mitchell Erblich

On Mar 18, 2015, at 7:38 PM, Mike Galbraith  wrote:

> On Wed, 2015-03-18 at 16:25 -0700, Mitchell Erblich wrote:
> 
>> 
>> SCHED_IA
>>  Over 10 years ago, System V Release 4 was enhanced with additional
>> features by Sun Microsystems. One of the more minor extensions dealt
>> with the subdivision of process’s scheduling characteristics and was
>> known as he INTERACTIVE /IA scheduling class. This scheduling class
>> was targeted to frequent sleepers, with the mouse icon being one the
>> first processes/tasks..
>> 
>>  Linux has no explicit SCHED_IA scheduling policy, but does alter
>> scheduling characteristics based on some sleep behavior (example:
>> GENTLE_FAIR_SLEEPERS) within the fair share scheduling configuration
>> option.
> 
> That's about fairness, it levels the playing field sleepers vs hogs.
> 
>> Processes / tasks that are CPU bound that fit into a SLEEPER behavior
>> can have a hybrid behavior over time where during any one scheduling
>> period, it may consume its variable length allocated time. This can
>> alter its expected short latency to be current / ONPROC. To simplify
>> the implementation, it is suggested that SCHED_IA be a sub scheduling
>> policy of SCHED_NORMAL. Shouldn’t an administrator be able to classify
>> that the NORMAL long term behavior of a task, be one as a FIXED
>> sleeper?
> 
> Nope, we definitely don't want a SCHED_IA class.
> 
> Your box can't tell if you're interacting or not even if you explicitly
> define something as 'interactive'.  If I stare in fascination at an
> eye-candy screen-saver, it becomes 'interactive'.  If I'm twiddling my
> thumbs waiting for a kbuild whatever other CPU intensive job to finish,
> that job becomes the 'interactive' thing of import.  The last thing I
> want is to have to squabble with every crack-head programmer on the
> planet who thinks his/her cool app should get special treatment.
> 
> You can’t get there from here.

This proposal was ONLY to resolve the legal issue with public domain code of 
notification when a patch was not offered.…

Any “crack-head programmer”  can still set if he has the CAPABILITY to 
change the scheduling policy to a RT-FIFO or RT-RR, to give the app special 
treatment… So, this proposal does not mitigate or change that treatment, 
assuming that the same CAPABILITY is checked.

POSIX ONLY specifies RT tasks as creating a level of unfairness, where 
in some APPLICATION SPECIFIC uses of Linux, need to execute an infrequently 
executed more than what is currently FAIR. 

So, the Linux scheduler ALREADY determines if a task slept during a 
time window and if it DIDN’T, it is penalized versus the tasks that did sleep.  
 It is effectively a dynamic scheduling behavior, where sometimes your 
interactive task did not fully sleep. Thus, why not still treat it as a IA 
task, because of the nature/functionality of the task??? 

 If could be generating tones/music through the audio/speaker driver of 
the system and you want a consistent minimal latency,  else you generate warble 
on your music.  Thus, if an admin/task KNOWS that a task is essentially a 
INTERACTIVE, aka a mouse icon driver, audio driver, etc that if a admin or 
startup script has the CAPABILITY, then he can set to make sure that the 
INTERACTIVE task ALWAYS/CONSISTENTLY is treated as an INTERACTIVE task.

Doing this change allows one or more tasks to BETTER behave the 
same independently of the number of tasks that are being scheduled during the 
time window and the number of CPUs/cores without any other special scheduling.

This was a SVR4.x feature within the SunOS kernel and a number of other 
SVR4.x UNIX Kernels, that could be set via priocntl(1) (Linux does not support) 
or a start script.

>> 
>>  Thus, the first Proposal is to explicitly support the SCHED_IA
>> scheduling policy within the Linux kernel. After kernel support, any
>> application that has the same functionality as priocntl(1) then needs
>> to be altered to support this new scheduling policy.
>> 
>> 
>> Note: Administrator in this context should be a task with a UID, EUID,
>> GID, EGID, etc, that has the proper CAPABILITY to alter scheduling
>> behavior.
> 
>> 
>> SCHED_UNFAIR
> 
> Snip.. we already have truckloads of bandwidth control.
> 
>   -Mike
> 

This is NOT bandwidth control..  Allocates an increase in its time 
slice versus other tasks…

SCHED_UNFAIR pertains to a task that can sometimes run and consume a 
UN-fair number of CPU cycles. With the routing protocol OSPFv2/v3, a well known 
Link state task is specified by function, but regular tasks that INFREQUENTLY 
execute like a file system file check(fsck

Linux Kernel Scheduling Addition Notification : Hybrid Sleepers and Unfair scheduling

2015-03-18 Thread Mitchell Erblich
Please note that this proposal is from this engineer and not from the company 
he works for.

This SHOULD also fulfills any legal notification of work done, but not 
submitted to the Linux Kernel.


Transfer of Information  : Notification & Proposal of Feasibility to Support 
System V Release 4 Defacto Standard Scheduling Extensions, etc within Linux
——-


SCHED_IA
Over 10 years ago, System V Release 4 was enhanced with additional 
features by Sun Microsystems. One of the more minor extensions dealt with the 
subdivision of process’s scheduling characteristics and was known as he 
INTERACTIVE /IA scheduling class. This scheduling class was targeted to 
frequent sleepers, with the mouse icon being one the first processes/tasks..

Linux has no explicit SCHED_IA scheduling policy, but does alter 
scheduling characteristics based on some sleep behavior (example: 
GENTLE_FAIR_SLEEPERS) within the fair share scheduling configuration option. 
Processes / tasks that are CPU bound that fit into a SLEEPER behavior can have 
a hybrid behavior over time where during any one scheduling period, it may 
consume its variable length allocated time. This can alter its expected short 
latency to be current / ONPROC. To simplify the implementation, it is suggested 
that SCHED_IA be a sub scheduling policy of SCHED_NORMAL. Shouldn’t an 
administrator be able to classify that the NORMAL long term behavior of a task, 
be one as a FIXED sleeper? 

Thus, the first Proposal is to explicitly support the SCHED_IA 
scheduling policy within the Linux kernel. After kernel support, any 
application that has the same functionality as priocntl(1) then needs to be 
altered to support this new scheduling policy.


Note: Administrator in this context should be a task with a UID, EUID, GID, 
EGID, etc, that has the proper CAPABILITY to alter scheduling behavior.


SCHED_UNFAIR
UNIX / Linux scheduling has in the most part attempts to 
achieve some level of process / task scheduling fairness within the Linux 
scheduler using the fair share scheduling class. Exceptions do exist, but are 
not being discussed below. In general this type of scheduling is acceptable in 
a generic implementation, but has weaknesses when UNIX / Linux is moved into a 
different environment. Many companies use UNIX / Linux in heavy networking 
environments where one or more tasks can infrequently attempt to consume more 
than its fair share in a window scheduling period.

This proposal is to acknowledge that “nice” and a few other 
scheduling workarounds do no always suffice to allow this temporary inequality 
to exist. Yes, a cpumask could be set that allows only certain tasks to run on 
specific nodes, however the implied assumption is that they only infrequently 
need to have inequality / greater “time slice” than their fair share. A network 
protocol task example is a convergence task that needs to run until it is 
finished and until that happens needed routing changes will not occur. The time 
window in which all tasks need to be run, SHOULD not need this task in 
consecutive time windows, thus over longer periods of time, it still fulfills 
the fair scheduling policy. Again the proper CAPABILITY needs to be specified 
with the priocntl(1) like application as running many tasks per CPU COULD then 
effect the performance of the system.

Thus, explicit support for a new SCHED_UNFAIR scheduling policy 
is proposed / suggested within the Linux kernel. Again it can be a sub 
scheduling policy of SCHED_NORMAL.

If there is an expressed need / want for this type of 
functionality to be patched into a git, please inform this engineer if any 
additional information is to be provided since this minimal TOI document is 
more architecture / enhancement based and does not deal into any details as to 
a possible implementation of the above additional functionality.

Thank you,
Mitchell Erblich
UNIX Kernel Engineer--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux Kernel Scheduling Addition Notification : Hybrid Sleepers and Unfair scheduling

2015-03-18 Thread Mitchell Erblich
Please note that this proposal is from this engineer and not from the company 
he works for.

This SHOULD also fulfills any legal notification of work done, but not 
submitted to the Linux Kernel.


Transfer of Information  : Notification  Proposal of Feasibility to Support 
System V Release 4 Defacto Standard Scheduling Extensions, etc within Linux
——-


SCHED_IA
Over 10 years ago, System V Release 4 was enhanced with additional 
features by Sun Microsystems. One of the more minor extensions dealt with the 
subdivision of process’s scheduling characteristics and was known as he 
INTERACTIVE /IA scheduling class. This scheduling class was targeted to 
frequent sleepers, with the mouse icon being one the first processes/tasks..

Linux has no explicit SCHED_IA scheduling policy, but does alter 
scheduling characteristics based on some sleep behavior (example: 
GENTLE_FAIR_SLEEPERS) within the fair share scheduling configuration option. 
Processes / tasks that are CPU bound that fit into a SLEEPER behavior can have 
a hybrid behavior over time where during any one scheduling period, it may 
consume its variable length allocated time. This can alter its expected short 
latency to be current / ONPROC. To simplify the implementation, it is suggested 
that SCHED_IA be a sub scheduling policy of SCHED_NORMAL. Shouldn’t an 
administrator be able to classify that the NORMAL long term behavior of a task, 
be one as a FIXED sleeper? 

Thus, the first Proposal is to explicitly support the SCHED_IA 
scheduling policy within the Linux kernel. After kernel support, any 
application that has the same functionality as priocntl(1) then needs to be 
altered to support this new scheduling policy.


Note: Administrator in this context should be a task with a UID, EUID, GID, 
EGID, etc, that has the proper CAPABILITY to alter scheduling behavior.


SCHED_UNFAIR
UNIX / Linux scheduling has in the most part attempts to 
achieve some level of process / task scheduling fairness within the Linux 
scheduler using the fair share scheduling class. Exceptions do exist, but are 
not being discussed below. In general this type of scheduling is acceptable in 
a generic implementation, but has weaknesses when UNIX / Linux is moved into a 
different environment. Many companies use UNIX / Linux in heavy networking 
environments where one or more tasks can infrequently attempt to consume more 
than its fair share in a window scheduling period.

This proposal is to acknowledge that “nice” and a few other 
scheduling workarounds do no always suffice to allow this temporary inequality 
to exist. Yes, a cpumask could be set that allows only certain tasks to run on 
specific nodes, however the implied assumption is that they only infrequently 
need to have inequality / greater “time slice” than their fair share. A network 
protocol task example is a convergence task that needs to run until it is 
finished and until that happens needed routing changes will not occur. The time 
window in which all tasks need to be run, SHOULD not need this task in 
consecutive time windows, thus over longer periods of time, it still fulfills 
the fair scheduling policy. Again the proper CAPABILITY needs to be specified 
with the priocntl(1) like application as running many tasks per CPU COULD then 
effect the performance of the system.

Thus, explicit support for a new SCHED_UNFAIR scheduling policy 
is proposed / suggested within the Linux kernel. Again it can be a sub 
scheduling policy of SCHED_NORMAL.

If there is an expressed need / want for this type of 
functionality to be patched into a git, please inform this engineer if any 
additional information is to be provided since this minimal TOI document is 
more architecture / enhancement based and does not deal into any details as to 
a possible implementation of the above additional functionality.

Thank you,
Mitchell Erblich
UNIX Kernel Engineer--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux Kernel Scheduling Addition Notification : Hybrid Sleepers and Unfair scheduling

2015-03-18 Thread Mitchell Erblich

On Mar 18, 2015, at 7:38 PM, Mike Galbraith umgwanakikb...@gmail.com wrote:

 On Wed, 2015-03-18 at 16:25 -0700, Mitchell Erblich wrote:
 
 
 SCHED_IA
  Over 10 years ago, System V Release 4 was enhanced with additional
 features by Sun Microsystems. One of the more minor extensions dealt
 with the subdivision of process’s scheduling characteristics and was
 known as he INTERACTIVE /IA scheduling class. This scheduling class
 was targeted to frequent sleepers, with the mouse icon being one the
 first processes/tasks..
 
  Linux has no explicit SCHED_IA scheduling policy, but does alter
 scheduling characteristics based on some sleep behavior (example:
 GENTLE_FAIR_SLEEPERS) within the fair share scheduling configuration
 option.
 
 That's about fairness, it levels the playing field sleepers vs hogs.
 
 Processes / tasks that are CPU bound that fit into a SLEEPER behavior
 can have a hybrid behavior over time where during any one scheduling
 period, it may consume its variable length allocated time. This can
 alter its expected short latency to be current / ONPROC. To simplify
 the implementation, it is suggested that SCHED_IA be a sub scheduling
 policy of SCHED_NORMAL. Shouldn’t an administrator be able to classify
 that the NORMAL long term behavior of a task, be one as a FIXED
 sleeper?
 
 Nope, we definitely don't want a SCHED_IA class.
 
 Your box can't tell if you're interacting or not even if you explicitly
 define something as 'interactive'.  If I stare in fascination at an
 eye-candy screen-saver, it becomes 'interactive'.  If I'm twiddling my
 thumbs waiting for a kbuild whatever other CPU intensive job to finish,
 that job becomes the 'interactive' thing of import.  The last thing I
 want is to have to squabble with every crack-head programmer on the
 planet who thinks his/her cool app should get special treatment.
 
 You can’t get there from here.

This proposal was ONLY to resolve the legal issue with public domain code of 
notification when a patch was not offered.…

Any “crack-head programmer”  can still set if he has the CAPABILITY to 
change the scheduling policy to a RT-FIFO or RT-RR, to give the app special 
treatment… So, this proposal does not mitigate or change that treatment, 
assuming that the same CAPABILITY is checked.

POSIX ONLY specifies RT tasks as creating a level of unfairness, where 
in some APPLICATION SPECIFIC uses of Linux, need to execute an infrequently 
executed more than what is currently FAIR. 

So, the Linux scheduler ALREADY determines if a task slept during a 
time window and if it DIDN’T, it is penalized versus the tasks that did sleep.  
 It is effectively a dynamic scheduling behavior, where sometimes your 
interactive task did not fully sleep. Thus, why not still treat it as a IA 
task, because of the nature/functionality of the task??? 

 If could be generating tones/music through the audio/speaker driver of 
the system and you want a consistent minimal latency,  else you generate warble 
on your music.  Thus, if an admin/task KNOWS that a task is essentially a 
INTERACTIVE, aka a mouse icon driver, audio driver, etc that if a admin or 
startup script has the CAPABILITY, then he can set to make sure that the 
INTERACTIVE task ALWAYS/CONSISTENTLY is treated as an INTERACTIVE task.

Doing this change allows one or more tasks to BETTER behave the 
same independently of the number of tasks that are being scheduled during the 
time window and the number of CPUs/cores without any other special scheduling.

This was a SVR4.x feature within the SunOS kernel and a number of other 
SVR4.x UNIX Kernels, that could be set via priocntl(1) (Linux does not support) 
or a start script.

 
  Thus, the first Proposal is to explicitly support the SCHED_IA
 scheduling policy within the Linux kernel. After kernel support, any
 application that has the same functionality as priocntl(1) then needs
 to be altered to support this new scheduling policy.
 
 
 Note: Administrator in this context should be a task with a UID, EUID,
 GID, EGID, etc, that has the proper CAPABILITY to alter scheduling
 behavior.
 
 
 SCHED_UNFAIR
 
 Snip.. we already have truckloads of bandwidth control.
 
   -Mike
 

This is NOT bandwidth control..  Allocates an increase in its time 
slice versus other tasks…

SCHED_UNFAIR pertains to a task that can sometimes run and consume a 
UN-fair number of CPU cycles. With the routing protocol OSPFv2/v3, a well known 
Link state task is specified by function, but regular tasks that INFREQUENTLY 
execute like a file system file check(fsck) becomes a boot bottleneck if it is 
actually doing work and isn’t able to do its work without a number of context 
switches. With this functionality more than a 30% or more latency decrease can 
occur, depending on the number of tasks contending for the time window.

Yes, if a TASK_UNFAIR policy always executes in every time

Re: Linux Kernel Scheduling Addition Notification : Hybrid Sleepers and Unfair scheduling

2015-03-18 Thread Mitchell Erblich
Group,

As a contractor or employee, the code that I write while being 
employed by them is owned by the company that I work for.

It is then up to them / legal / management, etc  whether they 
offer that code implementation to kernel.org or a ISP,  insert it into their 
release,  or whatever….

My notification is to say here is a minimal TOI, explain 
additionally where necessary, that I never saw a patch offered,  it exists, and 
I would be willing to repeat the code with a different implementation for FREE 
as an individual.

If the code is within a networking area, then maybe a 
simple Request For Enhancement (RFE) that explains functionality, probably some 
minimal API, but the direction is NOT to offer an implementation of code.

 Thus  is just a legal statement that needs repeating each 
and every time that make my offer.

How rude is that?

  Thus, Mike ,, your statement is totally uncalled for, 
inappropriate… and that being behind a email address does not excuse that.

Mitchell Erblich
Kernel Engineer



On Mar 18, 2015, at 8:59 PM, Mike Galbraith umgwanakikb...@gmail.com wrote:

 On Wed, 2015-03-18 at 20:43 -0700, Mitchell Erblich wrote:
 
 This proposal was ONLY to resolve the legal issue with public domain
 code of notification when a patch was not offered.…
 
 Ah, so completely off topic here.. but then you knew that.  How rude.
 
   -Mike
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


TOI: introducing Split Lists hybrids of doubly linked lists within the Linux Kernel

2015-03-10 Thread Mitchell Erblich
 in this suggestion. For the skeptical, add elements into 
a tree with the values from 1 to 100. Is the tree balanced without doing extra 
work? Dynamic upgrades from the split lists into the skip list should be a 
simple matter and can be configured via a /proc tunable if wanted.  A proper 
split list or skip list should show 16x, 32x, 64x or more higher performance 
finding the location to locate/search, add, or delete an element from the 
currently supported link list.

Of course, all new functions will co-exist and only engineers who are 
comfortable with using the new functions will then possibly see performance 
improvements. Yes, only if these lists are frequently walked (possibly a 
bottleneck)  will the performance improvements be noticeable. 

This engineer is willing to submit an initial prototype of the above 
logic functions to be incorporated within the Linux Kernel if/when there is a 
request to do so.

Sincerely, Mitchell Erblich
Kernel Engineer--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


TOI: introducing Split Lists hybrids of doubly linked lists within the Linux Kernel

2015-03-10 Thread Mitchell Erblich
 in this suggestion. For the skeptical, add elements into 
a tree with the values from 1 to 100. Is the tree balanced without doing extra 
work? Dynamic upgrades from the split lists into the skip list should be a 
simple matter and can be configured via a /proc tunable if wanted.  A proper 
split list or skip list should show 16x, 32x, 64x or more higher performance 
finding the location to locate/search, add, or delete an element from the 
currently supported link list.

Of course, all new functions will co-exist and only engineers who are 
comfortable with using the new functions will then possibly see performance 
improvements. Yes, only if these lists are frequently walked (possibly a 
bottleneck)  will the performance improvements be noticeable. 

This engineer is willing to submit an initial prototype of the above 
logic functions to be incorporated within the Linux Kernel if/when there is a 
request to do so.

Sincerely, Mitchell Erblich
Kernel Engineer--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: maturity and status and attributes, oh my!

2007-08-31 Thread Mitchell Erblich
"Robert P. J. Day" wrote:
> 
>   at the risk of driving everyone here totally bonkers, i'm going to
> take one last shot at explaining what i was thinking of when i first
> proposed this whole "maturity level" thing.  and, just so you know,
> the major reason i'm so cranked up about this is that i'm feeling just
> a little territorial -- i was the one who first started nagging people
> to consider this idea, so i'm a little edgy when i see folks finally
> giving it some serious thought but appearing to get ready to implement
> it entirely incorrectly in a way that's going to ruin it irreparably
> and make it utterly useless.
> 
>   this isn't just about defining a single feature called "maturity".
> it's about defining a general mechanism so that you can add entirely
> new (what i call) "attributes" to kernel features.  one attribute
> could be "maturity", which could take one of a number of possible
> values.  another could be "status", with the same restrictions.
> heck, you could define the attribute "colour", and decide that various
> kernel features could be labelled as (at most) one of "red", "green"
> and "chartreuse."  that's what i mean by an "attribute", and
> attributes would have two critical and non-negotiable properties:
<<< snip>>>>
> 
>   but i hope i've flogged this thoroughly to the point where people
> can see what i'm driving at.  once you see (as in simon's patch) how
> to add the first attribute, it's trivial to simply duplicate that code
> to add as many more as you want.
> 
> rday
> 
> --
> 
> Robert P. J. Day
> Linux Consulting, Training and Annoying Kernel Pedantry
> Waterloo, Ontario, CANADA
> 
> http://crashcourse.ca
> 
Robert Day,

If I can interpret what you are asking about and changing it abit.

Don't you think that Maturity can be defined ALSO, as the 
   number of known bugs and their priority / serverity against a 
   architecture dependent or independent item?

   Would this suffice and wouldn't it be easier to maintain?

   Mitchell Erblich
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: maturity and status and attributes, oh my!

2007-08-31 Thread Mitchell Erblich
Robert P. J. Day wrote:
 
   at the risk of driving everyone here totally bonkers, i'm going to
 take one last shot at explaining what i was thinking of when i first
 proposed this whole maturity level thing.  and, just so you know,
 the major reason i'm so cranked up about this is that i'm feeling just
 a little territorial -- i was the one who first started nagging people
 to consider this idea, so i'm a little edgy when i see folks finally
 giving it some serious thought but appearing to get ready to implement
 it entirely incorrectly in a way that's going to ruin it irreparably
 and make it utterly useless.
 
   this isn't just about defining a single feature called maturity.
 it's about defining a general mechanism so that you can add entirely
 new (what i call) attributes to kernel features.  one attribute
 could be maturity, which could take one of a number of possible
 values.  another could be status, with the same restrictions.
 heck, you could define the attribute colour, and decide that various
 kernel features could be labelled as (at most) one of red, green
 and chartreuse.  that's what i mean by an attribute, and
 attributes would have two critical and non-negotiable properties:
 snip
 
   but i hope i've flogged this thoroughly to the point where people
 can see what i'm driving at.  once you see (as in simon's patch) how
 to add the first attribute, it's trivial to simply duplicate that code
 to add as many more as you want.
 
 rday
 
 --
 
 Robert P. J. Day
 Linux Consulting, Training and Annoying Kernel Pedantry
 Waterloo, Ontario, CANADA
 
 http://crashcourse.ca
 
Robert Day,

If I can interpret what you are asking about and changing it abit.

Don't you think that Maturity can be defined ALSO, as the 
   number of known bugs and their priority / serverity against a 
   architecture dependent or independent item?

   Would this suffice and wouldn't it be easier to maintain?

   Mitchell Erblich
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd _page_from_freelist() : No more no page failures. (WHY????)

2007-08-28 Thread Mitchell Erblich
Nick Piggin wrote:
>
> [EMAIL PROTECTED] wrote:
> > [EMAIL PROTECTED]
> > Sent: Friday, August 24, 2007 3:11 PM
> > Subject: Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd &
> > get_page_from_freelist() : No more no page failures.
> >
> > Mailer added a HTML subpart and chopped the earlier email :^(
>
> Hi Mitchell,
>
> Is it possible to send suggestions in the form of a unified diff, even
> if you haven't even compiled it (just add a note to let people know).
>
> Secondly, we already have a (supposedly working) system of asynch
> reclaim, with buffering and hysteresis. I don't exactly understand
> what problem you think it has that would be solved by rechecking
> watermarks after allocating a page.
>
> When we're in the (min,low) watermark range, we'll wake up kswapd
> _before_ allocating anything, so what is better about the change to
> wake up kswapd after allocating? Can you perhaps come up with an
> example situation also to make this more clear?
>
> Overhead of wakeup_kswapd isn't too much of a problem: if we _should_
> be waking it up when we currently aren't, then we should be calling
> it. However the extra checking in the allocator fastpath is something
> we want to avoid if possible, because this can be a really hot path.
>
> Thanks,
> Nick
>
> --
> SUSE Labs, Novell Inc.
> -

Nick Piggin, et al,

First diffs would generate alot of noise, since I rip and insert
alot of code based on whether I think the code is REALLY
needed for MY TEST environment. These suggestions are
basicly minimal merge suggestions between my
development envir and the public Linux tree.

Now the why for this SUGGESTION/PATCH...

> When we're in the (min,low) watermark range, we'll wake up kswapd
> _before_ allocating anything, so what is better about the change to
> wake up kswapd after allocating? Can you perhaps come up with an
> example situation also to make this more clear?

Answer
Will GFP_ATOMIC alloc be failing at that point? If yes, then why
not allow kswapd attempt to prevent this condition from occuring?
The existing code reads that the first call to get_page_from_freelist()
has returned no page. Now you are going to start up something that
is at best going to take millisecs to start helping out. Won't it first
grab some pages to do its work? So we are going to be lower
in free memory right when it starts up. Right?

So, before the change, with  high memory consumption/pressure,
various GFP_xxx allocations would fail or take an excessive
amount of time due to the simple fact of low memory and/or
Slub/slab consumption and/or first failure of
get_page_from_freelist() when in a  low free memory condition.

Once the above condition occurs the perception is that the
current mainline Linux code then on demand increases its
effort to find some memory. However, while this is happening
the system is in a low memory bind and various performance
parameters are being effected and some allocations are
sleeping or being delayed or outright failing.

What I could see is that CURR suggestions allow a new class
of GFP_xxx allocations to succeed while in low memory,
try again philosophy, wake-up kswapd , etc, are all AFTER the
fact while something is WAITING for the memory. This
wait is in effect a SYNCHRONOUS wait for memory.

   Assuming that kswapd is really what is mostly needed.
   Execute it BEFORE (JUST IN TIME) to PREVENT low
   memory since I/O needs pages and  GFP_ATOMIC
allocs fails and other GFP allocs sleping and

  The SUGGESTION is to
   take the fraction of microsec longer in the fast path to see if
   it is needed to be started up and to ATTEMPT to prevent
   the SLOW-PATH and low/min memory from occuring.

The 2x low memory is
to allow some scalability and to allow it ENOUGH time to do what
it needs to do, since I expect a minimum number of millisecs
before it can move us away from low free memory. As the
amount of memory increases in a system this probably could
be decreased somewhat to maybe 1.25x.

IF the above is good then the issue is how to optimize the heck
out of the check.

Mitchell Erblich



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd get_page_from_freelist() : No more no page failures. (WHY????)

2007-08-28 Thread Mitchell Erblich
Nick Piggin wrote:

 [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED]
  Sent: Friday, August 24, 2007 3:11 PM
  Subject: Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd 
  get_page_from_freelist() : No more no page failures.
 
  Mailer added a HTML subpart and chopped the earlier email :^(

 Hi Mitchell,

 Is it possible to send suggestions in the form of a unified diff, even
 if you haven't even compiled it (just add a note to let people know).

 Secondly, we already have a (supposedly working) system of asynch
 reclaim, with buffering and hysteresis. I don't exactly understand
 what problem you think it has that would be solved by rechecking
 watermarks after allocating a page.

 When we're in the (min,low) watermark range, we'll wake up kswapd
 _before_ allocating anything, so what is better about the change to
 wake up kswapd after allocating? Can you perhaps come up with an
 example situation also to make this more clear?

 Overhead of wakeup_kswapd isn't too much of a problem: if we _should_
 be waking it up when we currently aren't, then we should be calling
 it. However the extra checking in the allocator fastpath is something
 we want to avoid if possible, because this can be a really hot path.

 Thanks,
 Nick

 --
 SUSE Labs, Novell Inc.
 -

Nick Piggin, et al,

First diffs would generate alot of noise, since I rip and insert
alot of code based on whether I think the code is REALLY
needed for MY TEST environment. These suggestions are
basicly minimal merge suggestions between my
development envir and the public Linux tree.

Now the why for this SUGGESTION/PATCH...

 When we're in the (min,low) watermark range, we'll wake up kswapd
 _before_ allocating anything, so what is better about the change to
 wake up kswapd after allocating? Can you perhaps come up with an
 example situation also to make this more clear?

Answer
Will GFP_ATOMIC alloc be failing at that point? If yes, then why
not allow kswapd attempt to prevent this condition from occuring?
The existing code reads that the first call to get_page_from_freelist()
has returned no page. Now you are going to start up something that
is at best going to take millisecs to start helping out. Won't it first
grab some pages to do its work? So we are going to be lower
in free memory right when it starts up. Right?

So, before the change, with  high memory consumption/pressure,
various GFP_xxx allocations would fail or take an excessive
amount of time due to the simple fact of low memory and/or
Slub/slab consumption and/or first failure of
get_page_from_freelist() when in a  low free memory condition.

Once the above condition occurs the perception is that the
current mainline Linux code then on demand increases its
effort to find some memory. However, while this is happening
the system is in a low memory bind and various performance
parameters are being effected and some allocations are
sleeping or being delayed or outright failing.

What I could see is that CURR suggestions allow a new class
of GFP_xxx allocations to succeed while in low memory,
try again philosophy, wake-up kswapd , etc, are all AFTER the
fact while something is WAITING for the memory. This
wait is in effect a SYNCHRONOUS wait for memory.

   Assuming that kswapd is really what is mostly needed.
   Execute it BEFORE (JUST IN TIME) to PREVENT low
   memory since I/O needs pages and  GFP_ATOMIC
allocs fails and other GFP allocs sleping and

  The SUGGESTION is to
   take the fraction of microsec longer in the fast path to see if
   it is needed to be started up and to ATTEMPT to prevent
   the SLOW-PATH and low/min memory from occuring.

The 2x low memory is
to allow some scalability and to allow it ENOUGH time to do what
it needs to do, since I expect a minimum number of millisecs
before it can move us away from low free memory. As the
amount of memory increases in a system this probably could
be decreased somewhat to maybe 1.25x.

IF the above is good then the issue is how to optimize the heck
out of the check.

Mitchell Erblich



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Mitchell Erblich
Jan-Bernd Themann,

  IMO, a box must be aware of the speed of all its
interfaces, whether the
  interface impliments tail-drop or RED or XYZ, the latency
to access the
  packet, etc.

  Then when a packet arrives, a timer is started for
interrupt colelesing,
  and to process awaiting packets, if tail-drop is
implemented, it is
  possible to wait until a the input FIFO fills to a
specific point before
  starting a timer.

  This may maximize the number of packets per interupt. And
realize that
  the worse case of a interrupt per packet is wirespeed
pings (echo
  request/reply) of 64 bytes per packet.

   Mitchell Erblich




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd & get_page_from_freelist() : No more no page failures.

2007-08-24 Thread Mitchell Erblich
From: Mitchell Erblich
To: Peter Zijlstra
Cc: Andrew Morton ; [EMAIL PROTECTED] ; linux-kernel@vger.kernel.org ;
[EMAIL PROTECTED]
Sent: Friday, August 24, 2007 3:11 PM
Subject: Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd &
get_page_from_freelist() : No more no page failures.

Mailer added a HTML subpart and chopped the earlier email :^(


Peter Zijlstra wrote:
>
> On Thu, 2007-08-23 at 02:35 -0700, Mitchell Erblich wrote:
> > Group,
> >
> > On the infrequent condition of failing to recieve a page from the
> > freelists, one of the things you do is call
wakeup_kswapd()(exception of
> > NUMA or GFP_THISNODE).
> >
> > Asuming that wakeup_kswapd() does what we want, this call is
> > such a high overhead call that you want to make sure that the
> > call is infrequent.
>
> It just wakes up a thread, it doesn't actually wait for anything.
> So the function is actually rather cheap.
>
> > My initial guess is that it REALLY needs to re-populate the
> > freelists just before they/it is used up. However, the simple change
> > is being suggested NOW.
>
> kswapd will only stop once it has reached the high watermarks
>
> > Assuming that on avg that the order value will be used, you should
> > increase the order to cover two allocs of that same level of order,
> > thus the +1. If on the chance that later page_alloc() calls need
> > fewer pages (smaller order) then the extra pages will be available
> > for more page_allocs(). If later calls have larger orders, hopefully
> > the latency between the calls is great enough that other parts of
> > the system will respond to the low memory / on the freelist(s).
>
> by virtue of kswapd only stopping reclaim when it reaches the high
> watermark you already have that it will free more than one page (its
> started when we're below the low watermark, so it'll free at least
> high-min pages).
>
> Changing the order has quite a different impact, esp now that we have
> lumpy reclaim.
>
> > Line 1265 within function __alloc_pages(), mm/page_alloc.c
> >
> > wakeup_kswapd(*z, order);
> >   to
> > wakeup_kswapd(*z, order + 1);
> >
> > In addition, isn't a call needed to determine that the
> > freelist(s) are almost empty, but are still returning a page?
>
> didn't we just do that by finding out that ALLOC_WMARK_LOW fails to
> return a page?
>
Peter Zijlstra, et al,

This reply is long.. In summary the order only effects whether the
max_order is
larger or smaller, and this makes sure that we are getting enough
memory.
When I get to swapd, I will look whether I really think, IMO, that it
SHOULD quit
before it reaches high mem as you seem to say above. IMO, cached memory
is good. Freelists are good. No page returns from
get_page_from_freelist()
are bad. And that doing something after a bad thing happens doesn't seem
to be the right thing to do.

> didn't we just do that by finding out that ALLOC_WMARK_LOW fails to
> return a page?

Yes, but I am talking about being premature in my checking while we
still are returning pages. When we first start not returning a page,
things
are starting to fail. Thus, this fairly KISS code is their to stop us
from
getting to the no page part! It calls two modified functions renaming
them
just a few lines above the current call to the wakeup_kswapd in what I
term the pre-call sequence. And the code itself is shorter than all this
explanation because it is a different way of thinking to a problem, IMO.

   Please realize that this is a prototype and
is intended to be very close, but needs to be reviewed by multiple
people to
identify whether the extra overhead in the fast path is worth the
tradeoff to
wake up the swapd BEFORE free memory drops to the current point that it
is NORMALLY awoken.

I will assume that 2x low memory is more than ample time to wakeup
kswapd
and allow is to clean any dirty pages, thus preventing any low memory
condition
from occuring..

Group,

Summary:

> > In addition, isn't a call needed to determine that the
> > freelist(s) are almost empty, but are still returning a page?

These suggestions are founded to attempt to
decrease the chance of dropping below low_memory
by waking up kswapd before we run out of pages
from the page freelist and/or effect MOST normal
page allocations.

The simplistic fix is to add a lightweight check prototyped/
suggestion below that checks whether we are about to drop
below the low watermark (free_pages  / 2 ). If we
are within 2 * lowmem_reserve, then we should start up kswapd
with the shortened wake

Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd get_page_from_freelist() : No more no page failures.

2007-08-24 Thread Mitchell Erblich
From: Mitchell Erblich
To: Peter Zijlstra
Cc: Andrew Morton ; [EMAIL PROTECTED] ; linux-kernel@vger.kernel.org ;
[EMAIL PROTECTED]
Sent: Friday, August 24, 2007 3:11 PM
Subject: Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd 
get_page_from_freelist() : No more no page failures.

Mailer added a HTML subpart and chopped the earlier email :^(


Peter Zijlstra wrote:

 On Thu, 2007-08-23 at 02:35 -0700, Mitchell Erblich wrote:
  Group,
 
  On the infrequent condition of failing to recieve a page from the
  freelists, one of the things you do is call
wakeup_kswapd()(exception of
  NUMA or GFP_THISNODE).
 
  Asuming that wakeup_kswapd() does what we want, this call is
  such a high overhead call that you want to make sure that the
  call is infrequent.

 It just wakes up a thread, it doesn't actually wait for anything.
 So the function is actually rather cheap.

  My initial guess is that it REALLY needs to re-populate the
  freelists just before they/it is used up. However, the simple change
  is being suggested NOW.

 kswapd will only stop once it has reached the high watermarks

  Assuming that on avg that the order value will be used, you should
  increase the order to cover two allocs of that same level of order,
  thus the +1. If on the chance that later page_alloc() calls need
  fewer pages (smaller order) then the extra pages will be available
  for more page_allocs(). If later calls have larger orders, hopefully
  the latency between the calls is great enough that other parts of
  the system will respond to the low memory / on the freelist(s).

 by virtue of kswapd only stopping reclaim when it reaches the high
 watermark you already have that it will free more than one page (its
 started when we're below the low watermark, so it'll free at least
 high-min pages).

 Changing the order has quite a different impact, esp now that we have
 lumpy reclaim.

  Line 1265 within function __alloc_pages(), mm/page_alloc.c
 
  wakeup_kswapd(*z, order);
to
  wakeup_kswapd(*z, order + 1);
 
  In addition, isn't a call needed to determine that the
  freelist(s) are almost empty, but are still returning a page?

 didn't we just do that by finding out that ALLOC_WMARK_LOW fails to
 return a page?

Peter Zijlstra, et al,

This reply is long.. In summary the order only effects whether the
max_order is
larger or smaller, and this makes sure that we are getting enough
memory.
When I get to swapd, I will look whether I really think, IMO, that it
SHOULD quit
before it reaches high mem as you seem to say above. IMO, cached memory
is good. Freelists are good. No page returns from
get_page_from_freelist()
are bad. And that doing something after a bad thing happens doesn't seem
to be the right thing to do.

 didn't we just do that by finding out that ALLOC_WMARK_LOW fails to
 return a page?

Yes, but I am talking about being premature in my checking while we
still are returning pages. When we first start not returning a page,
things
are starting to fail. Thus, this fairly KISS code is their to stop us
from
getting to the no page part! It calls two modified functions renaming
them
just a few lines above the current call to the wakeup_kswapd in what I
term the pre-call sequence. And the code itself is shorter than all this
explanation because it is a different way of thinking to a problem, IMO.

   Please realize that this is a prototype and
is intended to be very close, but needs to be reviewed by multiple
people to
identify whether the extra overhead in the fast path is worth the
tradeoff to
wake up the swapd BEFORE free memory drops to the current point that it
is NORMALLY awoken.

I will assume that 2x low memory is more than ample time to wakeup
kswapd
and allow is to clean any dirty pages, thus preventing any low memory
condition
from occuring..

Group,

Summary:

  In addition, isn't a call needed to determine that the
  freelist(s) are almost empty, but are still returning a page?

These suggestions are founded to attempt to
decrease the chance of dropping below low_memory
by waking up kswapd before we run out of pages
from the page freelist and/or effect MOST normal
page allocations.

The simplistic fix is to add a lightweight check prototyped/
suggestion below that checks whether we are about to drop
below the low watermark (free_pages  / 2 ). If we
are within 2 * lowmem_reserve, then we should start up kswapd
with the shortened wakeup_short_swapd() which removes the
call to zone_watermark_ok(), because it was already done.
So, the shorted zone_water_check() is done because we
aren't in the middle of a allocation.

thus, if we can start the kswapd() soon enough, the chance
of going below the goto got_pg should be diminished

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Mitchell Erblich
Jan-Bernd Themann,

  IMO, a box must be aware of the speed of all its
interfaces, whether the
  interface impliments tail-drop or RED or XYZ, the latency
to access the
  packet, etc.

  Then when a packet arrives, a timer is started for
interrupt colelesing,
  and to process awaiting packets, if tail-drop is
implemented, it is
  possible to wait until a the input FIFO fills to a
specific point before
  starting a timer.

  This may maximize the number of packets per interupt. And
realize that
  the worse case of a interrupt per packet is wirespeed
pings (echo
  request/reply) of 64 bytes per packet.

   Mitchell Erblich




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] : mm : / Patch / Suggestion : Add 1 order or agressiveness to wakeup_kswapd() : 1 line / 1 arg change

2007-08-23 Thread Mitchell Erblich
Group,

On the infrequent condition of failing to recieve a page from the
freelists, one of the things you do is call wakeup_kswapd()(exception of
NUMA or GFP_THISNODE).

Asuming that wakeup_kswapd() does what we want, this call is
such a high overhead call that you want to make sure that the
call is infrequent.
My initial guess is that it REALLY needs to re-populate the
freelists just before they/it is used up. However, the simple change
is being suggested NOW.

Assuming that on avg that the order value will be used, you should
increase the order to cover two allocs of that same level of order,
thus the +1. If on the chance that later page_alloc() calls need
fewer pages (smaller order) then the extra pages will be available
for more page_allocs(). If later calls have larger orders, hopefully
the latency between the calls is great enough that other parts of
the system will respond to the low memory / on the freelist(s).

Line 1265 within function __alloc_pages(), mm/page_alloc.c

wakeup_kswapd(*z, order);
  to
wakeup_kswapd(*z, order + 1);

In addition, isn't a call needed to determine that the
freelist(s) are almost empty, but are still returning a page?

Thus,  a lightweight call be done after a NORMAL
 page is recieved at line 1250 if (page) from the
 get_page_from_freelist()?  Or it could be embedded within the
 function call path that get_page_from_freelist() uses?

We could call  wakeup_kswapd()
 or equiv pro-actively?  The idea is that the call should check for
 2x of LOW_MEMORY equiv of the freelists and to re-populate them.
Then, HOPEFULLY the 2nd time calling get_page_from_freelist() would
then be obsolete.

When I come up with it, I will suggest it to the group.

Mitchell Erblich
FYI: My kernel is different enough that I can not validate this change
for the 2.6.2x git.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] : mm : / Patch / Suggestion : Add 1 order or agressiveness to wakeup_kswapd() : 1 line / 1 arg change

2007-08-23 Thread Mitchell Erblich
Group,

On the infrequent condition of failing to recieve a page from the
freelists, one of the things you do is call wakeup_kswapd()(exception of
NUMA or GFP_THISNODE).

Asuming that wakeup_kswapd() does what we want, this call is
such a high overhead call that you want to make sure that the
call is infrequent.
My initial guess is that it REALLY needs to re-populate the
freelists just before they/it is used up. However, the simple change
is being suggested NOW.

Assuming that on avg that the order value will be used, you should
increase the order to cover two allocs of that same level of order,
thus the +1. If on the chance that later page_alloc() calls need
fewer pages (smaller order) then the extra pages will be available
for more page_allocs(). If later calls have larger orders, hopefully
the latency between the calls is great enough that other parts of
the system will respond to the low memory / on the freelist(s).

Line 1265 within function __alloc_pages(), mm/page_alloc.c

wakeup_kswapd(*z, order);
  to
wakeup_kswapd(*z, order + 1);

In addition, isn't a call needed to determine that the
freelist(s) are almost empty, but are still returning a page?

Thus,  a lightweight call be done after a NORMAL
 page is recieved at line 1250 if (page) from the
 get_page_from_freelist()?  Or it could be embedded within the
 function call path that get_page_from_freelist() uses?

We could call  wakeup_kswapd()
 or equiv pro-actively?  The idea is that the call should check for
 2x of LOW_MEMORY equiv of the freelists and to re-populate them.
Then, HOPEFULLY the 2nd time calling get_page_from_freelist() would
then be obsolete.

When I come up with it, I will suggest it to the group.

Mitchell Erblich
FYI: My kernel is different enough that I can not validate this change
for the 2.6.2x git.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: QUESTION: RT & SCHED & fork: ?MISSING EQUIV of task_new_fairfor RT tasks.

2007-08-15 Thread Mitchell Erblich
Mike Galbraith wrote:
> 
> On Tue, 2007-08-14 at 12:28 -0700, Mitchell Erblich wrote:
> > Group, Ingo Molnar, etc,
> >
> > Why does the rt sched_class contain fewer elements than fair?
> > missing is the RT for .task_new.
> 
> No class specific initialization needs to be done for RT tasks.
> 
> -Mike


Mike, et al,

one time:  I was told that this group likes bottom posts.

The logic of class independent code calling class
scheduling dependent code, assumes that all functions
   are in ALL the class dependent sections.

Minimally, if I agree with your above statement, I would assume 
that the function should still exist as a null type function. However,
in reality,  alot of RT class specific init is done. Just currently 
none of it is done in this non-existant function.

Mitchell Erblich
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: QUESTION: RT SCHED fork: ?MISSING EQUIV of task_new_fairfor RT tasks.

2007-08-15 Thread Mitchell Erblich
Mike Galbraith wrote:
 
 On Tue, 2007-08-14 at 12:28 -0700, Mitchell Erblich wrote:
  Group, Ingo Molnar, etc,
 
  Why does the rt sched_class contain fewer elements than fair?
  missing is the RT for .task_new.
 
 No class specific initialization needs to be done for RT tasks.
 
 -Mike


Mike, et al,

one time:  I was told that this group likes bottom posts.

The logic of class independent code calling class
scheduling dependent code, assumes that all functions
   are in ALL the class dependent sections.

Minimally, if I agree with your above statement, I would assume 
that the function should still exist as a null type function. However,
in reality,  alot of RT class specific init is done. Just currently 
none of it is done in this non-existant function.

Mitchell Erblich
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


QUESTION: RT & SCHED & fork: ?MISSING EQUIV of task_new_fair for RT tasks.

2007-08-14 Thread Mitchell Erblich
Group, Ingo Molnar, etc,

Why does the rt sched_class contain fewer elements than fair?
missing is the RT for .task_new.

called by do_fork(): fork.c , which calls wake_up_new_task() via
if (!(clone_flags & CLONE_STOPPED))
  wake_up_new_task(p, clone_flags);

which is in sched.c and calls
if (!p->sched_class->task_new ||

   and
 p->sched_class->task_new(rq, p, now);


sched_rt.c:
static struct sched_class rt_sched_class __read_mostly = {
217 .enqueue_task   = enqueue_task_rt,
218 .dequeue_task   = dequeue_task_rt,
219 .yield_task = yield_task_rt,
220 
221 .check_preempt_curr = check_preempt_curr_rt,
222 
223 .pick_next_task = pick_next_task_rt,
224 .put_prev_task  = put_prev_task_rt,
225 
226 .load_balance   = load_balance_rt,
227 
228 .task_tick  = task_tick_rt,
229 };


sched_fair.c:
struct sched_class fair_sched_class __read_mostly = {
1075 .enqueue_task   = enqueue_task_fair,
1076 .dequeue_task   = dequeue_task_fair,
1077 .yield_task = yield_task_fair,
1078 
1079 .check_preempt_curr = check_preempt_curr_fair,
1080 
1081 .pick_next_task = pick_next_task_fair,
1082 .put_prev_task  = put_prev_task_fair,
1083 
1084 .load_balance   = load_balance_fair,
1085 
1086 .set_curr_task  = set_curr_task_fair,
1087 .task_tick  = task_tick_fair,
1088 .task_new   = task_new_fair,
1089 };
1090 

The missing rt equivalent item is:
/*
1014  * Share the fairness runtime between parent and child, thus the
1015  * total amount of pressure for CPU stays equal - new tasks
1016  * get a chance to run but frequent forkers are not allowed to
1017  * monopolize the CPU. Note: the parent runqueue is locked,
1018  * the child is not running yet.
1019  */
1020 static void task_new_fair(struct rq *rq, struct task_struct *p)
1021 {

Mitchell Erblich

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


minor Suggested cleanup: RT / sched : Have RT tasks use PF_LESS_THROTTLE flag

2007-08-14 Thread Mitchell Erblich
Group, Ingo Molnar, and Dmitry, et al,

task mm/page-writeback.c : get_dirty_limits()

Shouldn't the PF_LESS_THROTTLE flag be
scheduling class independent?

The below is a suggestion is to use the flag for
RT tasks.

-- to drop the rt_task() call within the function

if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {

1) becomes
if (tsk->flags & PF_LESS_THROTTLE)

Then 
2) in kernel/sched.c set:
   p->flags |= PF_LESS_THROTTLE before the break;

case SCHED_FIFO:
case SCHED_RR:
  p->sched_class = _sched_class;
  p->flags |= PF_LESS_THROTTLE;
break;

3) Unset it in case sched class changed; before the break
case SCHED_NORMAL:
case SCHED_BATCH:
case SCHED_IDLE:
   p->sched_class = _sched_class;
   p->flags &= ~PF_LESS_THROTTLE;
   break;

4) set the flag rt_mutex_setprio with braces
if (rt_prio(prio)) {
 p->sched_class = _sched_class;
 p->flags |= PF_LESS_THROTTLE;

 
 }  else


5) Am I missing anything?

Mitchell Erblich

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


minor Suggested cleanup: RT / sched : Have RT tasks use PF_LESS_THROTTLE flag

2007-08-14 Thread Mitchell Erblich
Group, Ingo Molnar, and Dmitry, et al,

task mm/page-writeback.c : get_dirty_limits()

Shouldn't the PF_LESS_THROTTLE flag be
scheduling class independent?

The below is a suggestion is to use the flag for
RT tasks.

-- to drop the rt_task() call within the function

if (tsk-flags  PF_LESS_THROTTLE || rt_task(tsk)) {

1) becomes
if (tsk-flags  PF_LESS_THROTTLE)

Then 
2) in kernel/sched.c set:
   p-flags |= PF_LESS_THROTTLE before the break;

case SCHED_FIFO:
case SCHED_RR:
  p-sched_class = rt_sched_class;
  p-flags |= PF_LESS_THROTTLE;
break;

3) Unset it in case sched class changed; before the break
case SCHED_NORMAL:
case SCHED_BATCH:
case SCHED_IDLE:
   p-sched_class = fair_sched_class;
   p-flags = ~PF_LESS_THROTTLE;
   break;

4) set the flag rt_mutex_setprio with braces
if (rt_prio(prio)) {
 p-sched_class = rt_sched_class;
 p-flags |= PF_LESS_THROTTLE;

 
 }  else


5) Am I missing anything?

Mitchell Erblich

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


QUESTION: RT SCHED fork: ?MISSING EQUIV of task_new_fair for RT tasks.

2007-08-14 Thread Mitchell Erblich
Group, Ingo Molnar, etc,

Why does the rt sched_class contain fewer elements than fair?
missing is the RT for .task_new.

called by do_fork(): fork.c , which calls wake_up_new_task() via
if (!(clone_flags  CLONE_STOPPED))
  wake_up_new_task(p, clone_flags);

which is in sched.c and calls
if (!p-sched_class-task_new ||

   and
 p-sched_class-task_new(rq, p, now);


sched_rt.c:
static struct sched_class rt_sched_class __read_mostly = {
217 .enqueue_task   = enqueue_task_rt,
218 .dequeue_task   = dequeue_task_rt,
219 .yield_task = yield_task_rt,
220 
221 .check_preempt_curr = check_preempt_curr_rt,
222 
223 .pick_next_task = pick_next_task_rt,
224 .put_prev_task  = put_prev_task_rt,
225 
226 .load_balance   = load_balance_rt,
227 
228 .task_tick  = task_tick_rt,
229 };


sched_fair.c:
struct sched_class fair_sched_class __read_mostly = {
1075 .enqueue_task   = enqueue_task_fair,
1076 .dequeue_task   = dequeue_task_fair,
1077 .yield_task = yield_task_fair,
1078 
1079 .check_preempt_curr = check_preempt_curr_fair,
1080 
1081 .pick_next_task = pick_next_task_fair,
1082 .put_prev_task  = put_prev_task_fair,
1083 
1084 .load_balance   = load_balance_fair,
1085 
1086 .set_curr_task  = set_curr_task_fair,
1087 .task_tick  = task_tick_fair,
1088 .task_new   = task_new_fair,
1089 };
1090 

The missing rt equivalent item is:
/*
1014  * Share the fairness runtime between parent and child, thus the
1015  * total amount of pressure for CPU stays equal - new tasks
1016  * get a chance to run but frequent forkers are not allowed to
1017  * monopolize the CPU. Note: the parent runqueue is locked,
1018  * the child is not running yet.
1019  */
1020 static void task_new_fair(struct rq *rq, struct task_struct *p)
1021 {

Mitchell Erblich

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Question : sched_rt.c : Loss of stats?? requeue_task_rt() does not call update_curr_rt() which updates stats

2007-08-08 Thread Mitchell Erblich
sched_rt.c : requeue_task_rt()

The comment states the problem requeue no dequeue.
Put task to the end of the run list without the overhead of dequeue
followed by enqueue.

dequeue_task_rt() updates stats. Where without calling
it will skip the stat update.

Thus, shouldn't  requeue_task_rt() call
  update_curr_rt(rq); ???

Mitchell Erblich.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Question: sched_rt.c : is RT check needed within a RT func? dequeue_task_rt() calls update_curr_rt() which checks for priority of RR or FIFO :

2007-08-08 Thread Mitchell Erblich
1) * Possible wasted stats overhead during dequeue..
sched_rt.c: 
Is RT check needed within a RT func? 
dequeue_task_rt() calls update_curr_rt() 
which checks for priority of RR or FIFO.

WITHIN..
static inline void update_curr_rt(struct rq *rq)

are the two lines..
if (!task_has_rt_policy(curr))
return;

Generally if I am reading this right, then what
 RT task is neither FIFO or RR???

Thus, I think those two lines could be removed.

2) nit
The comment within sched_rt.c
->   Adding/removing a task to/from a priority array:
Is placed before dequeue_task_rt() where
enqueue_task_rt() is placed above the comment

Thus, the comment should be moved above enqueue_task_rt()

Mitchell Erblich




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question: RT schedular : task_tick_rt(struct rq *rq, structtask_struct *p) : decreases overhead when rq->nr_running == 1

2007-08-08 Thread Mitchell Erblich
Ingo Molnar and group,

First, RT/RR tasks are not deprecated.  This
change simply removes sched overhead while
only 1 RT/RR task is runable per rq.

The code size increase is minor and is a very
straightforward change..

THUS...

My assumption is / was to hand you/Ingo Molnar this change
because,
* you have the latest scheduler git tree and do periodic
   pull requests, so I consider you the default scheduler
   maintainer    : ^ )
* my scheduler tree is more developmental,
* and I wish to first suggest minor changes over time to see
   what scheduler changes are more acceptable.

  Mitchell Erblich
--


Ingo Molnar wrote:
>
> * Mitchell Erblich <[EMAIL PROTECTED]> wrote:
>
> > After
> > p->time_slice = static_prio_timeslice(p->static_prio);
> >
> > Why isn't their a check like
> > if  (rq->nr_running == 1)
> >  return;
> >
> > Which world remove the need for any recheduling or requeue'ing...
>
> your change is a possible optimization, but this is a pretty rare
> codepath because the overwhelming majority of RT apps uses SCHED_FIFO.
> Plus, the time_slice going down to 0 is a relatively rare event even for
> SCHED_RR tasks. And if we have only a single SCHED_RR task, why is it
> SCHED_RR to begin with? So this is on several levels an uncommon
> workload and by adding a check like that we'd increase the codesize. But
> ... no strong feelings against this optimization - if you send a proper
> patch we can apply it, it certainly makes sense from a logic POV.
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question: RT schedular : task_tick_rt(struct rq *rq, structtask_struct *p) : decreases overhead when rq-nr_running == 1

2007-08-08 Thread Mitchell Erblich
Ingo Molnar and group,

First, RT/RR tasks are not deprecated.  This
change simply removes sched overhead while
only 1 RT/RR task is runable per rq.

The code size increase is minor and is a very
straightforward change..

THUS...

My assumption is / was to hand you/Ingo Molnar this change
because,
* you have the latest scheduler git tree and do periodic
   pull requests, so I consider you the default scheduler
   maintainer    : ^ )
* my scheduler tree is more developmental,
* and I wish to first suggest minor changes over time to see
   what scheduler changes are more acceptable.

  Mitchell Erblich
--


Ingo Molnar wrote:

 * Mitchell Erblich [EMAIL PROTECTED] wrote:

  After
  p-time_slice = static_prio_timeslice(p-static_prio);
 
  Why isn't their a check like
  if  (rq-nr_running == 1)
   return;
 
  Which world remove the need for any recheduling or requeue'ing...

 your change is a possible optimization, but this is a pretty rare
 codepath because the overwhelming majority of RT apps uses SCHED_FIFO.
 Plus, the time_slice going down to 0 is a relatively rare event even for
 SCHED_RR tasks. And if we have only a single SCHED_RR task, why is it
 SCHED_RR to begin with? So this is on several levels an uncommon
 workload and by adding a check like that we'd increase the codesize. But
 ... no strong feelings against this optimization - if you send a proper
 patch we can apply it, it certainly makes sense from a logic POV.

 Ingo
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Question: sched_rt.c : is RT check needed within a RT func? dequeue_task_rt() calls update_curr_rt() which checks for priority of RR or FIFO :

2007-08-08 Thread Mitchell Erblich
1) * Possible wasted stats overhead during dequeue..
sched_rt.c: 
Is RT check needed within a RT func? 
dequeue_task_rt() calls update_curr_rt() 
which checks for priority of RR or FIFO.

WITHIN..
static inline void update_curr_rt(struct rq *rq)

are the two lines..
if (!task_has_rt_policy(curr))
return;

Generally if I am reading this right, then what
 RT task is neither FIFO or RR???

Thus, I think those two lines could be removed.

2) nit
The comment within sched_rt.c
-   Adding/removing a task to/from a priority array:
Is placed before dequeue_task_rt() where
enqueue_task_rt() is placed above the comment

Thus, the comment should be moved above enqueue_task_rt()

Mitchell Erblich




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Question : sched_rt.c : Loss of stats?? requeue_task_rt() does not call update_curr_rt() which updates stats

2007-08-08 Thread Mitchell Erblich
sched_rt.c : requeue_task_rt()

The comment states the problem requeue no dequeue.
Put task to the end of the run list without the overhead of dequeue
followed by enqueue.

dequeue_task_rt() updates stats. Where without calling
it will skip the stat update.

Thus, shouldn't  requeue_task_rt() call
  update_curr_rt(rq); ???

Mitchell Erblich.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Question: RT schedular : task_tick_rt(struct rq *rq, struct task_struct *p) : decreases overhead when rq->nr_running == 1

2007-08-07 Thread Mitchell Erblich
Ingo Molnar and group,

If their is a single RT task on this rq, then why not just reset the
timeslice and return??? Just MAYBE decreasing re-scheduling
overhead..

Thus

After
p->time_slice = static_prio_timeslice(p->static_prio);

Why isn't their a check like
if  (rq->nr_running == 1)
 return;

Which world remove the need for any recheduling
or requeue'ing...

Mitchell Erblich
FYI: below is believed to be a snap of the current/ orig func
-


+static void task_tick_rt(struct rq *rq, struct task_struct *p)
+{
+ /*
+  * RR tasks need a special form of timeslice management.
+  * FIFO tasks have no timeslices.
+  */
+ if (p->policy != SCHED_RR)
+  return;
+
+ if (--p->time_slice)
+  return;
+
+ p->time_slice = static_prio_timeslice(p->static_prio);
+ set_tsk_need_resched(p);
+
+ /* put it at the end of the queue: */
+ requeue_task_rt(rq, p);
+}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Question: RT schedular : task_tick_rt(struct rq *rq, struct task_struct *p) : decreases overhead when rq-nr_running == 1

2007-08-07 Thread Mitchell Erblich
Ingo Molnar and group,

If their is a single RT task on this rq, then why not just reset the
timeslice and return??? Just MAYBE decreasing re-scheduling
overhead..

Thus

After
p-time_slice = static_prio_timeslice(p-static_prio);

Why isn't their a check like
if  (rq-nr_running == 1)
 return;

Which world remove the need for any recheduling
or requeue'ing...

Mitchell Erblich
FYI: below is believed to be a snap of the current/ orig func
-


+static void task_tick_rt(struct rq *rq, struct task_struct *p)
+{
+ /*
+  * RR tasks need a special form of timeslice management.
+  * FIFO tasks have no timeslices.
+  */
+ if (p-policy != SCHED_RR)
+  return;
+
+ if (--p-time_slice)
+  return;
+
+ p-time_slice = static_prio_timeslice(p-static_prio);
+ set_tsk_need_resched(p);
+
+ /* put it at the end of the queue: */
+ requeue_task_rt(rq, p);
+}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: about modularization

2007-08-06 Thread Mitchell Erblich
Rene,

Of the uni-processor systems currently that can run Linux, I would not
doubt if  99.% percent are uni-cores. It will be probably
3-5 years minimum before the multi-core processors will have any
decent percentage of systems.

And I am not suggesting not supporting them. I am only suggesting
   is wrt the schedular, bring the system up with a default schedular,
   and then load additional functionality based on the hardware/software
requirements of the system.

Thus, the fallout MIGHT be a uni-processor CFS that would not migrate
tasks between multiple CPUs and as additional processors are brought
online, migration could be enabled, and gang type scheduling,  whatever
could be then used.


IMO, if their is a fault (because of heat, etc) the user would rather
bring
up the system in a degraded mode. Same reason applies to...
boot -s..

Mitchell Erblich
--


Rene Herman wrote:
>
> On 08/06/2007 10:20 PM, Mitchell Erblich wrote:
>
> > Thus, a hybrid schedular approach could be taken
> > that would default to a single uni-processor schedular
>
> What a brilliant idea in a world where buying a non multi core CPU is
> getting to be only somewhat easier than a non SMT one...
>
> Rene.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: about modularization

2007-08-06 Thread Mitchell Erblich
Ingo Molnar and group,

If we just concentrate on CPU schedulars...

 IMO,  POSIX requirements almost guarantee
 the support for modularization. The different
 task scheds allow a set of task class specific
 funcs to be generated. The question is whether
 those modular schedulars will ALWAYS consume
 kernel footprint space?

With the arg of modularization is really targeted
to optional hardware and decreases the kernel
footprint size. Then here is a arg to support only 1
default schedular and take the rest of the sched
code and modularize that..

  IMO, ONLY some envs REQUIRE RT
  sched and some envs REQUIRE MP
  (multi-core and multi-processor) scheduling.
  I question whether the core kernel needs
  this support.
  .
This additional capability could be removed
from the growing kernel footprint and
additional schedulars could be kept in the src
code base with increasingly minimal effort if full
modularization support were added.

Thus, a hybrid schedular approach could be taken
that would default to a single uni-processor schedular
(ex: CFS) and the other schedulars could be
modularized.

Mitchell Erblich








Ingo Molnar wrote:
>
> * T. J. Brumfield <[EMAIL PROTECTED]> wrote:
>
> > 1 - Can someone please explain why the kernel can be modular in every
> > other aspect, including offering a choice of IO schedulers, but not
> > kernel schedulers?
>
> that's a fundamental misconception. If you boot into a distro kernel on
> a typical PC, about half of the kernel code that the box runs in any
> moment will be in modules, half of it is in the "kernel core". For
> example, on a random laptop:
>
>  $ echo `lsmod | cut -c1-30 | cut -d' ' -f2-` | sed 's/Size //' |
>sed 's/ /+/g' | bc
>  2513784
>
> i.e. 2.5 MB of modules. The core kernel's size:
>
>  $ dmesg | grep 'kernel code'
>  Memory: 2053212k/2087808k available (2185k kernel code, 33240k reserved,
1174k data, 244k init, 1170304k highmem)
>
> 2.1 MB of kernel core code. (of course the total body of "possible
> drivers" is 10 times larger than that of the core kernel - but the
> fundamental 'variety' is not.)
>
> most of the modules are for stuff where there is a significant physical
> difference between the components they support. Drivers for different
> pieces of hardware. Filesystem drivers for different on-disk physical
> layouts. Network protocol drivers for different on-wire formats. The
> sanest technological decision there is clearly to modularize.
>
> And note that often it's not even about choice there: the user's system
> has a particular piece of hardware, to which there is usually one
> primary driver. The user does not have any real 'choice' over the
> modularization here, it's largely a technological act to make the
> kernel's footprint smaller.
>
> But the kernel core, which does not depend as much on the physical
> properties of the stuff it supports (it depends on the physics of the
> machine of course, but those rules are mostly shared between all
> machines of that architecture), and is fundamentally influenced by the
> syscall API (which is not modular either) and by our OS design
> decisions, has much less reason to be modularized.
>
> The core kernel was always non-modular, and it depends on the technical
> details whether we want to or _have to_ modularize something so that it
> becomes modular to the user too. For example we dont have 'competing',
> modular versions of the IPv4 stack. Neither of the VFS. Nor of timers,
> futexes, nor of locking code or of the CPU scheduler. But we can switch
> out any of those implementations from the core kernel, and did so
> numerous times in the past and will do so in the future.
>
> CPU schedulers are as core kernel code as it gets - you cannot even boot
> without having a CPU scheduler. IO schedulers, although similar in name,
> are quite different beasts from CPU schedulers, and they are somewhere
> between the core kernel and drivers. They are not 'physical drivers' (an
> IO scheduler can drive any disk), nor are they fully 'core kernel code'
> in the sense of a kernel not even being able to boot without them. Also,
> disks are physically different from CPUs, in a way which works _against_
> the user-modularization of CPU schedulers. (there are also many other
> differences which have been pointed out in the past)
>
> In any case, the IO subsystem maintainers decided to modularize IO
> schedulers, and that's their decision. One of the authors of the IO
> scheduler code said it on lkml recently that while modularization of IO
> scheduler had advantages too, in retrospect he wishes they would not
> hav

Re: about modularization

2007-08-06 Thread Mitchell Erblich
Ingo Molnar and group,

If we just concentrate on CPU schedulars...

 IMO,  POSIX requirements almost guarantee
 the support for modularization. The different
 task scheds allow a set of task class specific
 funcs to be generated. The question is whether
 those modular schedulars will ALWAYS consume
 kernel footprint space?

With the arg of modularization is really targeted
to optional hardware and decreases the kernel
footprint size. Then here is a arg to support only 1
default schedular and take the rest of the sched
code and modularize that..

  IMO, ONLY some envs REQUIRE RT
  sched and some envs REQUIRE MP
  (multi-core and multi-processor) scheduling.
  I question whether the core kernel needs
  this support.
  .
This additional capability could be removed
from the growing kernel footprint and
additional schedulars could be kept in the src
code base with increasingly minimal effort if full
modularization support were added.

Thus, a hybrid schedular approach could be taken
that would default to a single uni-processor schedular
(ex: CFS) and the other schedulars could be
modularized.

Mitchell Erblich








Ingo Molnar wrote:

 * T. J. Brumfield [EMAIL PROTECTED] wrote:

  1 - Can someone please explain why the kernel can be modular in every
  other aspect, including offering a choice of IO schedulers, but not
  kernel schedulers?

 that's a fundamental misconception. If you boot into a distro kernel on
 a typical PC, about half of the kernel code that the box runs in any
 moment will be in modules, half of it is in the kernel core. For
 example, on a random laptop:

  $ echo `lsmod | cut -c1-30 | cut -d' ' -f2-` | sed 's/Size //' |
sed 's/ /+/g' | bc
  2513784

 i.e. 2.5 MB of modules. The core kernel's size:

  $ dmesg | grep 'kernel code'
  Memory: 2053212k/2087808k available (2185k kernel code, 33240k reserved,
1174k data, 244k init, 1170304k highmem)

 2.1 MB of kernel core code. (of course the total body of possible
 drivers is 10 times larger than that of the core kernel - but the
 fundamental 'variety' is not.)

 most of the modules are for stuff where there is a significant physical
 difference between the components they support. Drivers for different
 pieces of hardware. Filesystem drivers for different on-disk physical
 layouts. Network protocol drivers for different on-wire formats. The
 sanest technological decision there is clearly to modularize.

 And note that often it's not even about choice there: the user's system
 has a particular piece of hardware, to which there is usually one
 primary driver. The user does not have any real 'choice' over the
 modularization here, it's largely a technological act to make the
 kernel's footprint smaller.

 But the kernel core, which does not depend as much on the physical
 properties of the stuff it supports (it depends on the physics of the
 machine of course, but those rules are mostly shared between all
 machines of that architecture), and is fundamentally influenced by the
 syscall API (which is not modular either) and by our OS design
 decisions, has much less reason to be modularized.

 The core kernel was always non-modular, and it depends on the technical
 details whether we want to or _have to_ modularize something so that it
 becomes modular to the user too. For example we dont have 'competing',
 modular versions of the IPv4 stack. Neither of the VFS. Nor of timers,
 futexes, nor of locking code or of the CPU scheduler. But we can switch
 out any of those implementations from the core kernel, and did so
 numerous times in the past and will do so in the future.

 CPU schedulers are as core kernel code as it gets - you cannot even boot
 without having a CPU scheduler. IO schedulers, although similar in name,
 are quite different beasts from CPU schedulers, and they are somewhere
 between the core kernel and drivers. They are not 'physical drivers' (an
 IO scheduler can drive any disk), nor are they fully 'core kernel code'
 in the sense of a kernel not even being able to boot without them. Also,
 disks are physically different from CPUs, in a way which works _against_
 the user-modularization of CPU schedulers. (there are also many other
 differences which have been pointed out in the past)

 In any case, the IO subsystem maintainers decided to modularize IO
 schedulers, and that's their decision. One of the authors of the IO
 scheduler code said it on lkml recently that while modularization of IO
 scheduler had advantages too, in retrospect he wishes they would not
 have made IO schedulers modular and now that decision cannot be undone.
 So even that much different situation was far from a clear decision, and
 some negative effects can be felt today too, in form of having two
 primary IO schedulers but not having one IO scheduler that works well in
 all cases. For CPU schedulers

Re: about modularization

2007-08-06 Thread Mitchell Erblich
Rene,

Of the uni-processor systems currently that can run Linux, I would not
doubt if  99.% percent are uni-cores. It will be probably
3-5 years minimum before the multi-core processors will have any
decent percentage of systems.

And I am not suggesting not supporting them. I am only suggesting
   is wrt the schedular, bring the system up with a default schedular,
   and then load additional functionality based on the hardware/software
requirements of the system.

Thus, the fallout MIGHT be a uni-processor CFS that would not migrate
tasks between multiple CPUs and as additional processors are brought
online, migration could be enabled, and gang type scheduling,  whatever
could be then used.


IMO, if their is a fault (because of heat, etc) the user would rather
bring
up the system in a degraded mode. Same reason applies to...
boot -s..

Mitchell Erblich
--


Rene Herman wrote:

 On 08/06/2007 10:20 PM, Mitchell Erblich wrote:

  Thus, a hybrid schedular approach could be taken
  that would default to a single uni-processor schedular

 What a brilliant idea in a world where buying a non multi core CPU is
 getting to be only somewhat easier than a non SMT one...

 Rene.
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question: Scheduler 'Exit' or modularization of scheduler?

2007-07-31 Thread Mitchell Erblich
Paul Robinson,

Solaris's SunOS SVR4.x has a modular schedular / dispatcher,
however, I believe that the time share dispatcher is
actually a frozen-base and is not replaceable. I do
believe that some of its scheduling characteristics are
modifyable.

IT is my understanding that CFS is the new schedular,
however it is separated from the RT (real-time) FIFO and
RR schedular.

Currently, I have a initial suggestion to lkma to see if
their is any support of a interactive task class. It allows
a root user to classify a set of tasks as interactive.
If their is acceptance, I plan to propose a patch/ set of 
changes to support this new task class by the end of Aug.

IFF, then maybe in the future...
Paul Robinson, if you feel that you are up to the task
of modularizing the schedular / dispatcher to do what
you think should be done, I would suggest submitting
a prototype to Ingo, et al and see what the response
is..


Mitchell Erblich


Paul Robinson wrote:
> 
> There has been a considerable amount of talk and many news
articles on
> some websites because of the inclusion of the CFS scheduler either
as a
> replacement for the old scheduler or instead of using the SD
scheduler,
> some people apparently feel that one or the other of these is not
right
> in some contexts or in some environments.  I'm not completely
clear on
> what is going on or exactly what the complaint is.  But I
personally
> would like to try to toss in my 0.02 Euro in an attempt to offer
some
> light and less heat to the dilemma and offer a suggestion.
> 
> If my ignorance of the subject is too obvious, please excuse me, I
might
> not have that much experience in the subject.  I've only been
> programming for 27 years and I hope to get better at it with more
practice.
> 
> So, there are two questions about which I am wondering.   They may
be
> somewhat related but the methodology for each are different and
the
> method of implementation would be different.
> 
> 1. Could it be possible to design the interface to the scheduler
either
> that there is an 'exit' (my mainframe history precedes me) in
which one
> can issue a monitor call such that the system changes to an
alternate
> scheduler, possibly as part of the boot process?  Thus there might
be a
> default scheduler but it is possible to invoke an alternative one.
> 
> 2.  Could the scheduler be such that it be designed as a system
loadable
> module rather than as a monolithic part of the code, such that the
> particular scheduler is a specific file and is simply installed at
boot
> time, and if someone wants a different scheduler, they can simply
create
> a new one, rename the existing one to something else, name theirs
to
> whatever the scheduler's name is, then shutdown and reboot the
machine?
> 
> I am thinking that the system scheduler is an integral part of a
> time-shared operating system, it would be memory resident while
the
> machine is operating, thus it only has to be on disk during start
up and
> is not in use while the system is in normal operation and could be
> replaced at any time (subject to the usual caveat that the system
has to
> be shutdown and rebooted to cause the scheduling mechanism to be
changed
> to the new one.).
> 
> If such a capacity were available, or perhaps if such capacity can
be
> implemented at some point in the future, this would solve one of
the
> more critical issues, since people needing more finely tuned
scheduling
> facilities can use one different from the common one, or 'roll
their
> own' if they need something really special.
> 
> I am also thinking this sort of a capacity would be extremely
useful
> either in virtualization issues, in running other operating
systems (or
> copies of Linux) as guest operating systems under Linux vis-a-vis
Xen,
> or in respect to real-time versions of Linux, such that if someone
needs
> to grant certain processes high priority, and the rest everything
that's
> left over, then they could do that simply by writing the scheduler
to
> the interface definition.
> 
> Of course, I could be completely wrong on this point and this is
not a
> partitionable feature, that it's not possible to have the job
scheduler
> loaded from a secondary module at boot time.  (This may be one of
the
> reasons why there have been problems with non-monolithic kernels
being
> unavailable for general use except in extremely limited cases.)
> 
> Or I could be wrong in that this issue isn't that important and
most of
> the noise over the issue is a small and vocal minority complaining
about
> a marginal and unimportant issue.   Of course, this sort of
situation is
> proba

Re: Question: Scheduler 'Exit' or modularization of scheduler?

2007-07-31 Thread Mitchell Erblich
Paul Robinson,

Solaris's SunOS SVR4.x has a modular schedular / dispatcher,
however, I believe that the time share dispatcher is
actually a frozen-base and is not replaceable. I do
believe that some of its scheduling characteristics are
modifyable.

IT is my understanding that CFS is the new schedular,
however it is separated from the RT (real-time) FIFO and
RR schedular.

Currently, I have a initial suggestion to lkma to see if
their is any support of a interactive task class. It allows
a root user to classify a set of tasks as interactive.
If their is acceptance, I plan to propose a patch/ set of 
changes to support this new task class by the end of Aug.

IFF, then maybe in the future...
Paul Robinson, if you feel that you are up to the task
of modularizing the schedular / dispatcher to do what
you think should be done, I would suggest submitting
a prototype to Ingo, et al and see what the response
is..


Mitchell Erblich


Paul Robinson wrote:
 
 There has been a considerable amount of talk and many news
articles on
 some websites because of the inclusion of the CFS scheduler either
as a
 replacement for the old scheduler or instead of using the SD
scheduler,
 some people apparently feel that one or the other of these is not
right
 in some contexts or in some environments.  I'm not completely
clear on
 what is going on or exactly what the complaint is.  But I
personally
 would like to try to toss in my 0.02 Euro in an attempt to offer
some
 light and less heat to the dilemma and offer a suggestion.
 
 If my ignorance of the subject is too obvious, please excuse me, I
might
 not have that much experience in the subject.  I've only been
 programming for 27 years and I hope to get better at it with more
practice.
 
 So, there are two questions about which I am wondering.   They may
be
 somewhat related but the methodology for each are different and
the
 method of implementation would be different.
 
 1. Could it be possible to design the interface to the scheduler
either
 that there is an 'exit' (my mainframe history precedes me) in
which one
 can issue a monitor call such that the system changes to an
alternate
 scheduler, possibly as part of the boot process?  Thus there might
be a
 default scheduler but it is possible to invoke an alternative one.
 
 2.  Could the scheduler be such that it be designed as a system
loadable
 module rather than as a monolithic part of the code, such that the
 particular scheduler is a specific file and is simply installed at
boot
 time, and if someone wants a different scheduler, they can simply
create
 a new one, rename the existing one to something else, name theirs
to
 whatever the scheduler's name is, then shutdown and reboot the
machine?
 
 I am thinking that the system scheduler is an integral part of a
 time-shared operating system, it would be memory resident while
the
 machine is operating, thus it only has to be on disk during start
up and
 is not in use while the system is in normal operation and could be
 replaced at any time (subject to the usual caveat that the system
has to
 be shutdown and rebooted to cause the scheduling mechanism to be
changed
 to the new one.).
 
 If such a capacity were available, or perhaps if such capacity can
be
 implemented at some point in the future, this would solve one of
the
 more critical issues, since people needing more finely tuned
scheduling
 facilities can use one different from the common one, or 'roll
their
 own' if they need something really special.
 
 I am also thinking this sort of a capacity would be extremely
useful
 either in virtualization issues, in running other operating
systems (or
 copies of Linux) as guest operating systems under Linux vis-a-vis
Xen,
 or in respect to real-time versions of Linux, such that if someone
needs
 to grant certain processes high priority, and the rest everything
that's
 left over, then they could do that simply by writing the scheduler
to
 the interface definition.
 
 Of course, I could be completely wrong on this point and this is
not a
 partitionable feature, that it's not possible to have the job
scheduler
 loaded from a secondary module at boot time.  (This may be one of
the
 reasons why there have been problems with non-monolithic kernels
being
 unavailable for general use except in extremely limited cases.)
 
 Or I could be wrong in that this issue isn't that important and
most of
 the noise over the issue is a small and vocal minority complaining
about
 a marginal and unimportant issue.   Of course, this sort of
situation is
 probably the case with 90% of all traffic on usenet, newsgroups
and
 mailing lists, so what else is new?
 
 Or, and this is the big one, that this feature already exists in
the
 Linux kernel and the method of scheduler invocation is already
 modularized for boot-time

schedular : No code : New interactive (ia) sched class : Part 1

2007-07-30 Thread Mitchell Erblich
- Comments for or against the SCHED_IA class in the generic
  Linux kernel source tree.
- General comments
 
With Ingo Molnar's 2.6.22 latest inclusions into the Linux
source base, he has opened an opportunity to add additional
task classes.
 
This informal 1-pager is to try to identify whether their
is support for the interactive (ia) task class. Posix explicitly
states only three classes. However it states that
implementations may define additional schedulars/classes.
 
  This 1-pager is to address the support for an additional
schedular class SCHED_IA. 
Currently interactive tasks are derived from
SCHED_NORMAL based on sleep factors and generate bonuses/credits
to alter their scheduling behaviour. However, this developer
believes that other tasks with different behaviours exist that
COULD also be classified as a ia task.
 
This developers preconcieved belief is that the ia class could
generally be supported with a minimal amount of effort, be a
benefit to the Linux kernel, and to generally support the 
following rules:
 
* allow root / SUSER (Super USER) to determine that a task
  should be classified as a ia (interactive) task without 
  eviction based on scheduling behaviours,
 
* allow SCHED_IA priorities be placed between the Real-time
  schedulars and SCHED_NORMAL/OTHER,
 
* derive the interactive behaviour based on its nice value
  and/or  its initial combined priority,
 
* TBD : identify whether the priority should be fixed or
  have limited variability within the ia interactive class,
 
* user level command support to identify that a task is a
  interactive (ia) task,
 
* etc (fork behaviours),
 
Note: this 1-pager is not suggesting the removal or adjustment
  of interactive support within the SCHED_NORMAL/OTHER
  Linux, currently supported CFS environment.
 

- Comments for or against the SCHED_IA class in the generic
  Linux kernel source tree.
- General comments
 
Mitchell Erblich
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


schedular : No code : New interactive (ia) sched class : Part 1

2007-07-30 Thread Mitchell Erblich
Group,
 
- Comments for or against the SCHED_IA class in the generic
  Linux kernel source tree.
- General comments
 
With Ingo Molnar's 2.6.22 latest inclusions into the Linux
source base, he has opened an opportunity to add additional
task classes.
 
This informal 1-pager is to try to identify whether their
is support for the interactive (ia) task class. Posix explicitly
states only three classes. However it states that
implementations may define additional schedulars/classes.
 
  This 1-pager is to address the support for an additional
schedular class SCHED_IA. 
Currently interactive tasks are derived from
SCHED_NORMAL based on sleep factors and generate bonuses/credits
to alter their scheduling behaviour. However, this developer
believes that other tasks with different behaviours exist that
COULD also be classified as a ia task.
 
This developers preconcieved belief is that the ia class could
generally be supported with a minimal amount of effort, be a
benefit to the Linux kernel, and to generally support the 
following rules:
 
* allow root / SUSER (Super USER) to determine that a task
  should be classified as a ia (interactive) task without 
  eviction based on scheduling behaviours,
 
* allow SCHED_IA priorities be placed between the Real-time
  schedulars and SCHED_NORMAL/OTHER,
 
* derive the interactive behaviour based on its nice value
  and/or  its initial combined priority,
 
* TBD : identify whether the priority should be fixed or
  have limited variability within the ia interactive class,
 
* user level command support to identify that a task is a
  interactive (ia) task,
 
* etc (fork behaviours),
 
Note: this 1-pager is not suggesting the removal or adjustment
  of interactive support within the SCHED_NORMAL/OTHER
  Linux, currently supported CFS environment.
 

- Comments for or against the SCHED_IA class in the generic
  Linux kernel source tree.
- General comments
 
Mitchell Erblich

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


schedular : No code : New interactive (ia) sched class : Part 1

2007-07-30 Thread Mitchell Erblich
Group,
 
- Comments for or against the SCHED_IA class in the generic
  Linux kernel source tree.
- General comments
 
With Ingo Molnar's 2.6.22 latest inclusions into the Linux
source base, he has opened an opportunity to add additional
task classes.
 
This informal 1-pager is to try to identify whether their
is support for the interactive (ia) task class. Posix explicitly
states only three classes. However it states that
implementations may define additional schedulars/classes.
 
  This 1-pager is to address the support for an additional
schedular class SCHED_IA. 
Currently interactive tasks are derived from
SCHED_NORMAL based on sleep factors and generate bonuses/credits
to alter their scheduling behaviour. However, this developer
believes that other tasks with different behaviours exist that
COULD also be classified as a ia task.
 
This developers preconcieved belief is that the ia class could
generally be supported with a minimal amount of effort, be a
benefit to the Linux kernel, and to generally support the 
following rules:
 
* allow root / SUSER (Super USER) to determine that a task
  should be classified as a ia (interactive) task without 
  eviction based on scheduling behaviours,
 
* allow SCHED_IA priorities be placed between the Real-time
  schedulars and SCHED_NORMAL/OTHER,
 
* derive the interactive behaviour based on its nice value
  and/or  its initial combined priority,
 
* TBD : identify whether the priority should be fixed or
  have limited variability within the ia interactive class,
 
* user level command support to identify that a task is a
  interactive (ia) task,
 
* etc (fork behaviours),
 
Note: this 1-pager is not suggesting the removal or adjustment
  of interactive support within the SCHED_NORMAL/OTHER
  Linux, currently supported CFS environment.
 

- Comments for or against the SCHED_IA class in the generic
  Linux kernel source tree.
- General comments
 
Mitchell Erblich

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


schedular : No code : New interactive (ia) sched class : Part 1

2007-07-30 Thread Mitchell Erblich
- Comments for or against the SCHED_IA class in the generic
  Linux kernel source tree.
- General comments
 
With Ingo Molnar's 2.6.22 latest inclusions into the Linux
source base, he has opened an opportunity to add additional
task classes.
 
This informal 1-pager is to try to identify whether their
is support for the interactive (ia) task class. Posix explicitly
states only three classes. However it states that
implementations may define additional schedulars/classes.
 
  This 1-pager is to address the support for an additional
schedular class SCHED_IA. 
Currently interactive tasks are derived from
SCHED_NORMAL based on sleep factors and generate bonuses/credits
to alter their scheduling behaviour. However, this developer
believes that other tasks with different behaviours exist that
COULD also be classified as a ia task.
 
This developers preconcieved belief is that the ia class could
generally be supported with a minimal amount of effort, be a
benefit to the Linux kernel, and to generally support the 
following rules:
 
* allow root / SUSER (Super USER) to determine that a task
  should be classified as a ia (interactive) task without 
  eviction based on scheduling behaviours,
 
* allow SCHED_IA priorities be placed between the Real-time
  schedulars and SCHED_NORMAL/OTHER,
 
* derive the interactive behaviour based on its nice value
  and/or  its initial combined priority,
 
* TBD : identify whether the priority should be fixed or
  have limited variability within the ia interactive class,
 
* user level command support to identify that a task is a
  interactive (ia) task,
 
* etc (fork behaviours),
 
Note: this 1-pager is not suggesting the removal or adjustment
  of interactive support within the SCHED_NORMAL/OTHER
  Linux, currently supported CFS environment.
 

- Comments for or against the SCHED_IA class in the generic
  Linux kernel source tree.
- General comments
 
Mitchell Erblich
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/