RE: Bug 71331 - mlock yields processor to lower priority process
> Generally real-time applications should not be doing mlock calls during > their real-time execution for that reason. The required memory regions > should be locked during startup so that this kind of execution delay can > be avoided at runtime. Total agreement on this. . Regards, Bud Davis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 71331 - mlock yields processor to lower priority process
On 21/03/14 08:50 AM, jimmie.da...@l-3com.com wrote:> > > From: Mike Galbraith [umgwanakikb...@gmail.com] > Sent: Friday, March 21, 2014 9:41 AM > To: Davis, Bud @ SSG - Link > Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org > Subject: RE: Bug 71331 - mlock yields processor to lower priority process > > On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: > >> If you call mlock () from a SCHED_FIFO task, you expect it to return >> when done. You don't expect it to block, and your task to be >> pre-empted. > > Say some of your pages are sitting in an nfs swapfile orbiting Neptune, > how do they get home, and what should we do meanwhile? > > -Mike > > Two options. > > #1. Return with a status value of EAGAIN. > > or > > #2. Don't return until you can do it. > > If SCHED_FIFO is used, and mlock() is called, the intention of the user is very clear. Run this task until > it is completed or it blocks (and until a bit ago, mlock() did not block). Returning EAGAIN is not something that the API definition from POSIX allows for, that is only for indicating a failure. If the memory that is being locked is not currently residing in RAM, then the memory will need to be swapped in before the call returns, which clearly cannot be done without blocking. Thus mlock can potentially block, which has not changed. Whether or not any kernel behavior has changed to cause this to happen in some cases where it didn't previously, the fact remains that this is allowed behavior. Generally real-time applications should not be doing mlock calls during their real-time execution for that reason. The required memory regions should be locked during startup so that this kind of execution delay can be avoided at runtime. > > SCHED_FIFO users don't care about fairness. They want the system to do what it is told. > > regards, > Bud Davis > > > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 71331 - mlock yields processor to lower priority process
On 21/03/14 08:50 AM, jimmie.da...@l-3com.com wrote: From: Mike Galbraith [umgwanakikb...@gmail.com] Sent: Friday, March 21, 2014 9:41 AM To: Davis, Bud @ SSG - Link Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: RE: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: If you call mlock () from a SCHED_FIFO task, you expect it to return when done. You don't expect it to block, and your task to be pre-empted. Say some of your pages are sitting in an nfs swapfile orbiting Neptune, how do they get home, and what should we do meanwhile? -Mike Two options. #1. Return with a status value of EAGAIN. or #2. Don't return until you can do it. If SCHED_FIFO is used, and mlock() is called, the intention of the user is very clear. Run this task until it is completed or it blocks (and until a bit ago, mlock() did not block). Returning EAGAIN is not something that the API definition from POSIX allows for, that is only for indicating a failure. If the memory that is being locked is not currently residing in RAM, then the memory will need to be swapped in before the call returns, which clearly cannot be done without blocking. Thus mlock can potentially block, which has not changed. Whether or not any kernel behavior has changed to cause this to happen in some cases where it didn't previously, the fact remains that this is allowed behavior. Generally real-time applications should not be doing mlock calls during their real-time execution for that reason. The required memory regions should be locked during startup so that this kind of execution delay can be avoided at runtime. SCHED_FIFO users don't care about fairness. They want the system to do what it is told. regards, Bud Davis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
Generally real-time applications should not be doing mlock calls during their real-time execution for that reason. The required memory regions should be locked during startup so that this kind of execution delay can be avoided at runtime. Total agreement on this. . Regards, Bud Davis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
On Thu, 2014-03-27 at 04:20 +, jimmie.da...@l-3com.com wrote: > The example code submitted into bugzilla (chase back on the thread a > bit, there is a reference) shows the problem. > > Two threads, TaskA (high priority) and TaskB (low priority). Assigned > to the same processor, explicitly for the guarantee that only one of > them can execute at a time. Your priority based serialization guarantee does not exist. Tasks can be and are put to sleep. When that happens, a lower priority runnable task will run. Whether you like that fact or not, it remains a fact. If you don't want your lower priority task to run, why do you wake it?. -Mike > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
-Original Message- From: Andy Lutomirski [mailto:l...@amacapital.net] Sent: Wednesday, March 26, 2014 7:40 PM To: Davis, Bud @ SSG - Link; umgwanakikb...@gmail.com Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: Re: Bug 71331 - mlock yields processor to lower priority process On 03/21/2014 07:50 AM, jimmie.da...@l-3com.com wrote: > > > From: Mike Galbraith [umgwanakikb...@gmail.com] > Sent: Friday, March 21, 2014 9:41 AM > To: Davis, Bud @ SSG - Link > Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; > kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org > Subject: RE: Bug 71331 - mlock yields processor to lower priority process > > On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: > >> If you call mlock () from a SCHED_FIFO task, you expect it to return >> when done. You don't expect it to block, and your task to be >> pre-empted. > > Say some of your pages are sitting in an nfs swapfile orbiting Neptune, > how do they get home, and what should we do meanwhile? > > -Mike > > Two options. > > #1. Return with a status value of EAGAIN. > > or > > #2. Don't return until you can do it. > > If SCHED_FIFO is used, and mlock() is called, the intention of the user is > very clear. Run this task until > it is completed or it blocks (and until a bit ago, mlock() did not block). > > SCHED_FIFO users don't care about fairness. They want the system to do what > it is told. I use mlock in real-time processes, but I do it in a separate thread. Seriously, though, what do you expect the kernel to do? When you call mlock on a page that isn't present, the kernel will *read* that page. mlock will, therefore, block until the IO finishes. Some time around 3.9, the behavior changed a little bit: IIRC mlock used to hold mmap_sem while sleeping. Or maybe just mmap with MCL_FUTURE did that. In any case, the mlock code is less lock-happy than it was. Is it possible that you have two threads, and the non-mlock-calling thread got blocked behind mlock, so it looked better? --Andy === Andy, The example code submitted into bugzilla (chase back on the thread a bit, there is a reference) shows the problem. Two threads, TaskA (high priority) and TaskB (low priority). Assigned to the same processor, explicitly for the guarantee that only one of them can execute at a time. TaskA becomes eligible to run. As part of its processing ( which the normal end is a call to sem_wait() ), it calls mlock(). TaskA then blocks, and TaskB begins running. But wait, the system is designed that TaskA will run until it is done (thus SCHED_FIFO and a priority less than TaskB). TaskA, a higher priority task is suspended and TaskB starts running. And in the code that lead me on this endeavor :) {consisting of a lot of Ada threads}, the result was a segfault due to half-processed data by TaskA. This is what I call 'blocking'; the thread is no longer running and the scheduler puts someone else in the processor. I don't mean 'takes a long time until it returns'. Takes a long time is fine, the system design relies on priority based scheduling and cpu affinity to ensure ordered access to application data. mlock() now blocks. I don't care how long mlock() takes, what I care about is the lower priority process pre-empting me. Only a limited number of syscalls block; those that do are documented and usually have a way to obtain blocking or non-blocking behavior. Can I change the system to deal with mlock() being a blocking syscall ? Yes, but this is a situation where working code, that meets the API has stopped working. Thanks for looking at it. Regards, Bud Davis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 71331 - mlock yields processor to lower priority process
On 03/21/2014 07:50 AM, jimmie.da...@l-3com.com wrote: > > > From: Mike Galbraith [umgwanakikb...@gmail.com] > Sent: Friday, March 21, 2014 9:41 AM > To: Davis, Bud @ SSG - Link > Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; > kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org > Subject: RE: Bug 71331 - mlock yields processor to lower priority process > > On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: > >> If you call mlock () from a SCHED_FIFO task, you expect it to return >> when done. You don't expect it to block, and your task to be >> pre-empted. > > Say some of your pages are sitting in an nfs swapfile orbiting Neptune, > how do they get home, and what should we do meanwhile? > > -Mike > > Two options. > > #1. Return with a status value of EAGAIN. > > or > > #2. Don't return until you can do it. > > If SCHED_FIFO is used, and mlock() is called, the intention of the user is > very clear. Run this task until > it is completed or it blocks (and until a bit ago, mlock() did not block). > > SCHED_FIFO users don't care about fairness. They want the system to do what > it is told. I use mlock in real-time processes, but I do it in a separate thread. Seriously, though, what do you expect the kernel to do? When you call mlock on a page that isn't present, the kernel will *read* that page. mlock will, therefore, block until the IO finishes. Some time around 3.9, the behavior changed a little bit: IIRC mlock used to hold mmap_sem while sleeping. Or maybe just mmap with MCL_FUTURE did that. In any case, the mlock code is less lock-happy than it was. Is it possible that you have two threads, and the non-mlock-calling thread got blocked behind mlock, so it looked better? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 71331 - mlock yields processor to lower priority process
On 03/21/2014 07:50 AM, jimmie.da...@l-3com.com wrote: From: Mike Galbraith [umgwanakikb...@gmail.com] Sent: Friday, March 21, 2014 9:41 AM To: Davis, Bud @ SSG - Link Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: RE: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: If you call mlock () from a SCHED_FIFO task, you expect it to return when done. You don't expect it to block, and your task to be pre-empted. Say some of your pages are sitting in an nfs swapfile orbiting Neptune, how do they get home, and what should we do meanwhile? -Mike Two options. #1. Return with a status value of EAGAIN. or #2. Don't return until you can do it. If SCHED_FIFO is used, and mlock() is called, the intention of the user is very clear. Run this task until it is completed or it blocks (and until a bit ago, mlock() did not block). SCHED_FIFO users don't care about fairness. They want the system to do what it is told. I use mlock in real-time processes, but I do it in a separate thread. Seriously, though, what do you expect the kernel to do? When you call mlock on a page that isn't present, the kernel will *read* that page. mlock will, therefore, block until the IO finishes. Some time around 3.9, the behavior changed a little bit: IIRC mlock used to hold mmap_sem while sleeping. Or maybe just mmap with MCL_FUTURE did that. In any case, the mlock code is less lock-happy than it was. Is it possible that you have two threads, and the non-mlock-calling thread got blocked behind mlock, so it looked better? --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
-Original Message- From: Andy Lutomirski [mailto:l...@amacapital.net] Sent: Wednesday, March 26, 2014 7:40 PM To: Davis, Bud @ SSG - Link; umgwanakikb...@gmail.com Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: Re: Bug 71331 - mlock yields processor to lower priority process On 03/21/2014 07:50 AM, jimmie.da...@l-3com.com wrote: From: Mike Galbraith [umgwanakikb...@gmail.com] Sent: Friday, March 21, 2014 9:41 AM To: Davis, Bud @ SSG - Link Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: RE: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: If you call mlock () from a SCHED_FIFO task, you expect it to return when done. You don't expect it to block, and your task to be pre-empted. Say some of your pages are sitting in an nfs swapfile orbiting Neptune, how do they get home, and what should we do meanwhile? -Mike Two options. #1. Return with a status value of EAGAIN. or #2. Don't return until you can do it. If SCHED_FIFO is used, and mlock() is called, the intention of the user is very clear. Run this task until it is completed or it blocks (and until a bit ago, mlock() did not block). SCHED_FIFO users don't care about fairness. They want the system to do what it is told. I use mlock in real-time processes, but I do it in a separate thread. Seriously, though, what do you expect the kernel to do? When you call mlock on a page that isn't present, the kernel will *read* that page. mlock will, therefore, block until the IO finishes. Some time around 3.9, the behavior changed a little bit: IIRC mlock used to hold mmap_sem while sleeping. Or maybe just mmap with MCL_FUTURE did that. In any case, the mlock code is less lock-happy than it was. Is it possible that you have two threads, and the non-mlock-calling thread got blocked behind mlock, so it looked better? --Andy === Andy, The example code submitted into bugzilla (chase back on the thread a bit, there is a reference) shows the problem. Two threads, TaskA (high priority) and TaskB (low priority). Assigned to the same processor, explicitly for the guarantee that only one of them can execute at a time. TaskA becomes eligible to run. As part of its processing ( which the normal end is a call to sem_wait() ), it calls mlock(). TaskA then blocks, and TaskB begins running. But wait, the system is designed that TaskA will run until it is done (thus SCHED_FIFO and a priority less than TaskB). TaskA, a higher priority task is suspended and TaskB starts running. And in the code that lead me on this endeavor :) {consisting of a lot of Ada threads}, the result was a segfault due to half-processed data by TaskA. This is what I call 'blocking'; the thread is no longer running and the scheduler puts someone else in the processor. I don't mean 'takes a long time until it returns'. Takes a long time is fine, the system design relies on priority based scheduling and cpu affinity to ensure ordered access to application data. mlock() now blocks. I don't care how long mlock() takes, what I care about is the lower priority process pre-empting me. Only a limited number of syscalls block; those that do are documented and usually have a way to obtain blocking or non-blocking behavior. Can I change the system to deal with mlock() being a blocking syscall ? Yes, but this is a situation where working code, that meets the API has stopped working. Thanks for looking at it. Regards, Bud Davis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
On Thu, 2014-03-27 at 04:20 +, jimmie.da...@l-3com.com wrote: The example code submitted into bugzilla (chase back on the thread a bit, there is a reference) shows the problem. Two threads, TaskA (high priority) and TaskB (low priority). Assigned to the same processor, explicitly for the guarantee that only one of them can execute at a time. Your priority based serialization guarantee does not exist. Tasks can be and are put to sleep. When that happens, a lower priority runnable task will run. Whether you like that fact or not, it remains a fact. If you don't want your lower priority task to run, why do you wake it?. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
> Mike, > > There are several problem domains where you protect critical sections by > assigning multiple threads to a single CPU and use priorities > and SCHED_FIFO to ensure data integrity. > > In this kind of design you don't make many syscalls. The ones you do make, > have to be clearly understood > if they block. > > So, yes, I expect that a SCHED_FIFO task, that uses a subset of syscalls > known to be non-blocking, will not block. > > If it is not 'unstoppable', then there is a defect in the OS. > > In the past, a call to mlock() was known to be OK. It would not block. It > might take a while, but it would run to completion. It does not > do that any more. False. Mlock is blockable since it was born. Mlock and mlockall need memory allocate by definition. And it could lead to run VM activity and it may block. At least, on Linux. lru_add_drain_all() is not only place to wait. Even if we remove it, mlock can still block. I don't think this discussion make sense. > If mlock() is now a blocking call, then fine. It only needs to be called on > occasion, and this can be accounted for in the application Now? I have not seen any recent change. Note: I'm not sure Artem's use-case is good or bad. I only say the false assumption don't make a good discussion. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
From: Mike Galbraith [umgwanakikb...@gmail.com] Sent: Friday, March 21, 2014 9:41 AM To: Davis, Bud @ SSG - Link Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: RE: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: > If you call mlock () from a SCHED_FIFO task, you expect it to return > when done. You don't expect it to block, and your task to be > pre-empted. Say some of your pages are sitting in an nfs swapfile orbiting Neptune, how do they get home, and what should we do meanwhile? -Mike Two options. #1. Return with a status value of EAGAIN. or #2. Don't return until you can do it. If SCHED_FIFO is used, and mlock() is called, the intention of the user is very clear. Run this task until it is completed or it blocks (and until a bit ago, mlock() did not block). SCHED_FIFO users don't care about fairness. They want the system to do what it is told. regards, Bud Davis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: > If you call mlock () from a SCHED_FIFO task, you expect it to return > when done. You don't expect it to block, and your task to be > pre-empted. Say some of your pages are sitting in an nfs swapfile orbiting Neptune, how do they get home, and what should we do meanwhile? -Mike > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
From: Mike Galbraith [umgwanakikb...@gmail.com] Sent: Friday, March 21, 2014 8:14 AM To: Davis, Bud @ SSG - Link Cc: artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: RE: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 12:18 +, jimmie.da...@l-3com.com wrote: > As the submitter of the bug, let me give you my perspective. > SCHED_FIFO means run my task until it blocks or a higher priority task > pre-empts it. Period. It blocked. > > mlock() doesn't block. check the man page. > I don't see that specified. (or how it could be, but what do I know, IANIPL) > Any other way and you are not able to use priority based scheduling. Sure you can, allocate and lock down resources before entering critical sections. If you think donning a SCHED_FIFO super-suit should make your task unstoppable, you're gonna be very disappointed. Fact is if your Juggernaut bumps ever so gently into a contended sleeping variety lock (and in the rt kernel that means nearly every lock), it will block. -Mike Mike, There are several problem domains where you protect critical sections by assigning multiple threads to a single CPU and use priorities and SCHED_FIFO to ensure data integrity. In this kind of design you don't make many syscalls. The ones you do make, have to be clearly understood if they block. So, yes, I expect that a SCHED_FIFO task, that uses a subset of syscalls known to be non-blocking, will not block. If it is not 'unstoppable', then there is a defect in the OS. In the past, a call to mlock() was known to be OK. It would not block. It might take a while, but it would run to completion. It does not do that any more. If mlock() is now a blocking call, then fine. It only needs to be called on occasion, and this can be accounted for in the application design. Does write() block ? Yes, the man pages talks all about it. Does clock_gettime() block ? No, blocking is not mentioned in the man page. Blocking behaviour is rare, when it exists it is documented. My point is, this is either a defect to be fixed, or a change that warrants updating the documentation. regards, Bud Davis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
From: Oliver Neukum [oneu...@suse.de] Sent: Friday, March 21, 2014 8:35 AM To: Davis, Bud @ SSG - Link Cc: umgwanakikb...@gmail.com; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: Re: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 12:18 +, jimmie.da...@l-3com.com wrote: > > >How is that different from any other time a task has to yield the CPU > >for a bit? While your high priority task is blocked for whatever > >reason, a lower priority task gets to use the CPU. > > > As the submitter of the bug, let me give you my perspective. SCHED_FIFO > means run my task until it blocks or a higher priority task pre-empts it. > Period. > > mlock() doesn't block. check the man page. It guarantees that all pages be in RAM. That means it has to read them in if they aren't. How could it do that without blocking? Regards Oliver -- Oliver, I would assume it would touch some flag bits on every page. As part of the thread of execution that called it. If you call mlock () from a SCHED_FIFO task, you expect it to return when done. You don't expect it to block, and your task to be pre-empted. For many years it returned when finished. Now, it blocks. This makes code that used to work, not work. I consider it a defect. regards, Bud Davis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 71331 - mlock yields processor to lower priority process
On Fri, 2014-03-21 at 12:18 +, jimmie.da...@l-3com.com wrote: > > >How is that different from any other time a task has to yield the CPU > >for a bit? While your high priority task is blocked for whatever > >reason, a lower priority task gets to use the CPU. > > > As the submitter of the bug, let me give you my perspective. SCHED_FIFO > means run my task until it blocks or a higher priority task pre-empts it. > Period. > > mlock() doesn't block. check the man page. It guarantees that all pages be in RAM. That means it has to read them in if they aren't. How could it do that without blocking? Regards Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
On Fri, 2014-03-21 at 12:18 +, jimmie.da...@l-3com.com wrote: > As the submitter of the bug, let me give you my perspective. > SCHED_FIFO means run my task until it blocks or a higher priority task > pre-empts it. Period. It blocked. > > mlock() doesn't block. check the man page. > I don't see that specified. (or how it could be, but what do I know, IANIPL) > Any other way and you are not able to use priority based scheduling. Sure you can, allocate and lock down resources before entering critical sections. If you think donning a SCHED_FIFO super-suit should make your task unstoppable, you're gonna be very disappointed. Fact is if your Juggernaut bumps ever so gently into a contended sleeping variety lock (and in the rt kernel that means nearly every lock), it will block. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
>How is that different from any other time a task has to yield the CPU >for a bit? While your high priority task is blocked for whatever >reason, a lower priority task gets to use the CPU. As the submitter of the bug, let me give you my perspective. SCHED_FIFO means run my task until it blocks or a higher priority task pre-empts it. Period. mlock() doesn't block. check the man page. Any other way and you are not able to use priority based scheduling. --bud davis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 71331 - mlock yields processor to lower priority process
On Fri, 2014-03-21 at 23:02 +0300, Artem Fetishev wrote: > Hi all, > > I am looking at a use-case when a real-time task (B) of higher > priority is sometimes preempted by another real-time task (A) of lower > priority. Well, B is not really preempted. It calls mlockall() which > forces task B to yield the CPU. Under certain conditions, mlockall() > calls lru_add_drain_all() which schedules a deferred work and wants > the calling task to wait until that work is complete by putting the > task into TASK_UNINTERRUPTIBLE state and calling schedule_timeout(). > > Tasks utilize SCHED_FIFO policy. > > See details here: https://bugzilla.kernel.org/show_bug.cgi?id=71331 > > Besides mlockall, there are other kernel paths which make use of > lru_add_drain_all() and schedule_timeout(), so I guess there are bunch > of other syscalls which may lead to the above use-case. > > So the question is: is above use-case an expected behavior of > real-time tasks or is it a bug in mlockall (i.e. it should not > interrupt a real-time process)? How is that different from any other time a task has to yield the CPU for a bit? While your high priority task is blocked for whatever reason, a lower priority task gets to use the CPU. The bad thing is that in this case, your high priority task becomes dependent upon kworker threads all over the box, with no mechanism to guarantee that any of them will ever run. No PI-boost to the rescue, nada, say byebye to determinism. That's true any time you depend upon some generic proxy. Nothing tracks IO for instance, to make sure your IO is handled all the way through the chain by proxies of your priority. What happens if say kjournald is preempted by a low priority SCHED_FIFO hog.. nobody needing kjournald to make progress goes anywhere, SCHED_FIFO 99 may as well be SCHED_IDLE. In short, yes, I think this is the expected behavior. Don't do things that grow dependencies upon generic kernel proxies at critical times. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 71331 - mlock yields processor to lower priority process
On Fri, 2014-03-21 at 23:02 +0300, Artem Fetishev wrote: Hi all, I am looking at a use-case when a real-time task (B) of higher priority is sometimes preempted by another real-time task (A) of lower priority. Well, B is not really preempted. It calls mlockall() which forces task B to yield the CPU. Under certain conditions, mlockall() calls lru_add_drain_all() which schedules a deferred work and wants the calling task to wait until that work is complete by putting the task into TASK_UNINTERRUPTIBLE state and calling schedule_timeout(). Tasks utilize SCHED_FIFO policy. See details here: https://bugzilla.kernel.org/show_bug.cgi?id=71331 Besides mlockall, there are other kernel paths which make use of lru_add_drain_all() and schedule_timeout(), so I guess there are bunch of other syscalls which may lead to the above use-case. So the question is: is above use-case an expected behavior of real-time tasks or is it a bug in mlockall (i.e. it should not interrupt a real-time process)? How is that different from any other time a task has to yield the CPU for a bit? While your high priority task is blocked for whatever reason, a lower priority task gets to use the CPU. The bad thing is that in this case, your high priority task becomes dependent upon kworker threads all over the box, with no mechanism to guarantee that any of them will ever run. No PI-boost to the rescue, nada, say byebye to determinism. That's true any time you depend upon some generic proxy. Nothing tracks IO for instance, to make sure your IO is handled all the way through the chain by proxies of your priority. What happens if say kjournald is preempted by a low priority SCHED_FIFO hog.. nobody needing kjournald to make progress goes anywhere, SCHED_FIFO 99 may as well be SCHED_IDLE. In short, yes, I think this is the expected behavior. Don't do things that grow dependencies upon generic kernel proxies at critical times. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
How is that different from any other time a task has to yield the CPU for a bit? While your high priority task is blocked for whatever reason, a lower priority task gets to use the CPU. As the submitter of the bug, let me give you my perspective. SCHED_FIFO means run my task until it blocks or a higher priority task pre-empts it. Period. mlock() doesn't block. check the man page. Any other way and you are not able to use priority based scheduling. --bud davis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
On Fri, 2014-03-21 at 12:18 +, jimmie.da...@l-3com.com wrote: As the submitter of the bug, let me give you my perspective. SCHED_FIFO means run my task until it blocks or a higher priority task pre-empts it. Period. It blocked. mlock() doesn't block. check the man page. I don't see that specified. (or how it could be, but what do I know, IANIPL) Any other way and you are not able to use priority based scheduling. Sure you can, allocate and lock down resources before entering critical sections. If you think donning a SCHED_FIFO super-suit should make your task unstoppable, you're gonna be very disappointed. Fact is if your Juggernaut bumps ever so gently into a contended sleeping variety lock (and in the rt kernel that means nearly every lock), it will block. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 71331 - mlock yields processor to lower priority process
On Fri, 2014-03-21 at 12:18 +, jimmie.da...@l-3com.com wrote: How is that different from any other time a task has to yield the CPU for a bit? While your high priority task is blocked for whatever reason, a lower priority task gets to use the CPU. As the submitter of the bug, let me give you my perspective. SCHED_FIFO means run my task until it blocks or a higher priority task pre-empts it. Period. mlock() doesn't block. check the man page. It guarantees that all pages be in RAM. That means it has to read them in if they aren't. How could it do that without blocking? Regards Oliver -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
From: Oliver Neukum [oneu...@suse.de] Sent: Friday, March 21, 2014 8:35 AM To: Davis, Bud @ SSG - Link Cc: umgwanakikb...@gmail.com; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: Re: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 12:18 +, jimmie.da...@l-3com.com wrote: How is that different from any other time a task has to yield the CPU for a bit? While your high priority task is blocked for whatever reason, a lower priority task gets to use the CPU. As the submitter of the bug, let me give you my perspective. SCHED_FIFO means run my task until it blocks or a higher priority task pre-empts it. Period. mlock() doesn't block. check the man page. It guarantees that all pages be in RAM. That means it has to read them in if they aren't. How could it do that without blocking? Regards Oliver -- Oliver, I would assume it would touch some flag bits on every page. As part of the thread of execution that called it. If you call mlock () from a SCHED_FIFO task, you expect it to return when done. You don't expect it to block, and your task to be pre-empted. For many years it returned when finished. Now, it blocks. This makes code that used to work, not work. I consider it a defect. regards, Bud Davis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
From: Mike Galbraith [umgwanakikb...@gmail.com] Sent: Friday, March 21, 2014 8:14 AM To: Davis, Bud @ SSG - Link Cc: artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: RE: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 12:18 +, jimmie.da...@l-3com.com wrote: As the submitter of the bug, let me give you my perspective. SCHED_FIFO means run my task until it blocks or a higher priority task pre-empts it. Period. It blocked. mlock() doesn't block. check the man page. I don't see that specified. (or how it could be, but what do I know, IANIPL) Any other way and you are not able to use priority based scheduling. Sure you can, allocate and lock down resources before entering critical sections. If you think donning a SCHED_FIFO super-suit should make your task unstoppable, you're gonna be very disappointed. Fact is if your Juggernaut bumps ever so gently into a contended sleeping variety lock (and in the rt kernel that means nearly every lock), it will block. -Mike Mike, There are several problem domains where you protect critical sections by assigning multiple threads to a single CPU and use priorities and SCHED_FIFO to ensure data integrity. In this kind of design you don't make many syscalls. The ones you do make, have to be clearly understood if they block. So, yes, I expect that a SCHED_FIFO task, that uses a subset of syscalls known to be non-blocking, will not block. If it is not 'unstoppable', then there is a defect in the OS. In the past, a call to mlock() was known to be OK. It would not block. It might take a while, but it would run to completion. It does not do that any more. If mlock() is now a blocking call, then fine. It only needs to be called on occasion, and this can be accounted for in the application design. Does write() block ? Yes, the man pages talks all about it. Does clock_gettime() block ? No, blocking is not mentioned in the man page. Blocking behaviour is rare, when it exists it is documented. My point is, this is either a defect to be fixed, or a change that warrants updating the documentation. regards, Bud Davis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: If you call mlock () from a SCHED_FIFO task, you expect it to return when done. You don't expect it to block, and your task to be pre-empted. Say some of your pages are sitting in an nfs swapfile orbiting Neptune, how do they get home, and what should we do meanwhile? -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
From: Mike Galbraith [umgwanakikb...@gmail.com] Sent: Friday, March 21, 2014 9:41 AM To: Davis, Bud @ SSG - Link Cc: oneu...@suse.de; artem_fetis...@epam.com; pet...@infradead.org; kosaki.motoh...@jp.fujitsu.com; linux-kernel@vger.kernel.org Subject: RE: Bug 71331 - mlock yields processor to lower priority process On Fri, 2014-03-21 at 14:01 +, jimmie.da...@l-3com.com wrote: If you call mlock () from a SCHED_FIFO task, you expect it to return when done. You don't expect it to block, and your task to be pre-empted. Say some of your pages are sitting in an nfs swapfile orbiting Neptune, how do they get home, and what should we do meanwhile? -Mike Two options. #1. Return with a status value of EAGAIN. or #2. Don't return until you can do it. If SCHED_FIFO is used, and mlock() is called, the intention of the user is very clear. Run this task until it is completed or it blocks (and until a bit ago, mlock() did not block). SCHED_FIFO users don't care about fairness. They want the system to do what it is told. regards, Bud Davis -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Bug 71331 - mlock yields processor to lower priority process
Mike, There are several problem domains where you protect critical sections by assigning multiple threads to a single CPU and use priorities and SCHED_FIFO to ensure data integrity. In this kind of design you don't make many syscalls. The ones you do make, have to be clearly understood if they block. So, yes, I expect that a SCHED_FIFO task, that uses a subset of syscalls known to be non-blocking, will not block. If it is not 'unstoppable', then there is a defect in the OS. In the past, a call to mlock() was known to be OK. It would not block. It might take a while, but it would run to completion. It does not do that any more. False. Mlock is blockable since it was born. Mlock and mlockall need memory allocate by definition. And it could lead to run VM activity and it may block. At least, on Linux. lru_add_drain_all() is not only place to wait. Even if we remove it, mlock can still block. I don't think this discussion make sense. If mlock() is now a blocking call, then fine. It only needs to be called on occasion, and this can be accounted for in the application Now? I have not seen any recent change. Note: I'm not sure Artem's use-case is good or bad. I only say the false assumption don't make a good discussion. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/