Re: nsh_fileapps and usage of sched_lock()

2023-10-25 Thread Ville Juven
 Hi,

Thanks for the responses, they are very much appreciated.

I suspected that the scheduler lock can be removed without too severe side
effects (I made a PR of it already). I made a PR of removing it from
nsh_fileapps already since I think that is potentially a more critical
place to fix due to the file I/O. Loading the process into memory can take
a relatively long time.

I did notice that the builtin apps do the same but deemed that as
not-so-critical because the builtins are already in memory. I did not
immediately fix it due to regression mitigation but I can fix the builtin
apps as well if you wish.

Br,
Ville Juven

On Wed, Oct 25, 2023 at 5:55 PM Gregory Nutt  wrote:

> On 10/25/2023 8:48 AM, Gregory Nutt wrote:
> > On 10/25/2023 8:18 AM, Alan C. Assis wrote:
> >> On 10/25/23, Nathan Hartman  wrote:
> >>> On Wed, Oct 25, 2023 at 5:16 AM Ville Juven
> wrote:
> >>>
>  Hi all,
> 
>  I noticed that when spawning a new task/process from a file in
>  nsh_fileapps, the scheduler is locked prior to calling posix_spawn(),
>  which
>  does the file loading into memory etc.
> 
>  I noticed one issue with this; when the file size is large (in the
> order
>  of
>  MB) the scheduler is locked for very long periods at a time, in the
> order
>  of hundreds of milliseconds.
> The same logic exists in apps/nshlib/nsh_builtin.c.  In fact, it looks
> like one was just cloned from the other.  Both should behave the same way.
>


Re: nsh_fileapps and usage of sched_lock()

2023-10-25 Thread Gregory Nutt

On 10/25/2023 8:48 AM, Gregory Nutt wrote:

On 10/25/2023 8:18 AM, Alan C. Assis wrote:

On 10/25/23, Nathan Hartman  wrote:

On Wed, Oct 25, 2023 at 5:16 AM Ville Juven  wrote:


Hi all,

I noticed that when spawning a new task/process from a file in
nsh_fileapps, the scheduler is locked prior to calling posix_spawn(),
which
does the file loading into memory etc.

I noticed one issue with this; when the file size is large (in the order
of
MB) the scheduler is locked for very long periods at a time, in the order
of hundreds of milliseconds.
The same logic exists in apps/nshlib/nsh_builtin.c.  In fact, it looks 
like one was just cloned from the other.  Both should behave the same way.


Re: nsh_fileapps and usage of sched_lock()

2023-10-25 Thread Gregory Nutt

On 10/25/2023 8:18 AM, Alan C. Assis wrote:

On 10/25/23, Nathan Hartman  wrote:

On Wed, Oct 25, 2023 at 5:16 AM Ville Juven  wrote:


Hi all,

I noticed that when spawning a new task/process from a file in
nsh_fileapps, the scheduler is locked prior to calling posix_spawn(),
which
does the file loading into memory etc.

I noticed one issue with this; when the file size is large (in the order
of
MB) the scheduler is locked for very long periods at a time, in the order
of hundreds of milliseconds.



This sounds like a bug. The scheduler should not be locked during IO-bound
operations, since there is no way to know how long they will take. Loading
from flash could take hundreds of milliseconds (which is already terrible)
but imagine a different scenario where loading from a network with
connection problems outside of the device could lock the device for many
seconds!


If I understood this comment correctly:

   /* Lock the scheduler in an attempt to prevent the application from
* running until waitpid() has been called.
*/

then maybe instead of forcing a sched_lock() we could change the
task_state to TSTATE_TASK_INACTIVE or some other that prevent the task
to be scheduled again before the posix_spawnp() get finished.

BR,

Alan


I think that the sched_lock() is not necessary.   Notice that the this 
is only "an attempt" to keep the application from running and 
executing.  Without the sched_lock(), the task may run and exit before 
there is a change to call waitpid() which should effect only the user 
experience.


A good test to make sure that still works would be remove the 
sched_lock/unlock and add a test case like:


   int main(int argc, char **argv)
   {
  return 0;
   }

That is the case that the logic is trying to avoid, but it seems like it 
should work fine.  Subsequent logic handles this case (but provides no 
use feedback).  See comments:


  /* Wait for the application to exit.  We did lock the scheduler
   * above, but that does not guarantee that the application 
did not
   * already run to completion in the case where I/O was 
redirected.

   * Here the scheduler will be unlocked while waitpid is waiting
   * and if the application has not yet run, it will now be able to
   * do so.
   *
   * NOTE: WUNTRACED does nothing in the default case, but in the
   * case the where CONFIG_SIG_SIGSTOP_ACTION=y, the file app
   * may also be stopped.  In that case WUNTRACED will force
   * waitpid() to return with ECHILD.
   */

And

  /* If the child thread doesn't exist, waitpid() will 
return the

   * error ECHLD.  Since we know that the task was successfully
   * started, this must be one of the cases described above; we
   * have to assume that the task already exit'ed. In this 
case,

   * we have no idea if the application ran successfully or not
   * (because NuttX does not retain exit status of child 
tasks).

   * Let's assume that is did run successfully.
   */

So it looks to me like the case where the program exits is already handled.





Re: nsh_fileapps and usage of sched_lock()

2023-10-25 Thread Alan C. Assis
On 10/25/23, Nathan Hartman  wrote:
> On Wed, Oct 25, 2023 at 5:16 AM Ville Juven  wrote:
>
>> Hi all,
>>
>> I noticed that when spawning a new task/process from a file in
>> nsh_fileapps, the scheduler is locked prior to calling posix_spawn(),
>> which
>> does the file loading into memory etc.
>>
>> I noticed one issue with this; when the file size is large (in the order
>> of
>> MB) the scheduler is locked for very long periods at a time, in the order
>> of hundreds of milliseconds.
>
>
>
> This sounds like a bug. The scheduler should not be locked during IO-bound
> operations, since there is no way to know how long they will take. Loading
> from flash could take hundreds of milliseconds (which is already terrible)
> but imagine a different scenario where loading from a network with
> connection problems outside of the device could lock the device for many
> seconds!
>

If I understood this comment correctly:

  /* Lock the scheduler in an attempt to prevent the application from
   * running until waitpid() has been called.
   */

then maybe instead of forcing a sched_lock() we could change the
task_state to TSTATE_TASK_INACTIVE or some other that prevent the task
to be scheduled again before the posix_spawnp() get finished.

BR,

Alan


Re: nsh_fileapps and usage of sched_lock()

2023-10-25 Thread Nathan Hartman
On Wed, Oct 25, 2023 at 5:16 AM Ville Juven  wrote:

> Hi all,
>
> I noticed that when spawning a new task/process from a file in
> nsh_fileapps, the scheduler is locked prior to calling posix_spawn(), which
> does the file loading into memory etc.
>
> I noticed one issue with this; when the file size is large (in the order of
> MB) the scheduler is locked for very long periods at a time, in the order
> of hundreds of milliseconds.



This sounds like a bug. The scheduler should not be locked during IO-bound
operations, since there is no way to know how long they will take. Loading
from flash could take hundreds of milliseconds (which is already terrible)
but imagine a different scenario where loading from a network with
connection problems outside of the device could lock the device for many
seconds!

More below...


So my question is, is the scheduler lock strictly necessary in this case ?
> The only reason I could surmise from the comment below is that waitpid is
> not performed on a stale pid (or even a possibly re-taken pid ?):
>
>   /* Lock the scheduler in an attempt to prevent the application from
>* running until waitpid() has been called.
>*/
>
>   sched_lock();
>
> A follow-up question obviously is, what happens if the scheduler lock is
> removed ? Will something bad happen and if so, is there a way to mitigate
> this (other than the sched_lock())?



I have not studied the code but conceptually the file should be loaded (or
other IO operations) and only when ready to perform scheduler operation the
scheduler should be locked for the minimal length of time.

More below...


My reason for removing the scheduler lock is obviously that I lose all
> real-time assurances of the OS when I'm starting a process, e.g. I start
> losing sensor data during the process load time.



This is exactly why the scheduler should not be locked for extended lengths
of time.

Nathan


nsh_fileapps and usage of sched_lock()

2023-10-25 Thread Ville Juven
Hi all,

I noticed that when spawning a new task/process from a file in
nsh_fileapps, the scheduler is locked prior to calling posix_spawn(), which
does the file loading into memory etc.

I noticed one issue with this; when the file size is large (in the order of
MB) the scheduler is locked for very long periods at a time, in the order
of hundreds of milliseconds.

So my question is, is the scheduler lock strictly necessary in this case ?
The only reason I could surmise from the comment below is that waitpid is
not performed on a stale pid (or even a possibly re-taken pid ?):

  /* Lock the scheduler in an attempt to prevent the application from
   * running until waitpid() has been called.
   */

  sched_lock();

A follow-up question obviously is, what happens if the scheduler lock is
removed ? Will something bad happen and if so, is there a way to mitigate
this (other than the sched_lock())?

My reason for removing the scheduler lock is obviously that I lose all
real-time assurances of the OS when I'm starting a process, e.g. I start
losing sensor data during the process load time.

Any and all help on this is greatly appreciated.

BR,
Ville Juven / pussuw on github