Re: [Rd] User interrupts parallel excution. Why it works or why not?

2021-07-20 Thread Jiefei Wang
Thanks for your explanation. This makes a lot of sense! SIGINT
handling is a blind spot to me, this introduction looks perfect!

Best,
Jiefei

On Tue, Jul 20, 2021 at 4:31 PM Tomas Kalibera  wrote:
>
> Hi Jiefei,
>
> when you run the cluster "automatically" in your terminal and pres
> Ctrl-C in Unix, both the master and the worker processes get the SIGINT
> signal, because they belong to the same foreground process group. So you
> are directly interrupting also the worker process.
>
> When you run the cluster "manually", that is the master in one terminal
> window and the worker in another, they are in different process groups
> and if you pres Ctrl-C in the terminal running the master, only the
> master will receive SIGINT signal, not the worker.
>
> If you wanted to read the sources more, look for SIGINT handling in R,
> the onintrEx() function, etc. A good source on signal handling is e.g.
> http://www.linusakesson.net/programming/tty/
>
> Best
> Tomas
>
> On 7/20/21 9:55 AM, Jiefei Wang wrote:
> > Hi all,
> >
> > I just notice this interesting problem a few days before, but I cannot
> > find an answer for it. Say if you have a long-running job in a cluster
> > made by the parallel package and you decide to stop the execution by
> > pressing ctr + c in the terminal or the stop button in Rstudio for
> > some reason. After the interrupt, is the cluster still valid or not?
> > Below is a simple example code
> >
> > library(parallel)
> > cl <- makeCluster(1)
> > ## run and interrupt it
> > parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()})
> > ## run another apply function to check the cluster status
> > parLapply(cl, 1, function(i)i)
> >
> >  From my test result, the answer is yes. The worker is interrupted
> > immediately and the cluster is ready for the next command, but when I
> > create the worker manually, things seem different.
> >
> > library(parallel)
> > cl <- makeCluster(1, manual = TRUE)
> > ## run and interrupt it
> > parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()})
> > ## run another apply function to check the cluster status
> > parLapply(cl, 1, function(i)i)
> >
> > It seems like the worker does not know the manager has been
> > interrupted and still runs the current task. I have to wait for 10
> > seconds before I can get the result from the last line of the code and
> > the return value is the PID from the first apply function.
> >
> > Both cases are reasonable, but it is surprising to see them at the
> > same time. I start to wonder how the user interrupt is handled, so I
> > looked at the code in the parallel package. However, it looks like
> > there is no related code, there is no try-catch statement in the
> > manager's code to handle the user interrupt, but the worker just
> > magically knows it should stop the current execution.
> >
> > I can see this behavior in both Win and Ubuntu. It is kind of beyond
> > my knowledge, so I wonder if anyone can help me with it. Does the
> > cluster support the user interrupt? Why the above code works or not
> > works? Many thanks!
> >
> > Best,
> > Jiefei
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] User interrupts parallel excution. Why it works or why not?

2021-07-20 Thread Tomas Kalibera

Hi Jiefei,

when you run the cluster "automatically" in your terminal and pres 
Ctrl-C in Unix, both the master and the worker processes get the SIGINT 
signal, because they belong to the same foreground process group. So you 
are directly interrupting also the worker process.


When you run the cluster "manually", that is the master in one terminal 
window and the worker in another, they are in different process groups 
and if you pres Ctrl-C in the terminal running the master, only the 
master will receive SIGINT signal, not the worker.


If you wanted to read the sources more, look for SIGINT handling in R, 
the onintrEx() function, etc. A good source on signal handling is e.g. 
http://www.linusakesson.net/programming/tty/


Best
Tomas

On 7/20/21 9:55 AM, Jiefei Wang wrote:

Hi all,

I just notice this interesting problem a few days before, but I cannot
find an answer for it. Say if you have a long-running job in a cluster
made by the parallel package and you decide to stop the execution by
pressing ctr + c in the terminal or the stop button in Rstudio for
some reason. After the interrupt, is the cluster still valid or not?
Below is a simple example code

library(parallel)
cl <- makeCluster(1)
## run and interrupt it
parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()})
## run another apply function to check the cluster status
parLapply(cl, 1, function(i)i)

 From my test result, the answer is yes. The worker is interrupted
immediately and the cluster is ready for the next command, but when I
create the worker manually, things seem different.

library(parallel)
cl <- makeCluster(1, manual = TRUE)
## run and interrupt it
parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()})
## run another apply function to check the cluster status
parLapply(cl, 1, function(i)i)

It seems like the worker does not know the manager has been
interrupted and still runs the current task. I have to wait for 10
seconds before I can get the result from the last line of the code and
the return value is the PID from the first apply function.

Both cases are reasonable, but it is surprising to see them at the
same time. I start to wonder how the user interrupt is handled, so I
looked at the code in the parallel package. However, it looks like
there is no related code, there is no try-catch statement in the
manager's code to handle the user interrupt, but the worker just
magically knows it should stop the current execution.

I can see this behavior in both Win and Ubuntu. It is kind of beyond
my knowledge, so I wonder if anyone can help me with it. Does the
cluster support the user interrupt? Why the above code works or not
works? Many thanks!

Best,
Jiefei

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] User interrupts parallel excution. Why it works or why not?

2021-07-20 Thread Jiefei Wang
Hi all,

I just notice this interesting problem a few days before, but I cannot
find an answer for it. Say if you have a long-running job in a cluster
made by the parallel package and you decide to stop the execution by
pressing ctr + c in the terminal or the stop button in Rstudio for
some reason. After the interrupt, is the cluster still valid or not?
Below is a simple example code

library(parallel)
cl <- makeCluster(1)
## run and interrupt it
parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()})
## run another apply function to check the cluster status
parLapply(cl, 1, function(i)i)

>From my test result, the answer is yes. The worker is interrupted
immediately and the cluster is ready for the next command, but when I
create the worker manually, things seem different.

library(parallel)
cl <- makeCluster(1, manual = TRUE)
## run and interrupt it
parLapply(cl, 1, function(i){Sys.sleep(10);Sys.getpid()})
## run another apply function to check the cluster status
parLapply(cl, 1, function(i)i)

It seems like the worker does not know the manager has been
interrupted and still runs the current task. I have to wait for 10
seconds before I can get the result from the last line of the code and
the return value is the PID from the first apply function.

Both cases are reasonable, but it is surprising to see them at the
same time. I start to wonder how the user interrupt is handled, so I
looked at the code in the parallel package. However, it looks like
there is no related code, there is no try-catch statement in the
manager's code to handle the user interrupt, but the worker just
magically knows it should stop the current execution.

I can see this behavior in both Win and Ubuntu. It is kind of beyond
my knowledge, so I wonder if anyone can help me with it. Does the
cluster support the user interrupt? Why the above code works or not
works? Many thanks!

Best,
Jiefei

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel