Re: Sporadic delays in task execution
Hi Junkai, We are using 0.8.1 Dimuthu On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai wrote: > Hi Dimuthu, > > What's the version of Helix you are using? > > Best, > > Junkai > > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha < > dimuthu.upeks...@gmail.com> > wrote: > > > Hi Helix Dev, > > > > We are again seeing this delay in task execution. Please have a look at > the > > screencast [1] of logs printed in participant (top shell) and controller > > (bottom shell). When I record this, there were about 90 - 100 workflows > > pending to be executed. As you can see some tasks were suddenly executed > > and then participant freezed for about 30 seconds before executing next > set > > of tasks. I can see some WARN logs on controller log. I feel like this 30 > > second delay is some sort of a pattern. What do you think as the reason > for > > this? I can provide you more information by turning on verbose logs on > > controller if you want. > > > > [1] https://youtu.be/3EUdSxnIxVw > > > > Thanks > > Dimuthu > > > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha < > dimuthu.upeks...@gmail.com > > > > > wrote: > > > > > Hi Junkai, > > > > > > I'm CCing Airavata dev list as this is directly related to the project. > > > > > > I just went through the zookeeper path like / Name>/EXTERNALVIEW, > > > //CONFIGS/RESOURCE as I have noticed that helix > controller > > is > > > periodically monitoring for the children of those paths even though all > > the > > > Workflows have moved into a saturated state like COMPLETED and STOPPED. > > In > > > our case, we have a lot of completed workflows piled up in those > paths. I > > > believe that helix is clearing up those resources after some TTL. What > I > > > did was writing an external spectator [1] that continuously monitors > for > > > saturated workflows and clearing up resources before controller does > that > > > after a TTL. After that, we didn't see such delays in workflow > execution > > > and everything seems to be running smoothly. However we are > continuously > > > monitoring our deployments for any form of adverse effect introduced by > > > that improvement. > > > > > > Please let us know if we are doing something wrong in this improvement > or > > > is there any better way to achieve this directly through helix task > > > framework. > > > > > > [1] > > > > > > https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java > > > > > > Thanks > > > Dimuthu > > > > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai > wrote: > > > > > >> Could you please check the log of how long for each pipeline stage > > takes? > > >> > > >> Also, did you set expiry for workflows? Are they piled up for long > time? > > >> How long for each workflow completes? > > >> > > >> best, > > >> > > >> Junkai > > >> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha < > > >> dimuthu.upeks...@gmail.com> > > >> wrote: > > >> > > >> > Hi Junkai, > > >> > > > >> > Average load is like 10 - 20 workflows per minutes. In some cases > it's > > >> less > > >> > than that However based on the observations, I feel like it does not > > >> depend > > >> > on the load and it is sporadic. Is there a particular log lines > that I > > >> can > > >> > filter in controller and participant to capture the timeline of > > >> workflow so > > >> > that I can figure out which which component is malfunctioning? We > use > > >> helix > > >> > v 0.8.1. > > >> > > > >> > Thanks > > >> > Dimuthu > > >> > > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai > > >> wrote: > > >> > > > >> > > Hi Dimuthu, > > >> > > > > >> > > At which rate, you are keep submitting workflows? Usually, > Workflow > > >> > > scheduling is very fast. And which version of Helix you are using? > > >> > > > > >> > > Best, > > >> > > > > >> > > Junkai > > >> > > > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha < > > >> > > dimuthu.upeks...@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > Hi Folks, > > >> > > > > > >> > > > We have noticed some delays between workflow submission and > actual > > >> > > picking > > >> > > > up by participants and seems like that delay is somewhat > constant > > >> > around > > >> > > 2- > > >> > > > 3 minutes. We used to continuously submit workflows and after 2 > -3 > > >> > > minutes, > > >> > > > a bulk of workflows are picked by participant and execute them. > > >> Then it > > >> > > > remain silent for next 2 -3 minutes event we submit more > > workflows. > > >> > It's > > >> > > > like participant picking up workflows in discrete time > intervals. > > >> I'm > > >> > not > > >> > > > sure whether this is an issue of controller or the participant. > Do > > >> you > > >> > > have > > >> > > > any experience with this sort of behavior? > > >> > > > > > >> > > > Thanks > > >> > > > Dimuthu > > >> > > > > > >> > > > > >> > > > > >> > > -- > > >> > > Junkai Xue > > >> > > > > >> > > > >> > > >> >
Re: Sporadic delays in task execution
Hi Dimuthu, What's the version of Helix you are using? Best, Junkai On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha wrote: > Hi Helix Dev, > > We are again seeing this delay in task execution. Please have a look at the > screencast [1] of logs printed in participant (top shell) and controller > (bottom shell). When I record this, there were about 90 - 100 workflows > pending to be executed. As you can see some tasks were suddenly executed > and then participant freezed for about 30 seconds before executing next set > of tasks. I can see some WARN logs on controller log. I feel like this 30 > second delay is some sort of a pattern. What do you think as the reason for > this? I can provide you more information by turning on verbose logs on > controller if you want. > > [1] https://youtu.be/3EUdSxnIxVw > > Thanks > Dimuthu > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha > > wrote: > > > Hi Junkai, > > > > I'm CCing Airavata dev list as this is directly related to the project. > > > > I just went through the zookeeper path like //EXTERNALVIEW, > > //CONFIGS/RESOURCE as I have noticed that helix controller > is > > periodically monitoring for the children of those paths even though all > the > > Workflows have moved into a saturated state like COMPLETED and STOPPED. > In > > our case, we have a lot of completed workflows piled up in those paths. I > > believe that helix is clearing up those resources after some TTL. What I > > did was writing an external spectator [1] that continuously monitors for > > saturated workflows and clearing up resources before controller does that > > after a TTL. After that, we didn't see such delays in workflow execution > > and everything seems to be running smoothly. However we are continuously > > monitoring our deployments for any form of adverse effect introduced by > > that improvement. > > > > Please let us know if we are doing something wrong in this improvement or > > is there any better way to achieve this directly through helix task > > framework. > > > > [1] > > > https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java > > > > Thanks > > Dimuthu > > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai wrote: > > > >> Could you please check the log of how long for each pipeline stage > takes? > >> > >> Also, did you set expiry for workflows? Are they piled up for long time? > >> How long for each workflow completes? > >> > >> best, > >> > >> Junkai > >> > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha < > >> dimuthu.upeks...@gmail.com> > >> wrote: > >> > >> > Hi Junkai, > >> > > >> > Average load is like 10 - 20 workflows per minutes. In some cases it's > >> less > >> > than that However based on the observations, I feel like it does not > >> depend > >> > on the load and it is sporadic. Is there a particular log lines that I > >> can > >> > filter in controller and participant to capture the timeline of > >> workflow so > >> > that I can figure out which which component is malfunctioning? We use > >> helix > >> > v 0.8.1. > >> > > >> > Thanks > >> > Dimuthu > >> > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai > >> wrote: > >> > > >> > > Hi Dimuthu, > >> > > > >> > > At which rate, you are keep submitting workflows? Usually, Workflow > >> > > scheduling is very fast. And which version of Helix you are using? > >> > > > >> > > Best, > >> > > > >> > > Junkai > >> > > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha < > >> > > dimuthu.upeks...@gmail.com> > >> > > wrote: > >> > > > >> > > > Hi Folks, > >> > > > > >> > > > We have noticed some delays between workflow submission and actual > >> > > picking > >> > > > up by participants and seems like that delay is somewhat constant > >> > around > >> > > 2- > >> > > > 3 minutes. We used to continuously submit workflows and after 2 -3 > >> > > minutes, > >> > > > a bulk of workflows are picked by participant and execute them. > >> Then it > >> > > > remain silent for next 2 -3 minutes event we submit more > workflows. > >> > It's > >> > > > like participant picking up workflows in discrete time intervals. > >> I'm > >> > not > >> > > > sure whether this is an issue of controller or the participant. Do > >> you > >> > > have > >> > > > any experience with this sort of behavior? > >> > > > > >> > > > Thanks > >> > > > Dimuthu > >> > > > > >> > > > >> > > > >> > > -- > >> > > Junkai Xue > >> > > > >> > > >> > >> > >> -- > >> Junkai Xue > >> > > > -- Junkai Xue
Re: Sporadic delays in task execution
Hi Helix Dev, We are again seeing this delay in task execution. Please have a look at the screencast [1] of logs printed in participant (top shell) and controller (bottom shell). When I record this, there were about 90 - 100 workflows pending to be executed. As you can see some tasks were suddenly executed and then participant freezed for about 30 seconds before executing next set of tasks. I can see some WARN logs on controller log. I feel like this 30 second delay is some sort of a pattern. What do you think as the reason for this? I can provide you more information by turning on verbose logs on controller if you want. [1] https://youtu.be/3EUdSxnIxVw Thanks Dimuthu On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha wrote: > Hi Junkai, > > I'm CCing Airavata dev list as this is directly related to the project. > > I just went through the zookeeper path like //EXTERNALVIEW, > //CONFIGS/RESOURCE as I have noticed that helix controller is > periodically monitoring for the children of those paths even though all the > Workflows have moved into a saturated state like COMPLETED and STOPPED. In > our case, we have a lot of completed workflows piled up in those paths. I > believe that helix is clearing up those resources after some TTL. What I > did was writing an external spectator [1] that continuously monitors for > saturated workflows and clearing up resources before controller does that > after a TTL. After that, we didn't see such delays in workflow execution > and everything seems to be running smoothly. However we are continuously > monitoring our deployments for any form of adverse effect introduced by > that improvement. > > Please let us know if we are doing something wrong in this improvement or > is there any better way to achieve this directly through helix task > framework. > > [1] > https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java > > Thanks > Dimuthu > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai wrote: > >> Could you please check the log of how long for each pipeline stage takes? >> >> Also, did you set expiry for workflows? Are they piled up for long time? >> How long for each workflow completes? >> >> best, >> >> Junkai >> >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha < >> dimuthu.upeks...@gmail.com> >> wrote: >> >> > Hi Junkai, >> > >> > Average load is like 10 - 20 workflows per minutes. In some cases it's >> less >> > than that However based on the observations, I feel like it does not >> depend >> > on the load and it is sporadic. Is there a particular log lines that I >> can >> > filter in controller and participant to capture the timeline of >> workflow so >> > that I can figure out which which component is malfunctioning? We use >> helix >> > v 0.8.1. >> > >> > Thanks >> > Dimuthu >> > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai >> wrote: >> > >> > > Hi Dimuthu, >> > > >> > > At which rate, you are keep submitting workflows? Usually, Workflow >> > > scheduling is very fast. And which version of Helix you are using? >> > > >> > > Best, >> > > >> > > Junkai >> > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha < >> > > dimuthu.upeks...@gmail.com> >> > > wrote: >> > > >> > > > Hi Folks, >> > > > >> > > > We have noticed some delays between workflow submission and actual >> > > picking >> > > > up by participants and seems like that delay is somewhat constant >> > around >> > > 2- >> > > > 3 minutes. We used to continuously submit workflows and after 2 -3 >> > > minutes, >> > > > a bulk of workflows are picked by participant and execute them. >> Then it >> > > > remain silent for next 2 -3 minutes event we submit more workflows. >> > It's >> > > > like participant picking up workflows in discrete time intervals. >> I'm >> > not >> > > > sure whether this is an issue of controller or the participant. Do >> you >> > > have >> > > > any experience with this sort of behavior? >> > > > >> > > > Thanks >> > > > Dimuthu >> > > > >> > > >> > > >> > > -- >> > > Junkai Xue >> > > >> > >> >> >> -- >> Junkai Xue >> >