Re: Sporadic delays in task execution

2019-03-20 Thread DImuthu Upeksha
Hi Junkai,

We are using 0.8.1

Dimuthu

On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai  wrote:

> Hi Dimuthu,
>
> What's the version of Helix you are using?
>
> Best,
>
> Junkai
>
> On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com>
> wrote:
>
> > Hi Helix Dev,
> >
> > We are again seeing this delay in task execution. Please have a look at
> the
> > screencast [1] of logs printed in participant (top shell) and controller
> > (bottom shell). When I record this, there were about 90 - 100 workflows
> > pending to be executed. As you can see some tasks were suddenly executed
> > and then participant freezed for about 30 seconds before executing next
> set
> > of tasks. I can see some WARN logs on controller log. I feel like this 30
> > second delay is some sort of a pattern. What do you think as the reason
> for
> > this? I can provide you more information by turning on verbose logs on
> > controller if you want.
> >
> > [1] https://youtu.be/3EUdSxnIxVw
> >
> > Thanks
> > Dimuthu
> >
> > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com
> > >
> > wrote:
> >
> > > Hi Junkai,
> > >
> > > I'm CCing Airavata dev list as this is directly related to the project.
> > >
> > > I just went through the zookeeper path like / Name>/EXTERNALVIEW,
> > > //CONFIGS/RESOURCE as I have noticed that helix
> controller
> > is
> > > periodically monitoring for the children of those paths even though all
> > the
> > > Workflows have moved into a saturated state like COMPLETED and STOPPED.
> > In
> > > our case, we have a lot of completed workflows piled up in those
> paths. I
> > > believe that helix is clearing up those resources after some TTL. What
> I
> > > did was writing an external spectator [1] that continuously monitors
> for
> > > saturated workflows and clearing up resources before controller does
> that
> > > after a TTL. After that, we didn't see such delays in workflow
> execution
> > > and everything seems to be running smoothly. However we are
> continuously
> > > monitoring our deployments for any form of adverse effect introduced by
> > > that improvement.
> > >
> > > Please let us know if we are doing something wrong in this improvement
> or
> > > is there any better way to achieve this directly through helix task
> > > framework.
> > >
> > > [1]
> > >
> >
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> > >
> > > Thanks
> > > Dimuthu
> > >
> > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai 
> wrote:
> > >
> > >> Could you please check the log of how long for each pipeline stage
> > takes?
> > >>
> > >> Also, did you set expiry for workflows? Are they piled up for long
> time?
> > >> How long for each workflow completes?
> > >>
> > >> best,
> > >>
> > >> Junkai
> > >>
> > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> > >> dimuthu.upeks...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Junkai,
> > >> >
> > >> > Average load is like 10 - 20 workflows per minutes. In some cases
> it's
> > >> less
> > >> > than that However based on the observations, I feel like it does not
> > >> depend
> > >> > on the load and it is sporadic. Is there a particular log lines
> that I
> > >> can
> > >> > filter in controller and participant to capture the timeline of
> > >> workflow so
> > >> > that I can figure out which which component is malfunctioning? We
> use
> > >> helix
> > >> > v 0.8.1.
> > >> >
> > >> > Thanks
> > >> > Dimuthu
> > >> >
> > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai 
> > >> wrote:
> > >> >
> > >> > > Hi Dimuthu,
> > >> > >
> > >> > > At which rate, you are keep submitting workflows? Usually,
> Workflow
> > >> > > scheduling is very fast. And which version of Helix you are using?
> > >> > >
> > >> > > Best,
> > >> > >
> > >> > > Junkai
> > >> > >
> > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> > >> > > dimuthu.upeks...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Folks,
> > >> > > >
> > >> > > > We have noticed some delays between workflow submission and
> actual
> > >> > > picking
> > >> > > > up by participants and seems like that delay is somewhat
> constant
> > >> > around
> > >> > > 2-
> > >> > > > 3 minutes. We used to continuously submit workflows and after 2
> -3
> > >> > > minutes,
> > >> > > > a bulk of workflows are picked by participant and execute them.
> > >> Then it
> > >> > > > remain silent for next 2 -3 minutes event we submit more
> > workflows.
> > >> > It's
> > >> > > > like participant picking up workflows in discrete time
> intervals.
> > >> I'm
> > >> > not
> > >> > > > sure whether this is an issue of controller or the participant.
> Do
> > >> you
> > >> > > have
> > >> > > > any experience with this sort of behavior?
> > >> > > >
> > >> > > > Thanks
> > >> > > > Dimuthu
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Junkai Xue
> > >> > >
> > >> >
> > >>
> > >>
> 

Re: Sporadic delays in task execution

2019-03-20 Thread Xue Junkai
Hi Dimuthu,

What's the version of Helix you are using?

Best,

Junkai

On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha 
wrote:

> Hi Helix Dev,
>
> We are again seeing this delay in task execution. Please have a look at the
> screencast [1] of logs printed in participant (top shell) and controller
> (bottom shell). When I record this, there were about 90 - 100 workflows
> pending to be executed. As you can see some tasks were suddenly executed
> and then participant freezed for about 30 seconds before executing next set
> of tasks. I can see some WARN logs on controller log. I feel like this 30
> second delay is some sort of a pattern. What do you think as the reason for
> this? I can provide you more information by turning on verbose logs on
> controller if you want.
>
> [1] https://youtu.be/3EUdSxnIxVw
>
> Thanks
> Dimuthu
>
> On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha  >
> wrote:
>
> > Hi Junkai,
> >
> > I'm CCing Airavata dev list as this is directly related to the project.
> >
> > I just went through the zookeeper path like //EXTERNALVIEW,
> > //CONFIGS/RESOURCE as I have noticed that helix controller
> is
> > periodically monitoring for the children of those paths even though all
> the
> > Workflows have moved into a saturated state like COMPLETED and STOPPED.
> In
> > our case, we have a lot of completed workflows piled up in those paths. I
> > believe that helix is clearing up those resources after some TTL. What I
> > did was writing an external spectator [1] that continuously monitors for
> > saturated workflows and clearing up resources before controller does that
> > after a TTL. After that, we didn't see such delays in workflow execution
> > and everything seems to be running smoothly. However we are continuously
> > monitoring our deployments for any form of adverse effect introduced by
> > that improvement.
> >
> > Please let us know if we are doing something wrong in this improvement or
> > is there any better way to achieve this directly through helix task
> > framework.
> >
> > [1]
> >
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> >
> > Thanks
> > Dimuthu
> >
> > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai  wrote:
> >
> >> Could you please check the log of how long for each pipeline stage
> takes?
> >>
> >> Also, did you set expiry for workflows? Are they piled up for long time?
> >> How long for each workflow completes?
> >>
> >> best,
> >>
> >> Junkai
> >>
> >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> >> dimuthu.upeks...@gmail.com>
> >> wrote:
> >>
> >> > Hi Junkai,
> >> >
> >> > Average load is like 10 - 20 workflows per minutes. In some cases it's
> >> less
> >> > than that However based on the observations, I feel like it does not
> >> depend
> >> > on the load and it is sporadic. Is there a particular log lines that I
> >> can
> >> > filter in controller and participant to capture the timeline of
> >> workflow so
> >> > that I can figure out which which component is malfunctioning? We use
> >> helix
> >> > v 0.8.1.
> >> >
> >> > Thanks
> >> > Dimuthu
> >> >
> >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai 
> >> wrote:
> >> >
> >> > > Hi Dimuthu,
> >> > >
> >> > > At which rate, you are keep submitting workflows? Usually, Workflow
> >> > > scheduling is very fast. And which version of Helix you are using?
> >> > >
> >> > > Best,
> >> > >
> >> > > Junkai
> >> > >
> >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> >> > > dimuthu.upeks...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi Folks,
> >> > > >
> >> > > > We have noticed some delays between workflow submission and actual
> >> > > picking
> >> > > > up by participants and seems like that delay is somewhat constant
> >> > around
> >> > > 2-
> >> > > > 3 minutes. We used to continuously submit workflows and after 2 -3
> >> > > minutes,
> >> > > > a bulk of workflows are picked by participant and execute them.
> >> Then it
> >> > > > remain silent for next 2 -3 minutes event we submit more
> workflows.
> >> > It's
> >> > > > like participant picking up workflows in discrete time intervals.
> >> I'm
> >> > not
> >> > > > sure whether this is an issue of controller or the participant. Do
> >> you
> >> > > have
> >> > > > any experience with this sort of behavior?
> >> > > >
> >> > > > Thanks
> >> > > > Dimuthu
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Junkai Xue
> >> > >
> >> >
> >>
> >>
> >> --
> >> Junkai Xue
> >>
> >
>


-- 
Junkai Xue


Re: Sporadic delays in task execution

2019-03-20 Thread DImuthu Upeksha
Hi Helix Dev,

We are again seeing this delay in task execution. Please have a look at the
screencast [1] of logs printed in participant (top shell) and controller
(bottom shell). When I record this, there were about 90 - 100 workflows
pending to be executed. As you can see some tasks were suddenly executed
and then participant freezed for about 30 seconds before executing next set
of tasks. I can see some WARN logs on controller log. I feel like this 30
second delay is some sort of a pattern. What do you think as the reason for
this? I can provide you more information by turning on verbose logs on
controller if you want.

[1] https://youtu.be/3EUdSxnIxVw

Thanks
Dimuthu

On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha 
wrote:

> Hi Junkai,
>
> I'm CCing Airavata dev list as this is directly related to the project.
>
> I just went through the zookeeper path like //EXTERNALVIEW,
> //CONFIGS/RESOURCE as I have noticed that helix controller is
> periodically monitoring for the children of those paths even though all the
> Workflows have moved into a saturated state like COMPLETED and STOPPED. In
> our case, we have a lot of completed workflows piled up in those paths. I
> believe that helix is clearing up those resources after some TTL. What I
> did was writing an external spectator [1] that continuously monitors for
> saturated workflows and clearing up resources before controller does that
> after a TTL. After that, we didn't see such delays in workflow execution
> and everything seems to be running smoothly. However we are continuously
> monitoring our deployments for any form of adverse effect introduced by
> that improvement.
>
> Please let us know if we are doing something wrong in this improvement or
> is there any better way to achieve this directly through helix task
> framework.
>
> [1]
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
>
> Thanks
> Dimuthu
>
> On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai  wrote:
>
>> Could you please check the log of how long for each pipeline stage takes?
>>
>> Also, did you set expiry for workflows? Are they piled up for long time?
>> How long for each workflow completes?
>>
>> best,
>>
>> Junkai
>>
>> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
>> dimuthu.upeks...@gmail.com>
>> wrote:
>>
>> > Hi Junkai,
>> >
>> > Average load is like 10 - 20 workflows per minutes. In some cases it's
>> less
>> > than that However based on the observations, I feel like it does not
>> depend
>> > on the load and it is sporadic. Is there a particular log lines that I
>> can
>> > filter in controller and participant to capture the timeline of
>> workflow so
>> > that I can figure out which which component is malfunctioning? We use
>> helix
>> > v 0.8.1.
>> >
>> > Thanks
>> > Dimuthu
>> >
>> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai 
>> wrote:
>> >
>> > > Hi Dimuthu,
>> > >
>> > > At which rate, you are keep submitting workflows? Usually, Workflow
>> > > scheduling is very fast. And which version of Helix you are using?
>> > >
>> > > Best,
>> > >
>> > > Junkai
>> > >
>> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
>> > > dimuthu.upeks...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Folks,
>> > > >
>> > > > We have noticed some delays between workflow submission and actual
>> > > picking
>> > > > up by participants and seems like that delay is somewhat constant
>> > around
>> > > 2-
>> > > > 3 minutes. We used to continuously submit workflows and after 2 -3
>> > > minutes,
>> > > > a bulk of workflows are picked by participant and execute them.
>> Then it
>> > > > remain silent for next 2 -3 minutes event we submit more workflows.
>> > It's
>> > > > like participant picking up workflows in discrete time intervals.
>> I'm
>> > not
>> > > > sure whether this is an issue of controller or the participant. Do
>> you
>> > > have
>> > > > any experience with this sort of behavior?
>> > > >
>> > > > Thanks
>> > > > Dimuthu
>> > > >
>> > >
>> > >
>> > > --
>> > > Junkai Xue
>> > >
>> >
>>
>>
>> --
>> Junkai Xue
>>
>