Re: Sporadic delays in task execution

2019-03-24 Thread DImuthu Upeksha
Hi Lee,

Thanks for letting us know. We would happy to try out the latest version.
Can you please point me to such known issues (JIRA or Github issues) in
latest version and then we can decide whether those issue might affect to
our use case or not.

Thanks
Dimuthu

On Fri, Mar 22, 2019 at 4:29 PM Hunter Lee  wrote:

> Let me add a caveat to my previous email. Although it comes with
> scalability improvements, there are currently a few known issues with the
> latest version. We'd encourage you to check back to make sure your current
> usage isn't affected.
>
> Hunter
>
> On Fri, Mar 22, 2019 at 12:35 PM Hunter Lee  wrote:
>
> > No problem. If you have further questions, let us know what kind of load
> > you're putting on Helix as well. The newest version of Helix contains
> Task
> > Framework 2.0, and has greater scalability in scheduling tasks, so you
> > might want to consider using the newest version as well.
> >
> > Hunter
> >
> > On Fri, Mar 22, 2019 at 8:59 AM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com> wrote:
> >
> >> Hi Lee,
> >>
> >> Thanks for the trick. I didn't know that we can poke the controller like
> >> that :) However now we can see that tasks are moving smoothly in our
> >> staging setup. This behavior can be seen from time to time and get
> >> resolved
> >> automatically in few hours. I can't find a particular pattern however my
> >> best guess is that this happens when the load is high. I will put some
> >> load
> >> on testing setup and see if I can reproduce this issue and try your
> >> instructions then get back to you
> >>
> >> Thanks
> >> Dimuthu
> >>
> >> On Thu, Mar 21, 2019 at 5:27 PM Hunter Lee  wrote:
> >>
> >> > Hi Dimuthu,
> >> >
> >> > What Junkai meant by touching the IdealState is this:
> >> >
> >> > 1) use Zooinspector to log into ZK
> >> > 2) Locate the IDEALSTATES/ path
> >> > 3) grab any ZNode under that path and try to modify (just add a
> >> > whitespace) and save
> >> > 4) This will trigger a ZK callback which should tell Helix Controller
> to
> >> > rebalance/schedule things
> >> >
> >> > On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <
> >> > dimuthu.upeks...@gmail.com> wrote:
> >> >
> >> >> Hi Junkai,
> >> >>
> >> >> What do you mean by touching ideal state to trigger an event? I
> didn't
> >> >> quite get what you said. Is that like creating some path in
> zookeeper?
> >> >> Workflows are eventually scheduled but the problem is, it is very
> slow
> >> due
> >> >> to that 30s freeze.
> >> >>
> >> >> Thanks
> >> >> Dimuthu
> >> >>
> >> >> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai 
> >> wrote:
> >> >>
> >> >> > Can you try one thing? Touch the ideal state to trigger an event.
> If
> >> >> > workflows are not scheduled, it should scheduling has problem.
> >> >> >
> >> >> > Best,
> >> >> >
> >> >> > Junkai
> >> >> >
> >> >> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
> >> >> > dimuthu.upeks...@gmail.com> wrote:
> >> >> >
> >> >> >> Hi Junkai,
> >> >> >>
> >> >> >> We are using 0.8.1
> >> >> >>
> >> >> >> Dimuthu
> >> >> >>
> >> >> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai  >
> >> >> wrote:
> >> >> >>
> >> >> >> > Hi Dimuthu,
> >> >> >> >
> >> >> >> > What's the version of Helix you are using?
> >> >> >> >
> >> >> >> > Best,
> >> >> >> >
> >> >> >> > Junkai
> >> >> >> >
> >> >> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> >> >> >> > dimuthu.upeks...@gmail.com>
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> > > Hi Helix Dev,
> >> >> >> > >
> >> >> >> > > We are again seeing this delay in task execution. Please have
> a
> >> >> look
> >> >> >> at
> >> >> >> > the
> >> >> >> > > screencast [1] of logs printed in participant (top shell) and
> >> >> >> controller
> >> >> >> > > (bottom shell). When I record this, there were about 90 - 100
> >> >> >> workflows
> >> >> >> > > pending to be executed. As you can see some tasks were
> suddenly
> >> >> >> executed
> >> >> >> > > and then participant freezed for about 30 seconds before
> >> executing
> >> >> >> next
> >> >> >> > set
> >> >> >> > > of tasks. I can see some WARN logs on controller log. I feel
> >> like
> >> >> >> this 30
> >> >> >> > > second delay is some sort of a pattern. What do you think as
> the
> >> >> >> reason
> >> >> >> > for
> >> >> >> > > this? I can provide you more information by turning on verbose
> >> >> logs on
> >> >> >> > > controller if you want.
> >> >> >> > >
> >> >> >> > > [1] https://youtu.be/3EUdSxnIxVw
> >> >> >> > >
> >> >> >> > > Thanks
> >> >> >> > > Dimuthu
> >> >> >> > >
> >> >> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> >> >> >> > dimuthu.upeks...@gmail.com
> >> >> >> > > >
> >> >> >> > > wrote:
> >> >> >> > >
> >> >> >> > > > Hi Junkai,
> >> >> >> > > >
> >> >> >> > > > I'm CCing Airavata dev list as this is directly related to
> the
> >> >> >> project.
> >> >> >> > > >
> >> >> >> > > > I just went through the zookeeper path like / >> >> >> > Name>/EXTERNALVIEW,
> >> >> >> > > > //CONFIGS/RESOURCE as I have noticed that
> helix
> >> 

Re: Sporadic delays in task execution

2019-03-22 Thread Hunter Lee
Let me add a caveat to my previous email. Although it comes with
scalability improvements, there are currently a few known issues with the
latest version. We'd encourage you to check back to make sure your current
usage isn't affected.

Hunter

On Fri, Mar 22, 2019 at 12:35 PM Hunter Lee  wrote:

> No problem. If you have further questions, let us know what kind of load
> you're putting on Helix as well. The newest version of Helix contains Task
> Framework 2.0, and has greater scalability in scheduling tasks, so you
> might want to consider using the newest version as well.
>
> Hunter
>
> On Fri, Mar 22, 2019 at 8:59 AM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com> wrote:
>
>> Hi Lee,
>>
>> Thanks for the trick. I didn't know that we can poke the controller like
>> that :) However now we can see that tasks are moving smoothly in our
>> staging setup. This behavior can be seen from time to time and get
>> resolved
>> automatically in few hours. I can't find a particular pattern however my
>> best guess is that this happens when the load is high. I will put some
>> load
>> on testing setup and see if I can reproduce this issue and try your
>> instructions then get back to you
>>
>> Thanks
>> Dimuthu
>>
>> On Thu, Mar 21, 2019 at 5:27 PM Hunter Lee  wrote:
>>
>> > Hi Dimuthu,
>> >
>> > What Junkai meant by touching the IdealState is this:
>> >
>> > 1) use Zooinspector to log into ZK
>> > 2) Locate the IDEALSTATES/ path
>> > 3) grab any ZNode under that path and try to modify (just add a
>> > whitespace) and save
>> > 4) This will trigger a ZK callback which should tell Helix Controller to
>> > rebalance/schedule things
>> >
>> > On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <
>> > dimuthu.upeks...@gmail.com> wrote:
>> >
>> >> Hi Junkai,
>> >>
>> >> What do you mean by touching ideal state to trigger an event? I didn't
>> >> quite get what you said. Is that like creating some path in zookeeper?
>> >> Workflows are eventually scheduled but the problem is, it is very slow
>> due
>> >> to that 30s freeze.
>> >>
>> >> Thanks
>> >> Dimuthu
>> >>
>> >> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai 
>> wrote:
>> >>
>> >> > Can you try one thing? Touch the ideal state to trigger an event. If
>> >> > workflows are not scheduled, it should scheduling has problem.
>> >> >
>> >> > Best,
>> >> >
>> >> > Junkai
>> >> >
>> >> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
>> >> > dimuthu.upeks...@gmail.com> wrote:
>> >> >
>> >> >> Hi Junkai,
>> >> >>
>> >> >> We are using 0.8.1
>> >> >>
>> >> >> Dimuthu
>> >> >>
>> >> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai 
>> >> wrote:
>> >> >>
>> >> >> > Hi Dimuthu,
>> >> >> >
>> >> >> > What's the version of Helix you are using?
>> >> >> >
>> >> >> > Best,
>> >> >> >
>> >> >> > Junkai
>> >> >> >
>> >> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
>> >> >> > dimuthu.upeks...@gmail.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi Helix Dev,
>> >> >> > >
>> >> >> > > We are again seeing this delay in task execution. Please have a
>> >> look
>> >> >> at
>> >> >> > the
>> >> >> > > screencast [1] of logs printed in participant (top shell) and
>> >> >> controller
>> >> >> > > (bottom shell). When I record this, there were about 90 - 100
>> >> >> workflows
>> >> >> > > pending to be executed. As you can see some tasks were suddenly
>> >> >> executed
>> >> >> > > and then participant freezed for about 30 seconds before
>> executing
>> >> >> next
>> >> >> > set
>> >> >> > > of tasks. I can see some WARN logs on controller log. I feel
>> like
>> >> >> this 30
>> >> >> > > second delay is some sort of a pattern. What do you think as the
>> >> >> reason
>> >> >> > for
>> >> >> > > this? I can provide you more information by turning on verbose
>> >> logs on
>> >> >> > > controller if you want.
>> >> >> > >
>> >> >> > > [1] https://youtu.be/3EUdSxnIxVw
>> >> >> > >
>> >> >> > > Thanks
>> >> >> > > Dimuthu
>> >> >> > >
>> >> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
>> >> >> > dimuthu.upeks...@gmail.com
>> >> >> > > >
>> >> >> > > wrote:
>> >> >> > >
>> >> >> > > > Hi Junkai,
>> >> >> > > >
>> >> >> > > > I'm CCing Airavata dev list as this is directly related to the
>> >> >> project.
>> >> >> > > >
>> >> >> > > > I just went through the zookeeper path like /> >> >> > Name>/EXTERNALVIEW,
>> >> >> > > > //CONFIGS/RESOURCE as I have noticed that helix
>> >> >> > controller
>> >> >> > > is
>> >> >> > > > periodically monitoring for the children of those paths even
>> >> though
>> >> >> all
>> >> >> > > the
>> >> >> > > > Workflows have moved into a saturated state like COMPLETED and
>> >> >> STOPPED.
>> >> >> > > In
>> >> >> > > > our case, we have a lot of completed workflows piled up in
>> those
>> >> >> > paths. I
>> >> >> > > > believe that helix is clearing up those resources after some
>> TTL.
>> >> >> What
>> >> >> > I
>> >> >> > > > did was writing an external spectator [1] that continuously
>> >> monitors
>> >> >> > for
>> >> >> > > > saturated workflows 

Re: Sporadic delays in task execution

2019-03-22 Thread Hunter Lee
No problem. If you have further questions, let us know what kind of load
you're putting on Helix as well. The newest version of Helix contains Task
Framework 2.0, and has greater scalability in scheduling tasks, so you
might want to consider using the newest version as well.

Hunter

On Fri, Mar 22, 2019 at 8:59 AM DImuthu Upeksha 
wrote:

> Hi Lee,
>
> Thanks for the trick. I didn't know that we can poke the controller like
> that :) However now we can see that tasks are moving smoothly in our
> staging setup. This behavior can be seen from time to time and get resolved
> automatically in few hours. I can't find a particular pattern however my
> best guess is that this happens when the load is high. I will put some load
> on testing setup and see if I can reproduce this issue and try your
> instructions then get back to you
>
> Thanks
> Dimuthu
>
> On Thu, Mar 21, 2019 at 5:27 PM Hunter Lee  wrote:
>
> > Hi Dimuthu,
> >
> > What Junkai meant by touching the IdealState is this:
> >
> > 1) use Zooinspector to log into ZK
> > 2) Locate the IDEALSTATES/ path
> > 3) grab any ZNode under that path and try to modify (just add a
> > whitespace) and save
> > 4) This will trigger a ZK callback which should tell Helix Controller to
> > rebalance/schedule things
> >
> > On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com> wrote:
> >
> >> Hi Junkai,
> >>
> >> What do you mean by touching ideal state to trigger an event? I didn't
> >> quite get what you said. Is that like creating some path in zookeeper?
> >> Workflows are eventually scheduled but the problem is, it is very slow
> due
> >> to that 30s freeze.
> >>
> >> Thanks
> >> Dimuthu
> >>
> >> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai 
> wrote:
> >>
> >> > Can you try one thing? Touch the ideal state to trigger an event. If
> >> > workflows are not scheduled, it should scheduling has problem.
> >> >
> >> > Best,
> >> >
> >> > Junkai
> >> >
> >> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
> >> > dimuthu.upeks...@gmail.com> wrote:
> >> >
> >> >> Hi Junkai,
> >> >>
> >> >> We are using 0.8.1
> >> >>
> >> >> Dimuthu
> >> >>
> >> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai 
> >> wrote:
> >> >>
> >> >> > Hi Dimuthu,
> >> >> >
> >> >> > What's the version of Helix you are using?
> >> >> >
> >> >> > Best,
> >> >> >
> >> >> > Junkai
> >> >> >
> >> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> >> >> > dimuthu.upeks...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Helix Dev,
> >> >> > >
> >> >> > > We are again seeing this delay in task execution. Please have a
> >> look
> >> >> at
> >> >> > the
> >> >> > > screencast [1] of logs printed in participant (top shell) and
> >> >> controller
> >> >> > > (bottom shell). When I record this, there were about 90 - 100
> >> >> workflows
> >> >> > > pending to be executed. As you can see some tasks were suddenly
> >> >> executed
> >> >> > > and then participant freezed for about 30 seconds before
> executing
> >> >> next
> >> >> > set
> >> >> > > of tasks. I can see some WARN logs on controller log. I feel like
> >> >> this 30
> >> >> > > second delay is some sort of a pattern. What do you think as the
> >> >> reason
> >> >> > for
> >> >> > > this? I can provide you more information by turning on verbose
> >> logs on
> >> >> > > controller if you want.
> >> >> > >
> >> >> > > [1] https://youtu.be/3EUdSxnIxVw
> >> >> > >
> >> >> > > Thanks
> >> >> > > Dimuthu
> >> >> > >
> >> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> >> >> > dimuthu.upeks...@gmail.com
> >> >> > > >
> >> >> > > wrote:
> >> >> > >
> >> >> > > > Hi Junkai,
> >> >> > > >
> >> >> > > > I'm CCing Airavata dev list as this is directly related to the
> >> >> project.
> >> >> > > >
> >> >> > > > I just went through the zookeeper path like / >> >> > Name>/EXTERNALVIEW,
> >> >> > > > //CONFIGS/RESOURCE as I have noticed that helix
> >> >> > controller
> >> >> > > is
> >> >> > > > periodically monitoring for the children of those paths even
> >> though
> >> >> all
> >> >> > > the
> >> >> > > > Workflows have moved into a saturated state like COMPLETED and
> >> >> STOPPED.
> >> >> > > In
> >> >> > > > our case, we have a lot of completed workflows piled up in
> those
> >> >> > paths. I
> >> >> > > > believe that helix is clearing up those resources after some
> TTL.
> >> >> What
> >> >> > I
> >> >> > > > did was writing an external spectator [1] that continuously
> >> monitors
> >> >> > for
> >> >> > > > saturated workflows and clearing up resources before controller
> >> does
> >> >> > that
> >> >> > > > after a TTL. After that, we didn't see such delays in workflow
> >> >> > execution
> >> >> > > > and everything seems to be running smoothly. However we are
> >> >> > continuously
> >> >> > > > monitoring our deployments for any form of adverse effect
> >> >> introduced by
> >> >> > > > that improvement.
> >> >> > > >
> >> >> > > > Please let us know if we are doing something wrong in this
> >> >> 

Re: Sporadic delays in task execution

2019-03-22 Thread DImuthu Upeksha
Hi Lee,

Thanks for the trick. I didn't know that we can poke the controller like
that :) However now we can see that tasks are moving smoothly in our
staging setup. This behavior can be seen from time to time and get resolved
automatically in few hours. I can't find a particular pattern however my
best guess is that this happens when the load is high. I will put some load
on testing setup and see if I can reproduce this issue and try your
instructions then get back to you

Thanks
Dimuthu

On Thu, Mar 21, 2019 at 5:27 PM Hunter Lee  wrote:

> Hi Dimuthu,
>
> What Junkai meant by touching the IdealState is this:
>
> 1) use Zooinspector to log into ZK
> 2) Locate the IDEALSTATES/ path
> 3) grab any ZNode under that path and try to modify (just add a
> whitespace) and save
> 4) This will trigger a ZK callback which should tell Helix Controller to
> rebalance/schedule things
>
> On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com> wrote:
>
>> Hi Junkai,
>>
>> What do you mean by touching ideal state to trigger an event? I didn't
>> quite get what you said. Is that like creating some path in zookeeper?
>> Workflows are eventually scheduled but the problem is, it is very slow due
>> to that 30s freeze.
>>
>> Thanks
>> Dimuthu
>>
>> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai  wrote:
>>
>> > Can you try one thing? Touch the ideal state to trigger an event. If
>> > workflows are not scheduled, it should scheduling has problem.
>> >
>> > Best,
>> >
>> > Junkai
>> >
>> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
>> > dimuthu.upeks...@gmail.com> wrote:
>> >
>> >> Hi Junkai,
>> >>
>> >> We are using 0.8.1
>> >>
>> >> Dimuthu
>> >>
>> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai 
>> wrote:
>> >>
>> >> > Hi Dimuthu,
>> >> >
>> >> > What's the version of Helix you are using?
>> >> >
>> >> > Best,
>> >> >
>> >> > Junkai
>> >> >
>> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
>> >> > dimuthu.upeks...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi Helix Dev,
>> >> > >
>> >> > > We are again seeing this delay in task execution. Please have a
>> look
>> >> at
>> >> > the
>> >> > > screencast [1] of logs printed in participant (top shell) and
>> >> controller
>> >> > > (bottom shell). When I record this, there were about 90 - 100
>> >> workflows
>> >> > > pending to be executed. As you can see some tasks were suddenly
>> >> executed
>> >> > > and then participant freezed for about 30 seconds before executing
>> >> next
>> >> > set
>> >> > > of tasks. I can see some WARN logs on controller log. I feel like
>> >> this 30
>> >> > > second delay is some sort of a pattern. What do you think as the
>> >> reason
>> >> > for
>> >> > > this? I can provide you more information by turning on verbose
>> logs on
>> >> > > controller if you want.
>> >> > >
>> >> > > [1] https://youtu.be/3EUdSxnIxVw
>> >> > >
>> >> > > Thanks
>> >> > > Dimuthu
>> >> > >
>> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
>> >> > dimuthu.upeks...@gmail.com
>> >> > > >
>> >> > > wrote:
>> >> > >
>> >> > > > Hi Junkai,
>> >> > > >
>> >> > > > I'm CCing Airavata dev list as this is directly related to the
>> >> project.
>> >> > > >
>> >> > > > I just went through the zookeeper path like /> >> > Name>/EXTERNALVIEW,
>> >> > > > //CONFIGS/RESOURCE as I have noticed that helix
>> >> > controller
>> >> > > is
>> >> > > > periodically monitoring for the children of those paths even
>> though
>> >> all
>> >> > > the
>> >> > > > Workflows have moved into a saturated state like COMPLETED and
>> >> STOPPED.
>> >> > > In
>> >> > > > our case, we have a lot of completed workflows piled up in those
>> >> > paths. I
>> >> > > > believe that helix is clearing up those resources after some TTL.
>> >> What
>> >> > I
>> >> > > > did was writing an external spectator [1] that continuously
>> monitors
>> >> > for
>> >> > > > saturated workflows and clearing up resources before controller
>> does
>> >> > that
>> >> > > > after a TTL. After that, we didn't see such delays in workflow
>> >> > execution
>> >> > > > and everything seems to be running smoothly. However we are
>> >> > continuously
>> >> > > > monitoring our deployments for any form of adverse effect
>> >> introduced by
>> >> > > > that improvement.
>> >> > > >
>> >> > > > Please let us know if we are doing something wrong in this
>> >> improvement
>> >> > or
>> >> > > > is there any better way to achieve this directly through helix
>> task
>> >> > > > framework.
>> >> > > >
>> >> > > > [1]
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
>> >> > > >
>> >> > > > Thanks
>> >> > > > Dimuthu
>> >> > > >
>> >> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai 
>> >> > wrote:
>> >> > > >
>> >> > > >> Could you please check the log of how long for each pipeline
>> stage
>> >> > > takes?
>> >> > > >>
>> >> > > >> Also, 

Re: Sporadic delays in task execution

2019-03-21 Thread Hunter Lee
Hi Dimuthu,

What Junkai meant by touching the IdealState is this:

1) use Zooinspector to log into ZK
2) Locate the IDEALSTATES/ path
3) grab any ZNode under that path and try to modify (just add a whitespace)
and save
4) This will trigger a ZK callback which should tell Helix Controller to
rebalance/schedule things

On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha 
wrote:

> Hi Junkai,
>
> What do you mean by touching ideal state to trigger an event? I didn't
> quite get what you said. Is that like creating some path in zookeeper?
> Workflows are eventually scheduled but the problem is, it is very slow due
> to that 30s freeze.
>
> Thanks
> Dimuthu
>
> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai  wrote:
>
> > Can you try one thing? Touch the ideal state to trigger an event. If
> > workflows are not scheduled, it should scheduling has problem.
> >
> > Best,
> >
> > Junkai
> >
> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com> wrote:
> >
> >> Hi Junkai,
> >>
> >> We are using 0.8.1
> >>
> >> Dimuthu
> >>
> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai 
> wrote:
> >>
> >> > Hi Dimuthu,
> >> >
> >> > What's the version of Helix you are using?
> >> >
> >> > Best,
> >> >
> >> > Junkai
> >> >
> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> >> > dimuthu.upeks...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Helix Dev,
> >> > >
> >> > > We are again seeing this delay in task execution. Please have a look
> >> at
> >> > the
> >> > > screencast [1] of logs printed in participant (top shell) and
> >> controller
> >> > > (bottom shell). When I record this, there were about 90 - 100
> >> workflows
> >> > > pending to be executed. As you can see some tasks were suddenly
> >> executed
> >> > > and then participant freezed for about 30 seconds before executing
> >> next
> >> > set
> >> > > of tasks. I can see some WARN logs on controller log. I feel like
> >> this 30
> >> > > second delay is some sort of a pattern. What do you think as the
> >> reason
> >> > for
> >> > > this? I can provide you more information by turning on verbose logs
> on
> >> > > controller if you want.
> >> > >
> >> > > [1] https://youtu.be/3EUdSxnIxVw
> >> > >
> >> > > Thanks
> >> > > Dimuthu
> >> > >
> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> >> > dimuthu.upeks...@gmail.com
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Hi Junkai,
> >> > > >
> >> > > > I'm CCing Airavata dev list as this is directly related to the
> >> project.
> >> > > >
> >> > > > I just went through the zookeeper path like / >> > Name>/EXTERNALVIEW,
> >> > > > //CONFIGS/RESOURCE as I have noticed that helix
> >> > controller
> >> > > is
> >> > > > periodically monitoring for the children of those paths even
> though
> >> all
> >> > > the
> >> > > > Workflows have moved into a saturated state like COMPLETED and
> >> STOPPED.
> >> > > In
> >> > > > our case, we have a lot of completed workflows piled up in those
> >> > paths. I
> >> > > > believe that helix is clearing up those resources after some TTL.
> >> What
> >> > I
> >> > > > did was writing an external spectator [1] that continuously
> monitors
> >> > for
> >> > > > saturated workflows and clearing up resources before controller
> does
> >> > that
> >> > > > after a TTL. After that, we didn't see such delays in workflow
> >> > execution
> >> > > > and everything seems to be running smoothly. However we are
> >> > continuously
> >> > > > monitoring our deployments for any form of adverse effect
> >> introduced by
> >> > > > that improvement.
> >> > > >
> >> > > > Please let us know if we are doing something wrong in this
> >> improvement
> >> > or
> >> > > > is there any better way to achieve this directly through helix
> task
> >> > > > framework.
> >> > > >
> >> > > > [1]
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> >> > > >
> >> > > > Thanks
> >> > > > Dimuthu
> >> > > >
> >> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai 
> >> > wrote:
> >> > > >
> >> > > >> Could you please check the log of how long for each pipeline
> stage
> >> > > takes?
> >> > > >>
> >> > > >> Also, did you set expiry for workflows? Are they piled up for
> long
> >> > time?
> >> > > >> How long for each workflow completes?
> >> > > >>
> >> > > >> best,
> >> > > >>
> >> > > >> Junkai
> >> > > >>
> >> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> >> > > >> dimuthu.upeks...@gmail.com>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Hi Junkai,
> >> > > >> >
> >> > > >> > Average load is like 10 - 20 workflows per minutes. In some
> cases
> >> > it's
> >> > > >> less
> >> > > >> > than that However based on the observations, I feel like it
> does
> >> not
> >> > > >> depend
> >> > > >> > on the load and it is sporadic. Is there a particular log lines
> >> > that I
> >> > > >> can
> >> > > >> > filter in controller and 

Re: Sporadic delays in task execution

2019-03-21 Thread DImuthu Upeksha
Hi Junkai,

What do you mean by touching ideal state to trigger an event? I didn't
quite get what you said. Is that like creating some path in zookeeper?
Workflows are eventually scheduled but the problem is, it is very slow due
to that 30s freeze.

Thanks
Dimuthu

On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai  wrote:

> Can you try one thing? Touch the ideal state to trigger an event. If
> workflows are not scheduled, it should scheduling has problem.
>
> Best,
>
> Junkai
>
> On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com> wrote:
>
>> Hi Junkai,
>>
>> We are using 0.8.1
>>
>> Dimuthu
>>
>> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai  wrote:
>>
>> > Hi Dimuthu,
>> >
>> > What's the version of Helix you are using?
>> >
>> > Best,
>> >
>> > Junkai
>> >
>> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
>> > dimuthu.upeks...@gmail.com>
>> > wrote:
>> >
>> > > Hi Helix Dev,
>> > >
>> > > We are again seeing this delay in task execution. Please have a look
>> at
>> > the
>> > > screencast [1] of logs printed in participant (top shell) and
>> controller
>> > > (bottom shell). When I record this, there were about 90 - 100
>> workflows
>> > > pending to be executed. As you can see some tasks were suddenly
>> executed
>> > > and then participant freezed for about 30 seconds before executing
>> next
>> > set
>> > > of tasks. I can see some WARN logs on controller log. I feel like
>> this 30
>> > > second delay is some sort of a pattern. What do you think as the
>> reason
>> > for
>> > > this? I can provide you more information by turning on verbose logs on
>> > > controller if you want.
>> > >
>> > > [1] https://youtu.be/3EUdSxnIxVw
>> > >
>> > > Thanks
>> > > Dimuthu
>> > >
>> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
>> > dimuthu.upeks...@gmail.com
>> > > >
>> > > wrote:
>> > >
>> > > > Hi Junkai,
>> > > >
>> > > > I'm CCing Airavata dev list as this is directly related to the
>> project.
>> > > >
>> > > > I just went through the zookeeper path like /> > Name>/EXTERNALVIEW,
>> > > > //CONFIGS/RESOURCE as I have noticed that helix
>> > controller
>> > > is
>> > > > periodically monitoring for the children of those paths even though
>> all
>> > > the
>> > > > Workflows have moved into a saturated state like COMPLETED and
>> STOPPED.
>> > > In
>> > > > our case, we have a lot of completed workflows piled up in those
>> > paths. I
>> > > > believe that helix is clearing up those resources after some TTL.
>> What
>> > I
>> > > > did was writing an external spectator [1] that continuously monitors
>> > for
>> > > > saturated workflows and clearing up resources before controller does
>> > that
>> > > > after a TTL. After that, we didn't see such delays in workflow
>> > execution
>> > > > and everything seems to be running smoothly. However we are
>> > continuously
>> > > > monitoring our deployments for any form of adverse effect
>> introduced by
>> > > > that improvement.
>> > > >
>> > > > Please let us know if we are doing something wrong in this
>> improvement
>> > or
>> > > > is there any better way to achieve this directly through helix task
>> > > > framework.
>> > > >
>> > > > [1]
>> > > >
>> > >
>> >
>> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
>> > > >
>> > > > Thanks
>> > > > Dimuthu
>> > > >
>> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai 
>> > wrote:
>> > > >
>> > > >> Could you please check the log of how long for each pipeline stage
>> > > takes?
>> > > >>
>> > > >> Also, did you set expiry for workflows? Are they piled up for long
>> > time?
>> > > >> How long for each workflow completes?
>> > > >>
>> > > >> best,
>> > > >>
>> > > >> Junkai
>> > > >>
>> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
>> > > >> dimuthu.upeks...@gmail.com>
>> > > >> wrote:
>> > > >>
>> > > >> > Hi Junkai,
>> > > >> >
>> > > >> > Average load is like 10 - 20 workflows per minutes. In some cases
>> > it's
>> > > >> less
>> > > >> > than that However based on the observations, I feel like it does
>> not
>> > > >> depend
>> > > >> > on the load and it is sporadic. Is there a particular log lines
>> > that I
>> > > >> can
>> > > >> > filter in controller and participant to capture the timeline of
>> > > >> workflow so
>> > > >> > that I can figure out which which component is malfunctioning? We
>> > use
>> > > >> helix
>> > > >> > v 0.8.1.
>> > > >> >
>> > > >> > Thanks
>> > > >> > Dimuthu
>> > > >> >
>> > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai > >
>> > > >> wrote:
>> > > >> >
>> > > >> > > Hi Dimuthu,
>> > > >> > >
>> > > >> > > At which rate, you are keep submitting workflows? Usually,
>> > Workflow
>> > > >> > > scheduling is very fast. And which version of Helix you are
>> using?
>> > > >> > >
>> > > >> > > Best,
>> > > >> > >
>> > > >> > > Junkai
>> > > >> > >
>> > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
>> > > >> 

Re: Sporadic delays in task execution

2019-03-21 Thread Xue Junkai
Can you try one thing? Touch the ideal state to trigger an event. If
workflows are not scheduled, it should scheduling has problem.

Best,

Junkai

On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha 
wrote:

> Hi Junkai,
>
> We are using 0.8.1
>
> Dimuthu
>
> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai  wrote:
>
> > Hi Dimuthu,
> >
> > What's the version of Helix you are using?
> >
> > Best,
> >
> > Junkai
> >
> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com>
> > wrote:
> >
> > > Hi Helix Dev,
> > >
> > > We are again seeing this delay in task execution. Please have a look at
> > the
> > > screencast [1] of logs printed in participant (top shell) and
> controller
> > > (bottom shell). When I record this, there were about 90 - 100 workflows
> > > pending to be executed. As you can see some tasks were suddenly
> executed
> > > and then participant freezed for about 30 seconds before executing next
> > set
> > > of tasks. I can see some WARN logs on controller log. I feel like this
> 30
> > > second delay is some sort of a pattern. What do you think as the reason
> > for
> > > this? I can provide you more information by turning on verbose logs on
> > > controller if you want.
> > >
> > > [1] https://youtu.be/3EUdSxnIxVw
> > >
> > > Thanks
> > > Dimuthu
> > >
> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> > dimuthu.upeks...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Junkai,
> > > >
> > > > I'm CCing Airavata dev list as this is directly related to the
> project.
> > > >
> > > > I just went through the zookeeper path like / > Name>/EXTERNALVIEW,
> > > > //CONFIGS/RESOURCE as I have noticed that helix
> > controller
> > > is
> > > > periodically monitoring for the children of those paths even though
> all
> > > the
> > > > Workflows have moved into a saturated state like COMPLETED and
> STOPPED.
> > > In
> > > > our case, we have a lot of completed workflows piled up in those
> > paths. I
> > > > believe that helix is clearing up those resources after some TTL.
> What
> > I
> > > > did was writing an external spectator [1] that continuously monitors
> > for
> > > > saturated workflows and clearing up resources before controller does
> > that
> > > > after a TTL. After that, we didn't see such delays in workflow
> > execution
> > > > and everything seems to be running smoothly. However we are
> > continuously
> > > > monitoring our deployments for any form of adverse effect introduced
> by
> > > > that improvement.
> > > >
> > > > Please let us know if we are doing something wrong in this
> improvement
> > or
> > > > is there any better way to achieve this directly through helix task
> > > > framework.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> > > >
> > > > Thanks
> > > > Dimuthu
> > > >
> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai 
> > wrote:
> > > >
> > > >> Could you please check the log of how long for each pipeline stage
> > > takes?
> > > >>
> > > >> Also, did you set expiry for workflows? Are they piled up for long
> > time?
> > > >> How long for each workflow completes?
> > > >>
> > > >> best,
> > > >>
> > > >> Junkai
> > > >>
> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> > > >> dimuthu.upeks...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Junkai,
> > > >> >
> > > >> > Average load is like 10 - 20 workflows per minutes. In some cases
> > it's
> > > >> less
> > > >> > than that However based on the observations, I feel like it does
> not
> > > >> depend
> > > >> > on the load and it is sporadic. Is there a particular log lines
> > that I
> > > >> can
> > > >> > filter in controller and participant to capture the timeline of
> > > >> workflow so
> > > >> > that I can figure out which which component is malfunctioning? We
> > use
> > > >> helix
> > > >> > v 0.8.1.
> > > >> >
> > > >> > Thanks
> > > >> > Dimuthu
> > > >> >
> > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai 
> > > >> wrote:
> > > >> >
> > > >> > > Hi Dimuthu,
> > > >> > >
> > > >> > > At which rate, you are keep submitting workflows? Usually,
> > Workflow
> > > >> > > scheduling is very fast. And which version of Helix you are
> using?
> > > >> > >
> > > >> > > Best,
> > > >> > >
> > > >> > > Junkai
> > > >> > >
> > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> > > >> > > dimuthu.upeks...@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi Folks,
> > > >> > > >
> > > >> > > > We have noticed some delays between workflow submission and
> > actual
> > > >> > > picking
> > > >> > > > up by participants and seems like that delay is somewhat
> > constant
> > > >> > around
> > > >> > > 2-
> > > >> > > > 3 minutes. We used to continuously submit workflows and after
> 2
> > -3
> > > >> > > minutes,
> > > >> > > > a bulk of workflows are picked by participant and execute
> them.
> > > 

Re: Sporadic delays in task execution

2019-03-20 Thread DImuthu Upeksha
Hi Junkai,

We are using 0.8.1

Dimuthu

On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai  wrote:

> Hi Dimuthu,
>
> What's the version of Helix you are using?
>
> Best,
>
> Junkai
>
> On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com>
> wrote:
>
> > Hi Helix Dev,
> >
> > We are again seeing this delay in task execution. Please have a look at
> the
> > screencast [1] of logs printed in participant (top shell) and controller
> > (bottom shell). When I record this, there were about 90 - 100 workflows
> > pending to be executed. As you can see some tasks were suddenly executed
> > and then participant freezed for about 30 seconds before executing next
> set
> > of tasks. I can see some WARN logs on controller log. I feel like this 30
> > second delay is some sort of a pattern. What do you think as the reason
> for
> > this? I can provide you more information by turning on verbose logs on
> > controller if you want.
> >
> > [1] https://youtu.be/3EUdSxnIxVw
> >
> > Thanks
> > Dimuthu
> >
> > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com
> > >
> > wrote:
> >
> > > Hi Junkai,
> > >
> > > I'm CCing Airavata dev list as this is directly related to the project.
> > >
> > > I just went through the zookeeper path like / Name>/EXTERNALVIEW,
> > > //CONFIGS/RESOURCE as I have noticed that helix
> controller
> > is
> > > periodically monitoring for the children of those paths even though all
> > the
> > > Workflows have moved into a saturated state like COMPLETED and STOPPED.
> > In
> > > our case, we have a lot of completed workflows piled up in those
> paths. I
> > > believe that helix is clearing up those resources after some TTL. What
> I
> > > did was writing an external spectator [1] that continuously monitors
> for
> > > saturated workflows and clearing up resources before controller does
> that
> > > after a TTL. After that, we didn't see such delays in workflow
> execution
> > > and everything seems to be running smoothly. However we are
> continuously
> > > monitoring our deployments for any form of adverse effect introduced by
> > > that improvement.
> > >
> > > Please let us know if we are doing something wrong in this improvement
> or
> > > is there any better way to achieve this directly through helix task
> > > framework.
> > >
> > > [1]
> > >
> >
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> > >
> > > Thanks
> > > Dimuthu
> > >
> > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai 
> wrote:
> > >
> > >> Could you please check the log of how long for each pipeline stage
> > takes?
> > >>
> > >> Also, did you set expiry for workflows? Are they piled up for long
> time?
> > >> How long for each workflow completes?
> > >>
> > >> best,
> > >>
> > >> Junkai
> > >>
> > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> > >> dimuthu.upeks...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Junkai,
> > >> >
> > >> > Average load is like 10 - 20 workflows per minutes. In some cases
> it's
> > >> less
> > >> > than that However based on the observations, I feel like it does not
> > >> depend
> > >> > on the load and it is sporadic. Is there a particular log lines
> that I
> > >> can
> > >> > filter in controller and participant to capture the timeline of
> > >> workflow so
> > >> > that I can figure out which which component is malfunctioning? We
> use
> > >> helix
> > >> > v 0.8.1.
> > >> >
> > >> > Thanks
> > >> > Dimuthu
> > >> >
> > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai 
> > >> wrote:
> > >> >
> > >> > > Hi Dimuthu,
> > >> > >
> > >> > > At which rate, you are keep submitting workflows? Usually,
> Workflow
> > >> > > scheduling is very fast. And which version of Helix you are using?
> > >> > >
> > >> > > Best,
> > >> > >
> > >> > > Junkai
> > >> > >
> > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> > >> > > dimuthu.upeks...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Folks,
> > >> > > >
> > >> > > > We have noticed some delays between workflow submission and
> actual
> > >> > > picking
> > >> > > > up by participants and seems like that delay is somewhat
> constant
> > >> > around
> > >> > > 2-
> > >> > > > 3 minutes. We used to continuously submit workflows and after 2
> -3
> > >> > > minutes,
> > >> > > > a bulk of workflows are picked by participant and execute them.
> > >> Then it
> > >> > > > remain silent for next 2 -3 minutes event we submit more
> > workflows.
> > >> > It's
> > >> > > > like participant picking up workflows in discrete time
> intervals.
> > >> I'm
> > >> > not
> > >> > > > sure whether this is an issue of controller or the participant.
> Do
> > >> you
> > >> > > have
> > >> > > > any experience with this sort of behavior?
> > >> > > >
> > >> > > > Thanks
> > >> > > > Dimuthu
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Junkai Xue
> > >> > >
> > >> >
> > >>
> > >>
> 

Re: Sporadic delays in task execution

2019-03-20 Thread Xue Junkai
Hi Dimuthu,

What's the version of Helix you are using?

Best,

Junkai

On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha 
wrote:

> Hi Helix Dev,
>
> We are again seeing this delay in task execution. Please have a look at the
> screencast [1] of logs printed in participant (top shell) and controller
> (bottom shell). When I record this, there were about 90 - 100 workflows
> pending to be executed. As you can see some tasks were suddenly executed
> and then participant freezed for about 30 seconds before executing next set
> of tasks. I can see some WARN logs on controller log. I feel like this 30
> second delay is some sort of a pattern. What do you think as the reason for
> this? I can provide you more information by turning on verbose logs on
> controller if you want.
>
> [1] https://youtu.be/3EUdSxnIxVw
>
> Thanks
> Dimuthu
>
> On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha  >
> wrote:
>
> > Hi Junkai,
> >
> > I'm CCing Airavata dev list as this is directly related to the project.
> >
> > I just went through the zookeeper path like //EXTERNALVIEW,
> > //CONFIGS/RESOURCE as I have noticed that helix controller
> is
> > periodically monitoring for the children of those paths even though all
> the
> > Workflows have moved into a saturated state like COMPLETED and STOPPED.
> In
> > our case, we have a lot of completed workflows piled up in those paths. I
> > believe that helix is clearing up those resources after some TTL. What I
> > did was writing an external spectator [1] that continuously monitors for
> > saturated workflows and clearing up resources before controller does that
> > after a TTL. After that, we didn't see such delays in workflow execution
> > and everything seems to be running smoothly. However we are continuously
> > monitoring our deployments for any form of adverse effect introduced by
> > that improvement.
> >
> > Please let us know if we are doing something wrong in this improvement or
> > is there any better way to achieve this directly through helix task
> > framework.
> >
> > [1]
> >
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> >
> > Thanks
> > Dimuthu
> >
> > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai  wrote:
> >
> >> Could you please check the log of how long for each pipeline stage
> takes?
> >>
> >> Also, did you set expiry for workflows? Are they piled up for long time?
> >> How long for each workflow completes?
> >>
> >> best,
> >>
> >> Junkai
> >>
> >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> >> dimuthu.upeks...@gmail.com>
> >> wrote:
> >>
> >> > Hi Junkai,
> >> >
> >> > Average load is like 10 - 20 workflows per minutes. In some cases it's
> >> less
> >> > than that However based on the observations, I feel like it does not
> >> depend
> >> > on the load and it is sporadic. Is there a particular log lines that I
> >> can
> >> > filter in controller and participant to capture the timeline of
> >> workflow so
> >> > that I can figure out which which component is malfunctioning? We use
> >> helix
> >> > v 0.8.1.
> >> >
> >> > Thanks
> >> > Dimuthu
> >> >
> >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai 
> >> wrote:
> >> >
> >> > > Hi Dimuthu,
> >> > >
> >> > > At which rate, you are keep submitting workflows? Usually, Workflow
> >> > > scheduling is very fast. And which version of Helix you are using?
> >> > >
> >> > > Best,
> >> > >
> >> > > Junkai
> >> > >
> >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> >> > > dimuthu.upeks...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi Folks,
> >> > > >
> >> > > > We have noticed some delays between workflow submission and actual
> >> > > picking
> >> > > > up by participants and seems like that delay is somewhat constant
> >> > around
> >> > > 2-
> >> > > > 3 minutes. We used to continuously submit workflows and after 2 -3
> >> > > minutes,
> >> > > > a bulk of workflows are picked by participant and execute them.
> >> Then it
> >> > > > remain silent for next 2 -3 minutes event we submit more
> workflows.
> >> > It's
> >> > > > like participant picking up workflows in discrete time intervals.
> >> I'm
> >> > not
> >> > > > sure whether this is an issue of controller or the participant. Do
> >> you
> >> > > have
> >> > > > any experience with this sort of behavior?
> >> > > >
> >> > > > Thanks
> >> > > > Dimuthu
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Junkai Xue
> >> > >
> >> >
> >>
> >>
> >> --
> >> Junkai Xue
> >>
> >
>


-- 
Junkai Xue


Re: Sporadic delays in task execution

2019-03-20 Thread DImuthu Upeksha
Hi Helix Dev,

We are again seeing this delay in task execution. Please have a look at the
screencast [1] of logs printed in participant (top shell) and controller
(bottom shell). When I record this, there were about 90 - 100 workflows
pending to be executed. As you can see some tasks were suddenly executed
and then participant freezed for about 30 seconds before executing next set
of tasks. I can see some WARN logs on controller log. I feel like this 30
second delay is some sort of a pattern. What do you think as the reason for
this? I can provide you more information by turning on verbose logs on
controller if you want.

[1] https://youtu.be/3EUdSxnIxVw

Thanks
Dimuthu

On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha 
wrote:

> Hi Junkai,
>
> I'm CCing Airavata dev list as this is directly related to the project.
>
> I just went through the zookeeper path like //EXTERNALVIEW,
> //CONFIGS/RESOURCE as I have noticed that helix controller is
> periodically monitoring for the children of those paths even though all the
> Workflows have moved into a saturated state like COMPLETED and STOPPED. In
> our case, we have a lot of completed workflows piled up in those paths. I
> believe that helix is clearing up those resources after some TTL. What I
> did was writing an external spectator [1] that continuously monitors for
> saturated workflows and clearing up resources before controller does that
> after a TTL. After that, we didn't see such delays in workflow execution
> and everything seems to be running smoothly. However we are continuously
> monitoring our deployments for any form of adverse effect introduced by
> that improvement.
>
> Please let us know if we are doing something wrong in this improvement or
> is there any better way to achieve this directly through helix task
> framework.
>
> [1]
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
>
> Thanks
> Dimuthu
>
> On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai  wrote:
>
>> Could you please check the log of how long for each pipeline stage takes?
>>
>> Also, did you set expiry for workflows? Are they piled up for long time?
>> How long for each workflow completes?
>>
>> best,
>>
>> Junkai
>>
>> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
>> dimuthu.upeks...@gmail.com>
>> wrote:
>>
>> > Hi Junkai,
>> >
>> > Average load is like 10 - 20 workflows per minutes. In some cases it's
>> less
>> > than that However based on the observations, I feel like it does not
>> depend
>> > on the load and it is sporadic. Is there a particular log lines that I
>> can
>> > filter in controller and participant to capture the timeline of
>> workflow so
>> > that I can figure out which which component is malfunctioning? We use
>> helix
>> > v 0.8.1.
>> >
>> > Thanks
>> > Dimuthu
>> >
>> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai 
>> wrote:
>> >
>> > > Hi Dimuthu,
>> > >
>> > > At which rate, you are keep submitting workflows? Usually, Workflow
>> > > scheduling is very fast. And which version of Helix you are using?
>> > >
>> > > Best,
>> > >
>> > > Junkai
>> > >
>> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
>> > > dimuthu.upeks...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Folks,
>> > > >
>> > > > We have noticed some delays between workflow submission and actual
>> > > picking
>> > > > up by participants and seems like that delay is somewhat constant
>> > around
>> > > 2-
>> > > > 3 minutes. We used to continuously submit workflows and after 2 -3
>> > > minutes,
>> > > > a bulk of workflows are picked by participant and execute them.
>> Then it
>> > > > remain silent for next 2 -3 minutes event we submit more workflows.
>> > It's
>> > > > like participant picking up workflows in discrete time intervals.
>> I'm
>> > not
>> > > > sure whether this is an issue of controller or the participant. Do
>> you
>> > > have
>> > > > any experience with this sort of behavior?
>> > > >
>> > > > Thanks
>> > > > Dimuthu
>> > > >
>> > >
>> > >
>> > > --
>> > > Junkai Xue
>> > >
>> >
>>
>>
>> --
>> Junkai Xue
>>
>


Re: Sporadic delays in task execution

2018-10-04 Thread DImuthu Upeksha
Hi Junkai,

I'm CCing Airavata dev list as this is directly related to the project.

I just went through the zookeeper path like //EXTERNALVIEW,
//CONFIGS/RESOURCE as I have noticed that helix controller is
periodically monitoring for the children of those paths even though all the
Workflows have moved into a saturated state like COMPLETED and STOPPED. In
our case, we have a lot of completed workflows piled up in those paths. I
believe that helix is clearing up those resources after some TTL. What I
did was writing an external spectator [1] that continuously monitors for
saturated workflows and clearing up resources before controller does that
after a TTL. After that, we didn't see such delays in workflow execution
and everything seems to be running smoothly. However we are continuously
monitoring our deployments for any form of adverse effect introduced by
that improvement.

Please let us know if we are doing something wrong in this improvement or
is there any better way to achieve this directly through helix task
framework.

[1]
https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java

Thanks
Dimuthu

On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai  wrote:

> Could you please check the log of how long for each pipeline stage takes?
>
> Also, did you set expiry for workflows? Are they piled up for long time?
> How long for each workflow completes?
>
> best,
>
> Junkai
>
> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com>
> wrote:
>
> > Hi Junkai,
> >
> > Average load is like 10 - 20 workflows per minutes. In some cases it's
> less
> > than that However based on the observations, I feel like it does not
> depend
> > on the load and it is sporadic. Is there a particular log lines that I
> can
> > filter in controller and participant to capture the timeline of workflow
> so
> > that I can figure out which which component is malfunctioning? We use
> helix
> > v 0.8.1.
> >
> > Thanks
> > Dimuthu
> >
> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai  wrote:
> >
> > > Hi Dimuthu,
> > >
> > > At which rate, you are keep submitting workflows? Usually, Workflow
> > > scheduling is very fast. And which version of Helix you are using?
> > >
> > > Best,
> > >
> > > Junkai
> > >
> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> > > dimuthu.upeks...@gmail.com>
> > > wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > We have noticed some delays between workflow submission and actual
> > > picking
> > > > up by participants and seems like that delay is somewhat constant
> > around
> > > 2-
> > > > 3 minutes. We used to continuously submit workflows and after 2 -3
> > > minutes,
> > > > a bulk of workflows are picked by participant and execute them. Then
> it
> > > > remain silent for next 2 -3 minutes event we submit more workflows.
> > It's
> > > > like participant picking up workflows in discrete time intervals. I'm
> > not
> > > > sure whether this is an issue of controller or the participant. Do
> you
> > > have
> > > > any experience with this sort of behavior?
> > > >
> > > > Thanks
> > > > Dimuthu
> > > >
> > >
> > >
> > > --
> > > Junkai Xue
> > >
> >
>
>
> --
> Junkai Xue
>