Hi Dimuthu -

1. In Task Framework, tasks are units of work that are mutually independent
- that is, Helix will schedule A and B without considering any dependency.
By what you said about how task A depends on B, what did you actually mean?
Is Task A blocking (sleeping) until some condition is met by calling some
remote call? Or are these actually jobs where Job A depends on B in the
JobDAG?

2. Could you expand on the configs you are using? Your WorkflowConfig and
JobConfig. How are you modeling your workload? 1 workflow - 1 job - 2 tasks?

3. Let us also check that the cluster has 3 instances live. When you boot
up the cluster, do you see 3 ZNodes under LIVEINSTANCES? 3 ZNodes under
CONFIGS/PARTICIPANT? 3 directories in /INSTANCES?

As Kishore said, in theory the throughput should increase as you give the
cluster more nodes. My hunch is that by some combination of configs and
dependency setting, the workload is somehow getting 'linearized,' which
explains the "almost same" time to execute.

Hunter

On Thu, Apr 4, 2019 at 3:38 PM DImuthu Upeksha <dimuthu.upeks...@gmail.com>
wrote:

> Hi Kishore,
>
> There are two tasks (A [1], B [2]). I submit 1000 workflows at a time which
> includes both task A and task B. Task A depends on task B. In both tasks, I
> connect to a Thrift API to fetch some data and in task B, there is a remote
> ssh call to a compute host.
>
> In first test
> 1 Controller
> 1 Participant
> 1 Zookeeper
>
> In second test
> 1 Controller
> 2 Participants
> 1 Zookeeper
>
> In third test
> 1 Controller
> 3 Participants
> 1 Zookeeper
>
> However in all three cases, time to complete all 1000 submitted workflows
> were almost same. In fact in 2nd and 3rd cases, it took little more time
> than 1st case.
>
> I understand that there are lots of moving parts in this scenario (Thrift
> API performance, SSH client delays) however I need to know whether I have
> setup the cluster correctly. Is there some additional steps to be followed
> when adding a new participant? In my case, I just created a copy of 1st
> participant, changed the participant name and started it.
>
> [1]
>
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/task/submission/DefaultJobSubmissionTask.java
> [2]
>
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/task/env/EnvSetupTask.java
>
> Thanks
> Dimuthu
>
> On Thu, Apr 4, 2019 at 5:34 PM kishore g <g.kish...@gmail.com> wrote:
>
> > It should ideally but might depend on what happens within each task. Can
> > you give more information about the setup (how many nodes, tasks) etc.
> >
> > On Thu, Apr 4, 2019 at 2:15 PM DImuthu Upeksha <
> dimuthu.upeks...@gmail.com
> > >
> > wrote:
> >
> > > Hi Folks,
> > >
> > > In task framework, it is expected to significantly improve the
> throughput
> > > of tasks executed if I add a new participant to the the cluster? Reason
> > for
> > > asking for this is, I'm seeing the almost same throughput  with one
> > > participant and two participants. I'm using helix 0.8.4 for this setup.
> > >
> > > Thanks
> > > Dimuthu
> > >
> >
>

Reply via email to