Fwd: Helix parallelism

Maharajan Nachiappa Wed, 03 Sep 2014 10:56:54 -0700

Hi Kishore/kanak,

Thanks much for the guidance, I have tested the feature with minimum nodes it 
works as expected but have not done the fullest testing.


I have a question, is there a way that I can get the resulting data back to 
consolidate or aggregate in the client as an option along the TaskResult object 
which has status and info? Say for example of returning 1 or 2 kb of results 
from 5 task participants, as an optional data object. Similar to the map-reduce 
concept but a real time basis so given the client opportunity to do consolidate 
results.

Regards,
Maha

On Aug 22, 2014, at 8:05 AM, kishore g <[email protected]> wrote:

Not sure if you are subscribed to the mailing list

---------- Forwarded message ----------
From: "Kanak Biscuitwala" <[email protected]>
Date: Aug 21, 2014 10:02 AM
Subject: RE: Helix parallelism
To: "[email protected]" <[email protected]>
Cc: 

Yes, you can use the task framework, which hasn't been released yet, but will 
be soon. For more on the task framework, you can read this blog post: 
http://engineering.linkedin.com/distributed-systems/ad-hoc-task-management-apache-helix

You can submit a job with 1000 tasks using either Java or YAML.

The YAML specification of this job would look something like:

name: MyWorkflow
jobs:
    - name: RunQueries

      command: RunQuery # The command corresponding to Task callbacks

      jobConfigMap: { # Arbitrary key-value pairs to pass to all tasks in this 
job

        k1: "v1",
        k2: "v2"

      }
      numConcurrentTasksPerInstance: 200 # Max parallelism per instance

      tasks: # Schedule 1000 tasks, each responsible for aggregating requests 
for a chunk of partitions

        - taskConfigMap: { # Arbitrary key-value pairs to pass to this task

            query: "query1"
          }

        - taskConfigMap: {
            query: "query2"

          }
        - taskConfigMap: {

            query: "query3"
          } # Repeat for remaining 997 tasks



You can also see this class for an example of how to build jobs in Java: 
https://github.com/apache/helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java

Then you just need to implement a Task callback and register it on each of the 
instances, and Helix will take care of assignment and retries.
Date: Thu, 21 Aug 2014 09:07:11 -0700
Subject: Helix parallelism
From: [email protected]
To: [email protected]

Hi,

I just started looking at the capability that helix can do Parallelism 
executing task evenly in the cluster instances, resources. 

I have a requirement in executing different queries but in parallel to solve 
some issue. Can helix help in this case?

For example
1. I have some 1000 different queries to be executed.
2. I have 5 nodes configured in the helix cluster capable of executing set of 
queries.
3. I need helix to distribute these 1000 different queries equally to the 5 
nodes (200 per node) and takes care re-executing failed set of queries. And 
notifies the controller about the job done.

Can someone help me in understand how helix can solve this kind of issue? 

Regards,
Maha

Fwd: Helix parallelism

Reply via email to