Yes, you can use the task framework, which hasn't been released yet, but will 
be soon. For more on the task framework, you can read this blog post: 
http://engineering.linkedin.com/distributed-systems/ad-hoc-task-management-apache-helix
You can submit a job with 1000 tasks using either Java or YAML.
The YAML specification of this job would look something like:
name: MyWorkflowjobs:    - name: RunQueries      command: RunQuery # The 
command corresponding to Task callbacks      jobConfigMap: { # Arbitrary 
key-value pairs to pass to all tasks in this job        k1: "v1",        k2: 
"v2"      }      numConcurrentTasksPerInstance: 200 # Max parallelism per 
instance      tasks: # Schedule 1000 tasks, each responsible for aggregating 
requests for a chunk of partitions        - taskConfigMap: { # Arbitrary 
key-value pairs to pass to this task            query: "query1"          }      
  - taskConfigMap: {            query: "query2"          }        - 
taskConfigMap: {            query: "query3"          } # Repeat for remaining 
997 tasks

You can also see this class for an example of how to build jobs in Java: 
https://github.com/apache/helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java
Then you just need to implement a Task callback and register it on each of the 
instances, and Helix will take care of assignment and retries.
Date: Thu, 21 Aug 2014 09:07:11 -0700
Subject: Helix parallelism
From: [email protected]
To: [email protected]

Hi,
I just started looking at the capability that helix can do Parallelism 
executing task evenly in the cluster instances, resources. 
I have a requirement in executing different queries but in parallel to solve 
some issue. Can helix help in this case?

For example1. I have some 1000 different queries to be executed.2. I have 5 
nodes configured in the helix cluster capable of executing set of queries.3. I 
need helix to distribute these 1000 different queries equally to the 5 nodes 
(200 per node) and takes care re-executing failed set of queries. And notifies 
the controller about the job done.

Can someone help me in understand how helix can solve this kind of issue? 
Regards,Maha                                      

Reply via email to