Hi Maha, The info field is meant to be lightweight progress metadata. If you need to store something more sophisticated, you can use HelixManager#getHelixPropertyStore for small (kilobytes) state, or you can store pointers in the property store to a different store that can handle more data.
Kanak ________________________________ > From: [email protected] > Subject: Fwd: Helix parallelism > Date: Wed, 3 Sep 2014 10:49:41 -0700 > To: [email protected] > > Hi Kishore/kanak, > > Thanks much for the guidance, I have tested the feature with minimum > nodes it works as expected but have not done the fullest testing. > > I have a question, is there a way that I can get the resulting data > back to consolidate or aggregate in the client as an option along the > TaskResult object which has status and info? Say for example of > returning 1 or 2 kb of results from 5 task participants, as an optional > data object. Similar to the map-reduce concept but a real time basis so > given the client opportunity to do consolidate results. > > Regards, > Maha > > On Aug 22, 2014, at 8:05 AM, kishore g > <[email protected]<mailto:[email protected]>> wrote: > > > Not sure if you are subscribed to the mailing list > > ---------- Forwarded message ---------- > From: "Kanak Biscuitwala" <[email protected]<mailto:[email protected]>> > Date: Aug 21, 2014 10:02 AM > Subject: RE: Helix parallelism > To: "[email protected]<mailto:[email protected]>" > <[email protected]<mailto:[email protected]>> > Cc: > > Yes, you can use the task framework, which hasn't been released yet, > but will be soon. For more on the task framework, you can read this > blog > post: > http://engineering.linkedin.com/distributed-systems/ad-hoc-task-management-apache-helix > > > You can submit a job with 1000 tasks using either Java or YAML. > > The YAML specification of this job would look something like: > > > name: MyWorkflow > jobs: > - name: RunQueries > > > command: RunQuery # The command corresponding to Task callbacks > > jobConfigMap: { # Arbitrary key-value pairs to pass to all tasks in this job > > k1: "v1", > k2: "v2" > > > } > > numConcurrentTasksPerInstance: 200 # Max parallelism per instance > > tasks: # Schedule 1000 tasks, each responsible for aggregating requests > for a chunk of partitions > - taskConfigMap: { # Arbitrary key-value pairs to pass to this task > query: "query1" > } > - taskConfigMap: { > query: "query2" > } > - taskConfigMap: { > query: "query3" > } # Repeat for remaining 997 tasks > > > You can also see this class for an example of how to build jobs in > Java: > https://github.com/apache/helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/task/TestIndependentTaskRebalancer.java > > > Then you just need to implement a Task callback and register it on each > of the instances, and Helix will take care of assignment and retries. > ________________________________ > Date: Thu, 21 Aug 2014 09:07:11 -0700 > Subject: Helix parallelism > From: [email protected]<mailto:[email protected]> > To: [email protected]<mailto:[email protected]> > > Hi, > > I just started looking at the capability that helix can do Parallelism > executing task evenly in the cluster instances, resources. > > I have a requirement in executing different queries but in parallel to > solve some issue. Can helix help in this case? > > For example > 1. I have some 1000 different queries to be executed. > 2. I have 5 nodes configured in the helix cluster capable of executing > set of queries. > 3. I need helix to distribute these 1000 different queries equally to > the 5 nodes (200 per node) and takes care re-executing failed set of > queries. And notifies the controller about the job done. > > Can someone help me in understand how helix can solve this kind of issue? > > Regards, > Maha
