1. Currently, Helix ensures even distribution of partitions within a resource, not across resources. Is it possible for you to add tasks as part of the same resource? 2. &3 Yes, you can start the controller as part of your process. But since you said you launch this on Kubernetes every 5 minutes, I suggest keeping controller and zookeeper running all the time. Controllers are light weight and you can get away with a very an entry level container spec. It's ok to launch Helix Participants every 5 minutes.
You should consider using Helix Task Framework <http://helix.apache.org/0.6.7-docs/tutorial_task_framework.html>. It provides all the functionalities you need. On Mon, Jun 19, 2017 at 7:24 AM, Shekhar Bansal <[email protected]> wrote: > I have a standalone java app(containerised), it reads data from HDFS, does > some transformations and write data to remote storage. I want to make it > scalable by launching multiple instances of this java app. My problem is > how to assign tasks among these instances. can helix solve this problem? > > If yes, can you please help me with following > > 1. I referred helix quickstart example and created 1 resource per file > but node1 got assigned master for all resources, is it because of simple > StateModelDefinition used in quickstart example or I am using it wrong way > or is it some limitation of helix > 2. I want to avoid running a separate controller process, so If I run > start controller as part of setup will helix be able to elect master > controller (in standalone mode), is it advisable to run tens of controllers > in distributed mode. > 3. I schedule my app every five minutes using kubernetes cron, is it > advisable to use helix for such short lived processes > > > > Thanks > Shekhar >
