You might also want to look at Gobblin which uses Helix in a very similar way and is actually used to read data from HDFS, do transformations and load into remote store.
Shirshanka > On Jun 19, 2017, at 11:11 AM, kishore g <[email protected]> wrote: > > That should work. > >> On Mon, Jun 19, 2017 at 9:14 AM, Shekhar Bansal <[email protected]> >> wrote: >> Thanks a lot Kishor. >> I think I can treat HDFS directory as resource and mode of filename's hash >> as tasks, is there any better way of doing it in Helix? >> >> Thanks >> Shekhar >> >> >> On Monday, June 19, 2017 8:15 PM, kishore g <[email protected]> wrote: >> >> >> Currently, Helix ensures even distribution of partitions within a resource, >> not across resources. Is it possible for you to add tasks as part of the >> same resource? >> &3 Yes, you can start the controller as part of your process. But since you >> said you launch this on Kubernetes every 5 minutes, I suggest keeping >> controller and zookeeper running all the time. Controllers are light weight >> and you can get away with a very an entry level container spec. It's ok to >> launch Helix Participants every 5 minutes. >> You should consider using Helix Task Framework. It provides all the >> functionalities you need. >> >> >> On Mon, Jun 19, 2017 at 7:24 AM, Shekhar Bansal <[email protected]> >> wrote: >> I have a standalone java app(containerised), it reads data from HDFS, does >> some transformations and write data to remote storage. I want to make it >> scalable by launching multiple instances of this java app. My problem is how >> to assign tasks among these instances. can helix solve this problem? >> >> If yes, can you please help me with following >> I referred helix quickstart example and created 1 resource per file but >> node1 got assigned master for all resources, is it because of simple >> StateModelDefinition used in quickstart example or I am using it wrong way >> or is it some limitation of helix >> I want to avoid running a separate controller process, so If I run start >> controller as part of setup will helix be able to elect master controller >> (in standalone mode), is it advisable to run tens of controllers in >> distributed mode. >> I schedule my app every five minutes using kubernetes cron, is it advisable >> to use helix for such short lived processes >> >> >> Thanks >> Shekhar >> >> >> >
