You might also want to look at Gobblin which uses Helix in a very similar way 
and is actually used to read data from HDFS, do transformations and load into 
remote store. 

Shirshanka


> On Jun 19, 2017, at 11:11 AM, kishore g <[email protected]> wrote:
> 
> That should work.
> 
>> On Mon, Jun 19, 2017 at 9:14 AM, Shekhar Bansal <[email protected]> 
>> wrote:
>> Thanks a lot Kishor.
>> I think I can treat HDFS directory as resource and mode of filename's hash 
>> as tasks, is there any better way of doing it in Helix?
>> 
>> Thanks
>> Shekhar
>> 
>> 
>> On Monday, June 19, 2017 8:15 PM, kishore g <[email protected]> wrote:
>> 
>> 
>> Currently, Helix ensures even distribution of partitions within a resource, 
>> not across resources. Is it possible for you to add tasks as part of the 
>> same resource?
>>  &3 Yes, you can start the controller as part of your process. But since you 
>> said you launch this on Kubernetes every 5 minutes, I suggest keeping 
>> controller and zookeeper running all the time. Controllers are light weight 
>> and you can get away with a very an entry level container spec. It's ok to 
>> launch Helix Participants every 5 minutes.
>> You should consider using Helix Task Framework. It provides all the 
>> functionalities you need.
>> 
>> 
>> On Mon, Jun 19, 2017 at 7:24 AM, Shekhar Bansal <[email protected]> 
>> wrote:
>> I have a standalone java app(containerised), it reads data from HDFS, does 
>> some transformations and write data to remote storage. I want to make it 
>> scalable by launching multiple instances of this java app. My problem is how 
>> to assign tasks among these instances. can helix solve this problem?
>> 
>> If yes, can you please help me with following 
>> I referred helix quickstart example and created 1 resource per file but 
>> node1 got assigned master for all resources, is it because of simple 
>> StateModelDefinition used in quickstart example or I am using it wrong way 
>> or is it some limitation of helix
>> I want to avoid running a separate controller process, so If I run start 
>> controller as part of setup will helix be able to elect master controller 
>> (in standalone mode), is it advisable to run tens of controllers in 
>> distributed mode.
>> I schedule my app every five minutes using kubernetes cron, is it advisable 
>> to use helix for such short lived processes
>> 
>> 
>> Thanks
>> Shekhar
>> 
>> 
>> 
> 

Reply via email to