persisent services in Hadoop

John Lilley Wed, 25 Jun 2014 13:49:12 -0700

We are an ISV that currently ships a data-quality/integration suite running as 
a native YARN application.  We are finding several use cases that would benefit 
from being able to manage a per-node persistent service.  MapReduce has its 
"shuffle auxiliary service", but it isn't straightforward to add auxiliary 
services because they cannot be loaded from HDFS, so we'd have to manage the 
distribution of JARs across nodes (please tell me if I'm wrong here...).  Given 
that, is there a preferred method for managing persistent services on a Hadoop 
cluster?  We could have an AM that creates a set of YARN tasks and just waits 
until YARN gives a task on each node, and restart any failed tasks, but it 
doesn't really fit the AM/container structure very well.  I've also read about 
Slider, which looks interesting.  Other ideas?
--john

persisent services in Hadoop

Reply via email to