Hi all,
Currently when I start up my cluster, I use the packaged scripts for starting up Hadoop (v2.6.0) and HBase (0.98.9-hadoop2). This results in the following processes running: HADOOP: *hadoop --- /usr/java/jre1.7.0_65/bin/java -Dproc_namenode --- org.apache.hadoop.hdfs.server.namenode.NameNode* *hadoop --- /usr/java/jre1.7.0_65/bin/java -Dproc_datanode --- org.apache.hadoop.hdfs.server.datanode.DataNode* *hadoop --- /usr/java/jre1.7.0_65/bin/java -Dproc_secondarynamenode --- org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode* *hadoop --- /usr/java/jre1.7.0_65/bin/java -Dproc_resourcemanager --- org.apache.hadoop.yarn.server.resourcemanager.ResourceManager* *hadoop --- /usr/java/jre1.7.0_65/bin/java -Dproc_nodemanager --- org.apache.hadoop.yarn.server.nodemanager.NodeManager* HBASE: *hadoop --- /usr/java/jre1.7.0_65/bin/java -Dproc_master --- org.apache.hadoop.hbase.master.HMaster * *hadoop --- /usr/java/jre1.7.0_65/bin/java -Dproc_regionserver --- org.apache.hadoop.hbase.regionserver.HRegionServer* My understanding is that the Hadoop processes “Resource Manager” and “Node Manager” are part of Hadoop YARN (correct me if I’m wrong). If all I want to do with my cluster is put/get/scan directly with HBase and upsert/select/join with Phoenix, do I need to have YARN running? Does Phoenix use the YARN framework for distributing parts of queries, for example if I count the number of rows in a table? Or will it just scan through and count rows directly in HBase? Thanks! Matt
