Hi all,

We're currently moving to Hadoop 2 (years behind, I know) and debating how to 
handle job resource management using YARN where nearly 100% of our jobs are 
maps over HBase Tables and a large portion also Reduce to HBase. While YARN 
adequately handles the resources of the machine its tasks are running on, for 
the purposes of TableMapper jobs, the resources consumed are actually on the 
remote regionserver, which YARN doesn't seem to be able to recognize. We've 
implemented things such as per-job concurrent task limits to help deal with 
this on Hadoop 1, but that seems hard to do in Hadoop 2. I'm wondering if 
anyone has best practices or any ideas on how to deal with an all HBase, 
heavily I/O and RegionServer memory/RPC bound workload? Thanks in advance!

--Ian

Reply via email to