Hi Marc, On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese <[email protected]> wrote: > The reasons for that would be: > -After running full compaction, HFiles end up in the RS nodes, so would > achieve data locality. > -As I have replication factor 3 and just 2 Hbase nodes, I know that no map > task would try to read in the RS nodes. The reduce tasks will write first in > the node where they exist (which will never be a RS node). > -So, in the RS I would end up having the Hbase tables and block replicas of > the MR jobs that will never be read (as Maps do data locality and at least a > replica of each block will be in a MR node)
Just to keep in mind: All HBase read/write requests are made via the RS. The RS's held blocks of HDFS data isn't directly accessed by any client (RS is THE data server for HBase client). > In case this would work, if I add more nodes with RS and datanode, could I > guarantee that no map task would ever read in them? (assuming that a reduce > task always writes first in the node where it exists, correct me if I'm > wrong please as I'm not sure about this). Yes, you can guarantee this to a certain extent. In case data-locality is absent in some tasks (due to scheduling constraints), a few blocks may be read out by the RS-node's DNs, but shouldn't be a big impact given that a good scheduler in MR usually helps avoid having to do that. Alternatively you can also consider running low-slotted TTs to use up the RS machines but in a safer way. -- Harsh J
