How would you define a "low slotted" ? A poor scheduling capacity to avoid high number of mappers ?
On Sat, Aug 25, 2012 at 3:32 PM, Harsh J <[email protected]> wrote: > Hi Marc, > > On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese <[email protected]> > wrote: >> The reasons for that would be: >> -After running full compaction, HFiles end up in the RS nodes, so would >> achieve data locality. >> -As I have replication factor 3 and just 2 Hbase nodes, I know that no map >> task would try to read in the RS nodes. The reduce tasks will write first in >> the node where they exist (which will never be a RS node). >> -So, in the RS I would end up having the Hbase tables and block replicas of >> the MR jobs that will never be read (as Maps do data locality and at least a >> replica of each block will be in a MR node) > > Just to keep in mind: All HBase read/write requests are made via the > RS. The RS's held blocks of HDFS data isn't directly accessed by any > client (RS is THE data server for HBase client). > >> In case this would work, if I add more nodes with RS and datanode, could I >> guarantee that no map task would ever read in them? (assuming that a reduce >> task always writes first in the node where it exists, correct me if I'm >> wrong please as I'm not sure about this). > > Yes, you can guarantee this to a certain extent. In case data-locality > is absent in some tasks (due to scheduling constraints), a few blocks > may be read out by the RS-node's DNs, but shouldn't be a big impact > given that a good scheduler in MR usually helps avoid having to do > that. > > Alternatively you can also consider running low-slotted TTs to use up > the RS machines but in a safer way. > > -- > Harsh J -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
