Hi all, I have a small Hadoop and HBase cluster with 4 nodes all acting as datanodes and regionservers, with replication set to 3. I am bulk loading data in HBase using the importtsv program, writing heavily to one table that initially had no data in it and only 1 region. I'll call this TableA.
In HBase, I already had a table (tableB) with about 400 regions. These regions were evenly distributed across the four nodes I have. Here is the behavior I am observing with my bulk import of data: Initially, one regionserver was assigned regions for TabelA, so it got all the initial requests. When the number of regions became unbalanced across all four nodes, regions for tableB (my old table) are reassigned to the other regionservers, rather than any regions from my newer table (tableA). This means that my one node continues to be hit with all requests, which is slowing down my import. How does HBase decide which regions to reassign when balancing, or is it relatively arbitrary? Is there anything I can do at this point to force regions of my TableA to be assigned to other region servers?
