Hi folks, Here is how I understand the scan flow (A regular sequential scan from key A to key B): - Zookeeper is contacted for the RegionServer that has the -ROOT- regions. - The -ROOT- RS is contacted and it gets you the RS for .META. - The .META. is contacted, and it will give you all regions for keys from A to B - e.g, A to A1 resides in reg1, A1 to A2 in reg2, A2 to B in reg3.
Now if HDFS replication is set to 3, there must be 3 RS which will have reg1, and likewise for reg2 and reg3. So how does the client figure out which RS to go to? Or am I completely wrong here? As a follow up, if reg3 is present in RS1, RS2 and RS3, then does the client get all the data from A1 to A2 from a single RS or is there some sort of splitting like A1 to A11 can come from RS1, A11 to A12 from RS2 and A12 to A2 from RS3. That would be faster, right? Put another way, if my scan consists of only one region, which is hosted on three RegionServers, does the data come in from all 3 RS's or just one of them? Thanks a lot, Nick
