How does scan work internally? Does it make use of multi-threading/replication?

IGZ Nick Mon, 18 Jun 2012 11:17:02 -0700

Hi folks,

Here is how I understand the scan flow (A regular sequential scan from key
A to key B):
- Zookeeper is contacted for the RegionServer that has the -ROOT- regions.
- The -ROOT- RS is contacted and it gets you the RS for .META.
- The .META. is contacted, and it will give you all regions for keys from A
to B - e.g, A to A1 resides in reg1, A1 to A2 in reg2, A2 to B in reg3.


Now if HDFS replication is set to 3, there must be 3 RS which will have
reg1, and likewise for reg2 and reg3. So how does the client figure out
which RS to go to? Or am I completely wrong here?
As a follow up, if reg3 is present in RS1, RS2 and RS3, then does the
client get all the data from A1 to A2 from a single RS or is there some
sort of splitting like A1 to A11 can come from RS1, A11 to A12 from RS2 and
A12 to A2  from RS3. That would be faster, right? Put another way, if my
scan consists of only one region, which is hosted on three RegionServers,
does the data come in from all 3 RS's or just one of them?

Thanks a lot,
Nick

How does scan work internally? Does it make use of multi-threading/replication?

Reply via email to