Thank you all for your very precious responses. Lars you are right. The root of the problem was my misunderstanding of which system handles the blocks. I was thinking about data locality in HDFS, which works transparently with HBase. Things make sens a lot better now :)
Ben On Thu, Jun 21, 2012 at 9:45 PM, Michael Segel <[email protected]>wrote: > While data locality is nice, you may see it becoming less of a bonus or > issue. > > With Co-processors available, indexing becomes viable. So you may see > things where within the M/R you process a row from table A, maybe hit an > index to find a value in table B and then do some processing. > > There's more to it, like using the intersection of indexes across a common > key to join data which blows the whole issue of data locality out of the > water. But that's a longer discussion requiring a white board, and some > alcohol or coffee.... ;-) > > > On Jun 21, 2012, at 5:07 AM, Lars George wrote: > > > Hi Ben, > > > > According to your fsck dump, the first copy is located on hadoop-143, > which has all the blocks for the region. So if you check, I would assume > that the region is currently open and served by hadoop-143, right? > > > > The TableInputFormat getSplit() will report that server to the MapReduce > framework, so the task would run on that node to access the data locally. > *Only* if you have speculative execution turned on you will have a task > that is run on another random node in parallel, which would need to do > heaps of remote reads. That is why it is recommended to turn that of for > HBase and MapReduce in combination. > > > > Lars > > > > On Jun 21, 2012, at 6:57 AM, Ben Kim wrote: > > > >> Hi Lars, > >> I appreciate a lot for your reply. > >> > >> As you told, a regionserver processes hfiles so that all data blocks are > >> located in the same physical machine unless the regionserver failes. > >> I ran following hadoop command to see location of a HFile > >> > >> *hadoop fsck > >> > /hbase/testtable/9488ef7fbd23b62b9bf85b722c015e90/testcf/08dc1940944b4952b23f0cbee51bcea8 > >> -files -locations -blocks* > >> > >> here is the output... > >> > >> FSCK started by hadoop from /203.235.211.142 for path > >>> > /hbase/testtable/9488ef7fbd23b62b9bf85b722c015e90/testcf/08dc1940944b4952b23f0cbee51bcea8 > >>> at Thu Jun 21 13:40:11 KST 2012 > >>> > /hbase/testtable/9488ef7fbd23b62b9bf85b722c015e90/testcf/08dc1940944b4952b23f0cbee51bcea8 > >>> 727156659 bytes, 11 block(s): OK > >>> 0. blk_1832396139416350298_1296638 len=67108864 repl=3 > [hadoop-145:50010, > >>> hadoop-143:50010, hadoop-144:50010] > >>> 1. blk_8910330590545256327_1296640 len=67108864 repl=3 > [hadoop-143:50010, > >>> hadoop-157:50010, hadoop-159:50010] > >>> 2. blk_-3868612696419011016_1296640 len=67108864 repl=3 > [hadoop-145:50010, > >>> hadoop-143:50010, hadoop-156:50010] > >>> 3. blk_-7551946394410945015_1296640 len=67108864 repl=3 > [hadoop-145:50010, > >>> hadoop-143:50010, hadoop-157:50010] > >>> 4. blk_-1875839158119319613_1296640 len=67108864 repl=3 > [hadoop-145:50010, > >>> hadoop-143:50010, hadoop-146:50010] > >>> 5. blk_-6953623390282045248_1296640 len=67108864 repl=3 > [hadoop-143:50010, > >>> hadoop-157:50010, hadoop-144:50010] > >>> 6. blk_-3016727256928339770_1296640 len=67108864 repl=3 > [hadoop-143:50010, > >>> hadoop-146:50010, hadoop-159:50010] > >>> 7. blk_3526351456802007773_1296640 len=67108864 repl=3 > [hadoop-143:50010, > >>> hadoop-160:50010, hadoop-156:50010] > >>> 8. blk_5134681308608742320_1296640 len=67108864 repl=3 > [hadoop-145:50010, > >>> hadoop-143:50010, hadoop-144:50010] > >>> 9. blk_-6875541109589395450_1296640 len=67108864 repl=3 > [hadoop-145:50010, > >>> hadoop-143:50010, hadoop-156:50010] > >>> 10. blk_-553661064097182668_1296640 len=56068019 repl=3 > [hadoop-143:50010, > >>> hadoop-146:50010, hadoop-159:50010] > >>> > >>> Status: HEALTHY > >>> Total size: 727156659 B > >>> Total dirs: 0 > >>> Total files: 1 > >>> Total blocks (validated): 11 (avg. block size 66105150 B) > >>> Minimally replicated blocks: 11 (100.0 %) > >>> Over-replicated blocks: 0 (0.0 %) > >>> Under-replicated blocks: 0 (0.0 %) > >>> Mis-replicated blocks: 0 (0.0 %) > >>> Default replication factor: 3 > >>> Average block replication: 3.0 > >>> Corrupt blocks: 0 > >>> Missing replicas: 0 (0.0 %) > >>> Number of data-nodes: 9 > >>> Number of racks: 1 > >>> FSCK ended at Thu Jun 21 13:40:11 KST 2012 in 4 milliseconds > >>> > >> > >> As you see, data blocks of the HFile are stored across two different > >> datanodes (hadoop-145 and hadoop-143). > >> > >> Let say a map task runs on hadoop-145 and needs to access the block 7. > Then > >> the map task needs to remotely access the block 7 on hadoop-143 server. > >> Almost half of the data blocks are stored & accessed remotely. Referring > >> from the above example, It's hard to say that the data locality is being > >> applied to HBase. > >> > >> Ben > >> > >> > >> On Fri, Jun 15, 2012 at 5:21 PM, Lars George <[email protected]> > wrote: > >> > >>> Hi Ben, > >>> > >>> See inline... > >>> > >>> On Jun 15, 2012, at 6:56 AM, Ben Kim wrote: > >>> > >>>> Hi, > >>>> > >>>> I've been posting questions in the mailing-list quiet often lately, > and > >>>> here goes another one about data locality > >>>> I read the excellent blog post about data locality that Lars George > wrote > >>>> at http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html > >>>> > >>>> I understand data locality in hbase as locating a region in a > >>> region-server > >>>> where most of its data blocks reside. > >>> > >>> The opposite is happening, i.e. the region server process triggers for > all > >>> data it writes to be located on the same physical machine. > >>> > >>>> So that way fast data access is guranteed when running a MR because > each > >>>> map/reduce task is run for each region in the tasktracker where the > >>> region > >>>> co-locates. > >>> > >>> Correct. > >>> > >>>> But what if the data blocks of the region are evenly spread over > multiple > >>>> region-servers? > >>> > >>> This will not happen, unless the original server fails. Then the > region is > >>> moved to another that now needs to do a lot of remote reads over the > >>> network. This is way there is work being done to allow for custom > placement > >>> policies in HDFS. That way you can store the entire region and all > copies > >>> as complete units on three data nodes. In case of a failure you can > then > >>> move the region to one of the two copies. This is not available yet > though, > >>> but it is being worked on (so I heard). > >>> > >>>> Does a MR task has to remotely access the data blocks from other > >>>> regionservers? > >>> > >>> For the above failure case, it would be the region server accessing the > >>> remote data, yes. > >>> > >>>> How good is hbase locating datablocks where a region resides? > >>> > >>> That is again the wrong way around. HBase has no clue as to where > blocks > >>> reside, nor does it know that the file system in fact uses separate > blocks. > >>> HBase stores files, HDFS does the block magic underneath the hood, and > >>> transparent to HBase. > >>> > >>>> Also is it correct to say that if i set smaller data block size data > >>>> locality gets worse, and if data block size gets bigger data locality > >>> gets > >>>> better. > >>> > >>> This is not applicable here, I am assuming this stems from the above > >>> confusion about which system is handling the blocks, HBase or HDFS. See > >>> above. > >>> > >>> HTH, > >>> Lars > >>> > >>>> > >>>> Best regards, > >>>> -- > >>>> > >>>> *Benjamin Kim* > >>>> *benkimkimben at gmail* > >>> > >>> > >> > >> > >> -- > >> > >> *Benjamin Kim* > >> *benkimkimben at gmail* > > > > > > -- *Benjamin Kim* *benkimkimben at gmail*
