I don't think splitting will help. Adding more mappers in tasktracker will use more resources(heap memory). btw, how to view average region size? I found in web ui: ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed Storefile Size Index Size Bloom Size mphbase1,60020,1403089429986 38 119 75857m 75882mb 57839k 189016k mphbase2,60020,1403089433406 37 124 92252m 92281mb 69323k 248532k mphbase3,60020,1403088177813 40 53 35603m 35613mb 29471k 63010k mphbase4,60020,1403088177030 38 118 64880m 64898mb 50425k 167232k mphbase5,60020,1403088177302 38 99 52233m 52250mb 39737k 138680k
On Thu, Jun 26, 2014 at 9:11 PM, Ted Yu <[email protected]> wrote: > 80 regions over 5 nodes - that's 16 per server. > > How big is average region size ? > Have you considered splitting existing regions ? > > Cheers > > On Jun 26, 2014, at 12:34 AM, Li Li <[email protected]> wrote: > >> my table has about 700 million rows and about 80 regions. each task >> tracker is configured with 4 mappers and 4 reducers at the same time. >> The hadoop/hbase cluster has 5 nodes so at the same time, it has 20 >> mappers running. it takes more than an hour to finish mapper stage. >> The hbase cluster's load is very low, about 2,000 request per second. >> I think one mapper for a region is too small. How can I run more than >> one mapper for a region so that it can take full advantage of >> computing resources?
