Further more, what can we do if a table has 25 online regions? Can we safely set caching to a bigger number? Is a split necessary as well?
On Mon, Aug 26, 2013 at 2:42 PM, Pavan Sudheendra <[email protected]>wrote: > Hi Ashwanth, thanks for the reply.. > > I went to the HBase Web UI and saw that my table had 1 Online Regions.. > Can you please guide me as to how to do the split on this table? I see the > UI asking for a region key and a split button... How many splits can i make > exactly? Can i give two different 'keys' and assume that the table is now > split into 3? One from beginning to key1, key1 to key2 and key2 to the rest? > > > On Mon, Aug 26, 2013 at 2:36 PM, Ashwanth Kumar < > [email protected]> wrote: > >> setCaching is setting the value via API, other way is to set it in the >> job configuration using the Key "hbase.client.scanner.caching". >> >> I just realized, given that you have just 1 region Caching wouldn't help >> much in reducing the time. Splitting might be an ideal solution. Based on >> your Heap space for every Mapper task try playing with that 1500 value. >> >> Word of caution, if you increase it too much, you might see >> ScannerTimeoutException in your TT Logs. >> >> >> On Mon, Aug 26, 2013 at 2:29 PM, Pavan Sudheendra <[email protected]>wrote: >> >>> Hi Ashwanth, >>> My caching is set to 1500 .. >>> >>> scan.setCaching(1500); >>> scan.setCacheBlocks(false); >>> >>> Can i set the number of splits via an API? >>> >>> >>> On Mon, Aug 26, 2013 at 2:22 PM, Ashwanth Kumar < >>> [email protected]> wrote: >>> >>>> To answer your question - Go to HBase Web UI and you can initiate a >>>> manual >>>> split on the table. >>>> >>>> But, before you do that. May be you can try increasing your client >>>> caching >>>> value (hbase.client.scanner.caching) in your Job. >>>> >>>> >>>> On Mon, Aug 26, 2013 at 2:09 PM, Pavan Sudheendra <[email protected] >>>> >wrote: >>>> >>>> > What is the input split of the HBase Table in this job status? >>>> > >>>> > map() completion: 0.0 >>>> > reduce() completion: 0.0 >>>> > Counters: 24 >>>> > File System Counters >>>> > FILE: Number of bytes read=0 >>>> > FILE: Number of bytes written=216030 >>>> > FILE: Number of read operations=0 >>>> > FILE: Number of large read operations=0 >>>> > FILE: Number of write operations=0 >>>> > HDFS: Number of bytes read=116 >>>> > HDFS: Number of bytes written=0 >>>> > HDFS: Number of read operations=1 >>>> > HDFS: Number of large read operations=0 >>>> > HDFS: Number of write operations=0 >>>> > Job Counters >>>> > Launched map tasks=1 >>>> > Data-local map tasks=1 >>>> > Total time spent by all maps in occupied slots >>>> (ms)=3332 >>>> > Map-Reduce Framework >>>> > Map input records=45570 >>>> > Map output records=45569 >>>> > Map output bytes=4682237 >>>> > Input split bytes=116 >>>> > Combine input records=0 >>>> > Combine output records=0 >>>> > Spilled Records=0 >>>> > CPU time spent (ms)=1142950 >>>> > Physical memory (bytes) snapshot=475811840 >>>> > Virtual memory (bytes) snapshot=1262202880 >>>> > Total committed heap usage (bytes)=370343936 >>>> > >>>> > >>>> > My table has 80,000 rows.. >>>> > Is there any way to increase the number of input splits since it takes >>>> > nearly 30 mins for the map tasks to complete.. very unusual. >>>> > >>>> > >>>> > >>>> > -- >>>> > Regards- >>>> > Pavan >>>> > >>>> >>>> >>>> >>>> -- >>>> >>>> Ashwanth Kumar / ashwanthkumar.in >>>> >>> >>> >>> >>> -- >>> Regards- >>> Pavan >>> >> >> >> >> -- >> >> Ashwanth Kumar / ashwanthkumar.in >> >> > > > -- > Regards- > Pavan > -- Regards- Pavan
