Further more, what can we do if a table has 25 online regions? Can we
safely set caching to a bigger number? Is a split necessary as well?


On Mon, Aug 26, 2013 at 2:42 PM, Pavan Sudheendra <[email protected]>wrote:

> Hi Ashwanth, thanks for the reply..
>
> I went to the HBase Web UI and saw that my table had 1 Online Regions..
> Can you please guide me as to how to do the split on this table? I see the
> UI asking for a region key and a split button... How many splits can i make
> exactly? Can i give two different 'keys' and assume that the table is now
> split into 3? One from beginning to key1, key1 to key2 and key2 to the rest?
>
>
> On Mon, Aug 26, 2013 at 2:36 PM, Ashwanth Kumar <
> [email protected]> wrote:
>
>> setCaching is setting the value via API, other way is to set it in the
>> job configuration using the Key "hbase.client.scanner.caching".
>>
>> I just realized, given that you have just 1 region Caching wouldn't help
>> much in reducing the time. Splitting might be an ideal solution. Based on
>> your Heap space for every Mapper task try playing with that 1500 value.
>>
>> Word of caution, if you increase it too much, you might see
>> ScannerTimeoutException in your TT Logs.
>>
>>
>> On Mon, Aug 26, 2013 at 2:29 PM, Pavan Sudheendra <[email protected]>wrote:
>>
>>> Hi Ashwanth,
>>> My caching is set to 1500 ..
>>>
>>> scan.setCaching(1500);
>>> scan.setCacheBlocks(false);
>>>
>>> Can i set the number of splits via an API?
>>>
>>>
>>> On Mon, Aug 26, 2013 at 2:22 PM, Ashwanth Kumar <
>>> [email protected]> wrote:
>>>
>>>> To answer your question - Go to HBase Web UI and you can initiate a
>>>> manual
>>>> split on the table.
>>>>
>>>> But, before you do that. May be you can try increasing your client
>>>> caching
>>>> value (hbase.client.scanner.caching) in your Job.
>>>>
>>>>
>>>> On Mon, Aug 26, 2013 at 2:09 PM, Pavan Sudheendra <[email protected]
>>>> >wrote:
>>>>
>>>> > What is the input split of the HBase Table in this job status?
>>>> >
>>>> > map() completion: 0.0
>>>> > reduce() completion: 0.0
>>>> > Counters: 24
>>>> >         File System Counters
>>>> >                 FILE: Number of bytes read=0
>>>> >                 FILE: Number of bytes written=216030
>>>> >                 FILE: Number of read operations=0
>>>> >                 FILE: Number of large read operations=0
>>>> >                 FILE: Number of write operations=0
>>>> >                 HDFS: Number of bytes read=116
>>>> >                 HDFS: Number of bytes written=0
>>>> >                 HDFS: Number of read operations=1
>>>> >                 HDFS: Number of large read operations=0
>>>> >                 HDFS: Number of write operations=0
>>>> >         Job Counters
>>>> >                 Launched map tasks=1
>>>> >                 Data-local map tasks=1
>>>> >                 Total time spent by all maps in occupied slots
>>>> (ms)=3332
>>>> >         Map-Reduce Framework
>>>> >                 Map input records=45570
>>>> >                 Map output records=45569
>>>> >                 Map output bytes=4682237
>>>> >                 Input split bytes=116
>>>> >                 Combine input records=0
>>>> >                 Combine output records=0
>>>> >                 Spilled Records=0
>>>> >                 CPU time spent (ms)=1142950
>>>> >                 Physical memory (bytes) snapshot=475811840
>>>> >                 Virtual memory (bytes) snapshot=1262202880
>>>> >                 Total committed heap usage (bytes)=370343936
>>>> >
>>>> >
>>>> > My table has 80,000 rows..
>>>> > Is there any way to increase the number of input splits since it takes
>>>> > nearly 30 mins for the map tasks to complete.. very unusual.
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Regards-
>>>> > Pavan
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Ashwanth Kumar / ashwanthkumar.in
>>>>
>>>
>>>
>>>
>>> --
>>> Regards-
>>> Pavan
>>>
>>
>>
>>
>> --
>>
>> Ashwanth Kumar / ashwanthkumar.in
>>
>>
>
>
> --
> Regards-
> Pavan
>



-- 
Regards-
Pavan

Reply via email to