Re: Input split for a HBase of 80,000 rows?

Pavan Sudheendra Mon, 26 Aug 2013 02:14:15 -0700

Hi Ashwanth, thanks for the reply..

I went to the HBase Web UI and saw that my table had 1 Online Regions.. Can
you please guide me as to how to do the split on this table? I see the UI
asking for a region key and a split button... How many splits can i make
exactly? Can i give two different 'keys' and assume that the table is now
split into 3? One from beginning to key1, key1 to key2 and key2 to the rest?



On Mon, Aug 26, 2013 at 2:36 PM, Ashwanth Kumar <
[email protected]> wrote:

> setCaching is setting the value via API, other way is to set it in the job
> configuration using the Key "hbase.client.scanner.caching".
>
> I just realized, given that you have just 1 region Caching wouldn't help
> much in reducing the time. Splitting might be an ideal solution. Based on
> your Heap space for every Mapper task try playing with that 1500 value.
>
> Word of caution, if you increase it too much, you might see
> ScannerTimeoutException in your TT Logs.
>
>
> On Mon, Aug 26, 2013 at 2:29 PM, Pavan Sudheendra <[email protected]>wrote:
>
>> Hi Ashwanth,
>> My caching is set to 1500 ..
>>
>> scan.setCaching(1500);
>> scan.setCacheBlocks(false);
>>
>> Can i set the number of splits via an API?
>>
>>
>> On Mon, Aug 26, 2013 at 2:22 PM, Ashwanth Kumar <
>> [email protected]> wrote:
>>
>>> To answer your question - Go to HBase Web UI and you can initiate a
>>> manual
>>> split on the table.
>>>
>>> But, before you do that. May be you can try increasing your client
>>> caching
>>> value (hbase.client.scanner.caching) in your Job.
>>>
>>>
>>> On Mon, Aug 26, 2013 at 2:09 PM, Pavan Sudheendra <[email protected]
>>> >wrote:
>>>
>>> > What is the input split of the HBase Table in this job status?
>>> >
>>> > map() completion: 0.0
>>> > reduce() completion: 0.0
>>> > Counters: 24
>>> >         File System Counters
>>> >                 FILE: Number of bytes read=0
>>> >                 FILE: Number of bytes written=216030
>>> >                 FILE: Number of read operations=0
>>> >                 FILE: Number of large read operations=0
>>> >                 FILE: Number of write operations=0
>>> >                 HDFS: Number of bytes read=116
>>> >                 HDFS: Number of bytes written=0
>>> >                 HDFS: Number of read operations=1
>>> >                 HDFS: Number of large read operations=0
>>> >                 HDFS: Number of write operations=0
>>> >         Job Counters
>>> >                 Launched map tasks=1
>>> >                 Data-local map tasks=1
>>> >                 Total time spent by all maps in occupied slots
>>> (ms)=3332
>>> >         Map-Reduce Framework
>>> >                 Map input records=45570
>>> >                 Map output records=45569
>>> >                 Map output bytes=4682237
>>> >                 Input split bytes=116
>>> >                 Combine input records=0
>>> >                 Combine output records=0
>>> >                 Spilled Records=0
>>> >                 CPU time spent (ms)=1142950
>>> >                 Physical memory (bytes) snapshot=475811840
>>> >                 Virtual memory (bytes) snapshot=1262202880
>>> >                 Total committed heap usage (bytes)=370343936
>>> >
>>> >
>>> > My table has 80,000 rows..
>>> > Is there any way to increase the number of input splits since it takes
>>> > nearly 30 mins for the map tasks to complete.. very unusual.
>>> >
>>> >
>>> >
>>> > --
>>> > Regards-
>>> > Pavan
>>> >
>>>
>>>
>>>
>>> --
>>>
>>> Ashwanth Kumar / ashwanthkumar.in
>>>
>>
>>
>>
>> --
>> Regards-
>> Pavan
>>
>
>
>
> --
>
> Ashwanth Kumar / ashwanthkumar.in
>
>


-- 
Regards-
Pavan

Re: Input split for a HBase of 80,000 rows?

Reply via email to