setCaching is setting the value via API, other way is to set it in the job
configuration using the Key "hbase.client.scanner.caching".

I just realized, given that you have just 1 region Caching wouldn't help
much in reducing the time. Splitting might be an ideal solution. Based on
your Heap space for every Mapper task try playing with that 1500 value.

Word of caution, if you increase it too much, you might see
ScannerTimeoutException in your TT Logs.


On Mon, Aug 26, 2013 at 2:29 PM, Pavan Sudheendra <[email protected]>wrote:

> Hi Ashwanth,
> My caching is set to 1500 ..
>
> scan.setCaching(1500);
> scan.setCacheBlocks(false);
>
> Can i set the number of splits via an API?
>
>
> On Mon, Aug 26, 2013 at 2:22 PM, Ashwanth Kumar <
> [email protected]> wrote:
>
>> To answer your question - Go to HBase Web UI and you can initiate a manual
>> split on the table.
>>
>> But, before you do that. May be you can try increasing your client caching
>> value (hbase.client.scanner.caching) in your Job.
>>
>>
>> On Mon, Aug 26, 2013 at 2:09 PM, Pavan Sudheendra <[email protected]
>> >wrote:
>>
>> > What is the input split of the HBase Table in this job status?
>> >
>> > map() completion: 0.0
>> > reduce() completion: 0.0
>> > Counters: 24
>> >         File System Counters
>> >                 FILE: Number of bytes read=0
>> >                 FILE: Number of bytes written=216030
>> >                 FILE: Number of read operations=0
>> >                 FILE: Number of large read operations=0
>> >                 FILE: Number of write operations=0
>> >                 HDFS: Number of bytes read=116
>> >                 HDFS: Number of bytes written=0
>> >                 HDFS: Number of read operations=1
>> >                 HDFS: Number of large read operations=0
>> >                 HDFS: Number of write operations=0
>> >         Job Counters
>> >                 Launched map tasks=1
>> >                 Data-local map tasks=1
>> >                 Total time spent by all maps in occupied slots (ms)=3332
>> >         Map-Reduce Framework
>> >                 Map input records=45570
>> >                 Map output records=45569
>> >                 Map output bytes=4682237
>> >                 Input split bytes=116
>> >                 Combine input records=0
>> >                 Combine output records=0
>> >                 Spilled Records=0
>> >                 CPU time spent (ms)=1142950
>> >                 Physical memory (bytes) snapshot=475811840
>> >                 Virtual memory (bytes) snapshot=1262202880
>> >                 Total committed heap usage (bytes)=370343936
>> >
>> >
>> > My table has 80,000 rows..
>> > Is there any way to increase the number of input splits since it takes
>> > nearly 30 mins for the map tasks to complete.. very unusual.
>> >
>> >
>> >
>> > --
>> > Regards-
>> > Pavan
>> >
>>
>>
>>
>> --
>>
>> Ashwanth Kumar / ashwanthkumar.in
>>
>
>
>
> --
> Regards-
> Pavan
>



-- 

Ashwanth Kumar / ashwanthkumar.in

Reply via email to