Re: Parallel Scanner

ramkrishna vasudevan Sun, 19 Feb 2017 21:30:07 -0800

Hi Anil,

HBase directly does not provide parallel scans. If you know the table
region's start and end keys you could create parallel scans in your
application code.


In the above code snippet, the intent is right - you get the required
regions and can issue parallel scans from your app.

One thing to watch out is that if there is a split in the region then this
start and end row may change so in that case it is better you try to get
the regions every time before you issue a scan. Does that make sense to you?

Regards
Ram

On Sat, Feb 18, 2017 at 1:44 PM, Anil <anilk...@gmail.com> wrote:

> Hi ,
>
> I am building an usecase where i have to load the hbase data into In-memory
> database (IMDB). I am scanning the each region and loading data into IMDB.
>
> i am looking at parallel scanner ( https://issues.apache.org/
> jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and HTable#
> getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> deprecated, HBASE-1935 is still open.
>
> I see Connection from ConnectionFactory is HConnectionImplementation by
> default and creates HTable instance.
>
> Do you see any issues in using HTable from Table instance ?
>             for each region {
>                         int i = 0;
>                     List<HRegionLocation> regions =
> hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(), true);
>
>                     for (HRegionLocation region : regions){
>                     startRow = i == 0 ? scans.getStartRow() :
> region.getRegionInfo().getStartKey();
>                     i++;
>                     endRow = i == regions.size()? scans.getStopRow() :
> region.getRegionInfo().getEndKey();
>                      }
>            }
>
> are there any alternatives to achieve parallel scan? Thanks.
>
> Thanks
>

Re: Parallel Scanner

Reply via email to