Re: newAPIHadoopRDD Mutiple scan result return from Hbase

Jeetendra Gangele Sun, 05 Apr 2015 14:59:04 -0700

 I am  already using STRATROW and ENDROW in Hbase from newAPIHadoopRDD.
Can I do similar with RDD?.lets say  use Filter in RDD  to get only those
records which matches the same Criteria mentioned in STARTROW and Stop
ROW.will it much faster than Hbase querying?


On 6 April 2015 at 03:15, Ted Yu <[email protected]> wrote:

> bq. HBase scan operation like scan StartROW and EndROW in RDD?
>
> I don't think RDD supports concept of start row and end row.
>
> In HBase, please take a look at the following methods of Scan:
>
>   public Scan setStartRow(byte [] startRow) {
>
>   public Scan setStopRow(byte [] stopRow) {
>
> Cheers
>
> On Sun, Apr 5, 2015 at 2:35 PM, Jeetendra Gangele <[email protected]>
> wrote:
>
>> I have  2GB hbase table where this data is store in the form on key and
>> value(only one column per key) and key also unique
>>
>> What I thinking to load the complete hbase table into RDD and then do the
>> operation like scan and all in RDD rather than Hbase.
>> Can I do  HBase scan operation like scan StartROW and EndROW in RDD?
>>
>> Firrst steps in my job will be to load the complete data into RDD.
>>
>>
>>
>> On 6 April 2015 at 02:45, Ted Yu <[email protected]> wrote:
>>
>>> You do need to apply the patch since 0.96 doesn't have this feature.
>>>
>>> For JavaSparkContext.newAPIHadoopRDD, can you check region server
>>> metrics to see where the overhead might be (compared to creating scan
>>> and firing query using native client) ?
>>>
>>> Thanks
>>>
>>> On Sun, Apr 5, 2015 at 2:00 PM, Jeetendra Gangele <[email protected]>
>>> wrote:
>>>
>>>> Thats true I checked the MultiRowRangeFilter  and its serving my need.
>>>> do I need to apply the patch? for this since I am using 0.96 hbase
>>>> version.
>>>>
>>>> Also I have checked when I used JavaSparkContext.newAPIHadoopRDD its
>>>> slow compare to creating scan and firing query, is there any reason?
>>>>
>>>>
>>>>
>>>>
>>>> On 6 April 2015 at 01:57, Ted Yu <[email protected]> wrote:
>>>>
>>>>> Looks like MultiRowRangeFilter would serve your need.
>>>>>
>>>>> See HBASE-11144.
>>>>>
>>>>> HBase 1.1 would be released in May.
>>>>>
>>>>> You can also backport it to the HBase release you're using.
>>>>>
>>>>> On Sat, Apr 4, 2015 at 8:45 AM, Jeetendra Gangele <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Here is my conf object passing first parameter of API.
>>>>>> but here I want to pass multiple scan means i have 4 criteria for
>>>>>> STRAT ROW and STOROW in same table.
>>>>>> by using below code i can get result for one STARTROW and ENDROW.
>>>>>>
>>>>>> Configuration conf = DBConfiguration.getConf();
>>>>>>
>>>>>> // int scannerTimeout = (int) conf.getLong(
>>>>>> //      HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, -1);
>>>>>> // System.out.println("lease timeout on server is"+scannerTimeout);
>>>>>>
>>>>>> int scannerTimeout = (int) conf.getLong(
>>>>>>     "hbase.client.scanner.timeout.period", -1);
>>>>>> // conf.setLong("hbase.client.scanner.timeout.period", 60000L);
>>>>>> conf.set(TableInputFormat.INPUT_TABLE, TABLE_NAME);
>>>>>> Scan scan = new Scan();
>>>>>> scan.addFamily(FAMILY);
>>>>>> FilterList filterList = new FilterList(Operator.MUST_PASS_ALL);
>>>>>> filterList.addFilter(new KeyOnlyFilter());
>>>>>>  filterList.addFilter(new FirstKeyOnlyFilter());
>>>>>> scan.setFilter(filterList);
>>>>>>
>>>>>> scan.setCacheBlocks(false);
>>>>>> scan.setCaching(10);
>>>>>>  scan.setBatch(1000);
>>>>>> scan.setSmall(false);
>>>>>>  conf.set(TableInputFormat.SCAN,
>>>>>> DatabaseUtils.convertScanToString(scan));
>>>>>> return conf;
>>>>>>
>>>>>> On 4 April 2015 at 20:54, Jeetendra Gangele <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Can we get the result of the multiple scan
>>>>>>> from JavaSparkContext.newAPIHadoopRDD from Hbase.
>>>>>>>
>>>>>>> This method first parameter take configuration object where I have
>>>>>>> added filter. but how Can I query multiple scan from same table calling
>>>>>>> this API only once?
>>>>>>>
>>>>>>> regards
>>>>>>> jeetendra
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>
>

Re: newAPIHadoopRDD Mutiple scan result return from Hbase

Reply via email to