I am already using STRATROW and ENDROW in Hbase from newAPIHadoopRDD. Can I do similar with RDD?.lets say use Filter in RDD to get only those records which matches the same Criteria mentioned in STARTROW and Stop ROW.will it much faster than Hbase querying?
On 6 April 2015 at 03:15, Ted Yu <yuzhih...@gmail.com> wrote: > bq. HBase scan operation like scan StartROW and EndROW in RDD? > > I don't think RDD supports concept of start row and end row. > > In HBase, please take a look at the following methods of Scan: > > public Scan setStartRow(byte [] startRow) { > > public Scan setStopRow(byte [] stopRow) { > > Cheers > > On Sun, Apr 5, 2015 at 2:35 PM, Jeetendra Gangele <gangele...@gmail.com> > wrote: > >> I have 2GB hbase table where this data is store in the form on key and >> value(only one column per key) and key also unique >> >> What I thinking to load the complete hbase table into RDD and then do the >> operation like scan and all in RDD rather than Hbase. >> Can I do HBase scan operation like scan StartROW and EndROW in RDD? >> >> Firrst steps in my job will be to load the complete data into RDD. >> >> >> >> On 6 April 2015 at 02:45, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> You do need to apply the patch since 0.96 doesn't have this feature. >>> >>> For JavaSparkContext.newAPIHadoopRDD, can you check region server >>> metrics to see where the overhead might be (compared to creating scan >>> and firing query using native client) ? >>> >>> Thanks >>> >>> On Sun, Apr 5, 2015 at 2:00 PM, Jeetendra Gangele <gangele...@gmail.com> >>> wrote: >>> >>>> Thats true I checked the MultiRowRangeFilter and its serving my need. >>>> do I need to apply the patch? for this since I am using 0.96 hbase >>>> version. >>>> >>>> Also I have checked when I used JavaSparkContext.newAPIHadoopRDD its >>>> slow compare to creating scan and firing query, is there any reason? >>>> >>>> >>>> >>>> >>>> On 6 April 2015 at 01:57, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Looks like MultiRowRangeFilter would serve your need. >>>>> >>>>> See HBASE-11144. >>>>> >>>>> HBase 1.1 would be released in May. >>>>> >>>>> You can also backport it to the HBase release you're using. >>>>> >>>>> On Sat, Apr 4, 2015 at 8:45 AM, Jeetendra Gangele < >>>>> gangele...@gmail.com> wrote: >>>>> >>>>>> Here is my conf object passing first parameter of API. >>>>>> but here I want to pass multiple scan means i have 4 criteria for >>>>>> STRAT ROW and STOROW in same table. >>>>>> by using below code i can get result for one STARTROW and ENDROW. >>>>>> >>>>>> Configuration conf = DBConfiguration.getConf(); >>>>>> >>>>>> // int scannerTimeout = (int) conf.getLong( >>>>>> // HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, -1); >>>>>> // System.out.println("lease timeout on server is"+scannerTimeout); >>>>>> >>>>>> int scannerTimeout = (int) conf.getLong( >>>>>> "hbase.client.scanner.timeout.period", -1); >>>>>> // conf.setLong("hbase.client.scanner.timeout.period", 60000L); >>>>>> conf.set(TableInputFormat.INPUT_TABLE, TABLE_NAME); >>>>>> Scan scan = new Scan(); >>>>>> scan.addFamily(FAMILY); >>>>>> FilterList filterList = new FilterList(Operator.MUST_PASS_ALL); >>>>>> filterList.addFilter(new KeyOnlyFilter()); >>>>>> filterList.addFilter(new FirstKeyOnlyFilter()); >>>>>> scan.setFilter(filterList); >>>>>> >>>>>> scan.setCacheBlocks(false); >>>>>> scan.setCaching(10); >>>>>> scan.setBatch(1000); >>>>>> scan.setSmall(false); >>>>>> conf.set(TableInputFormat.SCAN, >>>>>> DatabaseUtils.convertScanToString(scan)); >>>>>> return conf; >>>>>> >>>>>> On 4 April 2015 at 20:54, Jeetendra Gangele <gangele...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> Can we get the result of the multiple scan >>>>>>> from JavaSparkContext.newAPIHadoopRDD from Hbase. >>>>>>> >>>>>>> This method first parameter take configuration object where I have >>>>>>> added filter. but how Can I query multiple scan from same table calling >>>>>>> this API only once? >>>>>>> >>>>>>> regards >>>>>>> jeetendra >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >> >> >> >