I have 2GB hbase table where this data is store in the form on key and value(only one column per key) and key also unique
What I thinking to load the complete hbase table into RDD and then do the operation like scan and all in RDD rather than Hbase. Can I do HBase scan operation like scan StartROW and EndROW in RDD? Firrst steps in my job will be to load the complete data into RDD. On 6 April 2015 at 02:45, Ted Yu <yuzhih...@gmail.com> wrote: > You do need to apply the patch since 0.96 doesn't have this feature. > > For JavaSparkContext.newAPIHadoopRDD, can you check region server metrics > to see where the overhead might be (compared to creating scan and firing > query using native client) ? > > Thanks > > On Sun, Apr 5, 2015 at 2:00 PM, Jeetendra Gangele <gangele...@gmail.com> > wrote: > >> Thats true I checked the MultiRowRangeFilter and its serving my need. >> do I need to apply the patch? for this since I am using 0.96 hbase >> version. >> >> Also I have checked when I used JavaSparkContext.newAPIHadoopRDD its >> slow compare to creating scan and firing query, is there any reason? >> >> >> >> >> On 6 April 2015 at 01:57, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Looks like MultiRowRangeFilter would serve your need. >>> >>> See HBASE-11144. >>> >>> HBase 1.1 would be released in May. >>> >>> You can also backport it to the HBase release you're using. >>> >>> On Sat, Apr 4, 2015 at 8:45 AM, Jeetendra Gangele <gangele...@gmail.com> >>> wrote: >>> >>>> Here is my conf object passing first parameter of API. >>>> but here I want to pass multiple scan means i have 4 criteria for STRAT >>>> ROW and STOROW in same table. >>>> by using below code i can get result for one STARTROW and ENDROW. >>>> >>>> Configuration conf = DBConfiguration.getConf(); >>>> >>>> // int scannerTimeout = (int) conf.getLong( >>>> // HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, -1); >>>> // System.out.println("lease timeout on server is"+scannerTimeout); >>>> >>>> int scannerTimeout = (int) conf.getLong( >>>> "hbase.client.scanner.timeout.period", -1); >>>> // conf.setLong("hbase.client.scanner.timeout.period", 60000L); >>>> conf.set(TableInputFormat.INPUT_TABLE, TABLE_NAME); >>>> Scan scan = new Scan(); >>>> scan.addFamily(FAMILY); >>>> FilterList filterList = new FilterList(Operator.MUST_PASS_ALL); >>>> filterList.addFilter(new KeyOnlyFilter()); >>>> filterList.addFilter(new FirstKeyOnlyFilter()); >>>> scan.setFilter(filterList); >>>> >>>> scan.setCacheBlocks(false); >>>> scan.setCaching(10); >>>> scan.setBatch(1000); >>>> scan.setSmall(false); >>>> conf.set(TableInputFormat.SCAN, >>>> DatabaseUtils.convertScanToString(scan)); >>>> return conf; >>>> >>>> On 4 April 2015 at 20:54, Jeetendra Gangele <gangele...@gmail.com> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> Can we get the result of the multiple scan >>>>> from JavaSparkContext.newAPIHadoopRDD from Hbase. >>>>> >>>>> This method first parameter take configuration object where I have >>>>> added filter. but how Can I query multiple scan from same table calling >>>>> this API only once? >>>>> >>>>> regards >>>>> jeetendra >>>>> >>>> >>>> >>>> >>>> >>> >> >