In 0.94, there is optimization in StoreFileScanner.requestSeek() where a real seek is only done when seekTimestamp > maxTimestampInFile.
I suggest upgrading to 0.94.4 so that you can utilize this facility. On Fri, Feb 8, 2013 at 11:04 AM, Ted Yu <[email protected]> wrote: > bq. in a cluster of 2 nodes +1 master > I assume you're limited by hardware in the regard. > > bq. job selects these new records > Have you used time-range scan ? > > Cheers > > > On Fri, Feb 8, 2013 at 10:59 AM, <[email protected]> wrote: > >> Hi, >> >> The rationale is that I have a mapred job that adds new records to an >> hbase table, constantly. >> The next mapred job selects these new records, but it must iterate over >> all records and check if it is a candidate for selection. >> Since there are too many old records iterating though them in a cluster >> of 2 nodes +1 master takes about 2 days. So I thought, splitting them into >> two tables must reduce this time, and as soon as I figure out that there is >> no more new record left in one of the new tables I will not run mapred job >> on it. >> >> Currently, we have 7 regions including ROOT and META. >> >> >> Thanks. >> Alex. >> >> >> >> >> >> >> -----Original Message----- >> From: Ted Yu <[email protected]> >> To: user <[email protected]> >> Sent: Fri, Feb 8, 2013 10:40 am >> Subject: Re: split table data into two or more tables >> >> >> May I ask the rationale behind this ? >> Were you aiming for higher write throughput ? >> >> Please also tell us how many regions you have in the current table. >> >> Thanks >> >> BTW please consider upgrading to 0.94.4 >> >> On Fri, Feb 8, 2013 at 10:36 AM, <[email protected]> wrote: >> >> > Hello, >> > >> > I wondered if there is a way of splitting data from one table into two >> or >> > more tables in hbase with iidentical schemas, i.e. if table A has 100M >> > records put 50M into table B, 50M into table C and delete table A. >> > Currently, I use hbase-0.92.1 and hadoop-1.4.0 >> > >> > Thanks. >> > Alex. >> > >> >> >> >
