Hi Tao, also, if you are thinking about time series, you can take a look at TSBD http://opentsdb.net/
JM 2014-04-21 11:56 GMT-04:00 Ted Yu <[email protected]>: > There're several alternatives. > One of which is HBaseWD : > > http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ > > You can also take a look at Phoenix. > > Cheers > > > On Mon, Apr 21, 2014 at 8:04 AM, Tao Xiao <[email protected]> > wrote: > > > I have a big table and rows will be added to this table each day. I wanna > > run a MapReduce job over this table and select rows of several days as > the > > job's input data. How can I achieve this? > > > > If I prefix the rowkey with the date, I can easily select one day's data > as > > the job's input, but this will involve hot spot problem because hundreds > of > > millions of rows will be added to this table each day and the data will > > probably go to a single region server. Secondary index would be good for > > query but not good for a batch processing job. > > > > Are there any other ways? > > > > Are there any other frameworks which can achieve this goal easieruser? > > Shark? Stinger?HSearch? > > >
