You can use FuzzyRowFilter<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.html>to do that.
Have a look at this link<http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/>. You might find it helpful. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Feb 19, 2013 at 11:20 PM, Paul van Hoven < [email protected]> wrote: > Yeah it worked fine. > > But as I understand: If I prefix my row key with something like > > md5-hash + timestamp > > then the rowkeys are probably evenly distributed but how would I > perform then a scan restricted to a special time range? > > > 2013/2/19 Mohammad Tariq <[email protected]>: > > No. before the timestamp. All the row keys which are identical go to the > > same region. This is the default Hbase behavior and is meant to make the > > performance better. But sometimes the machine gets overloaded with reads > > and writes because we get concentrated on that particular machine. For > > example timeseries data. So it's better to hash the keys in order to make > > them go to all the machines equally. HTH > > > > BTW, did that range query work?? > > > > Warm Regards, > > Tariq > > https://mtariq.jux.com/ > > cloudfront.blogspot.com > > > > > > On Tue, Feb 19, 2013 at 9:54 PM, Paul van Hoven < > > [email protected]> wrote: > > > >> Hey Tariq, > >> > >> thanks for your quick answer. I'm not sure if I got the idea in the > >> seond part of your answer. You mean if I use a timestamp as a rowkey I > >> should append a hash like this: > >> > >> 1357279200000+MD5HASH > >> > >> and then the data would be distributed more equally? > >> > >> > >> 2013/2/19 Mohammad Tariq <[email protected]>: > >> > Hello Paul, > >> > > >> > Try this and see if it works : > >> > scan.setStartRow(Bytes.toBytes(startDate.getTime() + "")); > >> > scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + "")); > >> > > >> > Also try not to use TS as the rowkey, as it may lead to RS > hotspotting. > >> > Just add a hash to your rowkeys so that data is distributed evenly on > all > >> > the RSs. > >> > > >> > Warm Regards, > >> > Tariq > >> > https://mtariq.jux.com/ > >> > cloudfront.blogspot.com > >> > > >> > > >> > On Tue, Feb 19, 2013 at 9:41 PM, Paul van Hoven < > >> > [email protected]> wrote: > >> > > >> >> Hi, > >> >> > >> >> I'm currently playing with hbase. The design of the rowkey seems to > be > >> >> critical. > >> >> > >> >> The rowkey for a certain database table of mine is: > >> >> > >> >> timestamp+ipaddress > >> >> > >> >> It looks something like this when performing a scan on the table in > the > >> >> shell: > >> >> hbase(main):012:0> scan 'ToyDataTable' > >> >> ROW COLUMN+CELL > >> >> 1357020000000+192.168.178.9 column=CF:SampleCol, > >> >> timestamp=1361288601717, value=Entry_1 = 2013-01-01 07:00:00 > >> >> > >> >> Since I got several rows for different timestamps I'd like to tell a > >> >> scan to just a region of the table for example from 2013-01-07 to > >> >> 2013-01-09. Previously I only had a timestamp as the rowkey and I > >> >> could restrict the rowkey like that: > >> >> > >> >> SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd > >> HH:mm:ss"); > >> >> Date startDate = formatter.parse("2013-01-07 > >> >> 07:00:00"); > >> >> Date endDate = formatter.parse("2013-01-10 > >> >> 07:00:00"); > >> >> > >> >> HTableInterface toyDataTable = > >> >> pool.getTable("ToyDataTable"); > >> >> Scan scan = new Scan( Bytes.toBytes( > >> >> startDate.getTime() ), > >> >> Bytes.toBytes( endDate.getTime() ) ); > >> >> > >> >> But this no longer works with my new design. > >> >> > >> >> Is there a way to tell the scan object to filter the rows with > respect > >> >> to the timestamp, or do I have to use a filter object? > >> >> > >> >
