Another good point. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com
On Fri, Feb 22, 2013 at 3:45 AM, Asaf Mesika <[email protected]> wrote: > An easier way is to place one byte before the time stamp which is called a > bucket. You can calculate it by using modulu on the time stamp by the > number of buckets. We are now in the process of field testing it. > > > On Tuesday, February 19, 2013, Paul van Hoven wrote: > > > Yeah it worked fine. > > > > But as I understand: If I prefix my row key with something like > > > > md5-hash + timestamp > > > > then the rowkeys are probably evenly distributed but how would I > > perform then a scan restricted to a special time range? > > > > > > 2013/2/19 Mohammad Tariq <[email protected] <javascript:;>>: > > > No. before the timestamp. All the row keys which are identical go to > the > > > same region. This is the default Hbase behavior and is meant to make > the > > > performance better. But sometimes the machine gets overloaded with > reads > > > and writes because we get concentrated on that particular machine. For > > > example timeseries data. So it's better to hash the keys in order to > make > > > them go to all the machines equally. HTH > > > > > > BTW, did that range query work?? > > > > > > Warm Regards, > > > Tariq > > > https://mtariq.jux.com/ > > > cloudfront.blogspot.com > > > > > > > > > On Tue, Feb 19, 2013 at 9:54 PM, Paul van Hoven < > > > [email protected]> wrote: > > > > > >> Hey Tariq, > > >> > > >> thanks for your quick answer. I'm not sure if I got the idea in the > > >> seond part of your answer. You mean if I use a timestamp as a rowkey I > > >> should append a hash like this: > > >> > > >> 1357279200000+MD5HASH > > >> > > >> and then the data would be distributed more equally? > > >> > > >> > > >> 2013/2/19 Mohammad Tariq <[email protected]>: > > >> > Hello Paul, > > >> > > > >> > Try this and see if it works : > > >> > scan.setStartRow(Bytes.toBytes(startDate.getTime() + "")); > > >> > scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + "")); > > >> > > > >> > Also try not to use TS as the rowkey, as it may lead to RS > > hotspotting. > > >> > Just add a hash to your rowkeys so that data is distributed evenly > on > > all > > >> > the RSs. > > >> > > > >> > Warm Regards, > > >> > Tariq > > >> > https://mtariq.jux.com/ > > >> > cloudfront.blogspot.com > > >> > > > >> > > > >> > On Tue, Feb 19, 2013 at 9:41 PM, Paul van Hoven < > > >> > [email protected]> wrote: > > >> > > > >> >> Hi, > > >> >> > > >> >> I'm currently playing with hbase. The design of the rowkey seems to > > be > > >> >> critical. > > >> >> > > >> >> The rowkey for a certain database table of mine is: > > >> >> > > >> >> timestamp+ipaddress > > >> >> > > >> >> It looks something like this when performing a scan on the table in > > the > > >> >> shell: > > >> >> hbase(main):012:0> scan 'ToyDataTable' > > >> >> ROW COLUMN+CELL > > >> >> 1357020000000+192.168.178.9 column=CF:SampleCol, > > >> >> timestamp=1361288601717, value=Entry_1 = 2013-01-01 07:00:00 > > >> >> > > >> >> Since I got several rows for different timestamps I'd like to tell > a > > >> >> scan to just a region of the table for example from 2013-01-07 to > > >> >> 2013-01-09. Previously I only had a timestamp as the rowkey and I > > >> >> could restrict the rowkey like that: > > >> >> > > >> >> SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd > > >> HH:mm:ss"); > > >> >> Date startDate = > formatter.parse("2013-01-07 > > >> >> 07:00:00"); > > >> >> Date endDate = formatter.parse("2013-01-10 > > >> >> 07:00:00"); > > >> >> > > >> >> HTableInterface toyDataTable = > > >> >> pool.getTable("ToyDataTable"); > > >> >> Scan scan = new Scan( Bytes.toBytes( > > >> >> startDate.getTime() ), > > >> >> Bytes.toBytes( endDate.getTime() ) ); > > >> >> > > >> > >
