Thanks for your reponse. 30 million rows is the best case :-)
Couple of questions about doing, [fieldA][time] as my key: Would I have to insert in order? If no, how would hbase know to stop scanning the entire table? How would a query actually look like, if my key was [fieldA time]? As a matter of fact, I can do 100% of my queries. I will leave the 5% out of my project/schema. On Thu, Aug 25, 2011 at 10:13 AM, Ian Varley <[email protected]> wrote: > Rita, > > There's no need to create separate tables here--the table is really just a > "namespace" for keys. A better option would probably be having one table > with "[fieldA][time]" (the two fields concatenated) as your row key. Then, > you can seek directly to the start of your records in constant time, and > then scan forward until you get to the end of the data (linear time in the > size of data you expect to get back). > > The downside of this is that for the 5% of your queries that aren't in this > form, you may have to do a full table scan. (Alternately, you could also > maintain secondary indexes that help you get the data back with less than a > full table scan; that would depend on the nature of the queries). > > In general, a good rule of thumb when designing a schema in HBase is, think > first about how you'd ideally like to access the data. Then structure the > data to match that access pattern. (This is obviously not ideal if you have > lots of different access patterns, but then, that's what relational > databases are for. Most commercial relational DBs wouldn't blink at doing > analytical queries against 30 million rows.) > > Ian > > On Aug 25, 2011, at 9:03 AM, Rita wrote: > > Hello, > > I am trying to solve a time related problem. I can certainly use opentsdb > for this but was wondering if anyone had a clever way to create this type > of > schema. > > I have an inventory table, > > time (unix epoch), fieldA, fieldB, data > > > There are about 30 million of these entries. > > 95% of my queries will look like this: > show me where fieldA=zCORE from range [1314180693 to now] > > for fieldA, there is a possibility of 4000 unique items. > for fieldB, there is a possibility of 2 unique items (bool). > > So, I was thinking of creating 4000*2 tables and place the data like that > so > I can easly scan. > > Any thoughts about this? Will hbase freak out if i have 8000 tables? > > > > > > > -- > --- Get your facts first, then you can distort them as you please.-- > > > -- --- Get your facts first, then you can distort them as you please.--
