On Thu, Mar 27, 2014 at 3:02 PM, Otis Gospodnetic < [email protected]> wrote:
> Hi, > > I wanted to extract the following in a separate thread: > > I was going to ask about partitioning as a way to handle (querying >> against) large volumes of data. This is related to my Q above about >> date-based partitioning. But I'm wondering if one can go further. >> Partitioning by date, partitioning by tenant, but then also partitioning >> by some other columns, which would be different for each type of data being >> inserted. e.g. for sales data maybe the partitions would be date, tenantID, >> but then also customerCountry, customerGender, etc. For performance >> metrics data maybe it would be date, tenantID, but then also environment >> (prod vs. dev), or applicationType (e.g. my HBase cluster performance >> metrics vs. my Tomcat performance metrics), and so on. >> > > > Essentially, a secondary index is declaring a partitioning. The indexed > columns make up the row > key which in HBase determines the partitioning. > > Aha! Hmmm. But, as far as I know, how one constructs the key is.... the > key. That is, doesn't one typically construct the key based on access > patterns? > > How would that work in the the scenario I described in my other email - > unknown number of columns and ad-hoc SQL queries? > > How do you handle the above without having to create all possible > combinations of columns (to anticipate any sort of query) and having to > insert N rows in the index table for each 1 row in the primary table? > Don't you have to do that in order to handle any ad-hoc query one may > choose to run? > That's true - you'd want to selectively add indexes, based on anticipated access patterns. It's similar to the RDBMS world in that regard. > Thanks, > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ >
