Re: Secondary index row explosion due to N key combos to handle ad-hoc queries?

James Taylor Thu, 27 Mar 2014 19:36:07 -0700

On Thu, Mar 27, 2014 at 3:02 PM, Otis Gospodnetic <
[email protected]> wrote:


> Hi,
>
> I wanted to extract the following in a separate thread:
>
> I was going to ask about partitioning as a way to handle (querying
>> against) large volumes of data.  This is related to my Q above about
>> date-based partitioning.  But I'm wondering if one can go further.
>>  Partitioning by date, partitioning by tenant, but then also partitioning
>> by some other columns, which would be different for each type of data being
>> inserted. e.g. for sales data maybe the partitions would be date, tenantID,
>> but then also customerCountry, customerGender, etc.  For performance
>> metrics data maybe it would be date, tenantID, but then also environment
>> (prod vs. dev), or applicationType (e.g. my HBase cluster performance
>> metrics vs. my Tomcat performance metrics), and so on.
>>
>
> > Essentially, a secondary index is declaring a partitioning. The indexed
> columns make up the row > key which in HBase determines the partitioning.
>
> Aha!  Hmmm.  But, as far as I know, how one constructs the key is.... the
> key.  That is, doesn't one typically construct the key based on access
> patterns?
>
> How would that work in the the scenario I described in my other email -
> unknown number of columns and ad-hoc SQL queries?
>
> How do you handle the above without having to create all possible
> combinations of columns (to anticipate any sort of query) and having to
> insert N rows in the index table for each 1 row in the primary table?
>  Don't you have to do that in order to handle any ad-hoc query one may
> choose to run?
>

That's true - you'd want to selectively add indexes, based on anticipated
access patterns. It's similar to the RDBMS world in that regard.


> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>

Re: Secondary index row explosion due to N key combos to handle ad-hoc queries?

Reply via email to