I plan on using HTable, and then querying it using Elasticsearch. The problem 
is that I'm new to both technologies, and it would be great to have some 
guidance as to how to set up my data models.


The primary table that will be queried against will have potentially hundreds 
of millions of rows, with each user having a variable amount of data that will 
be up into the millions. Primarily the data is going to be maybe 30 key/value 
fields that represent different states, and then 100s of boolean fields.


Most of the querying will be ad hoc realtime queries where I need the boolean 
fields aggregated into percentages when filtered by date, state conditions, and 
some arbitrary set of conditions on the booleans. The other common type of 
query would be simply by date and state conditions, with the booleans 
aggregated into percentages.


So my basic question is what to do with the boolean fields, on a given row 
there is likely to only be 20-50 fields set to true out of 100s. But I don't 
understand the query language yet, so don't know whether I can just have a 
column for "booleans" with an array of all true booleans, and query against 
that.


If I do have to create a column for each boolean field, does it make sense that 
this would be its own column family?



Reply via email to