I plan on using HTable, and then querying it using Elasticsearch. The problem is that I'm new to both technologies, and it would be great to have some guidance as to how to set up my data models.
The primary table that will be queried against will have potentially hundreds of millions of rows, with each user having a variable amount of data that will be up into the millions. Primarily the data is going to be maybe 30 key/value fields that represent different states, and then 100s of boolean fields. Most of the querying will be ad hoc realtime queries where I need the boolean fields aggregated into percentages when filtered by date, state conditions, and some arbitrary set of conditions on the booleans. The other common type of query would be simply by date and state conditions, with the booleans aggregated into percentages. So my basic question is what to do with the boolean fields, on a given row there is likely to only be 20-50 fields set to true out of 100s. But I don't understand the query language yet, so don't know whether I can just have a column for "booleans" with an array of all true booleans, and query against that. If I do have to create a column for each boolean field, does it make sense that this would be its own column family?
