Yup, that's the intended use case. You have the flexibility to determine what column families make sense to group together. Your only "cost" in changing your mind is the speed at which you can re-compact your data.

There is one concern which comes to mind. Though making many locality groups does increase the speed at which you can read from specific columns, it decreases the speed at which you can read from _all_ columns. So, you can do this trick to make Accumulo act more like a columnar database, but beware that you're going to have an impact if you still have a use-case where you read more than just one or two columns at a time.

Does that make sense?

On 10/19/17 5:50 PM, Mohammad Kargar wrote:
AFAIK in Accumulo we can use "locality groups" to group sets of columns together on disk which would make it more like  a column-oriented database. Considering that "locality groups" are per column family, I was wondering what if we treat column families like column qualifiers (creating one column family per each qualifier) and assigning each to a different locality group. This way all the data in a given column will be next to each other on disk which makes it easier for analytical applications to query the data.

Any thoughts?

Thanks,
Mohammad

Reply via email to