Re: Accumulo as a Column Storage

Mohammad Kargar Fri, 20 Oct 2017 06:57:53 -0700

"dozens, and possibly hundreds of locality groups" per table or per
Accumulo instance?


On Thu, Oct 19, 2017 at 6:05 PM, Christopher <ctubb...@apache.org> wrote:

> There's no expected scaling issue with having each column qualifier in its
> own unique column family, regardless of how large the number of these
> becomes. I've ingested random data like this before for testing, and it
> works fine.
>
> However, there may be an issue trying to create a very large number of
> locality groups. Locality groups are named, and you must explicitly
> configure them to store particular column families. That configuration is
> typically stored in ZooKeeper, and the configuration storage (in ZooKeeper,
> and/or in your conf/accumulo-site.xml file) does not scale as well as the
> data storage (HDFS) does. Where, and how, it will break, is probably
> system-dependent and not directly known (at least, not known by me). I
> would expect dozens, and possibly hundreds, of locality groups to work
> okay, but thousands seems like it's too many (but I haven't tried).
>
>
> On Thu, Oct 19, 2017 at 6:47 PM Mohammad Kargar <mkar...@phemi.com> wrote:
>
>> That makes sense. So this means that there's no limit or concerns on
>> having, potentially,  large number of column families (holing only one
>> column qualifier), right?
>>
>> On Thu, Oct 19, 2017 at 3:06 PM, Josh Elser <els...@apache.org> wrote:
>>
>>> Yup, that's the intended use case. You have the flexibility to determine
>>> what column families make sense to group together. Your only "cost" in
>>> changing your mind is the speed at which you can re-compact your data.
>>>
>>> There is one concern which comes to mind. Though making many locality
>>> groups does increase the speed at which you can read from specific columns,
>>> it decreases the speed at which you can read from _all_ columns. So, you
>>> can do this trick to make Accumulo act more like a columnar database, but
>>> beware that you're going to have an impact if you still have a use-case
>>> where you read more than just one or two columns at a time.
>>>
>>> Does that make sense?
>>>
>>>
>>> On 10/19/17 5:50 PM, Mohammad Kargar wrote:
>>>
>>>> AFAIK in Accumulo we can use "locality groups" to group sets of columns
>>>> together on disk which would make it more like  a column-oriented database.
>>>> Considering that "locality groups" are per column family, I was wondering
>>>> what if we treat column families like column qualifiers (creating one
>>>> column family per each qualifier) and assigning each to a different
>>>> locality group. This way all the data in a given column will be next to
>>>> each other on disk which makes it easier for analytical applications to
>>>> query the data.
>>>>
>>>> Any thoughts?
>>>>
>>>> Thanks,
>>>> Mohammad
>>>>
>>>>
>>

Re: Accumulo as a Column Storage

Reply via email to