I've been trying to understand Accumulo more deeply as we use it more. To
supplement the on-line documentation and source, I've been referencing some
blog articles on HBase (Lars George has some ones), HBase docs, and the
BigTable paper.
But I'm curious about some of the deviations of Accumulo from BigTable and
HBase.
The questions I have right now are:
1. Is the format of an RFile close to HFile version 1, HFile version 2, or
at this point is the format really it's own thing? I found good
documentation on the HFile, but I haven't yet found similar documentation
on RFiles. There's the source code, but I haven't dug into that yet.
2. I understand that HBase doesn't do well with too many column families.
However, creating too many column families in HBase isn't likely anyway
because you can't (I believe) create them dynamically. Accumulo allows you
to create column families dynamically. But I wonder if this can come at a
cost. Is there a benefit to using column families less frequently if
possible in Accumulo? Or is the cost of using column families more or less
the same as using column qualifiers.
3. I guess one way families might be different from qualifiers relates to
HBase's recommendation to keep column family names short to avoid needless
storage waste. That should apply to Accumulo as well, right?
4. In supporting dynamic column families, was there a design trade-off with
respect to the original BigTable or current HBase design? What might be a
benefit of doing it the other way?
Thanks,
Sukant