Couple of clarifications:
* Identical rowIDs will colocate data in the same tablet, but not
necessarily the same file. Tablets can have multiple files.
* Locality groups will colocate data within a file, not necessarily in
its own file. RFile's format support multiple "regions" within the file
which correspond to locality groups.
To David's original question, I like to think of the family/qualifier
breakdown in the general case as follows: the family is used for a
coarse grouping of similar data while the qualifier is used as some
name/identifier for the value.
Accumulo's flexibility in how the data model is implemented
(specifically the ability to store any column family in a table via the
default locality group), lets you implement much more advanced "schemas"
in Accumulo, but the above is definitely the "typical" case if you look
to "BigTable" use in general IMO.
Andrew Wells wrote:
On the surface it adds an additional level of specification/grouping.
The potential benefit we have in accumulo is that along with the fact
that identical rowID's are guaranteed to be in the same file. You can
use Locality Groups, to place specific Column Families into the same
file as well. Providing faster scans when looking for a specific column
family.
On Wed, May 27, 2015 at 9:05 AM, David Patterson <[email protected]
<mailto:[email protected]>> wrote:
I've been trying to understand the difference between the two column
name parts -- column family and column qualifier. I don't understand
the value of using the columnFamily for the column name and an
"empty text" (new Text(new byte[0])) field for the column qualifier
vs. a non-unique column name and the distinct column name in the
column qualifier position.
I can sort-of understand the distinction if I have multiple distinct
kinds of data in my data collection. I could use the column family
part to determine how to interpret the rest of the data (what
columns I can expect, etc.). But, that kind of data could also be
handled with multiple databases.
Any guidance would be appreciated.
Thanks.
Davie Patterson
--
*Andrew George Wells*
*Software Engineer*
*[email protected] <mailto:[email protected]>*