Depending on the amount of data, you could do a scan -c for the colfams you want, awk out the colqual and dump that to a file. Afterwards, you could sort and uniq.

The MR example would be pretty simple too -- same idea as above. Very similar to your run-of-the-mill wordcount. AccumuloInputFormat will let you just fetch the colfams you're interested in.

map:
foreach Key in colfams:
   emit colqual

reduce:
emit one instance of each colqual.

On 1/14/14, 6:06 PM, Dickson, Matt MR wrote:
*UNOFFICIAL*

Just for simplicity, this is a one of request for managment so I was
hoping to just scan via the shell and output to a file.
If I need to do it via a mr job I can do it that way and would be keen
to hear any suggestions.

------------------------------------------------------------------------
*From:* David Medinets [mailto:[email protected]]
*Sent:* Wednesday, 15 January 2014 09:36
*To:* accumulo-user
*Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL]

Why the restriction to the shell environment? A nice map-reduce job
would be ideal for this task.


On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR
<[email protected] <mailto:[email protected]>> wrote:

    __

    *UNOFFICIAL*

    Hi,
    I need to extract a list of unique qualifier values on a table from
    the Accumulo shell.  For every column there is a column family that
    identifies a specific qualifer, eg 'cityofbirth'.  I would like to
    get a unique list of all cities that are a listed in the qualifier
    against 'cityofbirth' for all rows.
    eg, If I had a table with
    Rowid Family Qual
    123                   cityofbirth LosAngeles
    133                   cityofbirth         Brisbane
    222 cityofbirth         London
    124                   cityofbirth London
    124                   cityofbirth London
    I want a list that is just;
    LosAngeles
    London
    Brisbane
    Any suggestions on how to achieve this from the shell would great.
    Thanks in advance.
    Matt


Reply via email to