Re: List of unique qualifiers [SEC=UNOFFICIAL]

Eric Newton Wed, 15 Jan 2014 17:27:48 -0800

You could create a locality group for your column family.  However, you
would need to recompact for the benefit.  And the benefit might not be
there if your column family includes a major portion of the data.


But!  If you could recompact once, and keeping this data in its own
locality group was useful, it might be useful for future
computations/compactions.

Offline mode provides some speed-up.  A colleague of mine just improved a
2-day MR job to ~4.5 hours, using this technique (along with other changes).



On Wed, Jan 15, 2014 at 7:29 PM, Dickson, Matt MR <
[email protected]> wrote:

>  *UNOFFICIAL*
> Thanks Keith.  I've run a simple mr job based on the UniqueColumns
> example, but due to the size of the table this is taking a very long time.
> Is it possible to pre-filter the data that goes to the MR job based on
> family, eg only run the MR job on columns with a specific column family of
> 'cityofbirth'?  I am currently going through every column in the table and
> checking the column family in the mapper ... slow.
>
>
>
>  ------------------------------
> *From:* Keith Turner [mailto:[email protected]]
> *Sent:* Wednesday, 15 January 2014 12:06
> *To:* [email protected]
>
> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL]
>
>
>
>
> On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR <
> [email protected]> wrote:
>
>>  *UNOFFICIAL*
>> Just for simplicity, this is a one of request for managment so I was
>> hoping to just scan via the shell and output to a file.
>>
>> If I need to do it via a mr job I can do it that way and would be keen to
>> hear any suggestions.
>>
>
> You could modify the following example in 1.4 to suit your needs.
>
>
> src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/UniqueColumns.java
>
>
>>
>>  ------------------------------
>> *From:* David Medinets [mailto:[email protected]]
>> *Sent:* Wednesday, 15 January 2014 09:36
>> *To:* accumulo-user
>> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL]
>>
>>   Why the restriction to the shell environment? A nice map-reduce job
>> would be ideal for this task.
>>
>>
>> On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR <
>> [email protected]> wrote:
>>
>>>  *UNOFFICIAL*
>>> Hi,
>>>
>>> I need to extract a list of unique qualifier values on a table from the
>>> Accumulo shell.  For every column there is a column family that identifies
>>> a specific qualifer, eg 'cityofbirth'.  I would like to get a unique list
>>> of all cities that are a listed in the qualifier against 'cityofbirth' for
>>> all rows.
>>>
>>> eg, If I had a table with
>>>
>>> Rowid                Family            Qual
>>> 123                   cityofbirth         LosAngeles
>>> 133                   cityofbirth         Brisbane
>>> 222                   cityofbirth         London
>>> 124                   cityofbirth         London
>>> 124                   cityofbirth         London
>>>
>>> I want a list that is just;
>>> LosAngeles
>>> London
>>> Brisbane
>>>
>>> Any suggestions on how to achieve this from the shell would great.
>>>
>>> Thanks in advance.
>>> Matt
>>>
>>>
>>>
>>>
>>
>>
>

Re: List of unique qualifiers [SEC=UNOFFICIAL]

Reply via email to