You could create a locality group for your column family. However, you would need to recompact for the benefit. And the benefit might not be there if your column family includes a major portion of the data.
But! If you could recompact once, and keeping this data in its own locality group was useful, it might be useful for future computations/compactions. Offline mode provides some speed-up. A colleague of mine just improved a 2-day MR job to ~4.5 hours, using this technique (along with other changes). On Wed, Jan 15, 2014 at 7:29 PM, Dickson, Matt MR < [email protected]> wrote: > *UNOFFICIAL* > Thanks Keith. I've run a simple mr job based on the UniqueColumns > example, but due to the size of the table this is taking a very long time. > Is it possible to pre-filter the data that goes to the MR job based on > family, eg only run the MR job on columns with a specific column family of > 'cityofbirth'? I am currently going through every column in the table and > checking the column family in the mapper ... slow. > > > > ------------------------------ > *From:* Keith Turner [mailto:[email protected]] > *Sent:* Wednesday, 15 January 2014 12:06 > *To:* [email protected] > > *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL] > > > > > On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR < > [email protected]> wrote: > >> *UNOFFICIAL* >> Just for simplicity, this is a one of request for managment so I was >> hoping to just scan via the shell and output to a file. >> >> If I need to do it via a mr job I can do it that way and would be keen to >> hear any suggestions. >> > > You could modify the following example in 1.4 to suit your needs. > > > src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/UniqueColumns.java > > >> >> ------------------------------ >> *From:* David Medinets [mailto:[email protected]] >> *Sent:* Wednesday, 15 January 2014 09:36 >> *To:* accumulo-user >> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL] >> >> Why the restriction to the shell environment? A nice map-reduce job >> would be ideal for this task. >> >> >> On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR < >> [email protected]> wrote: >> >>> *UNOFFICIAL* >>> Hi, >>> >>> I need to extract a list of unique qualifier values on a table from the >>> Accumulo shell. For every column there is a column family that identifies >>> a specific qualifer, eg 'cityofbirth'. I would like to get a unique list >>> of all cities that are a listed in the qualifier against 'cityofbirth' for >>> all rows. >>> >>> eg, If I had a table with >>> >>> Rowid Family Qual >>> 123 cityofbirth LosAngeles >>> 133 cityofbirth Brisbane >>> 222 cityofbirth London >>> 124 cityofbirth London >>> 124 cityofbirth London >>> >>> I want a list that is just; >>> LosAngeles >>> London >>> Brisbane >>> >>> Any suggestions on how to achieve this from the shell would great. >>> >>> Thanks in advance. >>> Matt >>> >>> >>> >>> >> >> >
