I would be surprised if anybody has tested more than a dozen or two locality groups or placed more than a dozen or two column families in any one locality group.
On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <[email protected]> wrote: > Thanks...We landed up doing just that. Correct having a bunch of random > data does not fit well with locality groups. I did have another question > though you mentioned a "small discrete set". What would you consider > small? Would you recommend for example against having several thousand > locality groups in a table? > > V/r, > -Daniel > -----Original Message----- > From: Christopher [mailto:[email protected]] > Sent: Wednesday, August 12, 2015 3:08 PM > To: Accumulo User List <[email protected]> > Subject: Re: Fetch Taking Longer Than Expected > > The schema shown above doesn't quite look like it's well-suited for > locality groups, though. The CF field looks like it's a composition of > an attribute name and that attribute's value. To take advantage of > locality groups with that schema, you'd have to have a locality group > for every attribute name/value combination, which would probably not > work well. > > If you want to take advantage of locality groups, you'll probably want > to make your CFs a small, discrete set (like just attribute names). > So, if you push the attribute value into the CQ, you could at the very > least limit your search to the locality containing the particular > attribute name you are searching for. > > If you really want efficient searches based on attribute name/value > combinations, you're going to want to put this up the row (at the > beginning of your row), so your data is ordered (indexed) by that. You > could do this in a secondary index (which could be in a different > table, a different segment of this table, or in a separate locality > group in this table). > > -- > Christopher L Tubbs II > http://gravatar.com/ctubbsii > > > On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <[email protected]> wrote: > > Yup, that would be expected. > > > > Remember that doing `scan -c ...` is an unbounded search over your entire > > table. So, it takes approximately 3 minutes to read your GUIDIndexTable. > > Because you have a single locality group, all of the columns in your > table > > are grouped together. > > > > One exercise that may be interesting for yourself is to create a locality > > group that has your specific column family in it, compact your > > GUIDIndexTable, and rerun your `scan -c` query. The speed should be > similar > > to your exact scan. Removing the locality group and re-compacting the > table > > should return the query time back to the slow 3 minutes. > > > > Does that make sense? > > > > Daniel Ruiz wrote: > >> > >> Hi All, > >> > >> I am having an issue where column fetches are taking over a minute on > >> 1.6.3. I don’t believe this should be case and my experience in the past > >> supports the idea that fetches should be very fast. > >> > >> For example we doing a scan on the table gives results instantly but > >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds > >> (plus or minus 1 second). > >> > >> Figure 1.1. Generated Test Data on GUIDIndexTable > >> > >> Here is the table config > >> > >> > >> > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > >> > >> SCOPE | NAME | VALUE > >> > >> > >> > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > >> > >> default | table.balancer .............................................. > >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer > >> > >> default | table.bloom.enabled ......................................... > >> | false > >> > >> default | table.bloom.error.rate ...................................... > >> | 0.5% > >> > >> default | table.bloom.hash.type ....................................... > >> | murmur > >> > >> default | table.bloom.key.functor ..................................... > >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor > >> > >> default | table.bloom.load.threshold .................................. > | > >> 1 > >> > >> default | table.bloom.size ............................................ > >> | 1048576 > >> > >> default | table.cache.block.enable .................................... > >> | false > >> > >> default | table.cache.index.enable .................................... > >> | true > >> > >> default | table.classpath.context ..................................... > | > >> > >> default | table.compaction.major.everything.idle ...................... > | > >> 1h > >> > >> default | table.compaction.major.ratio ................................ > | > >> 3 > >> > >> default | table.compaction.minor.idle ................................. > | > >> 5m > >> > >> default | table.compaction.minor.logs.threshold ....................... > | > >> 3 > >> > >> table | table.constraint.1 .......................................... | > >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint > >> > >> default | table.failures.ignore ....................................... > >> | false > >> > >> default | table.file.blocksize ........................................ > | > >> 0B > >> > >> default | table.file.compress.blocksize ............................... > >> | 100K > >> > >> default | table.file.compress.blocksize.index ......................... > >> | 128K > >> > >> default | table.file.compress.type .................................... > | > >> gz > >> > >> default | table.file.max .............................................. > | > >> 15 > >> > >> default | table.file.replication ...................................... > | > >> 0 > >> > >> default | table.file.type ............................................. > | > >> rf > >> > >> default | table.formatter ............................................. > >> | org.apache.accumulo.core.util.format.DefaultFormatter > >> > >> default | table.groups.enabled ........................................ > | > >> > >> default | table.interepreter .......................................... > >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter > >> > >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... | > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > >> > >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > >> 2592000000 > >> > >> table | table.iterator.majc.vers .................................... | > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > >> > >> table | table.iterator.majc.vers.opt.maxVersions .................... | > 1 > >> > >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... | > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > >> > >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > >> 2592000000 > >> > >> table | table.iterator.minc.vers .................................... | > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > >> > >> table | table.iterator.minc.vers.opt.maxVersions .................... | > 1 > >> > >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... | > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > >> > >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > >> 2592000000 > >> > >> ---------------------------------------------------------- hit any key > >> to continue or 'q' to quit > >> ---------------------------------------------------------- > >> > >> table | table.iterator.scan.vers .................................... | > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > >> > >> table | table.iterator.scan.vers.opt.maxVersions .................... | > 1 > >> > >> default | table.majc.compaction.strategy .............................. > >> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy > >> > >> default | table.scan.max.memory ....................................... > >> | 512K > >> > >> table | @override ................................................ | 1M > >> > >> default | table.security.scan.visibility.default ...................... > | > >> > >> default | table.split.threshold ....................................... > | > >> 1G > >> > >> default | table.walog.enabled ......................................... > >> | true > >> > >> > >> > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > >> > >> More Table Info: > >> > >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f> > >> > >> > >> > >> ONLINE > >> > >> > >> > >> 2 > >> > >> > >> > >> 0 > >> > >> > >> > >> 82.56M > >> > >> > >> > >> 810.00K > >> > >> > >> > >> 159 > >> > >> Please let me know if I am doing something wrong to if there is more > >> information you need. > >> > >> V/r, > >> > >> -Daniel > >> > > > >
