Yes, Josh is right. Sorry if my wording led to any unnecessary confusion. On Fri, Aug 14, 2015, 12:04 Josh Elser <[email protected]> wrote:
> "Small" might also be misleading. A locality group can have be a good > way to separate a large collection of data from an actually small number > of other records. Discrete yes, but the data itself does not need to be > small to put it into a locality group. > > Christopher wrote: > > I would be surprised if anybody has tested more than a dozen or two > > locality groups or placed more than a dozen or two column families in > > any one locality group. > > > > > > On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <[email protected] > > <mailto:[email protected]>> wrote: > > > > Thanks...We landed up doing just that. Correct having a bunch of > > random data does not fit well with locality groups. I did have > > another question though you mentioned a "small discrete set". What > > would you consider small? Would you recommend for example against > > having several thousand locality groups in a table? > > > > V/r, > > -Daniel > > -----Original Message----- > > From: Christopher [mailto:[email protected] > > <mailto:[email protected]>] > > Sent: Wednesday, August 12, 2015 3:08 PM > > To: Accumulo User List <[email protected] > > <mailto:[email protected]>> > > Subject: Re: Fetch Taking Longer Than Expected > > > > The schema shown above doesn't quite look like it's well-suited for > > locality groups, though. The CF field looks like it's a composition > of > > an attribute name and that attribute's value. To take advantage of > > locality groups with that schema, you'd have to have a locality group > > for every attribute name/value combination, which would probably not > > work well. > > > > If you want to take advantage of locality groups, you'll probably > want > > to make your CFs a small, discrete set (like just attribute names). > > So, if you push the attribute value into the CQ, you could at the > very > > least limit your search to the locality containing the particular > > attribute name you are searching for. > > > > If you really want efficient searches based on attribute name/value > > combinations, you're going to want to put this up the row (at the > > beginning of your row), so your data is ordered (indexed) by that. > You > > could do this in a secondary index (which could be in a different > > table, a different segment of this table, or in a separate locality > > group in this table). > > > > -- > > Christopher L Tubbs II > > http://gravatar.com/ctubbsii > > > > > > On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <[email protected] > > <mailto:[email protected]>> wrote: > > > Yup, that would be expected. > > > > > > Remember that doing `scan -c ...` is an unbounded search over > > your entire > > > table. So, it takes approximately 3 minutes to read your > > GUIDIndexTable. > > > Because you have a single locality group, all of the columns in > > your table > > > are grouped together. > > > > > > One exercise that may be interesting for yourself is to create a > > locality > > > group that has your specific column family in it, compact your > > > GUIDIndexTable, and rerun your `scan -c` query. The speed should > > be similar > > > to your exact scan. Removing the locality group and re-compacting > > the table > > > should return the query time back to the slow 3 minutes. > > > > > > Does that make sense? > > > > > > Daniel Ruiz wrote: > > >> > > >> Hi All, > > >> > > >> I am having an issue where column fetches are taking over a > > minute on > > >> 1.6.3. I don’t believe this should be case and my experience in > > the past > > >> supports the idea that fetches should be very fast. > > >> > > >> For example we doing a scan on the table gives results instantly > but > > >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 > > seconds > > >> (plus or minus 1 second). > > >> > > >> Figure 1.1. Generated Test Data on GUIDIndexTable > > >> > > >> Here is the table config > > >> > > >> > > >> > > > > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > > >> > > >> SCOPE | NAME | VALUE > > >> > > >> > > >> > > > > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > > >> > > >> default | table.balancer > > .............................................. > > >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer > > >> > > >> default | table.bloom.enabled > > ......................................... > > >> | false > > >> > > >> default | table.bloom.error.rate > > ...................................... > > >> | 0.5% > > >> > > >> default | table.bloom.hash.type > > ....................................... > > >> | murmur > > >> > > >> default | table.bloom.key.functor > > ..................................... > > >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor > > >> > > >> default | table.bloom.load.threshold > > .................................. | > > >> 1 > > >> > > >> default | table.bloom.size > > ............................................ > > >> | 1048576 > > >> > > >> default | table.cache.block.enable > > .................................... > > >> | false > > >> > > >> default | table.cache.index.enable > > .................................... > > >> | true > > >> > > >> default | table.classpath.context > > ..................................... | > > >> > > >> default | table.compaction.major.everything.idle > > ...................... | > > >> 1h > > >> > > >> default | table.compaction.major.ratio > > ................................ | > > >> 3 > > >> > > >> default | table.compaction.minor.idle > > ................................. | > > >> 5m > > >> > > >> default | table.compaction.minor.logs.threshold > > ....................... | > > >> 3 > > >> > > >> table | table.constraint.1 > > .......................................... | > > >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint > > >> > > >> default | table.failures.ignore > > ....................................... > > >> | false > > >> > > >> default | table.file.blocksize > > ........................................ | > > >> 0B > > >> > > >> default | table.file.compress.blocksize > > ............................... > > >> | 100K > > >> > > >> default | table.file.compress.blocksize.index > > ......................... > > >> | 128K > > >> > > >> default | table.file.compress.type > > .................................... | > > >> gz > > >> > > >> default | table.file.max > > .............................................. | > > >> 15 > > >> > > >> default | table.file.replication > > ...................................... | > > >> 0 > > >> > > >> default | table.file.type > > ............................................. | > > >> rf > > >> > > >> default | table.formatter > > ............................................. > > >> | org.apache.accumulo.core.util.format.DefaultFormatter > > >> > > >> default | table.groups.enabled > > ........................................ | > > >> > > >> default | table.interepreter > > .......................................... > > >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter > > >> > > >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable > > .......... | > > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > > >> > > >> table | > > table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > > >> 2592000000 > > >> > > >> table | table.iterator.majc.vers > > .................................... | > > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > > >> > > >> table | table.iterator.majc.vers.opt.maxVersions > > .................... | 1 > > >> > > >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable > > .......... | > > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > > >> > > >> table | > > table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > > >> 2592000000 > > >> > > >> table | table.iterator.minc.vers > > .................................... | > > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > > >> > > >> table | table.iterator.minc.vers.opt.maxVersions > > .................... | 1 > > >> > > >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable > > .......... | > > >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter > > >> > > >> table | > > table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. | > > >> 2592000000 > > >> > > >> ---------------------------------------------------------- hit > > any key > > >> to continue or 'q' to quit > > >> ---------------------------------------------------------- > > >> > > >> table | table.iterator.scan.vers > > .................................... | > > >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator > > >> > > >> table | table.iterator.scan.vers.opt.maxVersions > > .................... | 1 > > >> > > >> default | table.majc.compaction.strategy > > .............................. > > >> | > org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy > > >> > > >> default | table.scan.max.memory > > ....................................... > > >> | 512K > > >> > > >> table | @override > > ................................................ | 1M > > >> > > >> default | table.security.scan.visibility.default > > ...................... | > > >> > > >> default | table.split.threshold > > ....................................... | > > >> 1G > > >> > > >> default | table.walog.enabled > > ......................................... > > >> | true > > >> > > >> > > >> > > > > -----------+---------------------------------------------------------------+--------------------------------------------------------------------------------- > > >> > > >> More Table Info: > > >> > > >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f> > > >> > > >> > > >> > > >> ONLINE > > >> > > >> > > >> > > >> 2 > > >> > > >> > > >> > > >> 0 > > >> > > >> > > >> > > >> 82.56M > > >> > > >> > > >> > > >> 810.00K > > >> > > >> > > >> > > >> 159 > > >> > > >> Please let me know if I am doing something wrong to if there is > more > > >> information you need. > > >> > > >> V/r, > > >> > > >> -Daniel > > >> > > > > > >
