"Small" might also be misleading. A locality group can have be a good
way to separate a large collection of data from an actually small number
of other records. Discrete yes, but the data itself does not need to be
small to put it into a locality group.
Christopher wrote:
I would be surprised if anybody has tested more than a dozen or two
locality groups or placed more than a dozen or two column families in
any one locality group.
On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <[email protected]
<mailto:[email protected]>> wrote:
Thanks...We landed up doing just that. Correct having a bunch of
random data does not fit well with locality groups. I did have
another question though you mentioned a "small discrete set". What
would you consider small? Would you recommend for example against
having several thousand locality groups in a table?
V/r,
-Daniel
-----Original Message-----
From: Christopher [mailto:[email protected]
<mailto:[email protected]>]
Sent: Wednesday, August 12, 2015 3:08 PM
To: Accumulo User List <[email protected]
<mailto:[email protected]>>
Subject: Re: Fetch Taking Longer Than Expected
The schema shown above doesn't quite look like it's well-suited for
locality groups, though. The CF field looks like it's a composition of
an attribute name and that attribute's value. To take advantage of
locality groups with that schema, you'd have to have a locality group
for every attribute name/value combination, which would probably not
work well.
If you want to take advantage of locality groups, you'll probably want
to make your CFs a small, discrete set (like just attribute names).
So, if you push the attribute value into the CQ, you could at the very
least limit your search to the locality containing the particular
attribute name you are searching for.
If you really want efficient searches based on attribute name/value
combinations, you're going to want to put this up the row (at the
beginning of your row), so your data is ordered (indexed) by that. You
could do this in a secondary index (which could be in a different
table, a different segment of this table, or in a separate locality
group in this table).
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:
> Yup, that would be expected.
>
> Remember that doing `scan -c ...` is an unbounded search over
your entire
> table. So, it takes approximately 3 minutes to read your
GUIDIndexTable.
> Because you have a single locality group, all of the columns in
your table
> are grouped together.
>
> One exercise that may be interesting for yourself is to create a
locality
> group that has your specific column family in it, compact your
> GUIDIndexTable, and rerun your `scan -c` query. The speed should
be similar
> to your exact scan. Removing the locality group and re-compacting
the table
> should return the query time back to the slow 3 minutes.
>
> Does that make sense?
>
> Daniel Ruiz wrote:
>>
>> Hi All,
>>
>> I am having an issue where column fetches are taking over a
minute on
>> 1.6.3. I don’t believe this should be case and my experience in
the past
>> supports the idea that fetches should be very fast.
>>
>> For example we doing a scan on the table gives results instantly but
>> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44
seconds
>> (plus or minus 1 second).
>>
>> Figure 1.1. Generated Test Data on GUIDIndexTable
>>
>> Here is the table config
>>
>>
>>
-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> SCOPE | NAME | VALUE
>>
>>
>>
-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> default | table.balancer
..............................................
>> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>>
>> default | table.bloom.enabled
.........................................
>> | false
>>
>> default | table.bloom.error.rate
......................................
>> | 0.5%
>>
>> default | table.bloom.hash.type
.......................................
>> | murmur
>>
>> default | table.bloom.key.functor
.....................................
>> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>>
>> default | table.bloom.load.threshold
.................................. |
>> 1
>>
>> default | table.bloom.size
............................................
>> | 1048576
>>
>> default | table.cache.block.enable
....................................
>> | false
>>
>> default | table.cache.index.enable
....................................
>> | true
>>
>> default | table.classpath.context
..................................... |
>>
>> default | table.compaction.major.everything.idle
...................... |
>> 1h
>>
>> default | table.compaction.major.ratio
................................ |
>> 3
>>
>> default | table.compaction.minor.idle
................................. |
>> 5m
>>
>> default | table.compaction.minor.logs.threshold
....................... |
>> 3
>>
>> table | table.constraint.1
.......................................... |
>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>
>> default | table.failures.ignore
.......................................
>> | false
>>
>> default | table.file.blocksize
........................................ |
>> 0B
>>
>> default | table.file.compress.blocksize
...............................
>> | 100K
>>
>> default | table.file.compress.blocksize.index
.........................
>> | 128K
>>
>> default | table.file.compress.type
.................................... |
>> gz
>>
>> default | table.file.max
.............................................. |
>> 15
>>
>> default | table.file.replication
...................................... |
>> 0
>>
>> default | table.file.type
............................................. |
>> rf
>>
>> default | table.formatter
.............................................
>> | org.apache.accumulo.core.util.format.DefaultFormatter
>>
>> default | table.groups.enabled
........................................ |
>>
>> default | table.interepreter
..........................................
>> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable
.......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table |
table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.majc.vers
.................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.majc.vers.opt.maxVersions
.................... | 1
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable
.......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table |
table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.minc.vers
.................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.minc.vers.opt.maxVersions
.................... | 1
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable
.......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table |
table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> ---------------------------------------------------------- hit
any key
>> to continue or 'q' to quit
>> ----------------------------------------------------------
>>
>> table | table.iterator.scan.vers
.................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.scan.vers.opt.maxVersions
.................... | 1
>>
>> default | table.majc.compaction.strategy
..............................
>> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>>
>> default | table.scan.max.memory
.......................................
>> | 512K
>>
>> table | @override
................................................ | 1M
>>
>> default | table.security.scan.visibility.default
...................... |
>>
>> default | table.split.threshold
....................................... |
>> 1G
>>
>> default | table.walog.enabled
.........................................
>> | true
>>
>>
>>
-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> More Table Info:
>>
>> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>>
>>
>>
>> ONLINE
>>
>>
>>
>> 2
>>
>>
>>
>> 0
>>
>>
>>
>> 82.56M
>>
>>
>>
>> 810.00K
>>
>>
>>
>> 159
>>
>> Please let me know if I am doing something wrong to if there is more
>> information you need.
>>
>> V/r,
>>
>> -Daniel
>>
>