No, I was not recommending locality groups as a solution to the problem, but using them to illustrate why the query was taking a long time.

do() and observe slow
change config
do() and observe fast

I was not completely clear that I was not recommending use of locality groups as a solution to slow scans. The solution is to not do an unbounded `scan -c` and expect it to be fast.

Christopher wrote:
The schema shown above doesn't quite look like it's well-suited for
locality groups, though. The CF field looks like it's a composition of
an attribute name and that attribute's value. To take advantage of
locality groups with that schema, you'd have to have a locality group
for every attribute name/value combination, which would probably not
work well.

If you want to take advantage of locality groups, you'll probably want
to make your CFs a small, discrete set (like just attribute names).
So, if you push the attribute value into the CQ, you could at the very
least limit your search to the locality containing the particular
attribute name you are searching for.

If you really want efficient searches based on attribute name/value
combinations, you're going to want to put this up the row (at the
beginning of your row), so your data is ordered (indexed) by that. You
could do this in a secondary index (which could be in a different
table, a different segment of this table, or in a separate locality
group in this table).

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser<[email protected]>  wrote:
Yup, that would be expected.

Remember that doing `scan -c ...` is an unbounded search over your entire
table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
Because you have a single locality group, all of the columns in your table
are grouped together.

One exercise that may be interesting for yourself is to create a locality
group that has your specific column family in it, compact your
GUIDIndexTable, and rerun your `scan -c` query. The speed should be similar
to your exact scan. Removing the locality group and re-compacting the table
should return the query time back to the slow 3 minutes.

Does that make sense?

Daniel Ruiz wrote:
Hi All,

I am having an issue where column fetches are taking over a minute on
1.6.3. I don’t believe this should be case and my experience in the past
supports the idea that fetches should be very fast.

For example we doing a scan on the table gives results instantly but
doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
(plus or minus 1 second).

Figure 1.1. Generated Test Data on GUIDIndexTable

Here is the table config


-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------

SCOPE | NAME | VALUE


-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------

default | table.balancer ..............................................
| org.apache.accumulo.server.master.balancer.DefaultLoadBalancer

default | table.bloom.enabled .........................................
| false

default | table.bloom.error.rate ......................................
| 0.5%

default | table.bloom.hash.type .......................................
| murmur

default | table.bloom.key.functor .....................................
| org.apache.accumulo.core.file.keyfunctor.RowFunctor

default | table.bloom.load.threshold .................................. |
1

default | table.bloom.size ............................................
| 1048576

default | table.cache.block.enable ....................................
| false

default | table.cache.index.enable ....................................
| true

default | table.classpath.context ..................................... |

default | table.compaction.major.everything.idle ...................... |
1h

default | table.compaction.major.ratio ................................ |
3

default | table.compaction.minor.idle ................................. |
5m

default | table.compaction.minor.logs.threshold ....................... |
3

table | table.constraint.1 .......................................... |
org.apache.accumulo.core.constraints.DefaultKeySizeConstraint

default | table.failures.ignore .......................................
| false

default | table.file.blocksize ........................................ |
0B

default | table.file.compress.blocksize ...............................
| 100K

default | table.file.compress.blocksize.index .........................
| 128K

default | table.file.compress.type .................................... |
gz

default | table.file.max .............................................. |
15

default | table.file.replication ...................................... |
0

default | table.file.type ............................................. |
rf

default | table.formatter .............................................
| org.apache.accumulo.core.util.format.DefaultFormatter

default | table.groups.enabled ........................................ |

default | table.interepreter ..........................................
| org.apache.accumulo.core.util.interpret.DefaultScanInterpreter

table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter

table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000

table | table.iterator.majc.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator

table | table.iterator.majc.vers.opt.maxVersions .................... | 1

table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter

table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000

table | table.iterator.minc.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator

table | table.iterator.minc.vers.opt.maxVersions .................... | 1

table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter

table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000

---------------------------------------------------------- hit any key
to continue or 'q' to quit
----------------------------------------------------------

table | table.iterator.scan.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator

table | table.iterator.scan.vers.opt.maxVersions .................... | 1

default | table.majc.compaction.strategy ..............................
| org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy

default | table.scan.max.memory .......................................
| 512K

table | @override ................................................ | 1M

default | table.security.scan.visibility.default ...................... |

default | table.split.threshold ....................................... |
1G

default | table.walog.enabled .........................................
| true


-----------+---------------------------------------------------------------+---------------------------------------------------------------------------------

More Table Info:

GUIDIndexTable<http://107.23.12.24:50095/tables?t=f>



ONLINE



2



0



82.56M



810.00K



159

Please let me know if I am doing something wrong to if there is more
information you need.

V/r,

-Daniel

Reply via email to