Ok, So I understand now. You choose the index with the smallest number of matches per key on the average. Unfortunately this doesn't work out so well for me. I am doing a query in the "edges" columnfamily of a graph database, which should return edges with source and target labels equal to given values.
I have about 30M edges, and the target labels have on the average more matching rows. Unfortunately in the given case there are 2 matches on target label, and about 100K on the source label, and I have 5000 similar queries to perform for the overall task. What I think you should be doing is the following: open iterators on the matching keys for each of the indexes; the inside loop would pick an iterator at random, and pull a match from it. This would assure that the expected number of entries examined is a small multiple (# of other indexes) of the index with the most "precision". Then (if you want) you can optimize using overall statistics to adjust the initial probabilities if you want. But as you process the query you should mix these initial probabilities with probabilities proportional to the actual fraction of overall matches generated by a given index. (I guess you can control the speed of mixing using the standard deviations on the initial key counts if you want). I know you have a new type of index in the works... but it doesn't look like "trunk" has any modifications for "scan", and presumably the strategy I just mentioned is pretty general (not depending on histograms, etc). Does it sound like a good idea? -- Shaun On Feb 6, 2011, at 12:15 AM, Jonathan Ellis wrote: > ColumnFamilyStore.scan > > On Sat, Feb 5, 2011 at 10:32 PM, Shaun Cutts <sh...@cuttshome.net> wrote: >> Thanks for the response! >> >> So.. I *may* have a bug to report (at least I can generate radically >> different response times based on expression order with a multiply indexed >> columnfamily), but first I'll have to upgrade to a stable version (currently >> I have 7.0rc2 installed). >> >> I was also wondering where the code that does this is... is it in >> >> java.org.apache.cassandra.db.columniterator.IndexedSliceReader? >> >> >> Thanks, >> >> -- Shaun >> >> On Feb 5, 2011, at 2:39 PM, Jonathan Ellis wrote: >> >>> On Sat, Feb 5, 2011 at 8:48 AM, Shaun Cutts <sh...@cuttshome.net> wrote: >>>> Hello, >>>> I'm wondering if cassandra is sensitive to the order of index expressions >>>> in >>>> (pycassa call) get_indexed_slices? >>> >>> No. >>> >>>> If I have several column indexes available, will it attempt to optimize the >>>> order? >>> >>> Yes. >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com