Re: [PERFORM] Guidelines on best indexing strategy for varying searches on 20+ columns

Niels Kristian Schjødt Mon, 30 Jun 2014 00:34:27 -0700

Thanks for your suggestions, very useful. See comments inline:

Den 25/06/2014 kl. 23.48 skrev Merlin Moncure <mmonc...@gmail.com>:


> On Wed, Jun 25, 2014 at 3:48 AM, Niels Kristian Schjødt
> <nielskrist...@autouncle.com> wrote:
>> Hi,
>> I’m running a search engine for cars. It’s backed by a postgresql 9.3 
>> installation.
>> 
>> Now I’m unsure about the best approach/strategy on doing index optimization 
>> for the fronted search.
>> 
>> The problem:
>> 
>> The table containing the cars holds a around 1,5 million rows. People that 
>> searches for cars needs different criteria to search by. Some search by 
>> brand/model, some by year, some by mileage, some by price and some by 
>> special equipment etc. etc. - and often they combine a whole bunch of 
>> criteria together. Of cause some, like brand/mode and price, are used more 
>> frequently than others. In total we offer: 9 category criteria like 
>> brand/model or body type, plus 5 numeric criteria like price or mileage, 
>> plus 12 boolean criteria like equipment. Lastly people can order the results 
>> by different columns (year, price, mileage and a score we create about the 
>> cars). By default we order by our own generated score.
>> 
>> What I’ve done so far:
>> 
>> I have analyzed the usage of the criteria “lightly”, and created a few 
>> indexes (10). Among those, are e.g. indexes on price, mileage and a combined 
>> index on brand/model. Since we are only interested in showing results for 
>> cars which is actually for sale, the indexes are made as partial indexes on 
>> a sales state column.
>> 
>> Questions:
>> 
>> 1. How would you go about analyzing and determining what columns should be 
>> indexed, and how?
> 
> mainly frequency of access.
> 
>> 2. What is the best strategy when optimizing indexes for searches happening 
>> on 20 + columns, where the use and the combinations varies a lot? (To just 
>> index everything, to index some of the columns, to do combined indexes, to 
>> only do single column indexes etc. etc.)
> 
> don't make 20 indexes.   consider installing pg_trgm (for optimized
> LIKE searching) or hstore (for optmized key value searching) and then
> using GIST/GIN for multiple attribute search.  with 9.4 we have
> another fancy technique to explore: jsonb searching via GIST/GIN.

Interesting, do you have any good resources on this approach?
> 
>> 3. I expect that it does not make sense to index all columns?
> 
> well, maybe.  if you only ever search one column at a time, then it
> might make sense.  but if you need to search arbitrary criteria and
> frequently combine a large number, then no -- particularly if your
> dataset is very large and individual criteria are not very selective.

So, to just clarify: I’m often combining a large number of search criteria and 
the individual criteria is often not very selective, in that case, are you 
arguing for or against indexing all columns? :-)
> 
>> 4. I expect it does not make sense to index boolean columns?
> 
> in general, no.  an important exception is if you are only interested
> in true or false and the number of records that have that interesting
> value is tiny relative to the size of the table.  in that case, a
> partial index can be used for massive optimization.

Thanks, hadn’t been thinking about using partial indexes here as an option.
> 
>> 5. Is it better to do a combined index on 5 frequently used columns rather 
>> than having individual indexes on each of them?
> 
> Only if you search those 5 columns together a significant portion of the time.
> 
>> 6. Would it be a goof idea to have all indexes sorted by my default sorting?
> 
> index order rarely matters.  if you always search values backwards and
> the table is very large you may want to consider it.  unfortunately
> this often doesn't work for composite indexes so sometimes we must
> explore the old school technique of reversing the value.
> 
>> 7. Do you have so experiences with other approaches that could greatly 
>> improve performance (e.g. forcing indexes to stay in memory etc.)?
> 
> as noted above, fancy indexing is the first place to look.   start
> with pg_trgm (for like optmization), hstore, and the new json stuff.
> the big limitation you will hit is that that most index strategies, at
> least fo the prepackaged stuff will support '=', or partial string
> (particularly with pg_trgm like), but not > or <: for range operations
> you have to post process the search or try to work the index from
> another angle.
> 
> merlin

Re: [PERFORM] Guidelines on best indexing strategy for varying searches on 20+ columns

Reply via email to