Thanks for your suggestions, very useful. See comments inline: Den 25/06/2014 kl. 23.48 skrev Merlin Moncure <mmonc...@gmail.com>:
> On Wed, Jun 25, 2014 at 3:48 AM, Niels Kristian Schjødt > <nielskrist...@autouncle.com> wrote: >> Hi, >> I’m running a search engine for cars. It’s backed by a postgresql 9.3 >> installation. >> >> Now I’m unsure about the best approach/strategy on doing index optimization >> for the fronted search. >> >> The problem: >> >> The table containing the cars holds a around 1,5 million rows. People that >> searches for cars needs different criteria to search by. Some search by >> brand/model, some by year, some by mileage, some by price and some by >> special equipment etc. etc. - and often they combine a whole bunch of >> criteria together. Of cause some, like brand/mode and price, are used more >> frequently than others. In total we offer: 9 category criteria like >> brand/model or body type, plus 5 numeric criteria like price or mileage, >> plus 12 boolean criteria like equipment. Lastly people can order the results >> by different columns (year, price, mileage and a score we create about the >> cars). By default we order by our own generated score. >> >> What I’ve done so far: >> >> I have analyzed the usage of the criteria “lightly”, and created a few >> indexes (10). Among those, are e.g. indexes on price, mileage and a combined >> index on brand/model. Since we are only interested in showing results for >> cars which is actually for sale, the indexes are made as partial indexes on >> a sales state column. >> >> Questions: >> >> 1. How would you go about analyzing and determining what columns should be >> indexed, and how? > > mainly frequency of access. > >> 2. What is the best strategy when optimizing indexes for searches happening >> on 20 + columns, where the use and the combinations varies a lot? (To just >> index everything, to index some of the columns, to do combined indexes, to >> only do single column indexes etc. etc.) > > don't make 20 indexes. consider installing pg_trgm (for optimized > LIKE searching) or hstore (for optmized key value searching) and then > using GIST/GIN for multiple attribute search. with 9.4 we have > another fancy technique to explore: jsonb searching via GIST/GIN. Interesting, do you have any good resources on this approach? > >> 3. I expect that it does not make sense to index all columns? > > well, maybe. if you only ever search one column at a time, then it > might make sense. but if you need to search arbitrary criteria and > frequently combine a large number, then no -- particularly if your > dataset is very large and individual criteria are not very selective. So, to just clarify: I’m often combining a large number of search criteria and the individual criteria is often not very selective, in that case, are you arguing for or against indexing all columns? :-) > >> 4. I expect it does not make sense to index boolean columns? > > in general, no. an important exception is if you are only interested > in true or false and the number of records that have that interesting > value is tiny relative to the size of the table. in that case, a > partial index can be used for massive optimization. Thanks, hadn’t been thinking about using partial indexes here as an option. > >> 5. Is it better to do a combined index on 5 frequently used columns rather >> than having individual indexes on each of them? > > Only if you search those 5 columns together a significant portion of the time. > >> 6. Would it be a goof idea to have all indexes sorted by my default sorting? > > index order rarely matters. if you always search values backwards and > the table is very large you may want to consider it. unfortunately > this often doesn't work for composite indexes so sometimes we must > explore the old school technique of reversing the value. > >> 7. Do you have so experiences with other approaches that could greatly >> improve performance (e.g. forcing indexes to stay in memory etc.)? > > as noted above, fancy indexing is the first place to look. start > with pg_trgm (for like optmization), hstore, and the new json stuff. > the big limitation you will hit is that that most index strategies, at > least fo the prepackaged stuff will support '=', or partial string > (particularly with pg_trgm like), but not > or <: for range operations > you have to post process the search or try to work the index from > another angle. > > merlin