[PERFORM] partitioning versus clustering
Postgres Performance Wizards, I am a few years into developing and operating a system underpinned by Postgres that sees the arrival a significant number of events around the clock, not an epic amount of data, but enough to be challenging to me, in particular when system downtime is not very palatable and the data is retained quasi-indefinitely. I have various tables that house different kinds of events, and in addition to wanting to look at a small number of rows of data, users often want to generate summary reports on large swaths of data that span days or weeks. At present, these reports can trigger index scans that take minutes to service, and the parameters of the reports are user specified, making their pre-generation infeasible. Generally the rows in these tables are write-once, but they contain a pointer to the related BLOB from which they were constructed, and every now and again some new field in the originating BLOB becomes of interest, causing me to alter the table and then do a sweep of the full table with corresponding updates, violating the otherwise INSERT-only nature. These event tables generally have an "event time" column that is indexed and which is an obvious candidate for either partitioning or clustering of the table. I'm trying to make sense of which is the better option for me. As best I can tell, the decision points are as follows... PARTITIONING Pros: * no outage; data just starts flowing into new partitions seamlessly * allows more control over where the data goes, creating retrieval parallelization opportunities * "clustering" cannot be inadvertently undone in a way that requires scheduled downtime to repair * probably more resilient in the case of the "event time" being different from the time that I processed the event Cons: * does not deal with legacy data without extra migration (over time this becomes less relevant) * requires some kind of background process to manage partition creation * partition size will affect performance and choosing its size is not a science CLUSTERING Pros: * no particularly custom development work on my part * once done, it puts all existing data in a good state for efficient querying without extra work Cons: * will lock up the system for the duration of the CLUSTER command * somehow need to make sure that ANALYZE commands run often enough * does not give me much control of the underlying storage layout * may have problems when the occasional mass-UPDATE is done * unclear whether a VACUUM FULL is required to prevent subsequent un-clustered-ness despite having a fill factor of 100, stemming from the mass-UPDATE operations * could generate a huge number of WAL segments to archive * could possibly be sabotaged by the "event time" property not being well correlated with the time that the event is processed in the face of upstream systems have momentary issues As far as questions to the group go: * Is my understanding of the pros and cons of the options reasonably correct and comprehensive? * What has governed your decisions in making such a choice on past projects of your own? * If I go the clustering route, will the occasional mass update really mess with things, requiring a re-cluster and possibly even a full vacuum (to prevent re-un-clustering)? * Might it make more sense to cluster when the "event time" property is the time that I processed the event but partition when it is the time that the event occurred in some other system? * Is running a CLUSTER command actually necessary to get the performance benefits if the table ought already be in a good order, or is just running a CLUSTER command on a well ordered table enough to get query execution to yield nice sequential access to the disk? Many thanks in advance for your insights... -- AWG -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] query against large table not using sensible index to find very small amount of data
I have a fairly large table (~100M rows), let's call it "events", and among other things it has a couple of columns on it, columns that we'll call entity_type_id (an integer) and and published_at (a timestamp). It has, among others, indices on (published_at) and (entity_type_id, published_at). A very common query against this table is of the form... SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25; ... to get the most recent 25 events from the table for a given type of entity, and generally the query planner does the expected thing of using the two-part index on (entity_type_id, published_at). Every now and again, though, I have found the query planner deciding that it ought use the single column (published_at) index. This can, unsurprisingly, result in horrendous performance if events for a given entity type are rare, as we end up with a very long walk of an index. I had this happen again yesterday and I noticed something of particular interest pertaining to the event. Specifically, the query was for an entity type that the system had only seen for the first time one day prior, and furthermore the events table had not been analyzed by the statistics collector for a couple of weeks. My intuition is that the query planner, when working with an enormous table, and furthermore encountering an entity type that the statistics collector had never previously seen, would assume that the number of rows in the events table of that entity type would be very small, and therefore the two-part index on (entity_type_id, published_at) would be the right choice. Nonetheless, an EXPLAIN was showing usage of the (published_at) index, and since there were only ~20 rows in the entire events table for that entity type the queries were getting the worst possible execution imaginable, i.e. reading in the whole table to find the rows that hit, but doing it with the random I/O of an index walk. As an experiment, I ran a VACUUM ANALYZE on the events table, and then re-ran the EXPLAIN of the query, and... Same query plan again... Maybe for whatever issue I am having the random sampling nature of the statistics collector made it unhelpful, i.e. in its sampling of the ~100M rows it never hit a single row that had the new entity type specified? Other possibly relevant pieces of information... The entity type column has a cardinality in the neighborhood of a couple dozen. Meanwhile, for some of the entity types there is a large and ongoing number of events, and for other entity types there is a smaller and more sporadic number of events. Every now and again a new entity type shows up. I can't understand why the query planner would make this choice. Maybe it has gotten ideas into its head about the distribution of data? Or maybe there is a subtle bug that my data set is triggering? Or maybe I need to turn some knobs on statistics collection? Or maybe it's all of these things together? I worry that even if there is a knob turning exercise that helps that we're still going to get burned whenever a new entity type shows up until we re-run ANALYZE, assuming that I can find a fix that involves tweaking statistics collection. I just can't fathom how it would ever be the case that Postgres's choice of index usage in this case would make sense. It doesn't even slot cleanly into the problem space of "why did Postgres do a sequential scan instead of an index scan?". If you're doing a query of the described form and the entity type is specified, wouldn't the two-part index theoretically _always_ yield better performance than the one-part index? Maybe I have a flawed understanding of the cost of using various indexes? Maybe there is something analogous between sequential-versus-index-scan and one-part-versus-two-part-index scan choices? FWIW, we're running on 8.4.X and using the out-of-the-box default_statistics_target setting and haven't dabbled with setting table level statistics configurations. Thoughts? Recommended reading? -- AWG -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query against large table not using sensible index to find very small amount of data
Your understanding of the utility of multi-part indices does not jive with my own. While I agree that a partial index might be in order here, that ought just be a performance optimization that lowers the footprint of the index from an index size and index maintenance standpoint, not something that governs when the index is used for an item whose entity type rarely comes up in the table. If a couple of the entity types were to constitute 80% of the events, then using a partial index would reduce the performance strain of maintaining the index by 80%, but this ought not govern the query planner's behavior when doing queries on entity types that were not among those. My general understanding of the utility of multi-part indices is that they will come into play when some number of the leading columns appear in the query as fixed values and furthermore if a subsequent column appears as part of a ranging operation. I know that a b-tree structure isn't exactly the same as a binary-tree, but it is roughly equivalent for the purposes of our conversation... I believe you can think of multi-part indices as (roughly) equivalent either to nested binary trees, or as equivalent to a binary tree whose keys are the concatenation of the various columns. In the former case, doing a range scan would be a matter of hopping through the nested trees until you got to the terminal range scan operation, and in the latter case doing a range scan would be a matter of finding the first node in the tree that fell within the values for your concatenation and then walking through the tree. Yes, that's not exactly what happens with a b-tree, but it's pretty similar, the main differences being performance operations, I believe. Given that, I don't understand how having a multi-part index with the column over which I intend to range comes _earlier_ than the column(s) that I intend to have be fixed would be helpful. This is especially true given that the timestamp columns are are the granularity of _milliseconds_ and my data set sees a constant stream of inputs with bursts up to ~100 events per second. I think what you are describing could only make sense if the date column were at a large granularity, e.g hours or days. Or maybe I have missed something... -- AWG On Tue, Apr 08, 2014 at 01:39:41PM +, Shaun Thomas wrote: > > > Other possibly relevant pieces of information... The entity type > > column has a cardinality in the neighborhood of a couple dozen. > > Meanwhile, for some of the entity types there is a large and ongoing > > number of events, and for other entity types there is a smaller and > > more sporadic number of events. Every now and again a new entity > > type shows up. > > With that as the case, I have two questions for you: > > 1. Why do you have a low cardinality column as the first column in an index? > 2. Do you have any queries at all that only use the entity type as the only > where clause? > > I agree that the planner is probably wrong here, but these choices aren't > helping. The low cardinality of the first column causes very large buckets > that don't limit results very well at all. Combined with the order-by clause, > the planner really wants to walk the date index backwards to find results > instead. I would do a couple of things. > > First, remove the type/date index. Next, do a count of each type in the table > with something like this: > > SELECT type_id, count(1) > FROM my_table > GROUP BY 2 > > Any type that is more than 20% of the table will probably never be useful in > an index. At this point, you have a choice. You can create a new index with > date and type *in that order* or create a new partial index on date and type > that also ignores the top matches. For instance, if you had a type that was > 90% of the values, this would be my suggestion: > > CREATE INDEX idx_foo_table_date_event_type_part ON foo_table (event_date, > event_type) > WHERE event_type != 14; > > Or whatever. If the IDs are basically evenly distributed, it won't really > matter. > > In any case, index order matters. The planner wants to restrict data as > quickly as possible. If you provide an order clause, it wants to read the > index in that order. Your specified type as the first column disrupts that, > so it has to fetch the values first, which is usually more expensive. Even if > that's wrong in your particular case, planner stats are not precise enough to > know that. > > Either way, try moving the indexes around. I can't think of many indexes in > our database where I have the low cardinality value as the first column. > Databases have an easier time managing many shallow buckets of values, than a > few deep ones. > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604 > 312-676-8870 > stho...@optionshouse.com > > __ > > See http://www.peak6.com/email_disclaimer/ for terms and conditions relate
Re: [PERFORM] query against large table not using sensible index to find very small amount of data
planner can act with better knowledge of the specified entity_type_id value's prevalence in the events table, even though this would feel a little bit like a hack. Any insights would be greatly appreciate. -- AWG On Tue, Apr 08, 2014 at 09:55:38AM -0400, Tom Lane wrote: > "Andrew W. Gibbs" writes: > > A very common query against this table is of the form... > > > SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC > > LIMIT 25; > > > ... to get the most recent 25 events from the table for a given type > > of entity, and generally the query planner does the expected thing of > > using the two-part index on (entity_type_id, published_at). Every now > > and again, though, I have found the query planner deciding that it > > ought use the single column (published_at) index. > > What is the estimated rows count according to EXPLAIN when it does that, > versus when it chooses the better plan? > > > FLAW, we're running on 8.4.X and using the out-of-the-box > > default_statistics_target setting and haven't dabbled with setting > > table level statistics configurations. > > 8.4.X is due to reach SOL in July, so you really ought to be thinking > about an upgrade. It's not clear from the given info whether this issue > is fixable with stats configuration adjustments, is a bug already fixed > in later versions, or neither, but we're unlikely to make any significant > changes in the 8.4 planner code at this point... > > regards, tom lane > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance