Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
The plural seems better to me; there's no such thing as a solitary synchronized scan, no? The whole point of the feature is to affect the behavior of multiple scans. +1. The plural is important IMHO. ok, good. As I stated earlier, I don't really like this argument (we already broke badly designed applications a few times in the past) but we really need a way to guarantee that the execution of a query is stable and doesn't depend on external factors. And the original problem was to guarantee that pg_dump builds a dump as identical as possible to the existing data by ignoring external factors. It's now the case with your patch. The fact that it allows us not to break existing applications relying too much on physical ordering is a nice side effect though :). One more question. It would be possible that a session that turned off the synchronized_seqscans still be a pack leader for other later sessions. Do/should we consider that ? The procedure would be: start from page 0 iff no other pack is present fill the current scan position for others Andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Wed, Jan 30, 2008 at 10:56:47AM +0100, Zeugswetter Andreas ADI SD wrote: The plural seems better to me; there's no such thing as a solitary synchronized scan, no? The whole point of the feature is to affect the behavior of multiple scans. +1. The plural is important IMHO. ok, good. As I stated earlier, I don't really like this argument (we already broke badly designed applications a few times in the past) but we really need a way to guarantee that the execution of a query is stable and doesn't depend on external factors. And the original problem was to guarantee that pg_dump builds a dump as identical as possible to the existing data by ignoring external factors. It's now the case with your patch. The fact that it allows us not to break existing applications relying too much on physical ordering is a nice side effect though :). One more question. It would be possible that a session that turned off the synchronized_seqscans still be a pack leader for other later sessions. Do/should we consider that ? The procedure would be: start from page 0 iff no other pack is present fill the current scan position for others I think that allowing other scans to use the scan started by a query that disabled the sync scans would have value. It would prevent these types of queries from completely tanking the I/O. +1 Ken ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Zeugswetter Andreas ADI SD escribió: The plural seems better to me; there's no such thing as a solitary synchronized scan, no? The whole point of the feature is to affect the behavior of multiple scans. +1. The plural is important IMHO. ok, good. Hmm, if you guys are going to add another GUC variable, please hurry because we have to translate the description text. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Alvaro Herrera [EMAIL PROTECTED] writes: Hmm, if you guys are going to add another GUC variable, please hurry because we have to translate the description text. Yeah, I'm going to put it in today --- just the on/off switch. Any discussions of exposing threshold parameters will have to wait for 8.4. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: One more question. It would be possible that a session that turned off the synchronized_seqscans still be a pack leader for other later sessions. Do/should we consider that ? Seems like a reasonable thing to consider ... for 8.4. I'm not willing to go poking the syncscan code that much at this late point in the 8.3 cycle. (I'm not sure if it's been mentioned yet on -hackers, but the current plan is to freeze 8.3.0 tomorrow evening.) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Wed, 2008-01-30 at 13:07 -0500, Tom Lane wrote: Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: One more question. It would be possible that a session that turned off the synchronized_seqscans still be a pack leader for other later sessions. Do/should we consider that ? Seems like a reasonable thing to consider ... for 8.4. Definitely. I thought about this the other day and decided it had some strange behaviour in some circumstances, so wouldn't be desirable overall. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Simon Riggs wrote: On Wed, 2008-01-30 at 18:42 +, Heikki Linnakangas wrote: It's even worse than that. Elsewhere in this thread Simon mentioned a partitioned table, where each partition on its own is smaller than the threshold, but you're seq scanning several partitions and the total size of the seq scans is larger than memory size. In that scenario, you would want BAS and synchronized scans, but even a per-table setting wouldn't cut it. For synchronized scans to help in the partitioned situation, I guess you'd want to synchronize across partitions. If someone is already scanning partition 5, you'd want to start from that partition and join the pack, instead of starting from partition 1. You're right, but in practice its not quite that bad with the multi-table route. When you have partitions you generally exclude most of them, with typically 1-2 per query, usually different ones. Yep. And in that case, you *don't'* want BAS or sync scans to kick in, because you're only accessing a relatively small chunk of data, and it's worthwhile to cache it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
It's a good point that we don't want pg_dump to screw up the cluster order, but that's the only use case I've seen this far for disabling sync scans. Even that wouldn't matter much if our estimate for clusteredness didn't get screwed up by a table that looks like this: 5 6 7 8 9 1 2 3 4 I do think the guc to turn it off is useful, only I don't understand the reasoning that pg_dump needs it to maintain the basic clustered property. Sorry, but I don't grok this at all. Why the heck would we care if we have 2 parts of the table perfectly clustered, because we started in the middle ? Surely our stats collector should recognize such a table as perfectly clustered. Does it not ? We are talking about one breakage in the readahead logic here, this should only bring the clustered property from 100% to some 99.99% depending on table size vs readahead window. Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: Sorry, but I don't grok this at all. Why the heck would we care if we have 2 parts of the table perfectly clustered, because we started in the middle ? Surely our stats collector should recognize such a table as perfectly clustered. Does it not ? We are talking about one breakage in the readahead logic here, this should only bring the clustered property from 100% to some 99.99% depending on table size vs readahead window. Well clusteredness is used or could be used for a few different heuristics, not all of which this would be quite as well satisfied as readahead. But for the most common application, namely trying to figure out whether index probes for sequential ids will be sequential i/o or random i/o you're right. Currently the statistic we use to estimate this is the correlation of the column value with the physical location on disk. That's not a perfect metric for estimating how much random i/o would be needed to scan the table in index order though. It would be great if Postgres picked up a serious statistics geek who could pipe up in discussions like this with how about using the Euler-Jacobian Centroid or some such thing. If you have any suggestions of what metric to use and how to calculate the info we need from it that would be great. One suggestion from a long way back was scanning the index and counting how many times the item pointer moves backward to an earlier block. That would still require a full index scan though. And it doesn't help for columns which aren't indexed though I'm not sure we need this info for columns which aren't indexed. It's also not clear how to interpolate from that the amount of random access a given query would perform. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
+1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release cycle we do test the cases Simon described above and we agree we need to do a fine tune to benefit from this feature, we will need to deprecate 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). Looking at this perpective, IMHO we should go with the number (0.25) instead of the boolean. Surely the risk-of-needing-to-deprecate argument applies ten times more strongly to a number than a boolean. Yes, I would expect the tuning to be more system than user specific. So imho a boolean userset would couple well with a tuning guc, that may usefully not be userset (if we later discover a need for tuning at all). so +1 for the bool. synchronize[d]_seqscan sounds a bit better in my ears than the plural synchronize_seqscans. To me the latter somehow suggests influece on the whole cluster, probably not worth further discussion though, so if someone says no, ok. Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Tue, Jan 29, 2008 at 10:40:40AM +0100, Zeugswetter Andreas ADI SD wrote: It's a good point that we don't want pg_dump to screw up the cluster order, but that's the only use case I've seen this far for disabling sync scans. Even that wouldn't matter much if our estimate for clusteredness didn't get screwed up by a table that looks like this: 5 6 7 8 9 1 2 3 4 I do think the guc to turn it off is useful, only I don't understand the reasoning that pg_dump needs it to maintain the basic clustered property. Sorry, but I don't grok this at all. Why the heck would we care if we have 2 parts of the table perfectly clustered, because we started in the middle ? Surely our stats collector should recognize such a table as perfectly clustered. Does it not ? We are talking about one breakage in the readahead logic here, this should only bring the clustered property from 100% to some 99.99% depending on table size vs readahead window. Andreas Andreas, I agree with your logic. If the process that PostgreSQL uses to determine how clustered a table is that breaks with such a layout, we may need to see what should be changed to make it work. Having had pg_dump cause a database to grind to a halt, I would definitely like the option of using the synchronized scans even for clustered tables. Ken ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Euler Taveira de Oliveira [EMAIL PROTECTED] writes: +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release cycle we do test the cases Simon described above and we agree we need to do a fine tune to benefit from this feature, we will need to deprecate 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). Looking at this perpective, IMHO we should go with the number (0.25) instead of the boolean. Surely the risk-of-needing-to-deprecate argument applies ten times more strongly to a number than a boolean. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Tue, 2008-01-29 at 10:55 +, Gregory Stark wrote: Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: Sorry, but I don't grok this at all. Why the heck would we care if we have 2 parts of the table perfectly clustered, because we started in the middle ? Surely our stats collector should recognize such a table as perfectly clustered. Does it not ? We are talking about one breakage in the readahead logic here, this should only bring the clustered property from 100% to some 99.99% depending on table size vs readahead window. Well clusteredness is used or could be used for a few different heuristics, not all of which this would be quite as well satisfied as readahead. But for Can you give an example? Treating a file as a circular structure does not impose any significant cost that I can see. It would be great if Postgres picked up a serious statistics geek who could pipe up in discussions like this with how about using the Euler-Jacobian Centroid or some such thing. If you have any suggestions of what metric to use and how to calculate the info we need from it that would be great. Agreed. One suggestion from a long way back was scanning the index and counting how many times the item pointer moves backward to an earlier block. That would An interesting metric. As you say, we really need a statistician to definitively say what the correct metrics are, and what kind of sampling we need to make good estimates. still require a full index scan though. And it doesn't help for columns which aren't indexed though I'm not sure we need this info for columns which aren't indexed. It's also not clear how to interpolate from that the amount of random access a given query would perform. I don't think clusteredness has any meaning at all in postgres for an unindexed column. I suppose a table could be clustered without an index, but currently there's no way to do that in postgresql. Regards, Jeff Davis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Simon Riggs wrote: And if you have a partitioned table with partitions inconveniently sized? You'd need to *reduce* shared_buffers specifically to get synch scans and BAS to kick in. Or increase partition size. Both of which reduce the impact of the benefits we've added. I don't think the argument that a table is smaller than shared buffers therefore it is already in shared buffers holds true in all cases. I/O does matter. +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release cycle we do test the cases Simon described above and we agree we need to do a fine tune to benefit from this feature, we will need to deprecate 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). Looking at this perpective, IMHO we should go with the number (0.25) instead of the boolean. -- Euler Taveira de Oliveira http://www.timbira.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Tue, Jan 29, 2008 at 1:09 PM, in message [EMAIL PROTECTED], Tom Lane [EMAIL PROTECTED] wrote: Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? If geqo kicks in, we're already there, aren't we? Isn't an application which counts on the order of result rows without specifying ORDER BY fundamentally broken? -Kevin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Kevin Grittner [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? If geqo kicks in, we're already there, aren't we? Yup, and that's one of the reasons we have a way to turn geqo off. (geqo is actually a good precedent for this --- notice that it has an on/off switch that's separate from its tuning knobs.) Isn't an application which counts on the order of result rows without specifying ORDER BY fundamentally broken? No doubt, but if it's always worked before, people are going to be unhappy anyway. Also, it's not just ordering that's at stake. Try regression=# create table foo as select x from generate_series(1,100) x; SELECT regression=# select * from foo limit 1; x --- 1 2 3 4 regression=# select * from foo limit 1; x --- 7233 7234 7235 7236 regression=# select * from foo limit 1; x --- 14465 14466 14467 14468 Now admittedly we've never promised LIMIT without ORDER BY to be well-defined either, but not everybody reads the fine print. This case is particularly nasty because at smaller LIMIT values the result *is* consistent, so you might never notice the problem while testing. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: synchronize[d]_seqscan sounds a bit better in my ears than the plural synchronize_seqscans. The plural seems better to me; there's no such thing as a solitary synchronized scan, no? The whole point of the feature is to affect the behavior of multiple scans. BTW, so far as the rest of the thread goes, I'm not necessarily opposed to exposing the switchover threshold as a tunable. But I think it needs more thought to design than we can give it in time for 8.3 (because of the interaction with the buffer access strategy stuff). Also I don't like having pg_dump manipulating a tuning parameter. I don't see anything wrong with having both an on/off feature switch and a tunable in future releases. The feature switch can be justified on grounds of backwards compatibility quite independently of whether pg_dump uses it. Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Ron Mayer [EMAIL PROTECTED] writes: Tom Lane wrote: Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? Won't even autovacuum analyze cause this too if the new stats changes the plan? Given that the table is unchanging, that's moderately unlikely to happen (especially for select * from foo ;-)) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Tom Lane wrote: Kevin Grittner [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? Won't even autovacuum analyze cause this too if the new stats changes the plan? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Jan 29, 2008 8:09 PM, Tom Lane [EMAIL PROTECTED] wrote: Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: synchronize[d]_seqscan sounds a bit better in my ears than the plural synchronize_seqscans. The plural seems better to me; there's no such thing as a solitary synchronized scan, no? The whole point of the feature is to affect the behavior of multiple scans. +1. The plural is important IMHO. BTW, so far as the rest of the thread goes, I'm not necessarily opposed to exposing the switchover threshold as a tunable. But I think it needs more thought to design than we can give it in time for 8.3 (because of the interaction with the buffer access strategy stuff). +1. The current patch is simple and so far in the cycle, I really think we should keep it that way. The feature switch can be justified on grounds of backwards compatibility quite independently of whether pg_dump uses it. Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? As I stated earlier, I don't really like this argument (we already broke badly designed applications a few times in the past) but we really need a way to guarantee that the execution of a query is stable and doesn't depend on external factors. And the original problem was to guarantee that pg_dump builds a dump as identical as possible to the existing data by ignoring external factors. It's now the case with your patch. The fact that it allows us not to break existing applications relying too much on physical ordering is a nice side effect though :). -- Guillaume ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Simon Riggs wrote: On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Rather than having a boolean GUC, we should have a number and make the parameter synchronised_scan_threshold. This would open up a can of worms I'd prefer not to touch, having to do with whether the buffer-access-strategy behavior should track that or not. As the note in heapam.c says, * If the table is large relative to NBuffers, use a bulk-read access * strategy and enable synchronized scanning (see syncscan.c). Although * the thresholds for these features could be different, we make them the * same so that there are only two behaviors to tune rather than four. It's a bit late in the cycle to be revisiting that choice. Now we do already have three behaviors to worry about (BAS on and syncscan off) but throwing in a randomly settable knob will take it back to four, and we have no idea how that fourth case will behave. The other tack we could take (having the one GUC variable control both thresholds) is not good since it will result in pg_dump trashing the buffer cache. OK, good points. I'm still concerned that the thresholds gets higher as we increase shared_buffers. We may be removing performance features as fast as we gain performance when we set shared_buffers higher. Might we agree that the threshold should be fixed at 8MB, rather than varying upwards as we try to tune? Synchronized scans, and the bulk-read strategy, don't help if the table fits in cache. If it fits in shared buffers, you're better off keeping it there, than swap pages between the OS cache and shared buffers, or spend any effort synchronizing scans. That's why we agreed back then that the threshold should be X% of shared_buffers. It's a good point that we don't want pg_dump to screw up the cluster order, but that's the only use case I've seen this far for disabling sync scans. Even that wouldn't matter much if our estimate for clusteredness didn't get screwed up by a table that looks like this: 5 6 7 8 9 1 2 3 4 Now, maybe there's more use cases where you'd want to tune the threshold, but I'd like to see some before we add more knobs. To benefit from a lower threshold, you'd need to have a table large enough that its cache footprint matters, but is still smaller than 25% of shared_buffers, and have seq scans on it. In that scenario, you might benefit from a lower threshold, because that would leave some shared_buffers free for other use. Even that is quite hand-wavey; the buffer cache LRU algorithm handles that kind of scenarios reasonably well already, and whether or not To benefit from a larger threshold, you'd need to have a table larger than 25% of shared_buffers, but still smaller than shared_buffers, and seq scan it often enough that you want to keep it in shared buffers. If you're frequently seq scanning a table of that size, you're most likely suffering from a bad plan. Even then, the performance difference shouldn't be that great, the table surely fits in OS cache anyway, with typical shared_buffers settings. Tables that are seq scanned are typically very small, like a summary table with just a few rows, or huge tables in a data warehousing system. Between the extremes, I don't think the threshold actually has a very big impact. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Mon, 2008-01-28 at 23:13 +, Heikki Linnakangas wrote: Tables that are seq scanned are typically very small, like a summary table with just a few rows, or huge tables in a data warehousing system. Between the extremes, I don't think the threshold actually has a very big impact. And if you have a partitioned table with partitions inconveniently sized? You'd need to *reduce* shared_buffers specifically to get synch scans and BAS to kick in. Or increase partition size. Both of which reduce the impact of the benefits we've added. I don't think the argument that a table is smaller than shared buffers therefore it is already in shared buffers holds true in all cases. I/O does matter. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Mon, 2008-01-28 at 23:13 +, Heikki Linnakangas wrote: It's a good point that we don't want pg_dump to screw up the cluster order, but that's the only use case I've seen this far for disabling sync scans. Even that wouldn't matter much if our estimate for clusteredness didn't get screwed up by a table that looks like this: 5 6 7 8 9 1 2 3 4 It doesn't seem like there is any reason for the estimate to get confused, but it apparently does. I loaded a test table with a similar distribution to your example, and it shows a correlation of about -0.5, but it should be as good as something near -1 or +1. I am not a statistics expert, but it seems like a better measurement would be: what is the chance that, if the tuples are close together in index order, the corresponding heap tuples are close together?. The answer to that question in your example is very likely, so there would be no problem. Is there a reason we don't do this? Regards, Jeff Davis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Jeff Davis wrote: On Mon, 2008-01-28 at 23:13 +, Heikki Linnakangas wrote: clusteredness didn't get screwed up by a table that looks like this: 5 6 7 8 9 1 2 3 4 ...test table with a similar distribution to your example, and it shows a correlation of about -0.5, but it should be as good as something near -1 or +1. I am not a statistics expert, but it seems like a better measurement would be: what is the chance that, if the tuples are close together in index order, the corresponding heap tuples are close together?. Same applies for data clustered by zip-code. All rows for any State or City or County or SchoolZone are close together on the same pages; yet postgres's stats think they're totally unclustered. The answer to that question in your example is very likely, so there would be no problem. Is there a reason we don't do this? I've been tempted to do things like update pg_statistic set stanumbers3='{1.0}' where starelid=2617 and staattnum=7; after every analyze when I have data like this from tables clustered by zip. Seems it'd help more plans than it hurts, but haven't been brave enough to try in production. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster