Re: [PERFORM] Understanding histograms
Tom, Thank you for your prompt reply. On Tue, Apr 29, 2008 at 10:19 PM, Tom Lane [EMAIL PROTECTED] wrote: Len Shapiro [EMAIL PROTECTED] writes: 1. Why does Postgres come up with a negative n_distinct? It's a fractional representation. Per the docs: stadistinct float4 The number of distinct nonnull data values in the column. A value greater than zero is the actual number of distinct values. A value less than zero is the negative of a fraction of the number of rows in the table (for example, a column in which values appear about twice on the average could be represented by stadistinct = -0.5). A zero value means the number of distinct values is unknown I asked about n_distinct, whose documentation reads in part The negated form is used when ANALYZE believes that the number of distinct values is likely to increase as the table grows. and I asked about why ANALYZE believes that the number of distinct values is likely to increase. I'm unclear why you quoted to me the documentation on stadistinct. The rows=2 estimate makes sense when const = 1 or 5, but it makes no sense to me for other values of const not in the MVC list. For example, if I run the query EXPLAIN SELECT * from sailors where rank = -1000; Postgres still gives an estimate of row=2. I'm not sure what estimate you'd expect instead? Instead I would expect an estimate of rows=0 for values of const that are not in the MCV list and not in the histogram. When the histogram has less than the maximum number of entries, implying (I am guessing here) that all non-MCV values are in the histogram list, this seems like a simple strategy and has the virtue of being accurate. Where in the source is the code that manipulates the histogram? The code has a built in assumption that no value not present in the MCV list can be more frequent than the last member of the MCV list, so it's definitely not gonna guess *more* than 2. That's interesting. Where is this in the source code? Thanks for all your help. All the best, Len Shapiro regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Wed, Apr 30, 2008 at 11:09 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: relid |relname | n_tup_ins | n_tup_upd | n_tup_hot_upd | n_dead_tup ---++---+---+---+ 16461 | table1 | 0 | 8352496 | 5389 |8351242 Hmm.. So indeed there are very few HOT updates. What is the fillfactor you are using for these tests ? If its much less than 100, the very low percentage of HOT updates would make me guess that you are updating one of the index columns. Otherwise at least the initial updates until you fill up the free space should be HOT. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
table1 structure : idintegernot null codeintegernot null cridintegernot null statuscharacter varying(1)default 'A'::character varying delta1bigintdefault 0 delta2bigintdefault 0 delta3bigintdefault 0 delta4bigintdefault 0 tz_idintegerdefault 0 Indexes: idx1 PRIMARY KEY, btree (id) idx2 UNIQUE, btree (code, crid) idx3 btree (tz_id) idx4 btree (status) code as crid are foreign key. update table1 set delta1 = 100 where code/100 =999; On Wed, Apr 30, 2008 at 12:16 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: fillfactor is set to 80 as you suggested. delta* fields r updated and these fields are no where related to any of the index fields. On Wed, Apr 30, 2008 at 12:13 PM, Pavan Deolasee [EMAIL PROTECTED] wrote: On Wed, Apr 30, 2008 at 11:09 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: relid |relname | n_tup_ins | n_tup_upd | n_tup_hot_upd | n_dead_tup ---++---+---+---+ 16461 | table1 | 0 | 8352496 | 5389 |8351242 Hmm.. So indeed there are very few HOT updates. What is the fillfactor you are using for these tests ? If its much less than 100, the very low percentage of HOT updates would make me guess that you are updating one of the index columns. Otherwise at least the initial updates until you fill up the free space should be HOT. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Regards Gauri -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Wed, Apr 30, 2008 at 12:16 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: fillfactor is set to 80 as you suggested. delta* fields r updated and these fields are no where related to any of the index fields. That's weird. With that fillfactor, you should have a very high percentage of HOT update ratio. It could be a very special case that we might be looking at. I think a self contained test case or a very detail explanation of the exact usage is what we need to explain this behavior. You may also try dropping non-critical indexes and test again. Btw, I haven't been able to reproduce this at my end. With the given indexes and kind of updates, I get very high percentage of HOT updates. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Gauri Kanekar wrote: HOT doesn't seems to be working in our case. This is table1 structure : idintegernot null codeintegernot null cridintegernot null statuscharacter varying(1)default 'A'::character varying delta1bigintdefault 0 delta2bigintdefault 0 delta3bigintdefault 0 delta4bigintdefault 0 tz_idintegerdefault 0 Indexes: idx1 PRIMARY KEY, btree (id) idx2 UNIQUE, btree (code, crid) idx3 btree (tz_id) idx4 btree (status) code as crid are foreign key. Here delta* fields get updated through out the day. and most of the time it may update the same row again n again. table1 contains around 12843694 records. Now not understanding y HOT don't work in our case. Changed fillfactor to 80, 75,70 but nothing seems to work. Did you dump and reload the table after setting the fill factor? It only affects newly inserted data. Another possibility is that there's a long running transaction in the background, preventing HOT/vacuum from reclaiming the dead tuples. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Heikki Linnakangas wrote: Did you dump and reload the table after setting the fill factor? It only affects newly inserted data. VACUUM FULL or CLUSTER should do the job too, right? After all, they recreate the table so they must take the fillfactor into account. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Craig Ringer wrote: Heikki Linnakangas wrote: Did you dump and reload the table after setting the fill factor? It only affects newly inserted data. VACUUM FULL or CLUSTER should do the job too, right? After all, they recreate the table so they must take the fillfactor into account. CLUSTER, yes. VACUUM FULL won't move tuples around just to make room for the fillfactor. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Please keep list in the loop. On Wed, Apr 30, 2008 at 6:45 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: Hi, We have recreated the indices with fillfactor set to 80, which has improved HOT a little, Wait. Did you say, you recreated the indexes with fill factor ? That's no help for HOT. You need to recreate the TABLEs with a fill factor. And as Heikki pointed out, you need to dump and reload, just altering the table won't affect the current data. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Understanding histograms
Len Shapiro [EMAIL PROTECTED] writes: I asked about n_distinct, whose documentation reads in part The negated form is used when ANALYZE believes that the number of distinct values is likely to increase as the table grows. and I asked about why ANALYZE believes that the number of distinct values is likely to increase. I'm unclear why you quoted to me the documentation on stadistinct. n_distinct is just a view of stadistinct. I assumed you'd poked around in the code enough to know that ... The rows=2 estimate makes sense when const = 1 or 5, but it makes no sense to me for other values of const not in the MVC list. I'm not sure what estimate you'd expect instead? Instead I would expect an estimate of rows=0 for values of const that are not in the MCV list and not in the histogram. Surely that's not very sane? The MCV list plus histogram generally don't include every value in the table. IIRC the estimate for values not present in the MCV list is (1 - sum(MCV frequencies)) divided by (n_distinct - number of MCV entries), which amounts to assuming that all values not present in the MCV list occur equally often. The weak spot of course is that the n_distinct estimate may be pretty inaccurate. Where in the source is the code that manipulates the histogram? commands/analyze.c builds it, and most of the estimation with it happens in utils/adt/selfuncs.c. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Pavan Deolasee [EMAIL PROTECTED] writes: That's weird. With that fillfactor, you should have a very high percentage of HOT update ratio. It could be a very special case that we might be looking at. He's testing update table1 set delta1 = 100 where code/100 =999; so all the rows being updated fall into a contiguous range of code values. If the table was loaded in such a way that those rows were also physically contiguous, then the updates would be localized and would very soon run out of freespace on those pages. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Frank Ch. Eigler) writes: Tom Lane [EMAIL PROTECTED] writes: Also, you need to make sure you have the FSM parameters set high enough so that all the free space found by a VACUUM run can be remembered. Would it be difficult to arrange FSM parameters to be automatically set from the VACUUM reclaim results? Yeah, because the problem is that FSM is kept in shared memory which cannot be resized on-the-fly. In retrospect, trying to keep FSM in shared memory was a spectacularly bad idea (one for which I take full blame). There is work afoot to push it out to disk so that the whole problem goes away; so I don't see much point in worrying about band-aid solutions. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Postgres replication
Hi all, looking for a HA master/master or master/slave replication solution. Our setup consists of two databases and we want to use them both for queries. Aside from pgpool II there seems no advisable replication solution. But the problem seems to be that we will have a single point of failure with pgpool. slony also has the disadvantage not to cover a real failover solution. Are there any other manageable and well tested tools/setups for our scenario? Best regards Gernot -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Wed, Apr 30, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote: Pavan Deolasee [EMAIL PROTECTED] writes: That's weird. With that fillfactor, you should have a very high percentage of HOT update ratio. It could be a very special case that we might be looking at. He's testing It's She :-) Oh yes. Apologies if I sounded harsh; did not mean that. I was just completely confused why she is not seeing the HOT updates. update table1 set delta1 = 100 where code/100 =999; so all the rows being updated fall into a contiguous range of code values. If the table was loaded in such a way that those rows were also physically contiguous, then the updates would be localized and would very soon run out of freespace on those pages. Yeah, that seems like the pattern. I tested with the similar layout and a fill factor 80. The initial few bulk updates had comparatively less HOT updates (somewhere 20-25%), But within 4-5 iterations of updating the same set of rows, HOT updates were 90-95%. That's because after few iterations (and because of non-HOT updates) the tuples get scattered in various blocks, thus improving chances of HOT updates. I guess the reason probably is that she is using fill factor for indexes and not heap, but she hasn't yet confirmed. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
We have tried fillfactor for indices and it seems to work. Need to try fillfactor for table. May for that reason the bulk update queries don't get the advantage of HOT :) On Wed, Apr 30, 2008 at 9:45 PM, Pavan Deolasee [EMAIL PROTECTED] wrote: On Wed, Apr 30, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote: Pavan Deolasee [EMAIL PROTECTED] writes: That's weird. With that fillfactor, you should have a very high percentage of HOT update ratio. It could be a very special case that we might be looking at. He's testing It's She :-) Oh yes. Apologies if I sounded harsh; did not mean that. I was just completely confused why she is not seeing the HOT updates. update table1 set delta1 = 100 where code/100 =999; so all the rows being updated fall into a contiguous range of code values. If the table was loaded in such a way that those rows were also physically contiguous, then the updates would be localized and would very soon run out of freespace on those pages. Yeah, that seems like the pattern. I tested with the similar layout and a fill factor 80. The initial few bulk updates had comparatively less HOT updates (somewhere 20-25%), But within 4-5 iterations of updating the same set of rows, HOT updates were 90-95%. That's because after few iterations (and because of non-HOT updates) the tuples get scattered in various blocks, thus improving chances of HOT updates. I guess the reason probably is that she is using fill factor for indexes and not heap, but she hasn't yet confirmed. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Regards Gauri
Re: [PERFORM] Postgres replication
On Wed, 30 Apr 2008, Gernot Schwed wrote: Hi all, looking for a HA master/master or master/slave replication solution. Our setup consists of two databases and we want to use them both for queries. Aside from pgpool II there seems no advisable replication solution. But the problem seems to be that we will have a single point of failure with pgpool. slony also has the disadvantage not to cover a real failover solution. Are there any other manageable and well tested tools/setups for our scenario? I'm about to setup a similar config and what I was intending to do is to run pgpool on both boxes and use heartbeat (from http://linux-ha.org ) to move an IP address from one box to the other. clients connect to this virtual IP and then pgpool will distribute the connections to both systems from there. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Understanding histograms
On Wed, 2008-04-30 at 10:43 -0400, Tom Lane wrote: Instead I would expect an estimate of rows=0 for values of const that are not in the MCV list and not in the histogram. Surely that's not very sane? The MCV list plus histogram generally don't include every value in the table. IIRC the estimate for values not present in the MCV list is (1 - sum(MCV frequencies)) divided by (n_distinct - number of MCV entries), which amounts to assuming that all values not present in the MCV list occur equally often. The weak spot of course is that the n_distinct estimate may be pretty inaccurate. My understanding of Len's question is that, although the MCV list plus the histogram don't include every distinct value in the general case, they do include every value in the specific case where the histogram is not full. Essentially, this seems like using the histogram to extend the MCV list such that, together, they represent all distinct values. This idea only seems to help when the number of distinct values is greater than the max size of MCVs, but less than the max size of MCVs plus histogram bounds. I'm not sure how much of a gain this is, because right now that could be accomplished by increasing the statistics for that column (and therefore all of your distinct values would fit in the MCV list). Also the statistics aren't guaranteed to be perfectly up-to-date, so an estimate of zero might be risky. Regards, Jeff Davis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Understanding histograms
Jeff Davis [EMAIL PROTECTED] writes: On Wed, 2008-04-30 at 10:43 -0400, Tom Lane wrote: Surely that's not very sane? The MCV list plus histogram generally don't include every value in the table. My understanding of Len's question is that, although the MCV list plus the histogram don't include every distinct value in the general case, they do include every value in the specific case where the histogram is not full. I don't believe that's true. It's possible that a small histogram means that you are seeing every value that was in ANALYZE's sample, but it's a mighty long leap from that to the assumption that there are no other values in the table. In any case that seems more an artifact of the implementation than a property the histogram would be guaranteed to have. ... the statistics aren't guaranteed to be perfectly up-to-date, so an estimate of zero might be risky. Right. As a matter of policy we never estimate less than one matching row; and I've seriously considered pushing that up to at least two rows except when we see that the query condition matches a unique constraint. You can get really bad join plans from overly-small estimates. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Understanding histograms
Tom Lane [EMAIL PROTECTED] writes: Right. As a matter of policy we never estimate less than one matching row; and I've seriously considered pushing that up to at least two rows except when we see that the query condition matches a unique constraint. You can get really bad join plans from overly-small estimates. This is something that needs some serious thought though. In the case of partitioned tables I've seen someone get badly messed up plans because they had a couple hundred partitions each of which estimated to return 1 row. In fact of course they all returned 0 rows except the correct partition. (This was in a join so no constraint exclusion) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Please ignore ...
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Someone on this list has one of those 'confirm your email' filters on their mailbox, which is bouncing back messages ... this is an attempt to try and narrow down the address that is causing this ... - -- Marc G. FournierHub.Org Hosting Solutions S.A. (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.8 (FreeBSD) iEYEARECAAYFAkgZRAAACgkQ4QvfyHIvDvNHrwCcDdlkjAXSyfyOBa5vgfLVOrSb JyoAn005bSbY6lnyjGmlOQzj7fSMNSKV =n5PC -END PGP SIGNATURE- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Understanding histograms
Gregory Stark [EMAIL PROTECTED] writes: This is something that needs some serious thought though. In the case of partitioned tables I've seen someone get badly messed up plans because they had a couple hundred partitions each of which estimated to return 1 row. In fact of course they all returned 0 rows except the correct partition. (This was in a join so no constraint exclusion) Yeah, one of the things we need to have a serious partitioning solution is to get the planner's estimation code to understand what's happening there. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Please ignore ...
On Thu, 01 May 2008 01:16:00 -0300 Marc G. Fournier [EMAIL PROTECTED] wrote: Someone on this list has one of those 'confirm your email' filters on their Argh! Why do people think that it is OK to make their spam problem everyone else's problem? Whenever I see one of those I simply blackhole the server sending them. People, please, I know the spam you get isn't your fault but it isn't my fault either. You clean up your mailbox and I'll clean up mine. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance