Re: [HACKERS] Incremental Backup Script
-Original Message- From: Zach Bagnall [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 04, 2006 4:42 AM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Incremental Backup Script Gregor: can you explain how to identify the current file? I had implemented a backup and restore script for PITR but stumbled at this point. The page above does not specify how this is to be done. I have found a way from http://archives.postgresql.org/pgsql-admin/2005-10/msg00059.php. I have not tried it, but it seems straightforward. I appreciate the addition of PITR - it's better than nothing (nothing being full dumps) in some respects. Ideally, we need to be able to dump deltas for a single database. Yes, it is not a replacement for an incremental backup, especially due to the fact that WALs may be orders of magnitude larger than a delta. In practice, restoration using the PITR method is awkward. Yes, what I am planning to do: 1) drop the corrupted database 2) restore the base backup 3) replay all incremental backups (in the sense of my original mail) Gregor Zeitlinger LUCAS Product Development Torex Retail Solutions GmbH Schwedenstr. 9, D-13359 Berlin Tel. +49 (0) 30 49901-243 Fax +49 (0) 30 49901-139 Mailto:[EMAIL PROTECTED] http://www.torexretail.de ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] psql readline win32
On Jan 2, 2006, at 4:00 AM, Magnus Hagander wrote: Me, I'm not fully happy with psql on win32. I want my tab completion! (which the gui tools don't do either, from what I can tell. At least pgadmin doesn't. Yet.) Mine has tab completion adapted from psql :). There are also commands for specific completion types, e.g. complete table, complete function, etc. I hope to have a beta out soon with 8.1 psql and updated tab completion for the new commands (roles, etc). John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] psql readline win32
John DeSoi schrieb: On Jan 2, 2006, at 4:00 AM, Magnus Hagander wrote: Me, I'm not fully happy with psql on win32. I want my tab completion! (which the gui tools don't do either, from what I can tell. At least pgadmin doesn't. Yet.) Mine has tab completion adapted from psql :). There are also commands for specific completion types, e.g. complete table, complete function, etc. I hope to have a beta out soon with 8.1 psql and updated tab completion for the new commands (roles, etc). Great! I once experimented with dropdowns in textarea too but lost grip a bit. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] postmaster/postgres options assimilation plan
Here's the plan for assimilating the command-line options of the postmaster and postgres options. I reported earlier on a couple of conflict areas; here is the full plan: * Remove: postmaster -a -b -m -M These options have done nothing forever. * postmaster options added to postgres: -h -i -k -l -n These options will not have any useful effects, but their behavior is consistent if you do, say, SHOW listen_addresses. * postgres options added to postmaster: -e -E -f -O -P -t -W Using -E with postmaster does nothing, though. * Renamed options (because of conflicts): postgres -N is now postgres -j (mostly internal use) postgres -o is now postgres -r (mostly internal use) (postmaster -o is obsolete but still works for compatibility; postgres -o will get you an error.) postgres -p is now postgres -y (internal use only) postmaster -S now sets work_mem, like postgres -S does. The (deprecated) silent mode can be obtained using the long option --silent-mode=on. postmaster -s is now postmaster -T (expert/hacker use only) For the options -O, -P, and -W I have added three new GUC variables allow_system_table_mods (PGC_SIGHUP), ignore_system_indexes (PGC_BACKEND), connection_startup_delay (PGC_BACKEND); mostly to simplify the postmaster-postgres communication. Except for a few odd exceptions, all command line arguments now map to setting a GUC variable. Comments? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] postmaster/postgres options assimilation plan
Peter Eisentraut [EMAIL PROTECTED] writes: Here's the plan for assimilating the command-line options of the postmaster and postgres options. ... * postmaster options added to postgres: -h -i -k -l -n These options will not have any useful effects, but their behavior is consistent if you do, say, SHOW listen_addresses. Did you mean -N rather than -n here? For the options -O, -P, and -W I have added three new GUC variables allow_system_table_mods (PGC_SIGHUP), ignore_system_indexes (PGC_BACKEND), connection_startup_delay (PGC_BACKEND); mostly to simplify the postmaster-postgres communication. Might as well let allow_system_table_mods be PGC_BACKEND too. The dangers it poses are not made less by turning it on in all backends at the same time. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Improving N-Distinct estimation by ANALYZE
Improving N-Distinct estimation === v1.1 OBJECTIVES Answer these points... - Are there any performance issues that can be directly attributed to mis-estimation of N-Distinct (D) by the ANALYZE command? - If so, can we do better than we currently achieve? How? - Would altering the estimate of D cause problems in other places? Comments are sought on the problem and the possible solutions. Please get involved if you can help with detailed analyses on this topic. SUMMARY The estimation of D is difficult and imprecise. The current method works well in many cases, yet breaks down *badly* in one particular very common use case of database design: a large dependent table with a multi-column Primary Key. In some cases the estimate of D *decreases* as the size of the table increases, though the estimate of D is an underestimate in almost all cases, whatever the table design. PostgreSQL cvstip currently seriously under estimates D for very large dependent tables with rows that are clustered around one of the columns of a multi-column Primary Key. The mis-estimation identified can lead to poor system performance should the mis-estimation lead to the use of HashAggregate or general mis-selection of optimal join plans. An example of this is the orders and lineitem tables from DBT-3/TPC-H, which have a 1:M relationship. There are M lineitems associated with each order and all are inserted in one transaction into lineitem. If M is relatively large, then problems may ensue. A problem SQL statement may be something like: SELECT l_orderkey, sum(l_extendedprice) from lineitem; This issue could have an impact on any large table in a 1:M relationship where the actual number of distinct values, D, is much larger than the sample size, n. (D n). This can also effect associative tables where more than one 1:M relationship exists to separate tables, such as Fact tables in a star schema. The issue is alleviated by setting the column statistics target higher, though this merely increases the size range of the table over which problems may occur. There are a number of ways we can improve on the current estimates, using techniques suggested in later statistical research. Some proposals are made and comments are sought on the problem and the possible solutions. It is possible that we need more than one estimate of D for various purposes. We might potentially need a low estimate of D for use in join planning, whereas a higher estimate to reduce the risk of hash table operations. This approach might be taken initially to allow us to implement improved estimators without throwing out many previously good plans. WHAT WE CURRENTLY DO WITH ANALYZE Notation D = estimate of the number of distinct values (aka n_distinct) N = number of rows in table n = number of rows in sample Sampling method * Fixed sample size, no matter how big table, following Chaudhuri et al's 1998 paper on sample size sufficiency for statistics histograms. * Sample blocks = sample rows = 300 * col stats target Results * Count rows/value for all values observed in sample f1 = number of unique values in sample d = number of values in sample * If f1 == n = assume unique: D = N and scale with N else If f1 == 0 = assume D = d else = apply Haas-Stokes [1998] estimator * If D 10% of N = scale with N [There are a variety of techniques selected from Haas-Stokes [1998], Chaudhuri et al [1998], Vitter and Knuth. Sometimes these authors have discussed the same subject and come up with different answers, so you need to be careful to say which reference you mean when discussing these.] ISSUES 1. Estimation of D; mentioned above and covered in more detail below. (see ESTIMATES OF D FOR DEPENDENT TABLES) 2. The sample size calculation correctly follows Chaudhuri et al [1998] when the number of rows in the table is 1 million. However, smaller tables are overestimated and larger tables are underestimated. The sample size should be multiplied by 2.3 (i.e. ln(10)) for every x10 larger table size. e.g. a 100 million row table requires sample size 4.6 times larger to have the same accuracy for histogram selection. OBSERVATIONS 1. *All* methods of statistical analysis are improved by larger sample fractions. The D estimator method currently in use shows an optimum of accuracy and sample fraction at around 5% of a table, as shown in the author's original paper [Haas Stokes (1998)]. The current implementation's error rates climb higher as table size increases. 2. In terms of I/O, ANALYZE is relatively inefficient, since it uses a row sampling technique rather than a block sampling technique. This would translate directly into a performance drop from large sample ratios, but since we currently use a fixed sample size this problem is not yet visible for larger tables. With a 2GB table, we would typically sample 1% of the blocks, yet around 0.025 - 0.05% of the rows. 3. Large values of statistics target (up to 1000) could cause a
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Simon Riggs [EMAIL PROTECTED] writes: [ ... a large amount of analysis based on exactly one test case ... ] I think you are putting too much emphasis on fixing one case and not enough on considering what may happen in other cases ... In general, estimating n-distinct from a sample is just plain a hard problem, and it's probably foolish to suppose we'll ever be able to do it robustly. What we need is to minimize the impact when we get it wrong. So I agree with the comment that we need to finish the unfinished project of making HashAggregate tables expansible, but I'm dubious about the rest of this. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] psql readline win32
Me, I'm not fully happy with psql on win32. I want my tab completion! (which the gui tools don't do either, from what I can tell. At least pgadmin doesn't. Yet.) Mine has tab completion adapted from psql :). There are also commands for specific completion types, e.g. complete table, complete function, etc. Well, yours ain't free, so it doesn't really help me in this case ;-) And doesn't run on Linux, IIRC. Might help me in other cases, though, I'll keep it in mind for that. I hope to have a beta out soon with 8.1 psql and updated tab completion for the new commands (roles, etc). Sounds nice. //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] psql readline win32
Magnus Hagander said: Me, I'm not fully happy with psql on win32. I want my tab completion! (which the gui tools don't do either, from what I can tell. At least pgadmin doesn't. Yet.) Mine has tab completion adapted from psql :). There are also commands for specific completion types, e.g. complete table, complete function, etc. Well, yours ain't free, so it doesn't really help me in this case ;-) And doesn't run on Linux, IIRC. Might help me in other cases, though, I'll keep it in mind for that. We really need psql working properly, regardless of other clients, if possible. cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Vacuum Blocking A Deleteion - Why?
We have been having a problem with a long running vacuum on a table blocking deletes. We have a log table on which the only activity is an on going series of deletions. The problem is experienced on a slony replicated node, but I don't think that slony is at fault here. My question - why is a vacuum blocking a deletion? Here's the information that I grabbed: tldlogs=# SELECT * from pg_stat_activity order by query_start limit 1; -[ RECORD 1 ]-+-- - - datid | 17142 datname | tldlogs procpid | 1744920 usesysid | 104 usename | vacuser current_query | VACUUM VERBOSE ANALYZE public.log_2; query_start | 2006-01-04 17:43:57.350527+00 tldlogs=# SELECT relation, pid, mode, granted from pg_locks where pid=1744920; relation | pid | mode | granted --+-+--+- 17578 | 1744920 | ShareUpdateExclusiveLock | t 17578 | 1744920 | ShareUpdateExclusiveLock | t 17578 | 1744920 | AccessExclusiveLock | t | 1744920 | ExclusiveLock| t (4 rows) tldlogs=# SELECT relation, pid, mode, granted from pg_locks where relation=17578; relation | pid | mode | granted --+-+--+- 17578 | 1425438 | RowExclusiveLock | f 17578 | 1744920 | ShareUpdateExclusiveLock | t 17578 | 1744920 | ShareUpdateExclusiveLock | t 17578 | 1744920 | AccessExclusiveLock | t (4 rows) orglogs=# SELECT relname from pg_class where oid=17578; relname --- log_2 (1 row) current_query is truncated before the relevant delete from log_2 tldlogs=# SELECT * from pg_stat_activity where procpid=1425438; -[ RECORD 1 ]-+ datid | 17142 datname | tldlogs procpid | 1425438 usesysid | 103 usename | slony current_query | insert into public.log_1 (trans_on,trans_by,client_trid,server_trid,object,command,xml,rc,server_id,_rserv_ts) values ('2006-01-04 19:18:56+00','1234','abc123','987654321','1','4','epp xmlns=urn:ietf:params:xml:ns:epp-1.0 xml query_start | 2006-01-04 19:19:05.837785+00 tldlogs=# SELECT * from pgstattuple('log_2'); -[ RECORD 1 ]--+ table_len | 33264033792 tuple_count| 9464035 tuple_len | 6896823100 tuple_percent | 20.73 dead_tuple_count | 33165 dead_tuple_len | 20542608 dead_tuple_percent | 0.06 free_space | 26050593840 free_percent | 78.31 gdb backtrace of the vacuum process. prompt$ gdb attach 1744920 GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as powerpc-ibm-aix5.1.0.0...attach: A file or directory in the path name does not exist.. Attaching to process 1744920 [Switching to Thread 1] 0xd01f1a74 in read () from /usr/lib/libc.a(shr.o) (gdb) bt #0 0xd01f1a74 in read () from /usr/lib/libc.a(shr.o) #1 0x10039098 in FileRead (file=992, buffer=0x303c8480 , amount=537051144) at fd.c:963 #2 0x100abc40 in mdread (reln=0x203931e0, blocknum=3583554, buffer=0x303c8480 ) at md.c:433 #3 0x100aaa64 in smgrread (which=-1, reln=0x203931e0, blocknum=3583554, buffer=0x Address 0x out of bounds) at smgr.c:316 #4 0x1002bac0 in ReadBufferInternal (reln=0x203931e0, blockNum=3583554, bufferLockHeld=0 '\0') at bufmgr.c:230 #5 0x1019c550 in count_nondeletable_pages (onerel=0x203931e0, vacrelstats=0x20325bc8) at vacuumlazy.c:901 #6 0x1019c2b0 in lazy_truncate_heap (onerel=0x203931e0, vacrelstats=0x20325bc8) at vacuumlazy.c:795 #7 0x1019b120 in lazy_vacuum_rel (onerel=0x203931e0, vacstmt=0x) at vacuumlazy.c:181 #8 0x10194694 in vacuum_rel (relid=540619232, vacstmt=0x20323a78, expected_relkind=-83 '\u') at vacuum.c:855 #9 0x10193af4 in vacuum (vacstmt=0x20323a78) at vacuum.c:290 #10 0x1017af84 in ProcessUtility (parsetree=0x20323a78, dest=0x203238f0, completionTag=0x2ff1fa30 ) at utility.c:831 #11 0x101758fc in PortalRunUtility (portal=0x20327b10, query=0x20323ab8, dest=0x203238f0, completionTag=0x2ff1fa30 ) at pquery.c:772 #12 0x10175c54 in PortalRunMulti (portal=0x20327b10, dest=0x203238f0, altdest=0x203238f0, completionTag=0x2ff1fa30 ) at pquery.c:836 #13 0x101751f4 in
Re: [HACKERS] Vacuum Blocking A Deleteion - Why?
Brad Nicholson [EMAIL PROTECTED] writes: We have been having a problem with a long running vacuum on a table blocking deletes. We have a log table on which the only activity is an on going series of deletions. The problem is experienced on a slony replicated node, but I don't think that slony is at fault here. My question - why is a vacuum blocking a deletion? The problem is evidently that it's got an exclusive lock on the table: 17578 | 1744920 | AccessExclusiveLock | t which is entirely unexpected behavior for plain VACUUM in any release later than 7.1. I have to think that slony does have something to do with it ... would suggest asking the slony folk about this. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Tom, In general, estimating n-distinct from a sample is just plain a hard problem, and it's probably foolish to suppose we'll ever be able to do it robustly. What we need is to minimize the impact when we get it wrong. Well, I think it's pretty well proven that to be accurate at all you need to be able to sample at least 5%, even if some users choose to sample less. Also I don't think anyone on this list disputes that the current algorithm is very inaccurate for large tables. Or do they? While I don't think that we can estimate N-distinct completely accurately, I do think that we can get within +/- 5x for 80-90% of all cases, instead of 40-50% of cases like now. We can't be perfectly accurate, but we can be *more* accurate. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] back-patching locale environment fix
I'm looking at back-patching these fixes into the release branches: http://archives.postgresql.org/pgsql-committers/2005-12/msg00479.php http://archives.postgresql.org/pgsql-committers/2006-01/msg00017.php This is mostly pretty straightforward, except that the patch depends on having unsetenv(), which we don't currently have port support for before 8.0. So I'll need to stick src/port/unsetenv.c into the 7.4 and 7.3 branches and add the appropriate configure tests. This makes the patch a bit larger than I would like, but I don't see a whole lot of alternative: plperl is simply hazardous to the health of your database without this change. Comments, objections? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
On Wed, Jan 04, 2006 at 07:10:29PM +, Simon Riggs wrote: 3. We should also apply multi-column heuristics to the estimation of D, once we have estimated all columns. For column groups (pairs, triples etc) that form part of a PK, we know that it must be true that D1 * D2 ... Dk = N. In many cases we will be very confident of our estimate of D when we decide = d. i.e. When we have two columns, we can use this to infer that D1 = N/d when D2 = d. So we can do this in any case where we have confident estimates of all but one column; the required information is available at that time. e.g. if line_item primary key ( l_orderkey, l_linenumber ) and we know that there are at most 10 l_linenumber values in the table, then there should be N/10 values for l_orderkey, so set it to that if it is lower (only). Sorry if I'm pointing out the obwious, but I would do this for any unique index, not just a PK. (It should still hold for any unique index, right?) Also, was an approach of sampling random rows within random blocks considered? Something like: until row sample size reached read random block sample x% of rows in that block randomly done -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Simon, - Are there any performance issues that can be directly attributed to mis-estimation of N-Distinct (D) by the ANALYZE command? Yes. There's at least one query (maybe two) from TPC-H which bombs because of bad N-distinct estimation, even with stats_target =1000. Based on my experience with data warehouses, this also occurs in the field. - If so, can we do better than we currently achieve? How? Replace the current algorithm and broaden the sample. - Would altering the estimate of D cause problems in other places? Unlike Index Cost Estimation, I wouldn't expect it to. We make pretty proper use of D right now, it's just that for some common cases our estimates of D are bad. The estimation of D is difficult and imprecise. The current method works well in many cases, yet breaks down *badly* in one particular very common use case of database design: a large dependent table with a multi-column Primary Key. In some cases the estimate of D *decreases* as the size of the table increases, though the estimate of D is an underestimate in almost all cases, whatever the table design. Actually, the current estimator underestimates D for *any* large table's high-cardinality columns, primary key, multi-column, or not. Chaudhuri's calculation seems to be designed to yield the smallest number of cardinal values that could reasonably be expected to yield the provided sample. That is, if the estimate range within a stdev of 2.0 gives from 50,000 to 500,000 distinct values, Chaudhuri picks 50,000. This conservative approach makes sense when you're only considering join strategies. That is, given an unreliable estimate you want to estimate D low so that you don't wrongly choose a nested loop, the cost for which mistake being much higher than the cost of performing an unnecessary hash join. It's conservative in that sense. However, PostgreSQL now has a whole set of hash operations and other query types for which a too-low estimate of D causes query lockup. So for these operations, Chaudhuri ceases to be conservative and becomes high-risk. FWIW, my testing with TPCH showed that estimate error is usually OK within +/- 5x. Beyond that any you start to get bad query plans. (yes, I know all of the above begs examples. I'm swamped. I believe I posted examples when I first started talking about n-distinct estimation a year ago) So I think it's vital that we look at algorithms designed to deliver us the median estimated D, not the lowest reasonable, in addition to increasing sample size. The block-based estimator functions which Andrew and I looked at seem designed to do that provided a sample of between 1% and 10%. 1. *All* methods of statistical analysis are improved by larger sample fractions. The D estimator method currently in use shows an optimum of accuracy and sample fraction at around 5% of a table, as shown in the author's original paper [Haas Stokes (1998)]. The current implementation's error rates climb higher as table size increases. I read 5 different papers on ACM about sampling D. All of them were united in saying that you couldn't get even slightly accurate estimates with less than 3% sampling. 2. In terms of I/O, ANALYZE is relatively inefficient, since it uses a row sampling technique rather than a block sampling technique. This would translate directly into a performance drop from large sample ratios, but since we currently use a fixed sample size this problem is not yet visible for larger tables. With a 2GB table, we would typically sample 1% of the blocks, yet around 0.025 - 0.05% of the rows. This woudl be a reason to use block-sampling ONLY, rather than hybrid sampling. 3. Large values of statistics target (up to 1000) could cause a number of problems with statistics catalog table growth (mentioned on -perform). Setting these values appropriately can take significant effort. Automatic scaling of such parameters is desirable. Well, decoupling the n-distinct sample size from the # of heuristics rows would fix that. 4. ANALYZE doesn't use more memory if maintenance_work_mem is set high, nor does it use less if it is set low. Actual memory usage isn't measured at all. With very long rows this is max 24 MB with default stats target settings and BLCKSZ, or 2.4 GB with highest stats target (1000). This probably is of lower importance since stats targets are only usually set higher on larger databases, which typically have larger memory configurations anyway - and very long rows are uncommon because of TOAST. Typical memory usage by ANALYZE would be 1 MB with default settings i.e. maybe as low as a 0.01% sample for a very large table. Yeah, I think I pointed out that ANALYZE doesn't seem to be using maintenance_mem a year ago. It should, although there should be options to use less than maintenance_mem if the user wants low-impact analyze. There is a further effect of concern here. We currently apply a lift
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
On Wed, 2006-01-04 at 14:49 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: [ ... a large amount of analysis based on exactly one test case ... ] [Hmmm, those are your opinions, not my words. Funny guy ;-) ] The one test case just happens to be a very common 1:M relationship, an example of which occurs within the TPC-H/DBT-3 test database. I picked that so it was an obvious publicly accessible test case, rather than an isolated customer issue that couldn't be aired fully. I was hoping to allow the problem to be explored and improved. I think you are putting too much emphasis on fixing one case and not enough on considering what may happen in other cases ... It's not just one problem, but knowing that you would accept only a detailed analysis, I researched that one case so that it was indisputable. I'm dubious about the rest of this. Excellent. Much better than It's Wrong. I'll write some code and run some tests. Thanks for reading it all; sorry it had to be so long. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Josh Berkus josh@agliodbs.com writes: Tom, In general, estimating n-distinct from a sample is just plain a hard problem, and it's probably foolish to suppose we'll ever be able to do it robustly. What we need is to minimize the impact when we get it wrong. Well, I think it's pretty well proven that to be accurate at all you need to be able to sample at least 5%, even if some users choose to sample less. Also I don't think anyone on this list disputes that the current algorithm is very inaccurate for large tables. Or do they? I think it's worse than that. It's proven that to get any kind of accuracy in this kind of estimate you basically have to look at the entire table. Someone posted a URL to a paper that had a very clever way of estimating distinct values. It required scanning the entire table but only kept a sample of the values found using a method that guaranteed the sample was representative not of the entire table but of the distinct values. I think you're right that a reasonable sample size for this kind of estimate is going to be proportional to the table size, not the constant sized sample that regular statistics need. However that same paper has some preliminary verbiage about how previous papers had found that the sample sizes needed were unacceptably large. Something like 90%. Unfortunately I can't find the reference to the paper this moment. I'll look again later. I think it would be interesting to get the algorithm for sampling from that paper in. It would only be able to be used on a VACUUM ANALYZE but currently VACUUMs are necessary anyways. I do wonder what would happen when we get the incremental VACUUMs and the bitmaps to avoid vacuuming pages unnecessarily though. Then it would be less useful. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
On Wed, 2006-01-04 at 17:57 -0600, Jim C. Nasby wrote: On Wed, Jan 04, 2006 at 07:10:29PM +, Simon Riggs wrote: 3. We should also apply multi-column heuristics to the estimation of D, once we have estimated all columns. For column groups (pairs, triples etc) that form part of a PK, we know that it must be true that D1 * D2 ... Dk = N. In many cases we will be very confident of our estimate of D when we decide = d. i.e. When we have two columns, we can use this to infer that D1 = N/d when D2 = d. So we can do this in any case where we have confident estimates of all but one column; the required information is available at that time. e.g. if line_item primary key ( l_orderkey, l_linenumber ) and we know that there are at most 10 l_linenumber values in the table, then there should be N/10 values for l_orderkey, so set it to that if it is lower (only). Sorry if I'm pointing out the obwious, but I would do this for any unique index, not just a PK. (It should still hold for any unique index, right?) Yes. It's just a less common case to have 1 unique indexes. Also, was an approach of sampling random rows within random blocks considered? Something like: until row sample size reached read random block sample x% of rows in that block randomly done Yes. I was trying to maintain the existing approach as much as possible, rather than ripping everything out and starting again which could cause just as many problems as it solves. So evolution, rather than revolution. The approach I suggested uses the existing technique for selecting random blocks, then either an exhaustive check on all of the rows in a block or the existing random row approach, depending upon available memory. We need to check all of the rows in a reasonable sample of blocks otherwise we might miss clusters of rows in large tables - which is the source of the problems identified. The other reason was to increase the sample size, which is a win in any form of statistics. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
On Wed, 2006-01-04 at 19:22 -0500, Greg Stark wrote: I think you're right that a reasonable sample size for this kind of estimate is going to be proportional to the table size, not the constant sized sample that regular statistics need. Agreed [I said exactly that in April]; the counter argument at that time was that proportional samples on large tables lead to external sorts in many cases which would lead to unacceptably long run times - since we need to sort the values for each attribute in turn. I've proposed limiting ourselves to maintenance_work_mem (I credit Josh with that idea from April). If you can allocate 1 GB of memory to an ANALYZE then this would be a very large proportion of a medium sized table (or partition). Considering how few rows we sample at the moment, increasing the actual sample size by many 000s would have a very beneficial effect. On Tue, 2005-04-26 at 19:03 -0400, Greg Stark wrote: This one looks *really* good. http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf It does require a single full table scan The Distinct Sampling approach you mention would scan the whole table and also use large in-memory hash tables. So it uses more I/O, the same memory and probably less CPU - no sorting required. The technique is the same implementation as a HashAgg, just one that loses rows in a predictable manner when it spills out of memory. It doesn't identify columns that scale with N, nor does it calculate correlation. Thats the same as re-writing Count(Distinct) to use hashing, which is a TODO item. So perhaps you could plan the code to do the Distinct Sampling approach at the same time. Hmmm. I'll think about that. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Heads up: upcoming back-branch re-releases
Just FYI, the core committee has agreed we need re-releases to fix the locale environment issue and other recent bug fixes. Current thought is to wrap tarballs tomorrow (Thursday) evening, North American eastern time, with public announcement scheduled for Sunday evening or Monday. The longer-than-usual interval is because Dave Page and the other Windows installer team members need a bit of extra time to prepare the Windows version. Any last-minute issues out there? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Sorry to interupt. The discussion is interesting, but I need some help to follow along. On Wednesday 2006-01-04 17:07, Josh Berkus wrote: Simon, - Are there any performance issues that can be directly attributed to mis-estimation of N-Distinct (D) by the ANALYZE command? Yes. There's at least one query (maybe two) from TPC-H which bombs because of bad N-distinct estimation, even with stats_target =1000. Based on my experience with data warehouses, this also occurs in the field. - If so, can we do better than we currently achieve? How? Replace the current algorithm and broaden the sample. Is replace the algorithm the same as saying contextually use some estimate of D that is not Chaudhuri? - Would altering the estimate of D cause problems in other places? Unlike Index Cost Estimation, I wouldn't expect it to. We make pretty proper use of D right now, it's just that for some common cases our estimates of D are bad. The estimation of D is difficult and imprecise. The current method works well in many cases, yet breaks down *badly* in one particular very common use case of database design: a large dependent table with a multi-column Primary Key. In some cases the estimate of D *decreases* as the size of the table increases, though the estimate of D is an underestimate in almost all cases, whatever the table design. Actually, the current estimator underestimates D for *any* large table's high-cardinality columns, primary key, multi-column, or not. Chaudhuri's calculation seems to be designed to yield the smallest number of cardinal values that could reasonably be expected to yield the provided sample. That is, if the estimate range within a stdev of 2.0 gives from 50,000 to 500,000 distinct values, Chaudhuri picks 50,000. This conservative approach makes sense when you're only considering join strategies. That is, given an unreliable estimate you want to estimate D low so that you don't wrongly choose a nested loop, the cost for which mistake being much higher than the cost of performing an unnecessary hash join. It's conservative in that sense. So Chaudhuri's estimate of D is appropriate (and is working) when making decisions about joins. However, PostgreSQL now has a whole set of hash operations and other query types for which a too-low estimate of D causes query lockup. Why? So for these operations, Chaudhuri ceases to be conservative and becomes high-risk. FWIW, my testing with TPCH showed that estimate error is usually OK within +/- 5x. Beyond that any you start to get bad query plans. (yes, I know all of the above begs examples. I'm swamped. I believe I posted examples when I first started talking about n-distinct estimation a year ago) So I think it's vital that we look at algorithms designed to deliver us the median estimated D, not the lowest reasonable, in addition to increasing sample size. The block-based estimator functions which Andrew and I looked at seem designed to do that provided a sample of between 1% and 10%. Do you *really* want the median estimate in these case? Are you certain you do not want something with the opposite behavior of Chaudhuri's estimate so that for small sample sizes the bias is toward a high estimate of D? (Converges on D from the right instead of the left.) Chaudhuri's -D-- needed Estimate estimate 1. *All* methods of statistical analysis are improved by larger sample fractions. The D estimator method currently in use shows an optimum of accuracy and sample fraction at around 5% of a table, as shown in the author's original paper [Haas Stokes (1998)]. The current implementation's error rates climb higher as table size increases. I read 5 different papers on ACM about sampling D. All of them were united in saying that you couldn't get even slightly accurate estimates with less than 3% sampling. These statements are at odds with my admittedly basic understanding of statistics. Isn't the power of a sample more related to the absolute size of the sample than the sample as fraction of the population? Why not just pick a smallish sample size, say about 3000, and apply it to all the tables, even the ones with just a single row (modify appropriately from block sampling). 2. In terms of I/O, ANALYZE is relatively inefficient, since it uses a row sampling technique rather than a block sampling technique. This would translate directly into a performance drop from large sample ratios, but since we currently use a fixed sample size this problem is not yet visible for larger tables. With a 2GB table, we would typically sample 1% of the blocks, yet around 0.025 - 0.05% of the rows. This woudl be a reason to use block-sampling ONLY, rather than hybrid sampling. 3. Large values of statistics target (up to 1000) could cause a number of problems with statistics catalog table growth (mentioned on
Re: [HACKERS] Improving missing FROM-clause entry message
I wrote: I'm thinking about whether we can't improve the message for missing FROM-clause entry to somehow account for situations where the table does exist in the query but it's referenced from an improper place, ... On further investigation, this is arguably a regression in 8.1. Every PG release back to 7.2 has responded to this query with NOTICE: adding missing FROM-clause entry for table a ERROR: JOIN/ON clause refers to a, which is not part of JOIN In 8.1, where add_missing_from defaults to false, you get the first line as an ERROR and so the much-more-useful specific message doesn't appear. I think we need to do something about this. After some thought I've come up with possible ways to handle this. Plan A: when we are about to raise an error in warnAutoRange(), scan the rangetable to see if there are any entries anywhere that could match the specified table name (either by alias or by real table name). If so, don't use the missing FROM-clause entry wording, but instead say something like ERROR: invalid reference to FROM-clause entry for table foo HINT: The entry cannot be referenced from this part of the query. When the match is by real table name and there's an alias, a better HINT might be HINT: You probably should have used the table alias bar. since this would do something useful for the perennial mistake select foo.* from foo f; Plan B: when we are about to raise an error in warnAutoRange(), instead just save the error info in the ParseState struct and keep going. If we get to the end of parsing without detecting any other error, report the missing-FROM error. This would let the specific error message ERROR: JOIN/ON clause refers to a, which is not part of JOIN come out when it's applicable, but not change the behavior otherwise. Plan C: do both. This would give us the most specific error messages possible without major restructuring. A reasonable objection to either Plan A or Plan C is that it will add error strings that are not currently in the translation message files; which wouldn't matter for a HEAD-only patch, but I'd really like to back-patch this into 8.1. Plan B wouldn't change the set of possible messages. If that's not considered a show-stopper, I'd like to go with Plan C. We've certainly got plenty of evidence that this is a confusing error condition, and the more we can do to explain the problem in the error message, the less time will be wasted all around. Comments? Any thoughts about the exact wording of the proposed new messages? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] QNX mention in tuplesort.c
There is a mention of QNX in tuplesort.c. Can that code segment be removed now that QNX is removed? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Improving missing FROM-clause entry message
On Wednesday 04 January 2006 20:37, Tom Lane wrote: A reasonable objection to either Plan A or Plan C is that it will add error strings that are not currently in the translation message files; which wouldn't matter for a HEAD-only patch, but I'd really like to back-patch this into 8.1. Plan B wouldn't change the set of possible messages. No objections here but since I don't use a foreign lang I figure my vote doesn't really matter. I was wondering though if it would be resonable to try and get some language updates into the patch release? -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Improving missing FROM-clause entry message
Robert Treat [EMAIL PROTECTED] writes: No objections here but since I don't use a foreign lang I figure my vote doesn't really matter. I was wondering though if it would be resonable to try and get some language updates into the patch release? With the current re-release plans it'd take awfully quick response from the translators to get that done for 8.1.2. (I'm not even 100% sure I'll get the code patch done for 8.1.2, though I'll try if there's nothing more important on my plate tomorrow.) But it's reasonable to hope that the translations might catch up in 8.1.3 or later. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] QNX mention in tuplesort.c
Bruce Momjian pgman@candle.pha.pa.us writes: There is a mention of QNX in tuplesort.c. Can that code segment be removed now that QNX is removed? I'd leave it there. We don't have any certainty that the behavior being worked around exists only on QNX4, and it's not like it's costing a lot. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Heads up: upcoming back-branch re-releases
On Wed, 4 Jan 2006, Tom Lane wrote: Any last-minute issues out there? Not a last minute issue or a big deal, but I see no reason for this patch not to be applied to back branches. http://archives.postgresql.org/pgsql-patches/2005-12/msg00128.php It fixes this problem: http://archives.postgresql.org/pgsql-bugs/2005-12/msg00048.php Kris Jurka ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Heads up: upcoming back-branch re-releases
Tom Lane said: Just FYI, the core committee has agreed we need re-releases to fix the locale environment issue and other recent bug fixes. Current thought is to wrap tarballs tomorrow (Thursday) evening, North American eastern time, with public announcement scheduled for Sunday evening or Monday. The longer-than-usual interval is because Dave Page and the other Windows installer team members need a bit of extra time to prepare the Windows version. Any last-minute issues out there? I have not had time to test the locale fix on Windows, and don't know when I will. Can anyone else help (Magnus maybe?) cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Improving missing FROM-clause entry message
Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: No objections here but since I don't use a foreign lang I figure my vote doesn't really matter. I was wondering though if it would be resonable to try and get some language updates into the patch release? With the current re-release plans it'd take awfully quick response from the translators to get that done for 8.1.2. (I'm not even 100% sure I'll get the code patch done for 8.1.2, though I'll try if there's nothing more important on my plate tomorrow.) But it's reasonable to hope that the translations might catch up in 8.1.3 or later. Do we have enough time to test the patch before the minor releases? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Improving missing FROM-clause entry message
Bruce Momjian pgman@candle.pha.pa.us writes: Do we have enough time to test the patch before the minor releases? Sure, it's not like it raises any portability issues. As long as it gives a better error message than before in some common cases, it'll be a step forward, even if we think of further improvements later. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Heads up: upcoming back-branch re-releases
Kris Jurka [EMAIL PROTECTED] writes: Not a last minute issue or a big deal, but I see no reason for this patch not to be applied to back branches. http://archives.postgresql.org/pgsql-patches/2005-12/msg00128.php Well, it hasn't been applied to the *front* branch yet ... but I'll take a look. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Simon Riggs [EMAIL PROTECTED] writes: The approach I suggested uses the existing technique for selecting random blocks, then either an exhaustive check on all of the rows in a block or the existing random row approach, depending upon available memory. We need to check all of the rows in a reasonable sample of blocks otherwise we might miss clusters of rows in large tables - which is the source of the problems identified. The other reason was to increase the sample size, which is a win in any form of statistics. Only if your sample is random and independent. The existing mechanism tries fairly hard to ensure that every record has an equal chance of being selected. If you read the entire block and not appropriate samples then you'll introduce systematic sampling errors. For example, if you read an entire block you'll be biasing towards smaller records. I think it would be useful to have a knob to increase the sample size separately from the knob for the amount of data retained in the statistics tables. Though I think you'll be disappointed and find you have to read an unreasonably large sample out of the table before you get more useful distinct estimates. Certainly it's worth testing this in a low impact way like just keeping the existing sample method and dialing up the sample sizes before you try anything that would sacrifice the statistical validity of the more solid estimates. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] catalog corruption bug
On Wed, 21 Dec 2005, Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: We have encountered a very nasty but apparently rare bug which appears to result in catalog corruption. How much of this can you reproduce on 8.1.1? We've fixed a few issues already. We did not see this problem for a while. I upgraded the second gentoo box to show this problem to 8.1.1 basically as soon as the ebuild for it was out. It just started acting up today (but we have not stressed it for a while). It appears to be acting similarly (although corruption which persisted into other backends has not appeared). Here is the error message I currently get on 8.1.1 (names have been changed): DBD::Pg::st execute failed: ERROR: type push_temp already exists CONTEXT: SQL statement CREATE TEMPORARY TABLE push_temp (val text) ON COMMIT DROP PL/pgSQL function push_func line 6 at SQL statement DBD::Pg::st execute failed: ERROR: type push_temp already exists CONTEXT: SQL statement CREATE TEMPORARY TABLE push_temp (val text) ON COMMIT DROP PL/pgSQL function push_func line 6 at SQL statement postgres=# select version(); version -- PostgreSQL 8.1.1 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (GCC) 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8) (1 row) This was built from the gentoo ebuild version 8.1.0 I'd be even more interested if you can reproduce it on a non-gentoo machine. Gentoo is not noted for stability. This one was also on Gentoo, this time ebuild version 8.1.1. They are applying a couple patches it looks like, one of which looks like it just changes some makefile stuff around, and the other appears to add support for the SH platform in s_lock.h. Unfortunately, I don't have any non-gentoo boxes around which are on a par with these two hardware-wise. Also, I think my test cases I tried to come up with were most likely wrong. This code which is currently croaking is basically amounting to 9 processes calling functions which do SELECT, INSERT, SELECT FOR UPDATE, DELETE, and UPDATE, as well as CREATE TEMP TABLE ... ON COMMIT DROP. ON COMMIT DROP is the only kind of temp table that this code uses. I could probably try to re-arrange the code in such a way that I can send it, if that would be helpful (although I wouldn't want to waste the effort if it wouldn't be helpful). Also, what do you figure are the chances of that plperl locale problem causing this? I would guess pretty slim seeing as I am only using ASCII for my schemas, and all of my databases are SQL_ASCII. I am calling plperl functions in both of the projects which are breaking... Also, if I run the command 'locale' all of the things it prints out are either empty or POSIX -- Take your dying with some seriousness, however. Laughing on the way to your execution is not generally understood by less advanced life forms, and they'll call you crazy. -- Messiah's Handbook: Reminders for the Advanced Soul ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Trent, Sorry to interupt. The discussion is interesting, but I need some help to follow along. Thought-out commentary is welcome. Is replace the algorithm the same as saying contextually use some estimate of D that is not Chaudhuri? Yes. I favor a block-based approach like Brutlag, largely because it allows us to increase the sample size without dramatically increasing I/O. So Chaudhuri's estimate of D is appropriate (and is working) when making decisions about joins. Some kinds of joins. It avoids, for example, risky use of nested loops. However, PostgreSQL now has a whole set of hash operations and other query types for which a too-low estimate of D causes query lockup. Why? Two specific examples, both of which I've encountered in the field: 1) too-low D will cause an aggregate query to use a hashagg which is larger than memory resulting in swapping (or disk spill when it's fixed) which makes the query very much slower than if the hashagg was not used. 2) much too-low D will cause the planner to pick a seq scan when it's not necessary, resulting in increased query times. Do you *really* want the median estimate in these case? Are you certain you do not want something with the opposite behavior of Chaudhuri's estimate so that for small sample sizes the bias is toward a high estimate of D? (Converges on D from the right instead of the left.) Chaudhuri's -D-- needed Estimate estimate Hmmm. Yeah, I see what you mean. True, the ideal approach would to deterime for each query operation whether a too-low D or a too-high D was more risky, and then use the more conservative number. However, that would complicate the query planner enough that I think Tom would leave us. :-p These statements are at odds with my admittedly basic understanding of statistics. Isn't the power of a sample more related to the absolute size of the sample than the sample as fraction of the population? Why not just pick a smallish sample size, say about 3000, and apply it to all the tables, even the ones with just a single row (modify appropriately from block sampling). Nope, it's definitely proportional. As a simple example, a sample of 500 rows in a table of 1000 rows should yeild stats estimates with 90%+ accuracy. But a sample of 500 rows in a 600,000,000 row table is so small as to be nearly useless; it's quite possible to get all the same value in a random sample of 0.1% even on a column with a D/N of 0.001. If you look at the papers cited, almost all researchers more recent than Chaudhuri use a proportional sample size. And we're taking the fixed-sample-size approach now, and it's not working very well for large tables. --Josh Berkus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Greg, Only if your sample is random and independent. The existing mechanism tries fairly hard to ensure that every record has an equal chance of being selected. If you read the entire block and not appropriate samples then you'll introduce systematic sampling errors. For example, if you read an entire block you'll be biasing towards smaller records. Did you read any of the papers on block-based sampling? These sorts of issues are specifically addressed in the algorithms. --Josh ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Improving N-Distinct estimation by ANALYZE
Folks, Nope, it's definitely proportional. As a simple example, a sample of 500 rows in a table of 1000 rows should yeild stats estimates with 90%+ accuracy. But a sample of 500 rows in a 600,000,000 row table is so small as to be nearly useless; it's quite possible to get all the same value in a random sample of 0.1% even on a column with a D/N of 0.001. I meant a D/N of 0.1. Sorry. --Josh ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match