[PERFORM] What about utility to calculate planner cost constants?
I was following the cpu_tuple_cost thread and wondering, if it could be possible to make PQA style utility to calculate configuration-specific values for planner cost constants. It could make use of output of log_(statement|parser|planner|executor)_stats, tough I'm not sure if the output contains anything useful for those purposes. Otherwise it could just collect statements, run EXPLAIN ANALYZE for all of them and then play with planner cost constants to get the estimated values as close as possible to actual values. Something like Goal Seek in Excel, if you pardon my reference to MS :). Somewhat similar project seems to be pgautotune from pgFoundry, but it only considers shared_buffers, sort_mem and vacuum_mem. And it seems to use synthetic data instead of actual database and actual statements from log. And it has been inactive for a while. Tambet ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] column name is LIMIT
On 3/14/2005 4:26 AM, Qingqing Zhou wrote: So is it to make SQL parser context-sensitive - say the parser will understand that in statement SELECT * from LIMIT, LIMIT is just a table name, instead of keyword? More or less, yes. To use a reserved keyword as an identifier (table or column name), it must be enclosed in double quotes. Double quotes are also used to make identifiers case sensitive. So select someval, SOMEVAL, someVAL from user; is a valid query retrieving 3 distinct columns from the table user. There is a builtin function quote_ident() in PostgreSQL that is supposed to return a properly quoted string allowed as an identifier for whatever name is passed in. But it fails to do so for all lower case names that are reserved keywords. The queries Slony executes on the replicas are constructed using that quoting function, and therefore Slony fails to build valid SQL for replicated table containing reserved keyword identifiers. One solution would be to brute-force quote all identifiers in Slony ... not sure what the side effects performance wise would be. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] column name is LIMIT
Jan Wieck [EMAIL PROTECTED] writes: There is a builtin function quote_ident() in PostgreSQL that is supposed to return a properly quoted string allowed as an identifier for whatever name is passed in. But it fails to do so for all lower case names that are reserved keywords. Not any more ... regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What about utility to calculate planner cost constants?
Tambet, I was following the cpu_tuple_cost thread and wondering, if it could be possible to make PQA style utility to calculate configuration-specific values for planner cost constants. It could make use of output of log_(statement|parser|planner|executor)_stats, tough I'm not sure if the output contains anything useful for those purposes. Yeah, that's something I need to look at. Otherwise it could just collect statements, run EXPLAIN ANALYZE for all of them and then play with planner cost constants to get the estimated values as close as possible to actual values. Something like Goal Seek in Excel, if you pardon my reference to MS :). That's not really practical. There are currently 5 major query tuning parameters, not counting the memory adjustments which really can't be left out. You can't realistically test all combinations of 6 variables. Somewhat similar project seems to be pgautotune from pgFoundry, but it only considers shared_buffers, sort_mem and vacuum_mem. And it seems to use synthetic data instead of actual database and actual statements from log. And it has been inactive for a while. Yeah, pg_autotune is a dead project. Once we got OSDL able to run tests, we came up with some rules-of-thumb which are more useful than autotune's output. More importantly, the approach doesn't scale to the 15-20 GUCs which we'd realistically want to test. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] What needs to be done for real Partitioning?
Stacy, Luckily they that had the chance to work with a truly fantastic DBA (the author of an Oracle Press performance tuning book even) before they could switch back. He convinced them to make some of their indexes global. Performance dramatically improved (compared with both the unpartitioned schema, and the partitioned-and-locally-indexed schema), and they've since stayed with partitioned tables and a mix of local and global indexes. Hmmm. Wouldn't Greg's suggestion of a bitmap index which holds information on what values are found in what partition also solve this?Without 1/2 of the overhead imposed by global indexes? I can actually see such a bitmap as being universally useful to the partitioning concept ... for one, it would resolve the whole partition on {value} issue. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] What about utility to calculate planner cost constants?
If by not practical you mean, no one has implemented a multivariable testing approach, I'll agree with you. But multivariable testing is definitely a valid statistical approach to solving just such problems. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Mar 21, 2005, at 11:51 AM, Josh Berkus wrote: That's not really practical. There are currently 5 major query tuning parameters, not counting the memory adjustments which really can't be left out. You can't realistically test all combinations of 6 variables. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] What needs to be done for real Partitioning?
On Mon, Mar 21, 2005 at 09:55:03AM -0800, Josh Berkus wrote: Stacy, Luckily they that had the chance to work with a truly fantastic DBA (the author of an Oracle Press performance tuning book even) before they could switch back. He convinced them to make some of their indexes global. Performance dramatically improved (compared with both the unpartitioned schema, and the partitioned-and-locally-indexed schema), and they've since stayed with partitioned tables and a mix of local and global indexes. Hmmm. Wouldn't Greg's suggestion of a bitmap index which holds information on what values are found in what partition also solve this?Without 1/2 of the overhead imposed by global indexes? I can actually see such a bitmap as being universally useful to the partitioning concept ... for one, it would resolve the whole partition on {value} issue. I suspect both will have their uses. I've read quite a bit about global v. local indexs in Oracle, and there are definately cases where global is much better than local. Granted, there's some things with how Oracle handles their catalog, etc. that might make local indexes more expensive for them than they would be for PostgreSQL. It's also not clear how much a 'partition bitmap' index would help. As for the 'seqscan individual partitions' argument, that's not going to work well at all for a case where you need to hit a relatively small percentage of rows in a relatively large number of partitions. SELECT ... WHERE customer_id = 1 would be a good example of such a query (assuming the table is partitioned on something like invoice_date). -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] What needs to be done for real Partitioning?
On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: We probably also need multi-table indexes. As Josh says, that seems antithetical to the main point of partitioning, which is to be able to rapidly remove (and add) partitions of a table. If you have to do index cleaning before you can drop a partition, what's the point of partitioning? Why would you need to do index cleaning first? Presumably the code that goes to check a heap tuple that an index pointed at to ensure that it was visible in the current transaction would be able to recognize if the partition that tuple was in had been removed, and just ignore that index entry. Granted, you'd need to clean the index up at some point (presumably via vacuum), but it doesn't need to occur at partition drop time. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What about utility to calculate planner cost constants?
Thomas, If by not practical you mean, no one has implemented a multivariable testing approach, I'll agree with you. But multivariable testing is definitely a valid statistical approach to solving just such problems. Well, not practical as in: would take either $10 million in equipment or 10,000 hours or both --Josh -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] What about utility to calculate planner cost constants?
Josh Berkus josh@agliodbs.com writes: Otherwise it could just collect statements, run EXPLAIN ANALYZE for all of them and then play with planner cost constants to get the estimated values as close as possible to actual values. Something like Goal Seek in Excel, if you pardon my reference to MS :). That's not really practical. There are currently 5 major query tuning parameters, not counting the memory adjustments which really can't be left out. You can't realistically test all combinations of 6 variables. I don't think it would be very hard at all actually. It's just a linear algebra problem with a bunch of independent variables and a system of equations. Solving for values for all of them is a straightforward problem. Of course in reality these variables aren't actually independent because the costing model isn't perfect. But that wouldn't be a problem, it would just reduce the accuracy of the results. What's needed is for the explain plan to total up the costing penalties independently. So the result would be something like 1000 * random_page_cost + 101 * sequential_page_cost + 2000 * index_tuple_cost + ... In other words a tuple like 1000,101,2000,... And explain analyze would produce the above tuple along with the resulting time. Some program would have to gather these values from the log or stats data and gather them up into a large linear system and solve for values that minimize the divergence from the observed times. (costs penalties are currently normalized to sequential_page_cost being 1. That could be maintained, or it could be changed to be normalized to an expected 1ms.) (Also, currently explain analyze has overhead that makes this impractical. Ideally it could subtract out its overhead so the solutions would be accurate enough to be useful) -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match