Re: [HACKERS] generic options for explain
David Fetter wrote: On Mon, May 25, 2009 at 11:02:53AM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: This is all much more complicated than what I proposed, and I fail to see what it buys us. I'd say that you're just reinforcing the point I made upthread, which is that insisting that XML is the only way to get more detailed information will just create a cottage industry of beating that XML output format into submission. The impression I have is that (to misquote Churchill) XML is the worst option available, except for all the others. We need something that can represent a fairly complex data structure, easily supports addition or removal of particular fields in the structure (including fields not foreseen in the original design), is not hard for programs to parse, and is widely supported --- ie, not hard includes you don't have to write your own parser, in most languages. How many realistic alternatives are there? JSON for one, and it's *much* lighter in just about every way. Cheers, David. For what it's worth, if this revised output form is destined for consumption by a machine, it really doesn't matter what protocol is used and how 'readable' it is by humans, as long as the protocol can express all the present and anticipated variations of the data without breaking parsers along the way. While building the Visual Planner tool, I selected XML output for no other reason than it was easy to parse on the receiving end and was hierarchical, making it perfect for representing a plan tree - or thousands. I'm sure other alternatives would have been fine as well. But, once that decision was made, I never had any reason again to look at the XML stream. If we're worried about the excess 'weight' of XML, I found this to be a non-issue in practice. The output generated by the Visual-Planner-Enabled Postgres server contains MUCH more information that one would typically see with standard EXPLAIN. The tool returns not only the most-optimal plan, but all discarded plans as well. A four way join results in output of 24k lines of XML. While it parses nearly instantly, the biggest delay is in the network. And, even this is minimal. So, why not put ALL interesting data in the EXPLAIN XML feed? I'm not suggesting for this discussion that we include discarded plans, but that we include every piece of data that may be of interest to folks building connecting tools. The parsers can pick and choose what they use easily and, because the feed isn't positional, won't break when addition data is added. A GUC parameter could govern the data included in this variant of EXPLAIN, but even that seems unnecessary. This approach will allow the standard EXPLAIN to evolve in whatever way pleases the humans without interfering with the machines. Regards, Tom Raney -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner question
Tom Lane wrote: Tom Raney [EMAIL PROTECTED] writes: Why does the planner consider both input variations of each symmetric merge join? The README says there is not a lot of difference between the two options. When are there any differences? The righthand side needs to support mark/restore, the left doesn't; so depending on plan types one way might need a helper Materialize node that the other way doesn't. Also, duplicated values are a bit cheaper to process on the left than the right. regards, tom lane Thank you for the explanation. On a somewhat related issue, I am a bit stumped on the way path keys function. In the following query and debug data, why would an index scan on a single relation contain a path key from a different relation? optimizer/README says, The PathKeys data structure represents what is known about the sort order of the tuples generated by a particular Path. A path's pathkeys field is a list of PathKey nodes, where the n'th item represents the n'th sort key of the result. Why does the index scan for tenk1 include a path key from onek.unique2? Is it implying an equivalence there? -Tom Raney bench=# explain select * from tenk1 JOIN onek ON tenk1.unique2=onek.unique2; RELOPTINFO (tenk1): rows=1 width=244 path list: SeqScan(tenk1) rows=1 cost=0.00..434.00 IdxScan(tenk1) rows=1 cost=0.00..583.25 pathkeys: ((tenk1.unique2, onek.unique2)) --- cheapest startup path: SeqScan(tenk1) rows=1 cost=0.00..434.00 cheapest total path: SeqScan(tenk1) rows=1 cost=0.00..434.00 RELOPTINFO (onek): rows=1000 width=244 path list: SeqScan(onek) rows=1000 cost=0.00..44.00 IdxScan(onek) rows=1000 cost=0.00..72.25 pathkeys: ((tenk1.unique2, onek.unique2)) cheapest startup path: SeqScan(onek) rows=1000 cost=0.00..44.00 cheapest total path: SeqScan(onek) rows=1000 cost=0.00..44.00 RELOPTINFO (tenk1/onek): rows=1000 width=488 path list: MergeJoin(tenk1/onek) rows=1000 cost=0.52..144.24 clauses: tenk1.unique2 = onek.unique2 IdxScan(tenk1) rows=1 cost=0.00..583.25 pathkeys: ((tenk1.unique2, onek.unique2)) IdxScan(onek) rows=1000 cost=0.00..72.25 pathkeys: ((tenk1.unique2, onek.unique2)) NestLoop(tenk1/onek) rows=1000 cost=0.00..1756.96 clauses: tenk1.unique2 = onek.unique2 SeqScan(onek) rows=1000 cost=0.00..44.00 IdxScan(tenk1) rows=1 cost=0.00..1.70 cheapest startup path: NestLoop(tenk1/onek) rows=1000 cost=0.00..1756.96 clauses: tenk1.unique2 = onek.unique2 SeqScan(onek) rows=1000 cost=0.00..44.00 IdxScan(tenk1) rows=1 cost=0.00..1.70 cheapest total path: MergeJoin(tenk1/onek) rows=1000 cost=0.52..144.24 clauses: tenk1.unique2 = onek.unique2 IdxScan(tenk1) rows=1 cost=0.00..583.25 pathkeys: ((tenk1.unique2, onek.unique2)) IdxScan(onek) rows=1000 cost=0.00..72.25 pathkeys: ((tenk1.unique2, onek.unique2)) QUERY PLAN - Merge Join (cost=0.52..144.24 rows=1000 width=488) Merge Cond: (tenk1.unique2 = onek.unique2) - Index Scan using tenk1_unique2 on tenk1 (cost=0.00..583.25 rows=1 width=244) - Index Scan using onek_unique2 on onek (cost=0.00..72.25 rows=1000 width=244) (4 rows) bench=# -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner question
Tom Lane wrote: Tom Raney [EMAIL PROTECTED] writes: Why does the index scan for tenk1 include a path key from onek.unique2? Is it implying an equivalence there? bench=# explain select * from tenk1 JOIN onek ON tenk1.unique2=onek.unique2; Yes, for an example like that the planner knows that tenk1.unique2 and onek.unique2 will have equal values in any valid join row, so it's okay to suppose that a sort by one is the same as a sort by the other. So the pathkey items actually reference sets of variables {tenk1.unique2, onek.unique2} not just individual variables. Thanks. RELOPTINFO (tenk1): rows=1 width=244 path list: SeqScan(tenk1) rows=1 cost=0.00..434.00 IdxScan(tenk1) rows=1 cost=0.00..583.25 pathkeys: ((tenk1.unique2, onek.unique2)) --- cheapest startup path: SeqScan(tenk1) rows=1 cost=0.00..434.00 cheapest total path: SeqScan(tenk1) rows=1 cost=0.00..434.00 Hm, I don't recognize this output format, is it coming from some custom code? Yes, it is. I thought it was easier to read the OPTIMIZER_DEBUG output with the relation names instead of the relation indexes. I will post a patch against CVS HEAD if you think it will help others. -Tom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Planner question
Why does the planner consider both input variations of each symmetric merge join? The README says there is not a lot of difference between the two options. When are there any differences? -Tom Raney -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explain XML patch submitted
Bruce Momjian wrote: Where are we on this patch? I think I submitted the patch before its time. The project opened a big can of worms with the XML output. I'd like to rework it with some of the comments I received. The work there was some prep-work for the planner visualizer I've been working on. I haven't had time to revisit this initial work. -Tom Raney --- [EMAIL PROTECTED] wrote: I just posted a patch addressing the TODO item: Allow EXPLAIN output to be more easily processed by scripts, perhaps XML This is a modified patch originally submitted by Germ?n Po? Caama?o last year. I added the DTD and some other tweaks. I did *not* delve much into the ecpg code, other than mildly modifying prepoc.y by adding the XML and DTD defines. I'm sure more work is required there. And, I did not include Init Plan and Sub Plan in the XML output, which did not fit into the XML in a graceful way. But, that can also be revisited. Regards, Tom Raney -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Planner question
I've been working on a client application (based on the Red Hat Visual Explain tool) to display all plans the planner considers graphically and it does that. But, the trace functionality in the planner is always on (and thus, taking up cycles and resources) whether or not it is requested by the user. My question is: How would I let the planner know when a planner session has been invoked by the explain command? If I can slip a flag into PlannerInfo or PlannerGlobal, that would be perfect. But, I'm a bit stuck on how to get explain context to that point. I don't want to modify the planner() entry function parameter list, unless absolutely necessary. I currently compile with DEBUG_OPTIMIZER - and that is one option - to conditionally compile this functionality, but it would be great if this could run on a lean production system. -Tom Raney -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] Explain XML patch v2
Quoting daveg [EMAIL PROTECTED]: On Wed, Jul 02, 2008 at 09:01:18AM -0700, David Fetter wrote: On Wed, Jul 02, 2008 at 05:57:29PM +0200, Peter Eisentraut wrote: It would also be interesting if EXPLAIN could optionally be a function that returns a datum of type XML, to allow further processing. It would be better to have a function which allows people to plug in their own serialization. A JSON or YAML one, for example, would be much lighter weight on both ends. +1 for either of these. -dg So, this leads me to the idea of assembling the EXPLAIN data internally in an output-neutral data structure. At the very end of processing, one decision statement would decide which output plugin to use for output. Sprinkling XML print statements throughout the code (as currently done in the patch) while functional, is not ideal. And, the escaping of XML content should ideally be done in the serializer anyway. Of course, this realization didn't occur to me until *after* I had spent a bit of time coding up the patch in its current form. Oh well. Thoughts? Regarding the XML datum, in order to support that, will all users need to compile with libxml? Are there any lighter weight solutions to serialize other than libxml? -Tom Raney -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Plan targetlists in EXPLAIN output
I have been working on a project (for GSOC) to retrieve planner/optimizer details. As part of the project, I need machine parsable output. So, I thought I would dust off a patch I found from last year that Germán Caamaño submitted. I didn't see any further activity there so I integrated it into 8.4 and added a DTD. The output below is generated by using the added flag 'XML' to the EXPLAIN command. The DTD probably wouldn't be needed for every output instance and may need its own flag. I am coming up to speed on the planner internals, but it seems like this first EXPLAIN XML concept may have some use. Are there any strong opinions about the XML hierarchy? Is it enough to simply wrap the text output from EXPLAIN with XML tags? -Tom Raney QUERY PLAN --- ?xml version=1.0? !DOCTYPE explain [ !ELEMENT explain (plan+) !ELEMENT plan (table?, cost, qualifier?) !ELEMENT table EMPTY !ELEMENT cost EMPTY !ELEMENT qualifier EMPTY !ATTLIST explain version CDATA #REQUIRED !ATTLIST plan name CDATA #REQUIRED level CDATA#REQUIRED !ATTLIST cost startup CDATA #REQUIRED total CDATA#REQUIRED rows CDATA #REQUIRED width CDATA#REQUIRED !ATTLIST table name CDATA #REQUIRED !ATTLIST qualifier type CDATA #REQUIRED value CDATA #REQUIRED ] explain version=8.4devel plan name=Seq Scan level=0 table name=tenk1/ cost startup=0.00 total=445.00 rows=1 width=244 / /plan /explain (32 rows) Greg Smith wrote: On Thu, 17 Apr 2008, Tom Lane wrote: For debugging the planner work I'm about to do, I'm expecting it will be useful to be able to get EXPLAIN to print the targetlist of each plan node, not just the quals (conditions) as it's historically done. I've heard that some of the academic users of PostgreSQL were hoping to add features in this area in order to allow better using planner internals for educational purposes. It would be nice if that were available for such purposes without having to recompile. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Planner/optimizer tool
I'm scoping a project to display the choices and plans the planner/optimizer considers while PostgreSQL develops a plan. It would be primarily a teaching tool but may be of use to users curious about the inner workings of the planner. This is in contrast with EXPLAIN, which provides information about the *final* plan selected by the planner. This tool would show details about the access methods available at each step and the associated access costs, for example. There is a TODO item: Allow EXPLAIN output to be more easily processed by scripts, perhaps XML which could be a good first step for this project. Is anyone working on a related project? -Tom Raney -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash index todo list item
Kenneth, I just pushed the revised patch (v2!). The revised approach samples the parent relation to estimate the number of tuples rather than performing a complete scan. In my tests, the estimate appears to be accurate, erring on the larger side, which is fine. Tom, That is great. I am looking forward to your patch. After the issues that you needed to address, I think that it would be reasonable to add a few more user settings for the hash index. Fill-factor is too course a knob. The others that I have been considering are: maxtuples - Not really the maximum, but a target value to use for setting up the initial buckets. This would allow you to set it for data loads and avoid the split-n-copy trauma that you are trying to avoid with your new hash build process. If I understand you correctly, I believe we already do this with our current build process, there should not be any splits of the index if we estimated the tuple count correctly. However, what gets you is collisions where lots of overflow pages occur when distinct keys map to the same bucket, or if you have lots of duplicate keys. Because your overall tuple count doesn't exceed the fill factor, no splits occur, but lengthy bucket chains lead to lots of IOs. You touch on this below. multiplicity - Try to capture use cases that would require many overflow pages. In particular, if we discard the normal index page layout we can skip the space overhead of the page pointer and generate a more compact index. Then you could use a few more hash bits to lookup the index entry in the page. How many bits would be determined by this factor. 8-bits would give you 256 sub-pieces that could each hold about 3 entries using the current 4-byte hash, or 2 entries using an 8-byte hash. What do you think? Yes, this is a good direction. If we can increase the number of buckets and reduce the bucket size (either physically or virtually) to allow more direct access without creating a huge index on disk, that would be ideal. But, then if you do have collisions, overflows occur more frequently. I spoke with Neil Conway yesterday at the PG conference here in Portland and he piqued my interest in examining his hash code more closely to see what he has already done in this area. -Tom Cheers, Ken On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote: Kenneth, Great! Yes, we did update the code to use the estimate. I will post the patch with this update. We only saw a very small difference in index build time, but you may when you add many columns to the base relation. With 1 billion tuples, you should start to see the hash index outperform the btree for some equality probes, I would imagine. With a 90% fill factor, the btree would require 4 levels to index that many tuples. If the top two were in memory, there would be 3 IOs needed. I don't think PG supports index only scans, so it will take the extra IO to probe the base relation. The hash may take up to 2 IOs and maybe even less (or maybe more depending on how many overflow buckets there are). It might be interesting to fiddle with the fill factors of each index - hash pages (buckets) default to 75% full. -Tom Tom, I am getting ready to stitch in an updated, simplified version of Neil Conway's hash-only hash index patch. Did you have a chance to update your sizing function to use the planner-like estimate and not a full table scan? I would like to be able to use that when my test table start to have 10^9 entries. If you have not had a chance, I will try and add it myself. Regards, Ken ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Hash index todo list item
Kenneth Marshall wrote: On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote: Kenneth Marshall [EMAIL PROTECTED] writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree index build times and index sizes. ... That is super! (and timely) It is pretty super. I have a few comments to raise but don't take it to be too negative, it sounds like this is a big step forward towards making hash indexes valuable. Firstly in the graphs it seems the index size graph has an exponential x-axis but a linear y-axis. This makes it hard to see what I would expect to be pretty much linear growth. The curves look exponential which would mean linear growth but of course it's hard to tell. Also, the growth in the time chart looks pretty much linear. That seems weird since I would expect there would be a visible extra component since sort times are n-log(n). Perhaps you need to test still larger tables to see that though. In any case it's clear from the results you have there that the change is a positive one and fixes a fundamental problem with the hash index build code. Something else you should perhaps test is indexing something which is substantially larger than hash function output. A hash function is going to make the most sense when indexing something like strings for which you want to avoid the long comparison costs. Especially consider testing this on a UTF8 locale with expensive comparisons (like a CJK locale for example). Note that the bottom line for the problems with hash indexes is that the current implementation doesn't offer any advantages over btree indexes. Hash indexes need to be not only as good of a choice as btree indexes but significantly better a significantly better choice at least for some specific circumstances. Also, if you're going to submit a patch you should check out a copy of the CVS HEAD and work from that. I don't think there are huge differences in the area of hash indexes though. But in most other areas you would be spending quite a bit of time dealing details which have changed since. Finally note that we're in the final throes of the 8.3 feature freeze. Normally any patch submitted now would be held until 8.3 is released and development on 8.4 is started. I could imagine an exception being made for hash indexes since they're so moribund currently but probably not. The flip side of that argument is that there's not much point in making an exception for something which will only be really useful once further work is done in the same area. Although I am very excited about this patch, I do not see any real value in including it in 8.3. As you mention, we need to to have a hash index implementation that outperforms btree in some problem regime and that is currently not the case. I have just recently started the process of gathering ideas and having discussions on various approaches to making hash indexes more performant and we have a number of ideas on which to start our investigation. I do think that this patch will make the testing and evaluation, that will be needed to truly improve the hash index, much much easier. Regards, Ken We're glad to contribute and be a part of Postgres. The patch has been posted to [EMAIL PROTECTED] Speeding up the creation time of hash indexes on non-trivial relations was our goal. This will allow some interesting performance tests of the hash index on very large relations. It may be that the near constant lookup time of the hash index outperforms the Btree index for some large data sets and for certain types of data and distributions. Sincerely, Tom Raney ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Hash index todo list item
We are pleased to announce an upcoming patch to the hash index code which improves build time and index size, based on this item in the TODO list: During index creation, pre-sort the tuples to improve build speed http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php Using our implementation, build times and index sizes are comparable with btree index build times and index sizes. For example, for a particular 12 million row relation, the 8.2.4 release requires over 2.8 hours to build the hash index. With our patch, the index is built in 80 seconds. Here is the link for a graph, showing a comparative analysis of btree and hash index builds and describing the benchmark data. http://web.cecs.pdx.edu/~raneyt/pg/ We are currently cleaning up the patch and will submit it asap. Regards, Shreya Bhargava [EMAIL PROTECTED] Tom Raney [EMAIL PROTECTED] Kenneth Marshall wrote: Dear PostgreSQL Hackers: After following the hackers mailing list for quite a while, I am going to start investigating what will need to be done to improve hash index performance. Below are the pieces of this project that I am currently considering: 1. Characterize the current hash index implementation against the BTree index, with a focus on space utilization and lookup performance against a collection of test data. This will give a baseline performance test to evaluate the impact of changes. I initially do not plan to bench the hash creation process since my initial focus will be on lookup performance. 2. Evaluate the performance of different hash index implementations and/or changes to the current implementation. My current plan is to keep the implementation as simple as possible and still provide the desired performance. Several hash index suggestions deal with changing the layout of the keys on a page to improve lookup performance, including reducing the bucket size to a fraction of a page or only storing the hash value on the page, instead of the index value itself. My goal in this phase is to produce one or more versions with better performance than the current BTree. 3. Look at build time and concurrency issues with the addition of some additional tests to the test bed. (1) 4. Repeat as needed. This is the rough plan. Does anyone see anything critical that is missing at this point? Please send me any suggestions for test data and various performance test ideas, since I will be working on that first. Regards, Ken Marshall ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster