Re: [HACKERS] generic options for explain

2009-05-25 Thread Tom Raney

David Fetter wrote:

On Mon, May 25, 2009 at 11:02:53AM -0400, Tom Lane wrote:
  

Robert Haas robertmh...@gmail.com writes:


This is all much more complicated than what I proposed, and I fail
to see what it buys us.  I'd say that you're just reinforcing the
point I made upthread, which is that insisting that XML is the
only way to get more detailed information will just create a
cottage industry of beating that XML output format into
submission.
  

The impression I have is that (to misquote Churchill) XML is the
worst option available, except for all the others.  We need
something that can represent a fairly complex data structure, easily
supports addition or removal of particular fields in the structure
(including fields not foreseen in the original design), is not hard
for programs to parse, and is widely supported --- ie, not hard
includes you don't have to write your own parser, in most
languages.  How many realistic alternatives are there?



JSON for one, and it's *much* lighter in just about every way.

Cheers,
David.
  


For what it's worth, if this revised output form is destined for 
consumption by a machine, it really doesn't matter what protocol is used 
and how 'readable' it is by humans, as long as the protocol can express 
all the present and anticipated variations of the data without breaking 
parsers along the way.


While building the Visual Planner tool, I selected XML output for no 
other reason than it was easy to parse on the receiving end and was 
hierarchical, making it perfect for representing a plan tree - or 
thousands.  I'm sure other alternatives would have been fine as well.  
But, once that decision was made, I never had any reason again to look 
at the XML stream. 

If we're worried about the excess 'weight' of XML, I found this to be a 
non-issue in practice.  The output generated by the 
Visual-Planner-Enabled Postgres server contains MUCH more information 
that one would typically see with standard EXPLAIN.  The tool returns 
not only the most-optimal plan, but all discarded plans as well.  A four 
way join results in output of 24k lines of XML.  While it parses nearly 
instantly, the biggest delay is in the network.  And, even this is minimal.


So, why not put ALL interesting data in the EXPLAIN XML feed?  I'm not 
suggesting for this discussion that we include discarded plans, but that 
we include every piece of data that may be of interest to folks building 
connecting tools.  The parsers can pick and choose what they use easily 
and, because the feed isn't positional, won't break when addition data 
is added.  A GUC parameter could govern the data included in this 
variant of EXPLAIN, but even that seems unnecessary.  This approach will 
allow the standard EXPLAIN to evolve in whatever way pleases the humans 
without interfering with the machines.


Regards,

Tom Raney





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner question

2008-09-10 Thread Tom Raney

Tom Lane wrote:

Tom Raney [EMAIL PROTECTED] writes:

Why does the planner consider both input variations of each symmetric merge join?  The 
README says there is not a lot of difference between the two options.  When 
are there any differences?


The righthand side needs to support mark/restore, the left doesn't;
so depending on plan types one way might need a helper Materialize
node that the other way doesn't.  Also, duplicated values are a bit
cheaper to process on the left than the right.

regards, tom lane



Thank you for the explanation.

On a somewhat related issue, I am a bit stumped on the way path keys 
function.


In the following query and debug data, why would an index scan on a 
single relation contain a path key from a different relation?


optimizer/README says, The PathKeys data structure represents what is 
known about the sort order of the tuples generated by a particular 
Path.  A path's pathkeys field is a list of PathKey nodes, where the 
n'th item represents the n'th sort key of the result.


Why does the index scan for tenk1 include a path key from 
onek.unique2?  Is it implying an equivalence there?


-Tom Raney


bench=# explain select * from tenk1 JOIN onek ON 
tenk1.unique2=onek.unique2;


RELOPTINFO (tenk1): rows=1 width=244
path list:
SeqScan(tenk1) rows=1 cost=0.00..434.00
IdxScan(tenk1) rows=1 cost=0.00..583.25
  pathkeys: ((tenk1.unique2, onek.unique2))  ---

cheapest startup path:
SeqScan(tenk1) rows=1 cost=0.00..434.00

cheapest total path:
SeqScan(tenk1) rows=1 cost=0.00..434.00

RELOPTINFO (onek): rows=1000 width=244
path list:
SeqScan(onek) rows=1000 cost=0.00..44.00
IdxScan(onek) rows=1000 cost=0.00..72.25
  pathkeys: ((tenk1.unique2, onek.unique2))

cheapest startup path:
SeqScan(onek) rows=1000 cost=0.00..44.00

cheapest total path:
SeqScan(onek) rows=1000 cost=0.00..44.00

RELOPTINFO (tenk1/onek): rows=1000 width=488
path list:
MergeJoin(tenk1/onek) rows=1000 cost=0.52..144.24
  clauses: tenk1.unique2 = onek.unique2
IdxScan(tenk1) rows=1 cost=0.00..583.25
  pathkeys: ((tenk1.unique2, onek.unique2))
IdxScan(onek) rows=1000 cost=0.00..72.25
  pathkeys: ((tenk1.unique2, onek.unique2))
NestLoop(tenk1/onek) rows=1000 cost=0.00..1756.96
  clauses: tenk1.unique2 = onek.unique2
SeqScan(onek) rows=1000 cost=0.00..44.00
IdxScan(tenk1) rows=1 cost=0.00..1.70

cheapest startup path:
NestLoop(tenk1/onek) rows=1000 cost=0.00..1756.96
  clauses: tenk1.unique2 = onek.unique2
SeqScan(onek) rows=1000 cost=0.00..44.00
IdxScan(tenk1) rows=1 cost=0.00..1.70

cheapest total path:
MergeJoin(tenk1/onek) rows=1000 cost=0.52..144.24
  clauses: tenk1.unique2 = onek.unique2
IdxScan(tenk1) rows=1 cost=0.00..583.25
  pathkeys: ((tenk1.unique2, onek.unique2))
IdxScan(onek) rows=1000 cost=0.00..72.25
  pathkeys: ((tenk1.unique2, onek.unique2))

   QUERY PLAN
-
 Merge Join  (cost=0.52..144.24 rows=1000 width=488)
   Merge Cond: (tenk1.unique2 = onek.unique2)
   -  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..583.25
rows=1 width=244)
   -  Index Scan using onek_unique2 on onek  (cost=0.00..72.25 rows=1000
width=244)
(4 rows)

bench=#

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner question

2008-09-10 Thread Tom Raney

Tom Lane wrote:

Tom Raney [EMAIL PROTECTED] writes:
Why does the index scan for tenk1 include a path key from 
onek.unique2?  Is it implying an equivalence there?


bench=# explain select * from tenk1 JOIN onek ON 
tenk1.unique2=onek.unique2;


Yes, for an example like that the planner knows that tenk1.unique2 and
onek.unique2 will have equal values in any valid join row, so it's okay
to suppose that a sort by one is the same as a sort by the other.  So
the pathkey items actually reference sets of variables
{tenk1.unique2, onek.unique2}
not just individual variables.


Thanks.




RELOPTINFO (tenk1): rows=1 width=244
 path list:
 SeqScan(tenk1) rows=1 cost=0.00..434.00
 IdxScan(tenk1) rows=1 cost=0.00..583.25
   pathkeys: ((tenk1.unique2, onek.unique2))  ---



 cheapest startup path:
 SeqScan(tenk1) rows=1 cost=0.00..434.00



 cheapest total path:
 SeqScan(tenk1) rows=1 cost=0.00..434.00


Hm, I don't recognize this output format, is it coming from some custom
code?


Yes, it is.  I thought it was easier to read the OPTIMIZER_DEBUG 
output with the relation names instead of the relation indexes.  I 
will post a patch against CVS HEAD if you think it will help others.


-Tom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Planner question

2008-09-05 Thread Tom Raney

Why does the planner consider both input variations of each symmetric merge join?  The 
README says there is not a lot of difference between the two options.  When 
are there any differences?

-Tom Raney

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain XML patch submitted

2008-08-15 Thread Tom Raney

Bruce Momjian wrote:

Where are we on this patch?
  


I think I submitted the patch before its time.  The project opened a big 
can of worms with the XML output.  I'd like to rework it with some of 
the comments I received.


The work there was some prep-work for the planner visualizer I've been 
working on.  I haven't had time to revisit this initial work.


-Tom Raney


---

[EMAIL PROTECTED] wrote:
  

I just posted a patch addressing the TODO item:

Allow EXPLAIN output to be more easily processed by scripts, perhaps XML

This is a modified patch originally submitted by Germ?n Po? Caama?o  
last year.  I added the DTD and some other tweaks.


I did *not* delve much into the ecpg code, other than mildly modifying  
prepoc.y by adding the XML and DTD defines.  I'm sure more work is  
required there.


And, I did not include Init Plan and Sub Plan in the XML output,  
which did not fit into the XML in a graceful way.  But, that can also  
be revisited.


Regards,

Tom Raney

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



  



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Planner question

2008-08-12 Thread Tom Raney
I've been working on a client application (based on the Red Hat Visual 
Explain tool) to display all plans the planner considers graphically and 
it does that.  But, the trace functionality in the planner is always on 
(and thus, taking up cycles and resources) whether or not it is 
requested by the user. 

My question is:  How would I let the planner know when a planner session 
has been invoked by the explain command?  If I can slip a flag into 
PlannerInfo or PlannerGlobal, that would be perfect.  But, I'm a bit 
stuck on how to get explain context to that point.  I don't want to 
modify the planner() entry function parameter list, unless absolutely 
necessary. 

I currently compile with DEBUG_OPTIMIZER - and that is one option - to 
conditionally compile this functionality, but it would be great if this 
could run on a lean production system.


-Tom Raney




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Explain XML patch v2

2008-07-04 Thread Tom Raney

Quoting daveg [EMAIL PROTECTED]:


On Wed, Jul 02, 2008 at 09:01:18AM -0700, David Fetter wrote:

On Wed, Jul 02, 2008 at 05:57:29PM +0200, Peter Eisentraut wrote:
 It would also be interesting if EXPLAIN could optionally be a
 function that returns a datum of type XML, to allow further
 processing.

It would be better to have a function which allows people to plug in
their own serialization.  A JSON or YAML one, for example, would be
much lighter weight on both ends.


+1 for either of these.

-dg



So, this leads me to the idea of assembling the EXPLAIN data  
internally in an output-neutral data structure.  At the very end of  
processing, one decision statement would decide which output plugin to  
use for output.  Sprinkling XML print statements throughout the code  
(as currently done in the patch) while functional, is not ideal.  And,  
the escaping of XML content should ideally be done in the serializer  
anyway.


Of course, this realization didn't occur to me until *after* I had  
spent a bit of time coding up the patch in its current form.  Oh well.


Thoughts?

Regarding the XML datum, in order to support that, will all users need  
to compile with libxml?  Are there any lighter weight solutions to  
serialize other than libxml?


-Tom Raney




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Plan targetlists in EXPLAIN output

2008-06-19 Thread Tom Raney
I have been working on a project (for GSOC) to retrieve 
planner/optimizer details.  As part of the project, I need machine 
parsable output.  So, I thought I would dust off a patch I found from 
last year that Germán Caamaño submitted.  I didn't see any further 
activity there so I integrated it into 8.4 and added a DTD.


The output below is generated by using the added flag 'XML' to the 
EXPLAIN command.  The DTD probably wouldn't be needed for every output 
instance and may need its own flag.


I am coming up to speed on the planner internals, but it seems like this 
first EXPLAIN XML concept may have some use.  Are there any strong 
opinions about the XML hierarchy?  Is it enough to simply wrap the text 
output from EXPLAIN with XML tags?


-Tom Raney



QUERY PLAN
---
?xml version=1.0?

!DOCTYPE explain
[
!ELEMENT explain (plan+) 
!ELEMENT plan (table?, cost, qualifier?) 
!ELEMENT table EMPTY 
!ELEMENT cost EMPTY 
!ELEMENT qualifier EMPTY 
!ATTLIST explain
   version CDATA  #REQUIRED 
!ATTLIST plan
   name CDATA #REQUIRED
   level CDATA#REQUIRED 
!ATTLIST cost
   startup CDATA  #REQUIRED
   total CDATA#REQUIRED
   rows CDATA #REQUIRED
   width CDATA#REQUIRED 
!ATTLIST table
   name CDATA #REQUIRED 
!ATTLIST qualifier
   type CDATA #REQUIRED
   value CDATA #REQUIRED 
]

explain version=8.4devel
plan name=Seq Scan level=0
  table name=tenk1/
  cost startup=0.00 total=445.00 rows=1 width=244 /
/plan
/explain
(32 rows)



Greg Smith wrote:

On Thu, 17 Apr 2008, Tom Lane wrote:


For debugging the planner work I'm about to do, I'm expecting it will be
useful to be able to get EXPLAIN to print the targetlist of each plan
node, not just the quals (conditions) as it's historically done.


I've heard that some of the academic users of PostgreSQL were hoping 
to add features in this area in order to allow better using planner 
internals for educational purposes.  It would be nice if that were 
available for such purposes without having to recompile.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Planner/optimizer tool

2008-03-26 Thread Tom Raney
I'm scoping a project to display the choices and plans the 
planner/optimizer considers while PostgreSQL develops a plan.  It would 
be primarily a teaching tool but may be of use to users curious about 
the inner workings of the planner. 

This is in contrast with EXPLAIN, which provides information about the 
*final* plan selected by the planner.   This tool would show details 
about the access methods available at each step and the associated 
access costs, for example.


There is a TODO item: Allow EXPLAIN output to be more easily processed 
by scripts, perhaps XML which could be a good first step for this 
project. 


Is anyone working on a related project?

-Tom Raney



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Tom Raney
Kenneth, I just pushed the revised patch (v2!).  The revised approach 
samples the parent relation to estimate the number of tuples rather than 
performing a complete scan.  In my tests, the estimate appears to be 
accurate, erring on the larger side, which is fine.



Tom,

That is great. I am looking forward to your patch. After the
issues that you needed to address, I think that it would be
reasonable to add a few more user settings for the hash index.
Fill-factor is too course a knob. The others that I have been
considering are:
  



maxtuples - Not really the maximum, but a target value to use
  for setting up the initial buckets. This would allow you to
  set it for data loads and avoid the split-n-copy trauma
  that you are trying to avoid with your new hash build process.
  
If I understand you correctly, I believe we already do this with our 
current build process, there should not be any splits of the index if we 
estimated the tuple count correctly.  However, what gets you is 
collisions where lots of overflow pages occur when distinct keys map to 
the same bucket, or if you have lots of duplicate keys.  Because your 
overall tuple count doesn't exceed the fill factor, no splits occur, but 
lengthy bucket chains lead to lots of IOs.  You touch on this below.

multiplicity - Try to capture use cases that would require many
  overflow pages. In particular, if we discard the normal index
  page layout we can skip the space overhead of the page pointer
  and generate a more compact index. Then you could use a few
  more hash bits to lookup the index entry in the page. How many
  bits would be determined by this factor. 8-bits would give
  you 256 sub-pieces that could each hold about 3 entries using
  the current 4-byte hash, or 2 entries using an 8-byte hash.

What do you think?
  
Yes, this is a good direction.  If we can increase the number of buckets 
and reduce the bucket size (either physically or virtually) to allow 
more direct access without creating a huge index on disk, that would be 
ideal.  But, then if you do have collisions, overflows occur more 
frequently.  I spoke with Neil Conway yesterday at the PG conference 
here in Portland and he piqued my interest in examining his hash code 
more closely to see what he has already done in this area.


-Tom

Cheers,
Ken

On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote:
  

Kenneth,

Great!

Yes, we did update the code to use the estimate.  I will post the patch 
with this update.  We only saw a very small difference in index build time, 
but you may when you add many columns to the base relation. 
With 1 billion tuples, you should start to see the hash index outperform 
the btree for some equality probes, I would imagine.  With a 90% fill 
factor, the btree would require 4 levels to index that many tuples.  If the 
top two were in memory, there would be 3 IOs needed.  I don't think PG 
supports index only scans, so it will take the extra IO to probe the base 
relation.  The hash may take up to 2 IOs and maybe even less (or maybe more 
depending on how many overflow buckets there are).  It might be interesting 
to fiddle with the fill factors of each index - hash pages (buckets) 
default to 75% full. 
-Tom


Tom,

I am getting ready to stitch in an updated, simplified version
of Neil Conway's hash-only hash index patch. Did you have a
chance to update your sizing function to use the planner-like
estimate and not a full table scan? I would like to be able
to use that when my test table start to have 10^9 entries.
If you have not had a chance, I will try and add it myself.

Regards,
Ken

  
  



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

  



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Tom Raney

Kenneth Marshall wrote:

On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
  

Kenneth Marshall [EMAIL PROTECTED] writes:



On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:

  

Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.


...
That is super! (and timely)
  

It is pretty super. I have a few comments to raise but don't take it to be too
negative, it sounds like this is a big step forward towards making hash
indexes valuable.

Firstly in the graphs it seems the index size graph has an exponential x-axis
but a linear y-axis. This makes it hard to see what I would expect to be
pretty much linear growth. The curves look exponential which would mean linear
growth but of course it's hard to tell.

Also, the growth in the time chart looks pretty much linear. That seems weird
since I would expect there would be a visible extra component since sort times
are n-log(n). Perhaps you need to test still larger tables to see that though.

In any case it's clear from the results you have there that the change is a
positive one and fixes a fundamental problem with the hash index build code.

Something else you should perhaps test is indexing something which is
substantially larger than hash function output. A hash function is going to
make the most sense when indexing something like strings for which you want to
avoid the long comparison costs. Especially consider testing this on a UTF8
locale with expensive comparisons (like a CJK locale for example).

Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
circumstances.

Also, if you're going to submit a patch you should check out a copy of the CVS
HEAD and work from that. I don't think there are huge differences in the area
of hash indexes though. But in most other areas you would be spending quite a
bit of time dealing details which have changed since.

Finally note that we're in the final throes of the 8.3 feature freeze.
Normally any patch submitted now would be held until 8.3 is released and
development on 8.4 is started. I could imagine an exception being made for
hash indexes since they're so moribund currently but probably not. The flip
side of that argument is that there's not much point in making an exception
for something which will only be really useful once further work is done in
the same area.




Although I am very excited about this patch, I do not see any real value
in including it in 8.3. As you mention, we need to to have a hash index
implementation that outperforms btree in some problem regime and that is
currently not the case. I have just recently started the process of
gathering ideas and having discussions on various approaches to making
hash indexes more performant and we have a number of ideas on which to
start our investigation. I do think that this patch will make the testing
and evaluation, that will be needed to truly improve the hash index, much
much easier.

Regards,
Ken

  
We're glad to contribute and be a part of Postgres.  The patch has been 
posted to [EMAIL PROTECTED]


Speeding up the creation time of hash indexes on non-trivial relations 
was our goal.  This will allow some interesting performance tests of the 
hash index on very large relations.  It may be that the near constant 
lookup time of the hash index outperforms the Btree index for some large 
data sets and for certain types of data and distributions.


Sincerely,
Tom Raney




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-09-20 Thread Tom Raney

We are pleased to announce an upcoming patch to the hash index code
which improves build time and index size, based on this item in the
TODO list:
During index creation, pre-sort the tuples to improve build speed
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.
For example, for a particular 12 million row relation, the
8.2.4 release requires over 2.8 hours to build the hash index. 
With our patch, the index is built in 80 seconds.

Here is the link for a graph, showing a comparative analysis of
btree and hash index builds and describing the benchmark data.
http://web.cecs.pdx.edu/~raneyt/pg/

We are currently cleaning up the patch and will submit it asap.

Regards,
Shreya Bhargava [EMAIL PROTECTED]
Tom Raney [EMAIL PROTECTED]


Kenneth Marshall wrote:

Dear PostgreSQL Hackers:

After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to improve hash index performance. Below are the pieces of
this project that I am currently considering:

1. Characterize the current hash index implementation against
   the BTree index, with a focus on space utilization and
   lookup performance against a collection of test data. This
   will give a baseline performance test to evaluate the impact
   of changes. I initially do not plan to bench the hash creation
   process since my initial focus will be on lookup performance.

2. Evaluate the performance of different hash index implementations
   and/or changes to the current implementation. My current plan is
   to keep the implementation as simple as possible and still provide
   the desired performance. Several hash index suggestions deal with
   changing the layout of the keys on a page to improve lookup
   performance, including reducing the bucket size to a fraction of
   a page or only storing the hash value on the page, instead of
   the index value itself. My goal in this phase is to produce one
   or more versions with better performance than the current BTree.
   
3. Look at build time and concurrency issues with the addition of

   some additional tests to the test bed. (1)

4. Repeat as needed.

This is the rough plan. Does anyone see anything critical that
is missing at this point? Please send me any suggestions for test
data and various performance test ideas, since I will be working
on that first.

Regards,
Ken Marshall 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster