[HACKERS] A humble request

2009-05-22 Thread Gevik Babakhani

Hi,

For the ones who couldn't attend to How to Get Your PostgreSQL Patch 
Accepted, could someone please make a summary. (Tom? Bruce?)

Thank you.

Regards,
Gevik.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast ALTER TABLE ... ADD COLUMN ... DEFAULT xxx?

2009-05-22 Thread Sam Mason
On Thu, May 21, 2009 at 12:26:08PM -0400, Tom Lane wrote:
 I'd envision it
 as an extra column in pg_attribute, and it would work for any column(s).
 There's nothing to be gained by restricting it.

Yes, when I said first I meant the only thing that needs to be stored
is the first default value for the column.  The code currently assumes
that this first default value is NULL and is hence able to optimize the
case where you adding a NULLable column.  Adding this first default
value to pg_attribute would allow you to break this assumption and
allow the user to have non-NULL default values without complete table
rewrites.

I think the discussion covered this, but I don't think I was
particularly clear in my first message and thought I should attempt to
explain what I was saying.  Other comments about only allowing STABLE
expressions and non-toastable values make sense and were the sorts of
things I thought I would miss.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A humble request

2009-05-22 Thread David Fetter
On Fri, May 22, 2009 at 10:39:13AM +0200, Gevik Babakhani wrote:
 Hi,

 For the ones who couldn't attend to How to Get Your PostgreSQL
 Patch  Accepted, could someone please make a summary. (Tom? Bruce?)
 Thank you.

Video will get posted pretty soon after, and expect tweets, blog
posts, etc. :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A humble request

2009-05-22 Thread Gevik Babakhani

please, please, please

David Fetter wrote:

On Fri, May 22, 2009 at 10:39:13AM +0200, Gevik Babakhani wrote:
  

Hi,

For the ones who couldn't attend to How to Get Your PostgreSQL
Patch  Accepted, could someone please make a summary. (Tom? Bruce?)
Thank you.



Video will get posted pretty soon after, and expect tweets, blog
posts, etc. :)

Cheers,
David.
  



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

2009-05-22 Thread Andres Freund

Hi,

When analyzing the plan of a query I often find myself questioning 
whether an additional index may be sensible, or if it is sensible that a 
SeqScan is used if an index is available.


The current EXPLAIN ANALYZE only shows the number of tuples matching the 
qualifier of an SeqScan Node - for analyzing the above situation it is 
at least equally interesting how many tuples were read and discarded.


Therefore I produced a patch which adds a 'discarded=%f' part to the 
analyze output.
As this is only a RFD the implementation is a bit hackish at the moment 
- the discarded counter is increased in execScan directly instead of a 
helper routine in instrument.c.
Also the discarded count is displayed in other node types as well - for 
some there might be a sensible semantic meaning to it...


Good idea - Bad idea?

Greetings,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE - Patch v1

2009-05-22 Thread Andres Freund

...
From 4865950d8bb66ee29cead3ecb17b07812677bfdf Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 22 May 2009 15:37:13 +0200
Subject: [PATCH] Feature: discarded tuple count display in EXPLAIN ANALYZE

---
 src/backend/commands/explain.c|5 +++--
 src/backend/executor/execScan.c   |9 +
 src/backend/executor/instrument.c |2 ++
 src/include/executor/instrument.h |5 -
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index b334e2b..13644a8 100644
*** a/src/backend/commands/explain.c
--- b/src/backend/commands/explain.c
*** explain_outNode(StringInfo str,
*** 805,815 
  	{
  		double		nloops = planstate-instrument-nloops;
  
! 		appendStringInfo(str,  (actual time=%.3f..%.3f rows=%.0f loops=%.0f),
  		 1000.0 * planstate-instrument-startup / nloops,
  		 1000.0 * planstate-instrument-total / nloops,
  		 planstate-instrument-ntuples / nloops,
! 		 planstate-instrument-nloops);
  	}
  	else if (es-printAnalyze)
  		appendStringInfo(str,  (never executed));
--- 805,816 
  	{
  		double		nloops = planstate-instrument-nloops;
  
! 		appendStringInfo(str,  (actual time=%.3f..%.3f rows=%.0f loops=%.0f, discarded=%.0f),
  		 1000.0 * planstate-instrument-startup / nloops,
  		 1000.0 * planstate-instrument-total / nloops,
  		 planstate-instrument-ntuples / nloops,
! 		 planstate-instrument-nloops,
! 		 planstate-instrument-ntuples_disc / nloops);
  	}
  	else if (es-printAnalyze)
  		appendStringInfo(str,  (never executed));
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 19fa4e6..9592554 100644
*** a/src/backend/executor/execScan.c
--- b/src/backend/executor/execScan.c
***
*** 21,26 
--- 21,27 
  #include executor/executor.h
  #include miscadmin.h
  #include utils/memutils.h
+ #include executor/instrument.h
  
  
  static bool tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc);
*** ExecScan(ScanState *node,
*** 157,162 
--- 158,171 
  		}
  
  		/*
+ 		 * At this point the tuple has to be discarded by qualifier, so
+ 		 * increase discarded tuple count
+ 		 */
+ 		if(node-ps.instrument)
+ 			node-ps.instrument-tuplecount_disc += 1;
+ 
+ 
+ 		/*
  		 * Tuple fails qual, so free per-tuple memory and try again.
  		 */
  		ResetExprContext(econtext);
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index d8d7039..d8ccaae 100644
*** a/src/backend/executor/instrument.c
--- b/src/backend/executor/instrument.c
*** InstrEndLoop(Instrumentation *instr)
*** 86,91 
--- 86,92 
  	instr-startup += instr-firsttuple;
  	instr-total += totaltime;
  	instr-ntuples += instr-tuplecount;
+ 	instr-ntuples_disc += instr-tuplecount_disc;
  	instr-nloops += 1;
  
  	/* Reset for next cycle (if any) */
*** InstrEndLoop(Instrumentation *instr)
*** 94,97 
--- 95,99 
  	INSTR_TIME_SET_ZERO(instr-counter);
  	instr-firsttuple = 0;
  	instr-tuplecount = 0;
+ 	instr-tuplecount_disc = 0;
  }
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 9846f6f..3b567ad 100644
*** a/src/include/executor/instrument.h
--- b/src/include/executor/instrument.h
*** typedef struct Instrumentation
*** 24,39 
  	instr_time	counter;		/* Accumulated runtime for this node */
  	double		firsttuple;		/* Time for first tuple of this cycle */
  	double		tuplecount;		/* Tuples emitted so far this cycle */
  	/* Accumulated statistics across all completed cycles: */
  	double		startup;		/* Total startup time (in seconds) */
  	double		total;			/* Total total time (in seconds) */
  	double		ntuples;		/* Total tuples produced */
  	double		nloops;			/* # of run cycles for this node */
  } Instrumentation;
  
  extern Instrumentation *InstrAlloc(int n);
  extern void InstrStartNode(Instrumentation *instr);
! extern void InstrStopNode(Instrumentation *instr, double nTuples);
  extern void InstrEndLoop(Instrumentation *instr);
  
  #endif   /* INSTRUMENT_H */
--- 24,42 
  	instr_time	counter;		/* Accumulated runtime for this node */
  	double		firsttuple;		/* Time for first tuple of this cycle */
  	double		tuplecount;		/* Tuples emitted so far this cycle */
+ 	double		tuplecount_disc;/* Tuples emitted so far this cycle */
  	/* Accumulated statistics across all completed cycles: */
  	double		startup;		/* Total startup time (in seconds) */
  	double		total;			/* Total total time (in seconds) */
  	double		ntuples;		/* Total tuples produced */
+ 	double		ntuples_disc;	/* Total number of discarded tuples produced */
  	double		nloops;			/* # of run cycles for this node */
  } Instrumentation;
  
  extern Instrumentation *InstrAlloc(int n);
  extern void InstrStartNode(Instrumentation *instr);
! extern void InstrStopNode(Instrumentation *instr,
! 		  

Re: [HACKERS] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

2009-05-22 Thread Heikki Linnakangas

Andres Freund wrote:


When analyzing the plan of a query I often find myself questioning 
whether an additional index may be sensible, or if it is sensible that a 
SeqScan is used if an index is available.


The current EXPLAIN ANALYZE only shows the number of tuples matching the 
qualifier of an SeqScan Node - for analyzing the above situation it is 
at least equally interesting how many tuples were read and discarded.


Therefore I produced a patch which adds a 'discarded=%f' part to the 
analyze output.
As this is only a RFD the implementation is a bit hackish at the moment 
- the discarded counter is increased in execScan directly instead of a 
helper routine in instrument.c.
Also the discarded count is displayed in other node types as well - for 
some there might be a sensible semantic meaning to it...


Good idea - Bad idea?


Isn't the discarded count always equal to (# of rows in table - matched 
tuples)? Seems pretty redundant to me.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

2009-05-22 Thread Andres Freund

Hi,

On 05/22/2009 03:42 PM, Heikki Linnakangas wrote:

Andres Freund wrote:

When analyzing the plan of a query I often find myself questioning
whether an additional index may be sensible, or if it is sensible that
a SeqScan is used if an index is available.

The current EXPLAIN ANALYZE only shows the number of tuples matching
the qualifier of an SeqScan Node - for analyzing the above situation
it is at least equally interesting how many tuples were read and
discarded.
Good idea - Bad idea?

Isn't the discarded count always equal to (# of rows in table - matched
tuples)? Seems pretty redundant to me.

Not for EXISTS(), LIMIT and similar.

Also when looking at more complex plans its quite a nuisance to go 
through all participating tables and do a separate count(*). Especially 
its not your plan but some clients plan etc.


Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch for ja.po

2009-05-22 Thread Tatsuhito Kasahara
Hi.

Here is a asmall patch which fixes bin/psql/po/ja.po.
Now, psql returns the incorrect number of rows through ja.po.
(see following)

=# SELECT relname FROM pg_class limit 3;
   relname
--
 pg_type
 user_mapping_options
 user_mappings
(1 行) - mean 1 rows

best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuh...@oss.ntt.co.jp
diff -cpr Postgresql-CVS-20090522/src/bin/psql/po/ja.po 
ja_po_fix/src/bin/psql/po/ja.po
*** Postgresql-CVS-20090522/src/bin/psql/po/ja.po   2009-05-22 
18:01:22.0 +0900
--- ja_po_fix/src/bin/psql/po/ja.po 2009-05-23 05:08:56.0 +0900
*** msgstr テーブルの内容にセルã‚
*** 1186,1198 
  msgid invalid fout format (internal error): %d
  msgstr 出力フォーマットが無効(内部エラー):%d
  
! #: print.c:2352
  msgid (1 row)
! msgstr (1 行)
! 
! #: print.c:2354
! msgid (%lu rows)
! msgstr (%lu 行)
  
  #: startup.c:222
  msgid %s: could not open log file \%s\: %s\n
--- 1186,1196 
  msgid invalid fout format (internal error): %d
  msgstr 出力フォーマットが無効(内部エラー):%d
  
! #: print.c:2351
  msgid (1 row)
! msgid_plural (%lu rows)
! msgstr[0] (1 行)
! msgstr[1] (%lu 行)
  
  #: startup.c:222
  msgid %s: could not open log file \%s\: %s\n

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 8.5 plpgsql change for named notation: treat word following AS keyword as label v2

2009-05-22 Thread Josh Berkus

On 5/21/09 6:44 PM, Alvaro Herrera wrote:

Josh Berkus wrote:

The patch adds the AS keyword to the plpgsql grammar and doesn't
assign an expression parameter to the sql construct if the scalar
follows the AS keyword.

Would it be possible to also support = as well as as?  I believe
that SQL Server uses = exclusively, and supporting that syntax would
help people port TSQL-based applications.


Would this be best served by implementing PL/TSQL?


Passing variables takes place *outside* of the PL.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 8.5 plpgsql change for named notation: treat word following AS keyword as label v2

2009-05-22 Thread Alvaro Herrera
Josh Berkus wrote:
 On 5/21/09 6:44 PM, Alvaro Herrera wrote:
 Josh Berkus wrote:
 The patch adds the AS keyword to the plpgsql grammar and doesn't
 assign an expression parameter to the sql construct if the scalar
 follows the AS keyword.
 Would it be possible to also support = as well as as?  I believe
 that SQL Server uses = exclusively, and supporting that syntax would
 help people port TSQL-based applications.

 Would this be best served by implementing PL/TSQL?

 Passing variables takes place *outside* of the PL.

Hmm, sure, but I was actually thinking on a higher level ... I mean I've always
thought that there's so much effort directed at porting procedures from TSQL to
plpgsql, but it would be better to just write a dedicated PL.  (Of course,
the set of people who can actually write such a PL is a lot smaller than those
who can write plpgsql).

I mean if all you had to do was change = to as to make your TSQL procedures
supported, you'd be a lot happier than if that was the only work you could
_save_.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] information_schema.columns changes needed for OLEDB

2009-05-22 Thread Konstantin Izmailov
As we discussed at pgcon2009 there are some changes/fixes necessary in
information_schema.columns to allow correct work of applications and
services via OLEDB on Windows. Here are some:

1. data_type field contains types names that are not recognized by MS apps.

Code around: rename types on the fly, e.g.

integer - int

character varying - varchar

character - char

timestamp without time zone - datetime

bytea - image

2. character_maximum_length field

Code around: change value for text abd bytea types

[text] 1073741823

[bytea] 2147483647

3. character_octet_length should always be double of
character_maximum_length (due to Unicode character size on Windows which is
2).

4. datetime_precision field is not always correct

Code around: change value of the fly, e.g. if value is not null then

[numeric] keep the value (ok)

[bigint] set value to 19

all other set to 10

5. numeric_precision_radix field should always be equal to 10

6. datetime_precision field, minor changes

Code around: change value on the fly, e.g.

[date] set value to zero

[datetime] set value to 3


Re: [HACKERS] patch for ja.po

2009-05-22 Thread Alvaro Herrera
Tatsuhito Kasahara wrote:
 Hi.
 
 Here is a asmall patch which fixes bin/psql/po/ja.po.
 Now, psql returns the incorrect number of rows through ja.po.
 (see following)

Thanks.  Please submit to pgtranslation through pgfoundry -- see
http://babel.postgresql.org/

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

2009-05-22 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On 05/22/2009 03:42 PM, Heikki Linnakangas wrote:
 Isn't the discarded count always equal to (# of rows in table - matched
 tuples)? Seems pretty redundant to me.

 Not for EXISTS(), LIMIT and similar.

It doesn't really seem useful enough to justify breaking client-side
code that looks at EXPLAIN output.

This sort of ties into the discussions we have periodically about
allowing EXPLAIN to output XML or some other more-machine-friendly
data format.  The barrier for adding additional output fields would
be a lot lower in such a format.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions applied; some issues remain

2009-05-22 Thread Greg Stark
(quoting more than usual to provide context because this is such an old thread)

On Sat, Oct 4, 2008 at 11:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I've applied the latest version of the CTE patch.  Congratulations on
 making that happen!

 There are still some loose ends that need to be considered, though.

 1. As committed, the patch takes an extremely hard line about WITH
 queries being evaluated independently of the main query and only once
 per main query execution.  This could be seen as a good thing --- it
 provides much more determinism for execution of volatile functions
 within complex queries than was really available in the past.  It could
 also be seen as a bad thing --- in particular, we won't push any
 limiting qualifications from the main query into the WITH queries.
 So for instance

        WITH q AS ( SELECT * FROM foo )
        SELECT * FROM q WHERE key = 42;

 is going to be executed quite inefficiently; it won't use an index on
 foo.key.  I think we don't have much choice about this in the case of
 recursive WITH queries: it would be pretty difficult to determine
 whether pushing a restriction into a recursive WITH would change the
 results incorrectly.  However, for plain non-recursive WITHs it's all
 a matter of definition.  I gather from
 http://www.oracle-developer.net/display.php?id=212
 that Oracle chooses to treat WITH-queries as if they were plain
 sub-selects if they're non-recursive and only referenced once.
 That is, Oracle would rewrite the above into

        SELECT * FROM ( SELECT * FROM foo ) AS q WHERE key = 42;

 and then flatten the sub-select and optimize normally.  It would
 not be hard to make Postgres do the same, but then we would lose
 some guarantees about predictable execution of volatile functions.

 I'm inclined to think that there is no reason to provide two
 different syntaxes to do the same thing, and so having the WITH
 syntax behave like this is okay.  But it could well result in
 performance surprises for people who are used to Oracle.

 Any thoughts on what to do?  One possibility is to flatten only
 if the subquery doesn't contain any volatile functions.


One possibility would be to not flatten the query but find these quals
and copy them onto the cte when planning the cte. So we would still
materialize the result and avoid duplicate execution but only fetch
the records which we know a caller will need. We could even do that
for multiple callers if we join their quals with an OR -- that still
might allow a bitmap index scan.

I'm not sure we will work out with the order of in which the various
phases of analysis are done on the outer query compared to the
subquery.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

2009-05-22 Thread Greg Stark
On Fri, May 22, 2009 at 4:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 It doesn't really seem useful enough to justify breaking client-side
 code that looks at EXPLAIN output.

Fwiw at least pgadmin I don't think would be confused by this. These
tool authors aren't enamoured of fragile assumptions and the
maintenance headaches they cause either.

 This sort of ties into the discussions we have periodically about
 allowing EXPLAIN to output XML or some other more-machine-friendly
 data format.  The barrier for adding additional output fields would
 be a lot lower in such a format.

This is still pretty much true if only for the sheer unscalability of
the amount of data being presented for users to sift through. I do
want us to add a ton more instrumentation into the explain plan and
this is only one small addition. If we add number of hard and soft
i/os, time spent in user and system space, etc the result would be
pretty unreadable and they're at least as important as things like
this.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Greg Smith
Yesterday Jignesh Shah presented his extensive benchmark results comparing 
8.4-beta1 with 8.3.7 at PGCon: 
http://blogs.sun.com/jkshah/entry/pgcon_2009_performance_comparison_of


While most cases were dead even or a modest improvement, his dbt-2 results 
suggest a 15-20% regression in 8.4.  Changing the default_statistics_taget 
to 100 was responsible for about 80% of that regression.  The remainder 
was from the constraint_exclusion change.  That 80/20 proportion was 
mentioned in the talk but not in the slides.  Putting both those back to 
the 8.3 defaults swapped things where 8.4b1 was ahead by 5% instead. 
(Note that all of the later benchmarks in his slides continued to use the 
default parameters, that change was only tested with that specific 
workload)


The situation where the stats target being so low hurts things the most 
are the data warehouse use cases.  Josh Berkus tells me that his latest DW 
testing suggests that the 10-100 increase turns out to be insufficient 
anyway; 400+ is the range you really need that to be in.  I did a quick 
survey of some other community members who work in this space and that 
experience is not unique.  Josh has some early tools that tackle this 
problem by adjusting the stats target only when it's critical--on indexed 
columns for example.  I'm going to work with him to help get those 
polished, and to see if we can replicate some of those cases via a public 
benchmark.


The bump from 10 to 100 was supported by microbenchmarks that suggested it 
would be tolerable.  That doesn't seem to be reality here though, and it's 
questionable whether this change really helps the people who need to fool 
with the value the most.  This sort of feedback is exactly why it made 
sense to try this out during the beta cycle.  But unless someone has some 
compelling evidence to the contrary, it looks like the stats target needs 
to go back to a lower value.  I think the best we can do here is to 
improve the documentation about this parameter and continue to work on 
tuning guides and tools to help people set it correctly.


As for the change to constraint_exclusion, the regression impact there is 
much less severe and the downside of getting it wrong is pretty bad. 
Rather than reverting it, the ideal response to that might be to see if 
it's possible to improve the partition code path.  But as I'm not going 
to volunteer to actually do that, I really don't get a vote here anyway.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 While most cases were dead even or a modest improvement, his dbt-2 results
 suggest a 15-20% regression in 8.4.  Changing the default_statistics_taget
 to 100 was responsible for about 80% of that regression.
...
 The situation where the stats target being so low hurts things the most
 are the data warehouse use cases.

Er...so why should we change our defaults to support data warehousing
users? Certainly the rest of the postgresql.conf settings don't attempt
to do that.

 The bump from 10 to 100 was supported by microbenchmarks that suggested it
 would be tolerable.

No, the 10 to 100 was supported by years of people working in the field who
routinely did that adjustment (and 100) and saw great gains. Also, as the one
who originally started the push to 100, my original goal was to get it over the
magic 99 bump, at which the planner started acting very differently. This
caused a huge performance regression in one of the Postgres releases (don't
remember which one exactly), which severely impacted one of our large clients.

 That doesn't seem to be reality here though, and it's questionable whether
 this change really helps the people who need to fool with the value the most.

The goal of defaults is not to help people who fool with the value - it's to
get a good default out of the box for people who *don't* fool with all the
values. :)

 But unless someone has some compelling evidence to the contrary, it looks like
 the stats target needs to go back to a lower value.

Please don't. This is a very good change, and I don't see why changing it back
because it might hurt people doing DW is a good thing, when most of users are
not doing DW.

 As for the change to constraint_exclusion, the regression impact there is
 much less severe and the downside of getting it wrong is pretty bad.

Similarly, the people who are affected by something like presumably are not
running a default postgresql.conf anyway, so they can toggle it back to squeeze
a little more performance out of their system.

- --
Greg Sabino Mullane g...@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200905221239
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkoW1iEACgkQvJuQZxSWSsh1gACgqHBcwEd0zLsfbZJvCnXywlGp
jZ8AoNn79heFG+iLE2uh6eZ0lxRmwuHR
=/A/F
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Joshua D. Drake
On Fri, 2009-05-22 at 16:43 +, Greg Sabino Mullane wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: RIPEMD160
 
 
  While most cases were dead even or a modest improvement, his dbt-2 results
  suggest a 15-20% regression in 8.4.  Changing the default_statistics_taget
  to 100 was responsible for about 80% of that regression.
 ...
  The situation where the stats target being so low hurts things the most
  are the data warehouse use cases.

Nor is it our primary user base. If we want to do this we need to have
more than one conf as a tmpl.


  That doesn't seem to be reality here though, and it's questionable whether
  this change really helps the people who need to fool with the value the 
  most.
 
 The goal of defaults is not to help people who fool with the value - it's to
 get a good default out of the box for people who *don't* fool with all the
 values. :)

Right. If someone is really doing a DW they are already spending time
with the postgresql.conf.


  But unless someone has some compelling evidence to the contrary, it looks 
  like
  the stats target needs to go back to a lower value.
 
 Please don't. This is a very good change, and I don't see why changing it back
 because it might hurt people doing DW is a good thing, when most of users are
 not doing DW.

I haven't seen any evidence to suggest that Jignesh's findings provide
anything but a single data point in a vast metric of our smallest user
base. Reverting a value based on that seems like a mistake.

 
  As for the change to constraint_exclusion, the regression impact there is
  much less severe and the downside of getting it wrong is pretty bad.
 
 Similarly, the people who are affected by something like presumably are not
 running a default postgresql.conf anyway, so they can toggle it back to 
 squeeze
 a little more performance out of their system.
 

Right.

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Stephen Frost
Greg,

* Greg Sabino Mullane (g...@turnstep.com) wrote:
  While most cases were dead even or a modest improvement, his dbt-2 results
  suggest a 15-20% regression in 8.4.  Changing the default_statistics_taget
  to 100 was responsible for about 80% of that regression.
 ...
  The situation where the stats target being so low hurts things the most
  are the data warehouse use cases.
 
 Er...so why should we change our defaults to support data warehousing
 users? Certainly the rest of the postgresql.conf settings don't attempt
 to do that.

dbt-2 is for OLTP, not for DW.  Greg Smith's comment was actually that
we shouldn't penalize the OLTP crowd (by raising the value) for the
benefit of the DW crowd (who need it higher than 100 anyway).

http://osdldbt.sourceforge.net/

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Tom Lane
Greg Smith gsm...@gregsmith.com writes:
 Yesterday Jignesh Shah presented his extensive benchmark results comparing 
 8.4-beta1 with 8.3.7 at PGCon: 
 http://blogs.sun.com/jkshah/entry/pgcon_2009_performance_comparison_of

 While most cases were dead even or a modest improvement, his dbt-2 results 
 suggest a 15-20% regression in 8.4.  Changing the default_statistics_taget 
 to 100 was responsible for about 80% of that regression.  The remainder 
 was from the constraint_exclusion change.  That 80/20 proportion was 
 mentioned in the talk but not in the slides.  Putting both those back to 
 the 8.3 defaults swapped things where 8.4b1 was ahead by 5% instead. 

Yeah, I saw that talk and I'm concerned too, but I think it's premature
to conclude that the problem is precisely that stats_target is now too
high.  I'd like to see Jignesh check through the individual queries in
the test and make sure that none of them had plans that changed for the
worse.  The stats change might have just coincidentally tickled some
other planning issue.

Assuming that we do confirm that the hit comes from extra stats
processing time during planning, I'd still not want to go all the way
back to 10 as default.  Perhaps we'd end up compromising at something
like 50.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Joshua D. Drake
On Fri, 2009-05-22 at 13:35 -0400, Stephen Frost wrote:
 Greg,

 dbt-2 is for OLTP, not for DW.  Greg Smith's comment was actually that
 we shouldn't penalize the OLTP crowd (by raising the value) for the
 benefit of the DW crowd (who need it higher than 100 anyway).
 

I appear to have completely missed the top part of Greg Smith's original
email. Sorry about that Greg.

We probably need to test this to get some more data points. Of course
that is why we have the performance lab :)

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Jignesh K. Shah



Greg Sabino Mullane wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


  

While most cases were dead even or a modest improvement, his dbt-2 results
suggest a 15-20% regression in 8.4.  Changing the default_statistics_taget
to 100 was responsible for about 80% of that regression.


...
  

The situation where the stats target being so low hurts things the most
are the data warehouse use cases.



Er...so why should we change our defaults to support data warehousing
users? Certainly the rest of the postgresql.conf settings don't attempt
to do that.

  


The test that was greatly impacted is DBT-2 (OLTP Benchmark) with 8.4 
defaults but seems to work fine/better when reverted to use 8.3 
defaults. The 8.4 defaults seemed to improve DBT-3 (Data Warehousing) 
though I haven't retested them with 8.3 defaults in 8.4.   Of course I 
am not a big fan of DBT-2 myself and I am just providing datapoints of 
what I observed during my testing with various workloads. I certainly 
don't claim to understand what is happening (yet).


-Jignesh


The bump from 10 to 100 was supported by microbenchmarks that suggested it
would be tolerable.



No, the 10 to 100 was supported by years of people working in the field who
routinely did that adjustment (and 100) and saw great gains. Also, as the one
who originally started the push to 100, my original goal was to get it over the
magic 99 bump, at which the planner started acting very differently. This
caused a huge performance regression in one of the Postgres releases (don't
remember which one exactly), which severely impacted one of our large clients.

  

That doesn't seem to be reality here though, and it's questionable whether
this change really helps the people who need to fool with the value the most.



The goal of defaults is not to help people who fool with the value - it's to
get a good default out of the box for people who *don't* fool with all the
values. :)

  

But unless someone has some compelling evidence to the contrary, it looks like
the stats target needs to go back to a lower value.



Please don't. This is a very good change, and I don't see why changing it back
because it might hurt people doing DW is a good thing, when most of users are
not doing DW.

  

As for the change to constraint_exclusion, the regression impact there is
much less severe and the downside of getting it wrong is pretty bad.



Similarly, the people who are affected by something like presumably are not
running a default postgresql.conf anyway, so they can toggle it back to squeeze
a little more performance out of their system.

- --
Greg Sabino Mullane g...@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200905221239
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkoW1iEACgkQvJuQZxSWSsh1gACgqHBcwEd0zLsfbZJvCnXywlGp
jZ8AoNn79heFG+iLE2uh6eZ0lxRmwuHR
=/A/F
-END PGP SIGNATURE-


  


--
Jignesh Shah   http://blogs.sun.com/jkshah  
The New Sun Microsystems,Inc   http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread andrew
 Greg Smith gsm...@gregsmith.com writes:
 Yesterday Jignesh Shah presented his extensive benchmark results
 comparing
 8.4-beta1 with 8.3.7 at PGCon:
 http://blogs.sun.com/jkshah/entry/pgcon_2009_performance_comparison_of

 While most cases were dead even or a modest improvement, his dbt-2
 results
 suggest a 15-20% regression in 8.4.  Changing the
 default_statistics_taget
 to 100 was responsible for about 80% of that regression.  The remainder
 was from the constraint_exclusion change.  That 80/20 proportion was
 mentioned in the talk but not in the slides.  Putting both those back to
 the 8.3 defaults swapped things where 8.4b1 was ahead by 5% instead.

 Yeah, I saw that talk and I'm concerned too, but I think it's premature
 to conclude that the problem is precisely that stats_target is now too
 high.  I'd like to see Jignesh check through the individual queries in
 the test and make sure that none of them had plans that changed for the
 worse.  The stats change might have just coincidentally tickled some
 other planning issue.


Wouldn't he just need to rerun the tests with default_stats_target set to
the old value? I presume he has actually done this already in order to
come to the conclusion he did about the cause of the regression.

cheers

andrew



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Tom Lane
and...@dunslane.net writes:
 Wouldn't he just need to rerun the tests with default_stats_target set to
 the old value? I presume he has actually done this already in order to
 come to the conclusion he did about the cause of the regression.

Yeah, he did, so we know it's slower that way.  But exactly *why* it's
slower is not proven.  It could be an artifact rather than something
we really ought to react to.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Tom Lane
Joshua D. Drake j...@commandprompt.com writes:
 We probably need to test this to get some more data points.

Agreed --- DBT2 is just one data point.  We shouldn't assume that it's
definitive.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Tom Lane
Greg Sabino Mullane g...@turnstep.com writes:
 No, the 10 to 100 was supported by years of people working in the
 field who routinely did that adjustment (and 100) and saw great
 gains. Also, as the one who originally started the push to 100, my
 original goal was to get it over the magic 99 bump, at which the
 planner started acting very differently.

That particular issue is gone anyway.

I'm not in a big hurry to revert this change either, but I think
Jignesh's results are sufficient reason to take a closer look at
the decision.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Andrew Dunstan
On Fri, May 22, 2009 2:41 pm, Tom Lane wrote:
 Greg Sabino Mullane g...@turnstep.com writes:
 No, the 10 to 100 was supported by years of people working in the
 field who routinely did that adjustment (and 100) and saw great
 gains. Also, as the one who originally started the push to 100, my
 original goal was to get it over the magic 99 bump, at which the
 planner started acting very differently.

 That particular issue is gone anyway.

 I'm not in a big hurry to revert this change either, but I think
 Jignesh's results are sufficient reason to take a closer look at
 the decision.



We also need more data points just about this test. Does the behaviour
hold for other platforms, and what is the relationship between stats
target and timings (is it linear or is there a sudden jump at some level)?

cheers

andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Josh Berkus

On 5/22/09 2:36 PM, Tom Lane wrote:

and...@dunslane.net writes:

Wouldn't he just need to rerun the tests with default_stats_target set to
the old value? I presume he has actually done this already in order to
come to the conclusion he did about the cause of the regression.


Yeah, he did, so we know it's slower that way.  But exactly *why* it's
slower is not proven.  It could be an artifact rather than something
we really ought to react to.


It appears (right now) to be an artifact.

The drop in performance happens with queries which are called using C 
stored procedures exclusively.  It doesn't show up on other benchmarks 
which call similar queries directly.


Jignesh and I will be testing some stuff next week to get a better idea 
of what exactly makes the drop happen, but for not this appears to be a 
corner case.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] 8.5 plpgsql change for named notation: treat word following AS keyword as label v2

2009-05-22 Thread Pavel Stehule
Hello

2009/5/21 Tom Lane t...@sss.pgh.pa.us:
 Steve Prentice prent...@cisco.com writes:
 This patch is intended to supplement Pavel's patch for named and mixed
 notation support in 8.5. This patch makes it so a plpgsql function can
 call another function with the same parameter names using the named
 parameters notation.

 Well, plpgsql's parsing has always been a kluge, but I think this is
 really taking the kluge level a step too far.  It's only because AS
 is used in so few contexts that this can even pretend to work --- but
 there are still an awful lot of contexts where AS is used, and will
 likely be more in the future.  So I think it's pretty un-future-proof;
 and it certainly won't scale to any other contexts where we might wish
 that plpsql variables don't get substituted.

 It's probably time to bite the bullet and redo the parser as has been
 suggested in the past, ie fix things so that the main parser is used.
 Ideally I'd like to switch the name resolution priority to be more
 Oracle-like, but even if we don't do that it would be a great
 improvement to have actual syntactic knowledge behind the lookups.

I have fast hack, that is based on callback function from
transformExpr - it replace unknown ColumnRef node to Param node. It
works well, but I have two notes:

a) when we use main parser for identification, then valid_sql flag has
different sense. We have to valid all SQL statements - so we have to
change some logic or we have to develop levels of validations. This
have one big advantage - we are able to do complete validation.
Disadvantage - check is necessary before every first run.

b) Change priority (sql identifiers  plpgsql identifiers) will have
two impacts:

** buggy applications will have different behave
** some an table's alters should change function's behave - so minimum
is reset plpgsql cache.

postgres=# create or replace function test(a int) returns int as $$
declare x int; y int; begin return a+20 as a; end; $$ language
plpgsql;
CREATE FUNCTION
Time: 2,485 ms
postgres=# select test(20);
 test
--
   40
(1 row)

Time: 1,473 ms
postgres=# create or replace function test(a int) returns int as $$
declare x int; y int; begin return a+20 as a from a; end; $$ language
plpgsql;
ERROR:  relation a does not exist
LINE 1: SELECT  a+20 as a from a
   ^ -- pointer is correct, look on it
some non proportional font
QUERY:  SELECT  a+20 as a from a
CONTEXT:  SQL statement in PL/PgSQL function test near line 1

Attached patch is VERY UGLY, but it should to show possible direction.
I thing, so some similar should by in 8.5

??? Notes, Comments

Regards
Pavel Stehule





 Just for the record, you'd have to put the same kluge into the T_RECORD
 and T_ROW cases if we wanted to do it like this.

                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

*** ./src/backend/parser/parse_expr.c.orig	2009-05-22 17:12:52.0 +0200
--- ./src/backend/parser/parse_expr.c	2009-05-22 17:44:54.0 +0200
***
*** 458,466 
  	if (refnameRangeTblEntry(pstate, NULL, name1,
  			 cref-location,
  			 levels_up) != NULL)
  		node = transformWholeRowRef(pstate, NULL, name1,
  	cref-location);
! 	else
  		ereport(ERROR,
  (errcode(ERRCODE_UNDEFINED_COLUMN),
   errmsg(column \%s\ does not exist,
--- 458,473 
  	if (refnameRangeTblEntry(pstate, NULL, name1,
  			 cref-location,
  			 levels_up) != NULL)
+ 	{
  		node = transformWholeRowRef(pstate, NULL, name1,
  	cref-location);
! 		break;
! 	}
! 	
! 	if (pstate-p_external_var_hook)
! 		node = (pstate-p_external_var_hook)(pstate, name1, cref-location);
! 	
! 	if (node == NULL)
  		ereport(ERROR,
  (errcode(ERRCODE_UNDEFINED_COLUMN),
   errmsg(column \%s\ does not exist,
*** ./src/include/parser/parse_node.h.orig	2009-05-22 17:33:37.0 +0200
--- ./src/include/parser/parse_node.h	2009-05-22 17:48:12.0 +0200
***
*** 17,22 
--- 17,25 
  #include nodes/parsenodes.h
  #include utils/relcache.h
  
+ /* Hook for PL variable detection and substitution */
+ typedef Node *(*ParseExprExternalVariables_hook_type) (void *pstate, char *name, int location);
+ 
  /*
   * State information used during parse analysis
   *
***
*** 102,107 
--- 105,111 
  	bool		p_is_update;
  	Relation	p_target_relation;
  	RangeTblEntry *p_target_rangetblentry;
+ 	ParseExprExternalVariables_hook_type p_external_var_hook;
  } ParseState;
  
  /* Support for parser_errposition_callback function */
*** ./src/pl/plpgsql/src/gram.y.orig	2009-05-22 17:53:12.0 +0200
--- ./src/pl/plpgsql/src/gram.y	2009-05-22 21:14:03.0 +0200
***
*** 18,23 
--- 18,25 
  
  #include 

[HACKERS] integer overflow in reloption.h

2009-05-22 Thread Zdenek Kotala
When I compile postgresql now I get following message:

../../../src/include/access/reloptions.h, line 45: warning: integer
overflow detected: op 


The problem is on the following lines

typedef enum relopt_kind
{
...
RELOPT_KIND_MAX = (1  31)
}

enum is int datatype and 1  31 == -2147483648. It is reason why
compiler (sun studio) complains.

Is possible to change it to 1  30 to stop compiler generates noise?

Thanks Zdenek





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Greg Smith

On Fri, 22 May 2009, Greg Sabino Mullane wrote:


The bump from 10 to 100 was supported by microbenchmarks that suggested it
would be tolerable.


No, the 10 to 100 was supported by years of people working in the field who
routinely did that adjustment (and 100) and saw great gains.


No one is suggesting the increase isn't important to people running many 
common workloads.  The question the smaller benchmarks tried to answer is 
whether it was likely to detune anything else as a penalty for improving 
that situation.  The comments you made here can get turned right around at 
you:  if increasing the value in the field is sufficient to help out those 
that need it, why should the project at large accept any significant 
penalty that could apply to everyone just to help that subset?


Would you be happy with 8.4 going out the door if there really turns out 
to be a 15% penalty for other use cases by this change?  That's a PR 
nightmare waiting to happen, and the main reason I wanted to bring this up 
here with some additional details as soon as Jignesh's slides went 
public--so everyone here is aware of what's going on before this bit of 
news gets picked up anywhere else.


Hopefully whatever is happening to dbt2 will turn out to be a quirk not 
worth worrying about.  What if it turns out to be repeatable and expected 
to impact people in the field though?  I hope you'd recognize that your 
use case is no more privileged to trump other people's than the changes 
that would be good for DW users, but not anyone else, that you were just 
making critical comments about.


Anyway, thanks to Stephen for concisely clarifying the position I was 
trying to present here, which is quite different from the one you were 
arguing against.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)

2009-05-22 Thread Zdenek Kotala
Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.

I discussed this patch year ago with Alvaro when we tried to cleanup
include bloating problem. It should reduce also number of including.

The main point is that hash functions for datatypes are now in related
data files in utils/adt directory. hash_any() and hash_uint32 it now in
utils/hashfunc.c.

It would be nice to have this in 8.4 because it allows to test index
migration functionality.

Thanks Zdenek

diff -Nrc pgsql_indexcompat.5d4d60e3a557/src/backend/access/hash/hashfunc.c pgsql_indexcompat/src/backend/access/hash/hashfunc.c
*** pgsql_indexcompat.5d4d60e3a557/src/backend/access/hash/hashfunc.c	2009-05-22 15:56:34.409314434 -0400
--- pgsql_indexcompat/src/backend/access/hash/hashfunc.c	1969-12-31 19:00:00.0 -0500
***
*** 1,528 
- /*-
-  *
-  * hashfunc.c
-  *	  Support functions for hash access method.
-  *
-  * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group
-  * Portions Copyright (c) 1994, Regents of the University of California
-  *
-  *
-  * IDENTIFICATION
-  *	  $PostgreSQL: pgsql/src/backend/access/hash/hashfunc.c,v 1.57 2009/01/01 17:23:35 momjian Exp $
-  *
-  * NOTES
-  *	  These functions are stored in pg_amproc.	For each operator class
-  *	  defined for hash indexes, they compute the hash value of the argument.
-  *
-  *	  Additional hash functions appear in /utils/adt/ files for various
-  *	  specialized datatypes.
-  *
-  *	  It is expected that every bit of a hash function's 32-bit result is
-  *	  as random as every other; failure to ensure this is likely to lead
-  *	  to poor performance of hash joins, for example.  In most cases a hash
-  *	  function should use hash_any() or its variant hash_uint32().
-  *-
-  */
- 
- #include postgres.h
- 
- #include access/hash.h
- 
- 
- /* Note: this is used for both char and boolean datatypes */
- Datum
- hashchar(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32((int32) PG_GETARG_CHAR(0));
- }
- 
- Datum
- hashint2(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32((int32) PG_GETARG_INT16(0));
- }
- 
- Datum
- hashint4(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32(PG_GETARG_INT32(0));
- }
- 
- Datum
- hashint8(PG_FUNCTION_ARGS)
- {
- 	/*
- 	 * The idea here is to produce a hash value compatible with the values
- 	 * produced by hashint4 and hashint2 for logically equal inputs; this is
- 	 * necessary to support cross-type hash joins across these input types.
- 	 * Since all three types are signed, we can xor the high half of the int8
- 	 * value if the sign is positive, or the complement of the high half when
- 	 * the sign is negative.
- 	 */
- #ifndef INT64_IS_BUSTED
- 	int64		val = PG_GETARG_INT64(0);
- 	uint32		lohalf = (uint32) val;
- 	uint32		hihalf = (uint32) (val  32);
- 
- 	lohalf ^= (val = 0) ? hihalf : ~hihalf;
- 
- 	return hash_uint32(lohalf);
- #else
- 	/* here if we can't count on x  32 to work sanely */
- 	return hash_uint32((int32) PG_GETARG_INT64(0));
- #endif
- }
- 
- Datum
- hashoid(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32((uint32) PG_GETARG_OID(0));
- }
- 
- Datum
- hashenum(PG_FUNCTION_ARGS)
- {
- 	return hash_uint32((uint32) PG_GETARG_OID(0));
- }
- 
- Datum
- hashfloat4(PG_FUNCTION_ARGS)
- {
- 	float4		key = PG_GETARG_FLOAT4(0);
- 	float8		key8;
- 
- 	/*
- 	 * On IEEE-float machines, minus zero and zero have different bit patterns
- 	 * but should compare as equal.  We must ensure that they have the same
- 	 * hash value, which is most reliably done this way:
- 	 */
- 	if (key == (float4) 0)
- 		PG_RETURN_UINT32(0);
- 
- 	/*
- 	 * To support cross-type hashing of float8 and float4, we want to return
- 	 * the same hash value hashfloat8 would produce for an equal float8 value.
- 	 * So, widen the value to float8 and hash that.  (We must do this rather
- 	 * than have hashfloat8 try to narrow its value to float4; that could fail
- 	 * on overflow.)
- 	 */
- 	key8 = key;
- 
- 	return hash_any((unsigned char *) key8, sizeof(key8));
- }
- 
- Datum
- hashfloat8(PG_FUNCTION_ARGS)
- {
- 	float8		key = PG_GETARG_FLOAT8(0);
- 
- 	/*
- 	 * On IEEE-float machines, minus zero and zero have different bit patterns
- 	 * but should compare as equal.  We must ensure that they have the same
- 	 * hash value, which is most reliably done this way:
- 	 */
- 	if (key == (float8) 0)
- 		PG_RETURN_UINT32(0);
- 
- 	return hash_any((unsigned char *) key, sizeof(key));
- }
- 
- Datum
- hashoidvector(PG_FUNCTION_ARGS)
- {
- 	oidvector  *key = (oidvector *) PG_GETARG_POINTER(0);
- 
- 	return hash_any((unsigned char *) key-values, key-dim1 * sizeof(Oid));
- }
- 
- Datum
- hashint2vector(PG_FUNCTION_ARGS)
- {
- 	int2vector *key = (int2vector *) PG_GETARG_POINTER(0);
- 

Re: [HACKERS] Fast ALTER TABLE ... ADD COLUMN ... DEFAULT xxx?

2009-05-22 Thread Dmitry Koterov
On Thu, May 21, 2009 at 6:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Sam Mason s...@samason.me.uk writes:
  On Thu, May 21, 2009 at 12:06:29PM +0400, Dmitry Koterov wrote:
  ALTER TABLE ... ADD COLUMN ... NULL;
 
  (nullable without a default value). This is because of NULL bitmap in
  tuples. And it's greatest feature for a developer!

  I don't think this is because of the NULL bitmap.

 No, it isn't.  It's because each tuple includes the actual count of
 fields it contains (t_natts or HeapTupleHeaderGetNatts), and the value
 extraction routines are coded to assume that references to fields
 beyond that number should yield NULL.  So the ALTER can just leave
 the existing rows alone --- only when you update a row will it change
 to include the newly added field(s).


No, I meant that in case of the row (1, NULL, NULL, 2, 3, NULL):
- the corresponding NULL bitmap is (100110...)
- the corresponding tuple is (1, 2, 3)
- t_natts=3 (if I am not wrong here)

And in case of the row (5, 6, NULL, 7, 8, 9):
- the corresponding NULL bitmap is (110111...)
- the corresponding tuple is (5, 6, 7, 9)
- t_natts=4

So, without a NULL bitmap, we cannot handle this kind of rows at all by
t_natts only. And the NULL bitmap plays very important role in tuple saving,
and I meant exactly that point.


Re: [HACKERS] Revisiting default_statistics_target

2009-05-22 Thread Greg Stark



On 22 May 2009, at 16:17, Greg Smith gsm...@gregsmith.com wrote:


On Fri, 22 May 2009, Greg Sabino Mullane wrote:

The bump from 10 to 100 was supported by microbenchmarks that  
suggested it

would be tolerable.


No, the 10 to 100 was supported by years of people working in the  
field who

routinely did that adjustment (and 100) and saw great gains.


No one is suggesting the increase isn't important to people running  
many common workloads.  The question the smaller benchmarks tried to  
answer is whether it was likely to detune anything else as a penalty  
for improving that situation.


As Tom implied there are two possible problems and the response would  
be different depending on what's going on.


If the plan changed due to the added stats and the new plan is worse  
then it's a problem but the lowering the stats target would just be  
papering over the problem. Ideally having better stats should never  
cause a worse plan.


If on the other hand it turns out that planning the queries is taking  
15% longer the, well then we should rerun the microbenchmark using  
these queries might give us a better default.


Or maybe we would just find the bottlenecks and fix them ...


The comments you made here can get turned right around at you:  if  
increasing the value in the field is sufficient to help out those  
that need it, why should the project at large accept any significant  
penalty that could apply to everyone just to help that subset?


Would you be happy with 8.4 going out the door if there really turns  
out to be a 15% penalty for other use cases by this change?  That's  
a PR nightmare waiting to happen, and the main reason I wanted to  
bring this up here with some additional details as soon as Jignesh's  
slides went public--so everyone here is aware of what's going on  
before this bit of news gets picked up anywhere else.


Hopefully whatever is happening to dbt2 will turn out to be a quirk  
not worth worrying about.  What if it turns out to be repeatable and  
expected to impact people in the field though?  I hope you'd  
recognize that your use case is no more privileged to trump other  
people's than the changes that would be good for DW users, but not  
anyone else, that you were just making critical comments about.


Anyway, thanks to Stephen for concisely clarifying the position I  
was trying to present here, which is quite different from the one  
you were arguing against.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com  
Baltimore, MD


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Compiler warning cleanup

2009-05-22 Thread Zdenek Kotala
Attached patch fixes following sun studio compiler warning:

tsquery_op.c, line 193: warning: syntax error:  empty declaration
tsquery_op.c, line 194: warning: syntax error:  empty declaration
tsquery_op.c, line 195: warning: syntax error:  empty declaration
tsquery_op.c, line 196: warning: syntax error:  empty declaration
tsquery_op.c, line 197: warning: syntax error:  empty declaration
tsquery_op.c, line 198: warning: syntax error:  empty declaration
tsvector_op.c, line 177: warning: syntax error:  empty declaration
tsvector_op.c, line 178: warning: syntax error:  empty declaration
tsvector_op.c, line 179: warning: syntax error:  empty declaration
tsvector_op.c, line 180: warning: syntax error:  empty declaration
tsvector_op.c, line 181: warning: syntax error:  empty declaration
tsvector_op.c, line 182: warning: syntax error:  empty declaration
tsvector_op.c, line 183: warning: syntax error:  empty declaration


Thanks Zdenek
diff -Nrc pgsql.orig.5d4d60e3a557/src/backend/utils/adt/tsquery_op.c pgsql.orig/src/backend/utils/adt/tsquery_op.c
*** pgsql.orig.5d4d60e3a557/src/backend/utils/adt/tsquery_op.c	2009-05-22 17:00:22.081796110 -0400
--- pgsql.orig/src/backend/utils/adt/tsquery_op.c	2009-05-22 17:00:22.086155307 -0400
***
*** 190,201 
  	PG_RETURN_BOOL( CONDITION );\
  }
  
! CMPFUNC(tsquery_lt, res  0);
! CMPFUNC(tsquery_le, res = 0);
! CMPFUNC(tsquery_eq, res == 0);
! CMPFUNC(tsquery_ge, res = 0);
! CMPFUNC(tsquery_gt, res  0);
! CMPFUNC(tsquery_ne, res != 0);
  
  TSQuerySign
  makeTSQuerySign(TSQuery a)
--- 190,201 
  	PG_RETURN_BOOL( CONDITION );\
  }
  
! CMPFUNC(tsquery_lt, res  0)
! CMPFUNC(tsquery_le, res = 0)
! CMPFUNC(tsquery_eq, res == 0)
! CMPFUNC(tsquery_ge, res = 0)
! CMPFUNC(tsquery_gt, res  0)
! CMPFUNC(tsquery_ne, res != 0)
  
  TSQuerySign
  makeTSQuerySign(TSQuery a)
diff -Nrc pgsql.orig.5d4d60e3a557/src/backend/utils/adt/tsvector_op.c pgsql.orig/src/backend/utils/adt/tsvector_op.c
*** pgsql.orig.5d4d60e3a557/src/backend/utils/adt/tsvector_op.c	2009-05-22 17:00:22.083116270 -0400
--- pgsql.orig/src/backend/utils/adt/tsvector_op.c	2009-05-22 17:00:22.086348815 -0400
***
*** 174,186 
  	PG_RETURN_##ret( res action 0 );	\
  }
  
! TSVECTORCMPFUNC(lt, , BOOL);
! TSVECTORCMPFUNC(le, =, BOOL);
! TSVECTORCMPFUNC(eq, ==, BOOL);
! TSVECTORCMPFUNC(ge, =, BOOL);
! TSVECTORCMPFUNC(gt, , BOOL);
! TSVECTORCMPFUNC(ne, !=, BOOL);
! TSVECTORCMPFUNC(cmp, +, INT32);
  
  Datum
  tsvector_strip(PG_FUNCTION_ARGS)
--- 174,186 
  	PG_RETURN_##ret( res action 0 );	\
  }
  
! TSVECTORCMPFUNC(lt, , BOOL)
! TSVECTORCMPFUNC(le, =, BOOL)
! TSVECTORCMPFUNC(eq, ==, BOOL)
! TSVECTORCMPFUNC(ge, =, BOOL)
! TSVECTORCMPFUNC(gt, , BOOL)
! TSVECTORCMPFUNC(ne, !=, BOOL)
! TSVECTORCMPFUNC(cmp, +, INT32)
  
  Datum
  tsvector_strip(PG_FUNCTION_ARGS)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] question about pg_proc

2009-05-22 Thread Gevik Babakhani
Is there a historical reason why we have a proargtypes and a 
proallargtypes?

It seems that proallargtypes is only set when out parameters exist.
Could someone clarify this please?

Regards,
Gevik.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFD: Discarded tuple count for SeqScan nodes in EXPLAIN ANALYZE

2009-05-22 Thread Andres Freund

Hi,

On 05/22/2009 05:54 PM, Tom Lane wrote:

This sort of ties into the discussions we have periodically about
allowing EXPLAIN to output XML or some other more-machine-friendly
data format.  The barrier for adding additional output fields would
be a lot lower in such a format.

So the best thing would be to work on that front...

Tom (Raney), did you further work on your XML explain patch? Could you 
use help?



Greetings,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] question about pg_proc

2009-05-22 Thread Pavel Stehule
2009/5/23 Gevik Babakhani pg...@xs4all.nl:
 Is there a historical reason why we have a proargtypes and a
 proallargtypes?

 It seems that proallargtypes is only set when out parameters exist.
 Could someone clarify this please?

proargtypes is used for fast simple searching function in
transformFunc stage. For this searching OUT variables are excluded.

proallargtypes is used for building OUT variables.

Look on: FuncnameGetCandidates (namespace.c)
do_compile(pl_comp.c)

Regards
Pavel Stehule


 Regards,
 Gevik.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers