Re: [HACKERS] sorted writes for checkpoints

2010-10-28 Thread Itagaki Takahiro
On Fri, Oct 29, 2010 at 3:23 PM, Heikki Linnakangas
 wrote:
> Simon's argument in the thread that the todo item points to
> (http://archives.postgresql.org/pgsql-patches/2008-07/msg00123.php) is
> basically that we don't know what the best algorithm is yet and benchmarking
> is a lot of work, so let's just let people do whatever they feel like until
> we settle on the best approach. I think we need to bite the bullet and do
> some benchmarking, and commit one carefully vetted patch to the backend.

When I submitted the patch, I tested it on disk-based RAID-5 machine:
http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php
But there were no additional benchmarking reports at that time. We still
need benchmarking before we re-examine the feature. For example, SSD and
SSD-RAID was not popular at that time, but now they might be considerable.

I think direct patching to the core is enough at the first
testing, and we will decide the interface according to the
result. If one algorithm win in all cases, we could just
include it in the core, and then extensibility would not need.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sorted writes for checkpoints

2010-10-28 Thread Heikki Linnakangas

On 29.10.2010 06:00, Jeff Janes wrote:

One of the items on the Wiki ToDo list is sorted writes for
checkpoints.  The consensus seemed to be that this should be done by
adding hook(s) into the main code, and then a contrib module to work
with those hooks.  Is there an existing contrib module that one could
best look to for inspiration on how to go about doing this?  I have
the sorted checkpoint working under a guc, but don't know where to
start on converting it to a contrib module instead.


I don't think it's a good idea to have this as a hook. Bgwriter 
shouldn't need to load external code, and checkpoint robustness should 
dependend on user-written code. IIRC Tom Lane didn't even like pallocing 
the memory for the list of dirty pages at checkpoint time because that 
might cause an out-of-memory error. Calling a function in a contrib 
module is much much worse.


Simon's argument in the thread that the todo item points to 
(http://archives.postgresql.org/pgsql-patches/2008-07/msg00123.php) is 
basically that we don't know what the best algorithm is yet and 
benchmarking is a lot of work, so let's just let people do whatever they 
feel like until we settle on the best approach. I think we need to bite 
the bullet and do some benchmarking, and commit one carefully vetted 
patch to the backend.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Tom Lane
I wrote:
> the right way to make this faster is to refactor things so that we
> don't generate useless equivalence classes in the first place, or
> at least don't keep them around in the planner's lists once we realize
> they're useless.

After a bit of hacking, I propose the attached patch.

> I like Heikki's hack to cut down on searching in make_canonical_pathkey,
> but I think that complicating the data structure searching beyond that
> is just a band-aid.

With the given test case and this patch, we end up with exactly two
canonical pathkeys referencing a single EquivalenceClass.  So as far
as I can tell there's not a lot of point in refining the pathkey
searching.  Now, the EquivalenceClass has got 483 members, which
means that there's still some O(N^2) behavior in
get_eclass_for_sort_expr.  There might be some use in refining the
search for a matching eclass member.  It's not sticking out in
profiling like it did before though.

regards, tom lane

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index d6402cf911817b1b8c17da91019a1fac19bf051a..5c0786f2fe6dea9a72ad66ba93aa8833ab0e26ba 100644
*** a/src/backend/optimizer/README
--- b/src/backend/optimizer/README
*** sort ordering was important; and so usin
*** 632,640 
  orderings doesn't create any real problem.
  
  
  
- Though Bob Devine  was not involved in the 
- coding of our optimizer, he is available to field questions about
- optimizer topics.
  
  -- bjm & tgl
--- 632,670 
  orderings doesn't create any real problem.
  
  
+ Order of processing for EquivalenceClasses and PathKeys
+ ---
+ 
+ As alluded to above, there is a specific sequence of phases in the
+ processing of EquivalenceClasses and PathKeys during planning.  During the
+ initial scanning of the query's quals (deconstruct_jointree followed by
+ reconsider_outer_join_clauses), we construct EquivalenceClasses based on
+ mergejoinable clauses found in the quals.  At the end of this process,
+ we know all we can know about equivalence of different variables, so
+ subsequently there will be no further merging of EquivalenceClasses.
+ At that point it is possible to consider the EquivalenceClasses as
+ "canonical" and build canonical PathKeys that reference them.  Before
+ we reach that point (actually, before entering query_planner at all)
+ we also ensure that we have constructed EquivalenceClasses for all the
+ expressions used in the query's ORDER BY and related clauses.  These
+ classes might or might not get merged together, depending on what we
+ find in the quals.
+ 
+ Because all the EquivalenceClasses are known before we begin path
+ generation, we can use them as a guide to which indexes are of interest:
+ if an index's column is not mentioned in any EquivalenceClass then that
+ index's sort order cannot possibly be helpful for the query.  This allows
+ short-circuiting of much of the processing of create_index_paths() for
+ irrelevant indexes.
+ 
+ There are some cases where planner.c constructs additional
+ EquivalenceClasses and PathKeys after query_planner has completed.
+ In these cases, the extra ECs/PKs are needed to represent sort orders
+ that were not considered during query_planner.  Such situations should be
+ minimized since it is impossible for query_planner to return a plan
+ producing such a sort order, meaning a explicit sort will always be needed.
+ Currently this happens only for queries involving multiple window functions
+ with different orderings, so extra sorts are needed anyway.
  
  
  -- bjm & tgl
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index e44e960b5454d4698ed82e4e857794ffe2a9adf2..c101c272a14b2f1b9d92a54670688df057d84a13 100644
*** a/src/backend/optimizer/path/equivclass.c
--- b/src/backend/optimizer/path/equivclass.c
*** static bool reconsider_full_join_clause(
*** 78,83 
--- 78,87 
   * join.  (This is the reason why we need a failure return.  It's more
   * convenient to check this case here than at the call sites...)
   *
+  * On success return, we have also initialized the clause's left_ec/right_ec
+  * fields to point to the EquivalenceClass built from it.  This saves lookup
+  * effort later.
+  *
   * Note: constructing merged EquivalenceClasses is a standard UNION-FIND
   * problem, for which there exist better data structures than simple lists.
   * If this code ever proves to be a bottleneck then it could be sped up ---
*** process_equivalence(PlannerInfo *root, R
*** 106,111 
--- 110,119 
  			   *em2;
  	ListCell   *lc1;
  
+ 	/* Should not already be marked as having generated an eclass */
+ 	Assert(restrictinfo->left_ec == NULL);
+ 	Assert(restrictinfo->right_ec == NULL);
+ 
  	/* Extract info from given clause */
  	Assert(is_opclause(clause));
  	opno = ((OpExpr *) clause)->opno;
*** process_equivalence(Plann

Re: [HACKERS] add label to enum syntax

2010-10-28 Thread Pavel Golub
Hello, Alvaro.

You wrote:

AH> Excerpts from Pavel Golub's message of jue oct 28 07:50:24 -0300 2010:

>> Forgot link to poll:
>> http://pgolub.wordpress.com/2010/10/28/poll-alter-type-enumtype-add-what-newlabel/

AH> Hah, there are 17 votes as of right now, no option is below 23% and no
AH> option is above 29%.


Yeah, right now 42 votes:

 VALUE 26%
 LABEL 26%
 Just ADD 'newlabel' 24%
 ELEMENT 21%
 MEMBER 2%

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sorted writes for checkpoints

2010-10-28 Thread Alvaro Herrera
Excerpts from Jeff Janes's message of vie oct 29 00:00:24 -0300 2010:
> One of the items on the Wiki ToDo list is sorted writes for
> checkpoints.  The consensus seemed to be that this should be done by
> adding hook(s) into the main code, and then a contrib module to work
> with those hooks.  Is there an existing contrib module that one could
> best look to for inspiration on how to go about doing this?  I have
> the sorted checkpoint working under a guc, but don't know where to
> start on converting it to a contrib module instead.

Hmm, see contrib/auto_explain?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] sorted writes for checkpoints

2010-10-28 Thread Jeff Janes
One of the items on the Wiki ToDo list is sorted writes for
checkpoints.  The consensus seemed to be that this should be done by
adding hook(s) into the main code, and then a contrib module to work
with those hooks.  Is there an existing contrib module that one could
best look to for inspiration on how to go about doing this?  I have
the sorted checkpoint working under a guc, but don't know where to
start on converting it to a contrib module instead.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-28 Thread Joshua D. Drake
On Thu, 2010-10-28 at 17:12 -0700, Josh Berkus wrote:
> > Sorry, didn't know... I have 122 responses so far, which I think will be
> > surprising (some of them certainly surprised me). I will keep it open
> > until next week and then post the results.
> 
> Well, for any community site poll, I hope you realize that there's a LOT
> of sampling error.  Here's another one:
> 
> http://www.postgresql.org/community/survey.71
> 

Oh sure. I don't expect this to be some kind of authoritative reference
but it is certainly worth at least reviewing. If nothing else it is fun
to see the responses and consider their meaning based on your own views.

JD

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-28 Thread Josh Berkus

> Sorry, didn't know... I have 122 responses so far, which I think will be
> surprising (some of them certainly surprised me). I will keep it open
> until next week and then post the results.

Well, for any community site poll, I hope you realize that there's a LOT
of sampling error.  Here's another one:

http://www.postgresql.org/community/survey.71

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Buildfarm server upgrade

2010-10-28 Thread Andrew Dunstan


The buildfarm server is moving. The web site has already moved, and the 
database will be moving on Tuesday Nov 9th, starting at 9.00 am EST (or, 
if you prefer, at 14:00 UTC). The server will be unavailable during the 
outage, which is expected to take no longer then 6 hours. (we're moving 
from 8.0 to 9.0, so it will be an old-fashioned dump and reload).


Any buildfarm owners are advised to disable any scheduled jobs during 
that period. I will post notices shortly before shutting the server down 
and as soon as it is back up. It would probably be a good thing not to 
make any commits in the hours leading up to the outage.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-28 Thread Joshua D. Drake
On Thu, 2010-10-28 at 16:25 -0700, Josh Berkus wrote:
> > https://www.postgresqlconference.org/content/replication-poll
> > 
> > You don't have to login to take it but of course it helps with validity
> > of results.
> 
> Oh, I'd already put something up on http://www.postgresql.org/community

Sorry, didn't know... I have 122 responses so far, which I think will be
surprising (some of them certainly surprised me). I will keep it open
until next week and then post the results.

JD

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-28 Thread Josh Berkus

> https://www.postgresqlconference.org/content/replication-poll
> 
> You don't have to login to take it but of course it helps with validity
> of results.

Oh, I'd already put something up on http://www.postgresql.org/community

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keywords in pg_hba.conf should be field-specific

2010-10-28 Thread Brendan Jurd
On 18 October 2010 01:19, Tom Lane  wrote:
> Brendan Jurd  writes:
>> On 17 October 2010 09:59, Tom Lane  wrote:
>>> Good point.  Maybe the correct fix is to remember whether each token was
>>> quoted or not, so that keyword detection can be done safely after the
>>> initial lexing.  I still think that the current method is impossibly
>>> ugly ...
>
>> I'm happy to revise the patch on that basis.  Any suggestions about
>> how to communicate the 'quotedness' of each token?  We could make each
>> token a struct consisting of the token itself, plus a boolean flag to
>> indicate whether it had been quoted.  Does that work for you?
>
> Seems reasonable.  I had the idea of a parallel list of booleans in the
> back of my mind, but a list of structs is probably easier to understand,
> and to extend further if necessary.
>

Okay, I've taken the red pill and I'm finding out how deep the rabbit
hole goes ...

The logical structure of pg_hba.conf is a set of lines, each line
containing a set of fields, each field containing a set of tokens.
The way the existing implementation handles this is to create a list
of lines containing sublists of fields, containing comma-separated
strings for the set of tokens, with newlines embedded next to tokens
which might be keywords.

The tokeniser breaks apart the comma-separated tokens ... and then
reassembles them into a comma-separated string.  Which the db/role
matching functions then have to break apart *again*.

In order to keep track of whether each individual token was quoted, I
first need to impose some sanity here.  Rather than using a magical
string for each field, I intend to use a List of HbaToken structs
which explicitly note whether quoting was used.

Introducing an extra List level does mean a bit more work copying and
freeing, and it makes the patch really quite intrusive.  I have to
touch a lot of lines in hba.c, but I think the additional clarity is
worth it.  If nobody dissuades me from this approach I hope to post a
patch in a couple of days.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Tom Lane
Boszormenyi Zoltan  writes:
> Tom Lane írta:
>> That seems like a ridiculously large number of ECs.  What is the
>> test query again?

> The test case is here:
> http://archives.postgresql.org/message-id/4cbd9ddc.4040...@cybertec.at

After poking through that a bit, I think that the real issue is in this
division of labor:

index_pathkeys = build_index_pathkeys(root, index,
  ForwardScanDirection);
useful_pathkeys = truncate_useless_pathkeys(root, rel,
index_pathkeys);

If you trace what is happening here, the index pathkeys that actually
survive the "usefulness" test all refer to exactly ONE equivalence
class, namely the one arising from the query's "order by timestamp2"
clause.  All the other pathkeys that get created are immediately
discarded as being irrelevant to the query.  The reason that we end up
with so many equivalence classes is that there is nothing causing the
variables of the different child tables to be recognized as all
sort-equivalent.  Maybe that's a bug in itself, but I would argue that
the right way to make this faster is to refactor things so that we
don't generate useless equivalence classes in the first place, or
at least don't keep them around in the planner's lists once we realize
they're useless.

I like Heikki's hack to cut down on searching in make_canonical_pathkey,
but I think that complicating the data structure searching beyond that
is just a band-aid.  Reasonably-sized queries shouldn't contain very
many equivalence classes: they should only come from equality clauses
or sort conditions that appeared in the query text.  Therefore, there
also shouldn't be all that many distinct pathkeys.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contsel and gist

2010-10-28 Thread Tom Lane
Ben  writes:
> On Oct 28, 2010, at 10:50 AM, Tom Lane wrote:
>> However, having said that: the constant value of the stub contsel
>> function is intended to be small enough to encourage use of an
>> indexscan.  Maybe we just need to decrease it a bit more.  Have you
>> investigated what the cutover point is for your queries?

> i'd be happy to investigate this for you, but my guess is my dataset
is probably not a good example to use for setting the constant more
generally.  i'm joining an 8e10 table vs a 150K table, so the
selectivity fraction would probably have to drop by many orders of
magnitude.

I doubt it.

> that being said, i'll poke around and see if i can find the cutoff point.  is 
> there an easy way to do this that doesn't involve recompiling postgres?

No, those are just hardwired constants.  If you wanted to avoid multiple
recompilations while experimenting, you could set up a custom GUC
variable for the functions to read...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-28 Thread Joshua D. Drake
On Thu, 2010-10-28 at 07:05 -0500, Kevin Grittner wrote:
> "Joshua D. Drake"  wrote:
> > On Wed, 2010-10-27 at 19:52 -0400, Robert Haas wrote:
> >> Josh Berkus  wrote:
> >  
> >>> *you don't know* how many .org users plan to implement
> >>> replication, whether it's a minority or majority.
> >>
> >> None of us know. What I do know is that I don't want PostgreSQL to
> >> be slower out of the box.
> > 
> > Poll TIME!
>  
> If you do take a poll, be careful to put in an option or two to deal
> with environments where there is "surgical" implementation of
> replication features.  

And Poll it is:

https://www.postgresqlconference.org/content/replication-poll

You don't have to login to take it but of course it helps with validity
of results.

Sincerely,

Joshua D. Drake

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql autocompletion for \z and \dg

2010-10-28 Thread Peter Eisentraut
On ons, 2010-10-20 at 22:56 -0400, Josh Kupershmidt wrote:
> It looks like psql's tab completion for the \z and \dg commands in
> psql are missing. I couldn't see a reason for this, so attached patch
> fixes.
> 
> Also, this patch proposes to change psql's "\?" help text to say that
> \dg and \du are the same, since AFAICT they do exactly the same thing.

Committed with slight variations.  I didn't think documenting \du as
"same as \dg" added much value, so I just made them both "list roles".


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contsel and gist

2010-10-28 Thread Ben
thanks for the prompt reply.

On Oct 28, 2010, at 10:50 AM, Tom Lane wrote:

>> 1 - am i wrong in my assessment?  is the constant contsel, areasel, etc 
>> hurting us?
> 
> The stub selectivity functions definitely suck.

i'm taking this as implying that my intuition here is basically right.

>> 2 - how hard would it be to implement contsel et al for period data types?
> 
> If it were easy, it'd likely have been done already :-(

i am interested in learning more about this, in hopes that it might be possible 
for me to do it some day.  do you have any pointers as far as things to look at 
to learn from?  i imagine this must be a problem for the postgis people too.

i guess the first step is to figure out what kind of statistics / histograms to 
collect for the period datatype.  (i don't see anything in pg_stats.)  has 
there been previous work / thinking on this?

> However, having said that: the constant value of the stub contsel
> function is intended to be small enough to encourage use of an
> indexscan.  Maybe we just need to decrease it a bit more.  Have you
> investigated what the cutover point is for your queries?

i'd be happy to investigate this for you, but my guess is my dataset is 
probably not a good example to use for setting the constant more generally.  
i'm joining an 8e10 table vs a 150K table, so the selectivity fraction would 
probably have to drop by many orders of magnitude.  that being said, i'll poke 
around and see if i can find the cutoff point.  is there an easy way to do this 
that doesn't involve recompiling postgres?

best regards, b
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Andrew Dunstan



On 10/28/2010 02:11 PM, Garick Hamlin wrote:

On Thu, Oct 28, 2010 at 01:03:24PM -0400, Andrew Dunstan wrote:


On 10/28/2010 12:34 PM, Tom Lane wrote:

BTW, maybe we could have the best of both worlds?  Dunno about Perl,
but in some languages it would be possible to instantiate the hash
only if it's actually touched.  Passing the data as a hash definitely
seems to fit with the spirit of things otherwise, so as long as it
didn't cost cycles when not needed, I'd be in favor of that API.

Maybe, but I think that's getting rather beyond my perlguts-fu. I think
we'd need to do that via PERL_MAGIC_tied, but it's new territory for me.
Anyone else want to chime in?

Warning, I don't know the plperl, I am just a perl coder.

I do think all the anonymous array are worth worrying about in terms of
performance.

I don't think that tie is necessarily good for performance.  tie() is not
generally fast.  I think you'd likely be better off writing plain accessors
or using a function to add type info.

Use an accessor for type information, like this?
$ref->typeof($key)

...
or perhaps use a special function?

add_type_info(\%args);

...
or if you want attibute based syntax sugar for the add_type_info() solution...

my %args : pg_record(add_type_info);

Again, these I don't know the plperl code, so I might be missing something
here.



This wouldn't be done at the perl level. It would be done in C code. Run 
"man perlguts" and search for "Understanding the Magic of Tied Hashes 
and Arrays". The overhead in setting it up is likely to be very low 
unless I'm not understanding correctly. There might be some price paid 
when accessing the object, but that's another affair.


OTOH, a pg_get_arg_info() function would probably be a substantially 
simpler if slightly less perlish solution.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Garick Hamlin
On Thu, Oct 28, 2010 at 01:03:24PM -0400, Andrew Dunstan wrote:
> 
> 
> On 10/28/2010 12:34 PM, Tom Lane wrote:
> >
> > BTW, maybe we could have the best of both worlds?  Dunno about Perl,
> > but in some languages it would be possible to instantiate the hash
> > only if it's actually touched.  Passing the data as a hash definitely
> > seems to fit with the spirit of things otherwise, so as long as it
> > didn't cost cycles when not needed, I'd be in favor of that API.
> 
> Maybe, but I think that's getting rather beyond my perlguts-fu. I think 
> we'd need to do that via PERL_MAGIC_tied, but it's new territory for me. 
> Anyone else want to chime in?

Warning, I don't know the plperl, I am just a perl coder.

I do think all the anonymous array are worth worrying about in terms of
performance.

I don't think that tie is necessarily good for performance.  tie() is not 
generally fast.  I think you'd likely be better off writing plain accessors 
or using a function to add type info.

Use an accessor for type information, like this?
$ref->typeof($key)

...
or perhaps use a special function?

add_type_info(\%args);

...
or if you want attibute based syntax sugar for the add_type_info() solution...

my %args : pg_record(add_type_info);

Again, these I don't know the plperl code, so I might be missing something
here.

Garick


> 
> cheers
> 
> andrew
> 
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contsel and gist

2010-10-28 Thread Tom Lane
Ben  writes:
> my guess is that it has to do with the selectivity of the @> operator.  i've 
> looked and noticed that the selectivity functions for @> and other period 
> operators are basically stubs, with constant selectivity.  my questions are :

> 1 - am i wrong in my assessment?  is the constant contsel, areasel, etc 
> hurting us?

The stub selectivity functions definitely suck.

> 2 - how hard would it be to implement contsel et al for period data types?

If it were easy, it'd likely have been done already :-(

However, having said that: the constant value of the stub contsel
function is intended to be small enough to encourage use of an
indexscan.  Maybe we just need to decrease it a bit more.  Have you
investigated what the cutover point is for your queries?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Types and Function Parameters

2010-10-28 Thread David E. Wheeler
On Oct 28, 2010, at 9:31 AM, Andrew Dunstan wrote:

> Of course it's possible, but it's a different feature. As for "just as easy", 
> no, it's much more work. I agree it should be done, though.

I bet we could raise some money to fund it's development. How much work are we 
talking about here?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Boszormenyi Zoltan
Tom Lane írta:
> Boszormenyi Zoltan  writes:
>   
>> This means add_child_rel_equivalences() is still takes
>> too much time, the previously posted test case calls this
>> function 482 times, it's called for almost  every 10th entry
>> added to eq_classes. The elog() I put into this function says
>> that at the last call list_length(eq_classes) == 4754.
>> 
>
> That seems like a ridiculously large number of ECs.  What is the
> test query again?
>
>   regards, tom lane
>   

The test case is here:
http://archives.postgresql.org/message-id/4cbd9ddc.4040...@cybertec.at

create_table.sql for the main table plus childtables.sql.gz, the EXPLAIN
query is in the message body.

Basically, it's a model for a lot of data for three months, partitioned by
4 hour intervals for every day. Imagine the call list handled by a
phone company in a large country.

Best regards,
Zoltán Böszörményi

-- 
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Andrew Dunstan



On 10/28/2010 12:34 PM, Tom Lane wrote:


BTW, maybe we could have the best of both worlds?  Dunno about Perl,
but in some languages it would be possible to instantiate the hash
only if it's actually touched.  Passing the data as a hash definitely
seems to fit with the spirit of things otherwise, so as long as it
didn't cost cycles when not needed, I'd be in favor of that API.


Maybe, but I think that's getting rather beyond my perlguts-fu. I think 
we'd need to do that via PERL_MAGIC_tied, but it's new territory for me. 
Anyone else want to chime in?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] contsel and gist

2010-10-28 Thread Ben
hello --

i have a largish table (~8 billion rows) which makes use of the temporal period 
datatype and gist indexes.  i find that query plans are somewhat "unstable" in 
that it is often the case that slightly altering a query can result in a change 
from using the index (usually via a bitmap scan) to a sequential scan.  there 
is basically no situation where a sequential scan is the right plan for the 
kinds of queries we are doing -- the table is so large that any sequential scan 
would take hours.  (i will give more details about our setup below.)

my guess is that it has to do with the selectivity of the @> operator.  i've 
looked and noticed that the selectivity functions for @> and other period 
operators are basically stubs, with constant selectivity.  my questions are :

1 - am i wrong in my assessment?  is the constant contsel, areasel, etc hurting 
us?

2 - how hard would it be to implement contsel et al for period data types?  
i've read the gist papers but find the eqsel code a bit hard to understand.  
(would anyone be willing to help?)

more details :

pg 9, linux x64_64 box with 24gb ram and software raid-5.  (not ideal, i 
understand.)  the table is

create table timeseries ( id integer not null, value float not null, timespan 
period not null );
create index timeseries_id on timeseries (id);
create index timeseries_timespan on timeseries using gist (timespan);

in our dataset there are about 2000 different time series, each given a 
different id.  the time series are piecewise constant functions we are 
representing as (id, value, time interval) triples.  the intervals are 
typically no more than a few seconds, at most a few minutes.  for each id, the 
intervals are non-overlapping and cover the total time period.  there are about 
8 billion rows of historical data, and there are about 3 million new rows a 
day, relatively evenly spread across the different ids.  the database gets 
updated once a day with the new rows, and the rows are loaded in time order; 
the historical data is basically ordered in time. other than that single daily 
update, the workload is basically read-only.  the most important query for us 
is to sample the time series at (potentially irregular) grid points (i will 
give some example queries below.)

there are some small side tables (less than 150K rows) for things like 
different grid points to sample at, or auxiliary data which augment the time 
series. 

create table grids ( gridid integer not null, gridpt timestamp with timezone, 
primary key (gridid, gridpt) );
create table adjs1 ( id integer not null, timespan period not null, adj double 
precision not null);
create index adjs1_timespan on adjs1 using gist (timespan);
create index adjs1_id on adjs1 (id);

an example query that works :

postgres=> explain analyze select * from timeseries join grids on timespan @> 
gridpt where gridid = 2 and gridpt between '2006-10-12' and '2006-10-15';
   
QUERY PLAN  
--
Nested Loop  (cost=18082.74..33760441.77 rows=6525038912 width=48) (actual 
time=204.655..2498.152 rows=34275 loops=1)
  ->  Index Scan using grids_pkey on grids  (cost=0.00..76.64 rows=23 width=12) 
(actual time=0.059..0.559 rows=38 loops=1)
Index Cond: ((gridid = 2) AND (gridpt >= '2006-10-12 
00:00:00'::timestamp without time zone) AND (gridpt <= '2006-10-15 
00:00:00'::timestamp without time zone))
  ->  Bitmap Heap Scan on timeseries  (cost=18082.74..1460749.52 rows=567395 
width=36) (actual time=32.561..62.545 rows=902 loops=38)
Recheck Cond: (timeseries.timespan @> grids.gridpt)
->  Bitmap Index Scan on timeseries_idx_timespan  (cost=0.00..17940.89 
rows=567395 width=0) (actual time=32.184..32.184 rows=902 loops=38)
  Index Cond: (timeseries.timespan @> grids.gridpt)
Total runtime: 2553.386 ms
(8 rows)

Time: 2555.498 ms

where there are about 10-100 gridpts between the times selected.  this query 
plan looks good to me, and indeed it runs fairly fast.

an example query that goes haywire :

postgres=> explain select * from timeseries as TS join grids on TS.timespan @> 
gridpt join adjs1 as AD1 on TS.id=AD1.id and AD1.timespan @> gridpt  where 
gridid=2 and gridpt between '2006-10-19' and '2006-10-22';

 QUERY PLAN
--
Hash Join  (cost=7166.37..2517194919.79 rows=83476436469 width=76)
  Hash Cond: (ts.id = ad1.id)
  Join Filter: (ts.timespan @> grids.gridpt)
  ->  Seq Scan on timeseries ts  (cost=0.00..10402235.88 rows=567394688 
width=36)
  ->  Hash  (cost=

Re: [HACKERS] plperl arginfo

2010-10-28 Thread Stephen J. Butler
On Thu, Oct 28, 2010 at 11:34 AM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 10/28/2010 11:54 AM, Pavel Stehule wrote:
>>> Alternatively, maybe the feature could be exposed in a way where you
>>> don't actually calculate the values unless requested, ie provide some
>>> sort of inquiry function instead of always precomputing a hash.
>>> +1 .. some like get_function_info()
>
>> Yeah, that looks doable.
>
> BTW, maybe we could have the best of both worlds?  Dunno about Perl,
> but in some languages it would be possible to instantiate the hash
> only if it's actually touched.  Passing the data as a hash definitely
> seems to fit with the spirit of things otherwise, so as long as it
> didn't cost cycles when not needed, I'd be in favor of that API.

Perl has the tie interface (perldoc perltie) which lets you tie a hash
to an object instance, which implements subs to handle the various
hash operations.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] revision of todo: NULL for ROW variables

2010-10-28 Thread Merlin Moncure
On Thu, Oct 28, 2010 at 10:15 AM, Tom Lane  wrote:
> Pavel Stehule  writes:
>> I am checking PLpgSQL ToDo topics, and I am not sure if this topic
>> isn't done. And if not, then I would to get some detail.
>
> I think that thread petered out because we didn't have consensus on
> what the behavior ought to be.  It goes back to whether there is
> supposed to be a difference between NULL and ROW(NULL,NULL,NULL,...)

I think somewhere along the line it was noticed that SQL says you are
supposed to treat (null, null) as null and the behavior of 'is null'
operator was changed to reflect this while other null influenced
behaviors were left intact (for example, coalesce()).

My take on this is that we are stuck with the status quo.  If a change
must be done, the 'is null' change should be reverted to un-standard
behavior.  The SQL standard position on this issue is, IMNSHO, on
mars.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Pavel Stehule
2010/10/28 Andrew Dunstan :
>
>
> On 10/28/2010 11:54 AM, Pavel Stehule wrote:
>>
>> Alternatively, maybe the feature could be exposed in a way where you
>> don't actually calculate the values unless requested, ie provide some
>> sort of inquiry function instead of always precomputing a hash.
>> +1 .. some like get_function_info()
>>
>
> Yeah, that looks doable.
>
> I think we can just commit the generic record support now and add this on
> later.

this can be very interesting feature - because it can to do some
things in plperi instead c.

regards

Pavel


>
> cheers
>
> andrew
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Tom Lane
Andrew Dunstan  writes:
> On 10/28/2010 11:54 AM, Pavel Stehule wrote:
>> Alternatively, maybe the feature could be exposed in a way where you
>> don't actually calculate the values unless requested, ie provide some
>> sort of inquiry function instead of always precomputing a hash.
>> +1 .. some like get_function_info()

> Yeah, that looks doable.

BTW, maybe we could have the best of both worlds?  Dunno about Perl,
but in some languages it would be possible to instantiate the hash
only if it's actually touched.  Passing the data as a hash definitely
seems to fit with the spirit of things otherwise, so as long as it
didn't cost cycles when not needed, I'd be in favor of that API.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Types and Function Parameters

2010-10-28 Thread David E. Wheeler
On Oct 27, 2010, at 9:08 PM, Andrew Dunstan wrote:

> Well, it turns out that the hashref required exactly one more line to 
> achieve. We already have all the infrastructure on the composite handling 
> code, and all it requires it to enable it for the RECORDOID case.

I don't suppose that it would be just as easy to allow an array passed to 
PL/Perl to be read into the PL/Perl function as an array reference, would it? 
Since it would break backcompat, it would need to be enabled by a plperl.* 
directive, but man, I would kill for that.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Types and Function Parameters

2010-10-28 Thread Andrew Dunstan



On 10/28/2010 12:23 PM, David E. Wheeler wrote:

On Oct 27, 2010, at 9:08 PM, Andrew Dunstan wrote:


Well, it turns out that the hashref required exactly one more line to achieve. 
We already have all the infrastructure on the composite handling code, and all 
it requires it to enable it for the RECORDOID case.

I don't suppose that it would be just as easy to allow an array passed to 
PL/Perl to be read into the PL/Perl function as an array reference, would it? 
Since it would break backcompat, it would need to be enabled by a plperl.* 
directive, but man, I would kill for that.


Of course it's possible, but it's a different feature. As for "just as 
easy", no, it's much more work. I agree it should be done, though.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Andrew Dunstan



On 10/28/2010 11:54 AM, Pavel Stehule wrote:

Alternatively, maybe the feature could be exposed in a way where you
don't actually calculate the values unless requested, ie provide some
sort of inquiry function instead of always precomputing a hash.
+1 .. some like get_function_info()



Yeah, that looks doable.

I think we can just commit the generic record support now and add this 
on later.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-28 Thread Tom Lane
Alvaro Herrera  writes:
> BTW I note that there are no elog(ERROR) calls in that code path at all,
> because syscall errors are ignored, so PANIC is not a concern (as the
> code stands currently, at least).  ISTM it would be good to have a
> comment on SetLatch stating that it's used inside critical sections,
> though.

Yeah, I was thinking the same while reading the code yesterday.  It
already notes that it's used in interrupt handlers, but the critical
section angle is an additional reason not to elog(ERROR).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Pavel Stehule
2010/10/28 Tom Lane :
> Andrew Dunstan  writes:
>>   While we were discussing allowing generic record type arguments to
>> plperl functions, Tom suggested that we should expose the type
>> information about the record members to plperl. I think if we do that we
>> should probably expand it somewhat to all arguments, so that for
>> non-trigger functions, we'd have $_ARG_INFO while could look something
>> like this:
>
>>     {
>>          names => ['arg1', undef, 'arg3' ] , # no name was supplied for arg2
>>          modes => ['in', 'in', 'in' ], # do we support anything other
>>     than IN ?
>>          types => ['integer', 'text', { name => 'somecomposite', fields
>>     => [ 'field1', 'field2' ], types => ['date', 'numeric' ] } ],
>>     }
>
> Hmm, I'm a bit worried about the performance implications of adding this
> information.  It seems like the value in typical cases would be minimal:
> when you are writing the body of "myfunction(foo int, bar text)", it's
> not like you don't know perfectly well the names and argument types of
> the parameters.
>
> I can see the value of providing type info for polymorphic arguments,
> but not sure about expending extra cycles to do it for all.
>
> Alternatively, maybe the feature could be exposed in a way where you
> don't actually calculate the values unless requested, ie provide some
> sort of inquiry function instead of always precomputing a hash.

+1 .. some like get_function_info()

Regards

Pavel Stehule
>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-28 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mié oct 27 19:01:38 -0300 2010:

> I don't know what Simon is thinking, but I think he's nuts.  There is is
> obvious extra overhead in COMMIT:
> 
> /*
>  * Wake up all walsenders to send WAL up to the COMMIT record
>  * immediately if replication is enabled
>  */
> if (max_wal_senders > 0)
> WalSndWakeup();
> 
> which AFAICT is injecting multiple kernel calls into what's not only
> a hot-spot but a critical section (ie, any error -> PANIC).

Hmm, I wonder if that could be moved out of the critical section
somehow.  Obviously the point here is to allow wal senders to react
before we write to clog (which is expensive by itself); but it seems
hard to wake up some other process without incurring exactly the same
cost (which is basically SetLatch) ... the only difference is that it
would be a single one instead of one per walsender.

BTW I note that there are no elog(ERROR) calls in that code path at all,
because syscall errors are ignored, so PANIC is not a concern (as the
code stands currently, at least).  ISTM it would be good to have a
comment on SetLatch stating that it's used inside critical sections,
though.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Pavel Stehule
2010/10/28 Andrew Dunstan :
> While we were discussing allowing generic record type arguments to plperl
> functions, Tom suggested that we should expose the type information about
> the record members to plperl. I think if we do that we should probably
> expand it somewhat to all arguments, so that for non-trigger functions, we'd
> have $_ARG_INFO while could look something like this:
>
> {
>     names => ['arg1', undef, 'arg3' ] , # no name was supplied for arg2
>     modes => ['in', 'in', 'in' ], # do we support anything other than IN ?

variadic

Pavel

>     types => ['integer', 'text', { name => 'somecomposite', fields => [
> 'field1', 'field2' ], types => ['date', 'numeric' ] } ],
> }
>
> Maybe we should also pass in type Oid info, too.
>
> I don't think this would be terribly difficult to do.
>
> thoughts?
>
> cheers
>
> andrew
>
>
>
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl arginfo

2010-10-28 Thread Tom Lane
Andrew Dunstan  writes:
>   While we were discussing allowing generic record type arguments to 
> plperl functions, Tom suggested that we should expose the type 
> information about the record members to plperl. I think if we do that we 
> should probably expand it somewhat to all arguments, so that for 
> non-trigger functions, we'd have $_ARG_INFO while could look something 
> like this:

> {
>  names => ['arg1', undef, 'arg3' ] , # no name was supplied for arg2
>  modes => ['in', 'in', 'in' ], # do we support anything other
> than IN ?
>  types => ['integer', 'text', { name => 'somecomposite', fields
> => [ 'field1', 'field2' ], types => ['date', 'numeric' ] } ],
> }

Hmm, I'm a bit worried about the performance implications of adding this
information.  It seems like the value in typical cases would be minimal:
when you are writing the body of "myfunction(foo int, bar text)", it's
not like you don't know perfectly well the names and argument types of
the parameters.

I can see the value of providing type info for polymorphic arguments,
but not sure about expending extra cycles to do it for all.

Alternatively, maybe the feature could be exposed in a way where you
don't actually calculate the values unless requested, ie provide some
sort of inquiry function instead of always precomputing a hash.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plperl arginfo

2010-10-28 Thread Andrew Dunstan
 While we were discussing allowing generic record type arguments to 
plperl functions, Tom suggested that we should expose the type 
information about the record members to plperl. I think if we do that we 
should probably expand it somewhat to all arguments, so that for 
non-trigger functions, we'd have $_ARG_INFO while could look something 
like this:


   {
names => ['arg1', undef, 'arg3' ] , # no name was supplied for arg2
modes => ['in', 'in', 'in' ], # do we support anything other
   than IN ?
types => ['integer', 'text', { name => 'somecomposite', fields
   => [ 'field1', 'field2' ], types => ['date', 'numeric' ] } ],
   }

Maybe we should also pass in type Oid info, too.

I don't think this would be terribly difficult to do.

thoughts?

cheers

andrew






Re: [HACKERS] add label to enum syntax

2010-10-28 Thread Alvaro Herrera
Excerpts from Pavel Golub's message of jue oct 28 07:50:24 -0300 2010:

> Forgot link to poll:
> http://pgolub.wordpress.com/2010/10/28/poll-alter-type-enumtype-add-what-newlabel/

Hah, there are 17 votes as of right now, no option is below 23% and no
option is above 29%.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.1alpha2 release notes and plan

2010-10-28 Thread Peter Eisentraut
I have committed an initial version of release notes for 9.1alpha2.
Please look it over.

The release can then be prepared over the weekend and announced on
Monday.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplifying replication

2010-10-28 Thread Robert Haas
On Thu, Oct 28, 2010 at 1:13 AM, Josh Berkus  wrote:
>
>> I sort of agree with you that the current checkpoint_segments
>> parameter is a bit hard to tune, at least if your goal is to control
>> the amount of disk space that will be used by WAL files.  But I'm not
>> sure your proposal is better.  Instead of having a complicated formula
>> for predicting how much disk space would get used by a given value for
>> checkpoint_segments, we'd have a complicated formula for the amount of
>> WAL that would force a checkpoint based on max_wal_size.
>
> Yes, but the complicated formula would then be *in our code* instead of
> being inflicted on the user, as it now is.

I don't think so - I think it will just be inflicted on the user in a
different way.  We'd still have to document what the formula is,
because people will want to understand how often a checkpoint is going
to get forced.

So here's an example of how this could happen.  Someone sets
max_wal_size = 480MB.  Then, they hear about the
checkpoint_completion_target parameter, and say, ooh, goody, let me
boost that.  So they raise it from 0.5 to 0.9.  Now, all of a sudden,
they're getting more frequent checkpoints.  Performance may get worse
rather than better.  To figure out what value for max_wal_size forces
a checkpoint after the same amount of WAL that forced a checkpoint
before, they need to work backwards from max_wal_size to
checkpoint_segments, and then work forward again to figure out the new
value for the max_wal_size parameter.

Here's the math.  max_wal_size = 480MB = 30 segments.  With
checkpoint_completion_target = 0.5, that means that
checkpoint_segments is (30 - 1) / (2 + 0.5) = 11 (rounded down).  With
checkpoint_completion_target = 0.9, that means they'll need to set
max_wal_size to (2 + 0.9) * 11 + 1 = 33 (rounded up) * 16MB = 528MB.
Whew!

My theory is that most tuning of checkpoint_segments is based on a
worry about recovery time or performance, not disk consumption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PERFORM] Re: Postgres insert performance and storage requirement compared to Oracle

2010-10-28 Thread Robert Haas
[moving to -hackers, from -performance]

On Wed, Oct 27, 2010 at 11:32 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Gee, I wonder if it would possible for PG to automatically do an
>> asynchronous commit of any transaction which touches only temp tables.
>
> Hmm ... do we need a commit at all in such a case?  If our XID has only
> gone into temp tables, I think we need to write to clog, but we don't
> really need a WAL entry, synced or otherwise.

I think we might need a commit record anyway to keep Hot Standby's
KnownAssignedXids tracking from getting confused.  It might be
possible to suppress it when wal_level is less than hot_standby, but
I'm not sure it's worth it.

You definitely need to write to CLOG, because otherwise a subsequent
transaction from within the same backend might fail to see those
tuples.

Also, I think that the right test is probably "Have we done anything
that needs to be WAL-logged?".  We can get that conveniently by
checking whether XactLastRecEnd.xrecoff.  One option looks to be to
just change this:

if (XactSyncCommit || forceSyncCommit || nrels > 0)

...to say ((XactSyncCommit && XactLastRecEnd.recoff != 0) ||
forceSyncCommit || nrels > 0).

But I'm wondering if we can instead rejigger things so that this test
moves out of the !markXidCommitted branch of the if statement and
drops down below the whole if statement.

/*
 * If we didn't create XLOG entries, we're done here;
otherwise we
 * should flush those entries the same as a commit
record.  (An
 * example of a possible record that wouldn't cause an XID to be
 * assigned is a sequence advance record due to
nextval() --- we want
 * to flush that to disk before reporting commit.)
 */
if (XactLastRecEnd.xrecoff == 0)
goto cleanup;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Compan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] revision of todo: NULL for ROW variables

2010-10-28 Thread Tom Lane
Pavel Stehule  writes:
> I am checking PLpgSQL ToDo topics, and I am not sure if this topic
> isn't done. And if not, then I would to get some detail.

I think that thread petered out because we didn't have consensus on
what the behavior ought to be.  It goes back to whether there is
supposed to be a difference between NULL and ROW(NULL,NULL,NULL,...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Tom Lane
Heikki Linnakangas  writes:
> Actually, I wonder if we could just have a separate canon_pathkeys list 
> for each EquivalenceClass, instead of one big list in PlannerInfo. I'm 
> not too familiar with equivalence classes and all that,

Hm.  I don't like getting rid of the main canon_pathkeys list like that.
The whole point of a canonical pathkey is that there is only one, so
it seems like we need a central list.  But it might be sane for each
EC to have an additional, side list of PKs made from it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Tom Lane
Boszormenyi Zoltan  writes:
> This means add_child_rel_equivalences() is still takes
> too much time, the previously posted test case calls this
> function 482 times, it's called for almost  every 10th entry
> added to eq_classes. The elog() I put into this function says
> that at the last call list_length(eq_classes) == 4754.

That seems like a ridiculously large number of ECs.  What is the
test query again?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] revision of todo: NULL for ROW variables

2010-10-28 Thread Pavel Stehule
Hello

I am checking PLpgSQL ToDo topics, and I am not sure if this topic
isn't done. And if not, then I would to get some detail.

Now there is possible to test row's variable on NULL, now it is
possible to assign NULL to row variable. What we can do more?

a) There is small difference between returned value when we use a
empty row or empty record variable

CREATE OR REPLACE FUNCTION f2(int) RETURNS t2 AS $$
DECLARE rv t2; re record;
BEGIN
  CASE $1 WHEN 0 THEN RETURN rv;
 WHEN 1 THEN RETURN re;
 ELSE RETURN null;
  END CASE;
END; $$ LANGUAGE plpgsql;

postgres=# SELECT f2(0);
 f2

 ()
(1 row)

Time: 0.759 ms
postgres=# SELECT f2(1);
   f2

 [null]
(1 row)

Time: 0.570 ms
postgres=# SELECT f2(2);
   f2

 [null]
(1 row)

() is equal to NULL for test IS NULL, but it isn't same - see:

Time: 0.586 ms
postgres=# SELECT f2(0) is null;
 ?column?
──
 t
(1 row)

Time: 0.548 ms
postgres=# SELECT f2(1) is null;
 ?column?
──
 t
(1 row)

Time: 0.535 ms
postgres=# SELECT f2(2) is null;
 ?column?
──
 t
(1 row)

postgres=# SELECT 'Hello' || f2(0);
 ?column?
──
 Hello()
(1 row)

Time: 51.546 ms
postgres=# SELECT 'Hello' || f2(1);
 ?column?
──
 [null]
(1 row)

so this is one known issue.

Actually rowvar := NULL <-> reset all fields inside row. I think so
this is perfect from perspective "IS [NOT] NULL" operator. But maybe
it isn't practical. So we can distinct between assign some field to
NULL and between assign row variable to NULL. This flag can be used
just only for returning value. Some like

DECLARE r rowtype;
BEGIN
  IF a = 1 THEN
RETURN r; -- result is NULL
  ELSIF a = 2 THEN
r.x := NULL;
RETURN r; -- result is ()
  ELSIF a = 3 THEN
r.x := NULL;
r := NULL;
RETURN r; -- result is NULL;

comments? Is this change some what we would?

next question? I found one paradox. When some IS NULL, then any
operation with this value should be NULL. But it isn't true for
composite values!

postgres=# CREATE TYPE t AS (a int, b int);
CREATE TYPE
Time: 66.605 ms
postgres=# SELECT 'Hello' || (NULL, NULL)::t;
 ?column?
──
 Hello(,)
(1 row)

postgres=# SELECT  (NULL, NULL)::t is null;
 ?column?
──
 t
(1 row)

does know somebody if this behave is good per ANSI SQL?

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_wal_senders must die

2010-10-28 Thread Kevin Grittner
"Joshua D. Drake"  wrote:
> On Wed, 2010-10-27 at 19:52 -0400, Robert Haas wrote:
>> Josh Berkus  wrote:
>  
>>> *you don't know* how many .org users plan to implement
>>> replication, whether it's a minority or majority.
>>
>> None of us know. What I do know is that I don't want PostgreSQL to
>> be slower out of the box.
> 
> Poll TIME!
 
If you do take a poll, be careful to put in an option or two to deal
with environments where there is "surgical" implementation of
replication features.  We'll almost certainly be using SR with a
custom WAL receiver as part of our solution for our biggest and most
distributed data set (circuit court data), but an "out of the box"
drop in usage there is not in the cards anytime soon; whereas we're
already using HS/SR "out of the box" for a small RoR web app's data.
 
By the way, the other three DBAs here implemented the HS/SR while I
was out for a couple days while my dad was in the hospital (so they
didn't want to even bother me with a phone call).  They went straight
from the docs, without the benefit of having tracked any PostgreSQL
lists.  They told me that it was working great once they figured it
out, but it was confusing; it took them a lot of time and a few false
starts to get it working.  I've been trying to get details to support
an improvement in documentation, but if those guys had problems I
agree we need to do *something* to make this simpler -- they're
bright professionals who manage hundreds of PostgreSQL databases on a
full time basis.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Heikki Linnakangas

On 28.10.2010 13:54, Boszormenyi Zoltan wrote:

A little better version, no need for the heavy hash_any, hash_uint32
on the lower 32 bits on pk_eclass is enough. The profiling runtime
is now 0.42 seconds vs the previous 0.41 seconds for the tree version.


Actually, I wonder if we could just have a separate canon_pathkeys list 
for each EquivalenceClass, instead of one big list in PlannerInfo. I'm 
not too familiar with equivalence classes and all that, but the attached 
patch at least passes the regressions. I haven't done any performance 
testing, but I would expect this to be even faster than the hashtable or 
tree implementations, and a lot simpler.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ee2aeb0..7a12c47 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1595,7 +1595,6 @@ _outPlannerInfo(StringInfo str, PlannerInfo *node)
 	WRITE_NODE_FIELD(init_plans);
 	WRITE_NODE_FIELD(cte_plan_ids);
 	WRITE_NODE_FIELD(eq_classes);
-	WRITE_NODE_FIELD(canon_pathkeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -1692,6 +1691,7 @@ _outEquivalenceClass(StringInfo str, EquivalenceClass *node)
 	WRITE_BOOL_FIELD(ec_below_outer_join);
 	WRITE_BOOL_FIELD(ec_broken);
 	WRITE_UINT_FIELD(ec_sortref);
+	WRITE_NODE_FIELD(ec_canon_pathkeys);
 }
 
 static void
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 643d57a..d5e5c42 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -93,9 +93,10 @@ make_canonical_pathkey(PlannerInfo *root,
 	while (eclass->ec_merged)
 		eclass = eclass->ec_merged;
 
-	foreach(lc, root->canon_pathkeys)
+	foreach(lc, eclass->ec_canon_pathkeys)
 	{
 		pk = (PathKey *) lfirst(lc);
+		Assert(eclass == pk->pk_eclass);
 		if (eclass == pk->pk_eclass &&
 			opfamily == pk->pk_opfamily &&
 			strategy == pk->pk_strategy &&
@@ -110,7 +111,7 @@ make_canonical_pathkey(PlannerInfo *root,
 	oldcontext = MemoryContextSwitchTo(root->planner_cxt);
 
 	pk = makePathKey(eclass, opfamily, strategy, nulls_first);
-	root->canon_pathkeys = lappend(root->canon_pathkeys, pk);
+	eclass->ec_canon_pathkeys = lappend(eclass->ec_canon_pathkeys, pk);
 
 	MemoryContextSwitchTo(oldcontext);
 
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fd4c6f5..5f8f817 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -117,7 +117,10 @@ query_planner(PlannerInfo *root, List *tlist,
 		 * We still are required to canonicalize any pathkeys, in case it's
 		 * something like "SELECT 2+2 ORDER BY 1".
 		 */
+/* XXX: Same as below */
+#if 0
 		root->canon_pathkeys = NIL;
+#endif
 		root->query_pathkeys = canonicalize_pathkeys(root,
 	 root->query_pathkeys);
 		root->group_pathkeys = canonicalize_pathkeys(root,
@@ -145,7 +148,13 @@ query_planner(PlannerInfo *root, List *tlist,
 	root->join_rel_hash = NULL;
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
-	root->canon_pathkeys = NIL;
+/* XXX
+ * Do we need to reset something here? This is just initializing otherwise
+ * uninitialized fields, right?
+ */
+#if 0
+		root->canon_pathkeys = NIL;
+#endif
 	root->left_join_clauses = NIL;
 	root->right_join_clauses = NIL;
 	root->full_join_clauses = NIL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6e3d0f3..c959708 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -160,8 +160,6 @@ typedef struct PlannerInfo
 
 	List	   *eq_classes;		/* list of active EquivalenceClasses */
 
-	List	   *canon_pathkeys; /* list of "canonical" PathKeys */
-
 	List	   *left_join_clauses;		/* list of RestrictInfos for
 		 * mergejoinable outer join clauses
 		 * w/nonnullable var on left */
@@ -527,6 +525,7 @@ typedef struct EquivalenceClass
 	bool		ec_below_outer_join;	/* equivalence applies below an OJ */
 	bool		ec_broken;		/* failed to generate needed clauses? */
 	Index		ec_sortref;		/* originating sortclause label, or 0 */
+	List	   *ec_canon_pathkeys;
 	struct EquivalenceClass *ec_merged; /* set if merged into another EC */
 } EquivalenceClass;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Boszormenyi Zoltan
Boszormenyi Zoltan írta:
> Boszormenyi Zoltan írta:
>   
>> Boszormenyi Zoltan írta:
>>   
>> 
>>> Heikki Linnakangas írta:
>>>   
>>> 
>>>   
 On 26.10.2010 18:34, Boszormenyi Zoltan wrote:
 
   
 
> thank you very much for pointing me to dynahash, here is the
> next version that finally seems to work.
>
> Two patches are attached, the first is the absolute minimum for
> making it work, this still has the Tree type for canon_pathkeys
> and eq_classes got the same treatment as join_rel_list/join_rel_hash
> has in the current sources: if the list grows larger than 32, a hash
> table
> is created. It seems to be be enough for doing in for
>   get_eclass_for_sort_expr()
> only, the other users of eq_classes aren't bothered by this change.
>   
> 
>   
 That's better, but can't you use dynahash for canon_pathkeys as well?
 
   
 
>>> Here's a purely dynahash solution. It's somewhat slower than
>>> the tree version, 0.45 vs 0.41 seconds in the cached case for the
>>> previously posted test case.
>>>   
>>> 
>>>   
>> And now in context diff, sorry for my affection towards unified diffs. :-)
>>   
>> 
>
> A little better version, no need for the heavy hash_any, hash_uint32
> on the lower 32 bits on pk_eclass is enough. The profiling runtime
> is now 0.42 seconds vs the previous 0.41 seconds for the tree version.
>
> Best regards,
> Zoltán Böszörményi
>   

Btw, the top entries in the current gprof output are:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total  
 time   seconds   secondscalls  ms/call  ms/call  name   
 19.05  0.08 0.08  482 0.17 0.29 
add_child_rel_equivalences
 11.90  0.13 0.05  1133447 0.00 0.00  bms_is_subset
  9.52  0.17 0.04   331162 0.00 0.00 
hash_search_with_hash_value
  7.14  0.20 0.03   548971 0.00 0.00  AllocSetAlloc
  4.76  0.22 0.02 2858 0.01 0.01  get_tabstat_entry
  4.76  0.24 0.02 1136 0.02 0.02  tzload

This means add_child_rel_equivalences() is still takes
too much time, the previously posted test case calls this
function 482 times, it's called for almost  every 10th entry
added to eq_classes. The elog() I put into this function says
that at the last call list_length(eq_classes) == 4754.

Best regards,
Zoltán Böszörményi

-- 
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-28 Thread Pavel Golub
Hello, Andrew.

You wrote:


AD> It occurred to me in the dead of the night that instead of:

AD> ALTER TYPE enumtype ADD 'newlabel'


AD> it might be better to have:

AD> ALTER TYPE enumtype ADD LABEL 'newlabel'


AD> That way if we later wanted to support some other sort of ADD operation
AD> on types we would be able to more easily. LABEL is already a keyword, so
AD> it should be pretty minimally invasive to make this change, and if we 
AD> want to do it now is the time.

AD> Thoughts?

AD> cheers

AD> andrew


Forgot link to poll:
http://pgolub.wordpress.com/2010/10/28/poll-alter-type-enumtype-add-what-newlabel/

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add label to enum syntax

2010-10-28 Thread Pavel Golub
Hello, Andrew.

You wrote:


AD> It occurred to me in the dead of the night that instead of:

AD> ALTER TYPE enumtype ADD 'newlabel'


AD> it might be better to have:

AD> ALTER TYPE enumtype ADD LABEL 'newlabel'


AD> That way if we later wanted to support some other sort of ADD operation
AD> on types we would be able to more easily. LABEL is already a keyword, so
AD> it should be pretty minimally invasive to make this change, and if we 
AD> want to do it now is the time.

AD> Thoughts?

AD> cheers

AD> andrew


I'm with you.

BTW, I wrote post with poll about this.

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Boszormenyi Zoltan
Boszormenyi Zoltan írta:
> Boszormenyi Zoltan írta:
>   
>> Heikki Linnakangas írta:
>>   
>> 
>>> On 26.10.2010 18:34, Boszormenyi Zoltan wrote:
>>> 
>>>   
 thank you very much for pointing me to dynahash, here is the
 next version that finally seems to work.

 Two patches are attached, the first is the absolute minimum for
 making it work, this still has the Tree type for canon_pathkeys
 and eq_classes got the same treatment as join_rel_list/join_rel_hash
 has in the current sources: if the list grows larger than 32, a hash
 table
 is created. It seems to be be enough for doing in for
   get_eclass_for_sort_expr()
 only, the other users of eq_classes aren't bothered by this change.
   
 
>>> That's better, but can't you use dynahash for canon_pathkeys as well?
>>> 
>>>   
>> Here's a purely dynahash solution. It's somewhat slower than
>> the tree version, 0.45 vs 0.41 seconds in the cached case for the
>> previously posted test case.
>>   
>> 
>
> And now in context diff, sorry for my affection towards unified diffs. :-)
>   

A little better version, no need for the heavy hash_any, hash_uint32
on the lower 32 bits on pk_eclass is enough. The profiling runtime
is now 0.42 seconds vs the previous 0.41 seconds for the tree version.

Best regards,
Zoltán Böszörményi

-- 
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/

diff -dcrpN postgresql.orig/src/backend/optimizer/path/equivclass.c postgresql.1/src/backend/optimizer/path/equivclass.c
*** postgresql.orig/src/backend/optimizer/path/equivclass.c	2010-10-15 10:31:47.0 +0200
--- postgresql.1/src/backend/optimizer/path/equivclass.c	2010-10-28 12:37:47.0 +0200
***
*** 24,29 
--- 24,30 
  #include "optimizer/planmain.h"
  #include "optimizer/prep.h"
  #include "optimizer/var.h"
+ #include "utils/hsearch.h"
  #include "utils/lsyscache.h"
  
  
*** add_eq_member(EquivalenceClass *ec, Expr
*** 360,434 
  
  
  /*
!  * get_eclass_for_sort_expr
!  *	  Given an expression and opfamily info, find an existing equivalence
!  *	  class it is a member of; if none, build a new single-member
!  *	  EquivalenceClass for it.
!  *
!  * sortref is the SortGroupRef of the originating SortGroupClause, if any,
!  * or zero if not.	(It should never be zero if the expression is volatile!)
!  *
!  * This can be used safely both before and after EquivalenceClass merging;
!  * since it never causes merging it does not invalidate any existing ECs
!  * or PathKeys.
!  *
!  * Note: opfamilies must be chosen consistently with the way
!  * process_equivalence() would do; that is, generated from a mergejoinable
!  * equality operator.  Else we might fail to detect valid equivalences,
!  * generating poor (but not incorrect) plans.
   */
! EquivalenceClass *
! get_eclass_for_sort_expr(PlannerInfo *root,
! 		 Expr *expr,
! 		 Oid expr_datatype,
! 		 List *opfamilies,
! 		 Index sortref)
  {
! 	EquivalenceClass *newec;
! 	EquivalenceMember *newem;
  	ListCell   *lc1;
! 	MemoryContext oldcontext;
  
  	/*
! 	 * Scan through the existing EquivalenceClasses for a match
  	 */
! 	foreach(lc1, root->eq_classes)
  	{
! 		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
! 		ListCell   *lc2;
  
  		/*
! 		 * Never match to a volatile EC, except when we are looking at another
! 		 * reference to the same volatile SortGroupClause.
  		 */
! 		if (cur_ec->ec_has_volatile &&
! 			(sortref == 0 || sortref != cur_ec->ec_sortref))
! 			continue;
! 
! 		if (!equal(opfamilies, cur_ec->ec_opfamilies))
  			continue;
  
! 		foreach(lc2, cur_ec->ec_members)
  		{
! 			EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
! 
! 			/*
! 			 * If below an outer join, don't match constants: they're not as
! 			 * constant as they look.
! 			 */
! 			if (cur_ec->ec_below_outer_join &&
! cur_em->em_is_const)
! continue;
  
! 			if (expr_datatype == cur_em->em_datatype &&
! equal(expr, cur_em->em_expr))
! return cur_ec;	/* Match! */
  		}
  	}
  
  	/*
- 	 * No match, so build a new single-member EC
- 	 *
  	 * Here, we must be sure that we construct the EC in the right context. We
  	 * can assume, however, that the passed expr is long-lived.
  	 */
--- 361,463 
  
  
  /*
!  * eq_classes_match - matching function for eq_classes_hash in PlannerInfo
   */
! static int
! eq_classes_match(const void *key1, const void *key2, Size keysize)
  {
! 	EquivalenceClass *ec1 = (EquivalenceClass *) key1; /* this is in the hashtable */
! 	EquivalenceClass *ec2 = (EquivalenceClass *) key2; /* this is the new matched entry */
  	ListCell   *lc1;
! 	ListCell   *lc2;
  
  	/*
! 	 * Never match to a volatile EC, except when we are looking at another
! 	 * reference to the same volatile S

[HACKERS] Error message with COMMIT/ROLLBACK within PL/pgSQL

2010-10-28 Thread Heikki Linnakangas

This is not very user-friendly:

postgres=# do $$
begin
  COMMIT;
end;
$$;
ERROR:  SPI_execute_plan_with_paramlist failed executing query "COMMIT": 
SPI_ERROR_TRANSACTION

CONTEXT:  PL/pgSQL function "inline_code_block" line 2 at SQL statement

Clearly we don't support that, but seems like it would deserve a better 
error message.


Curiously, we *do* give a better error message if you try that with EXECUTE:

postgres=# do $$
begin
  EXECUTE 'COMMIT';
end;
$$;
ERROR:  cannot begin/end transactions in PL/pgSQL
HINT:  Use a BEGIN block with an EXCEPTION clause instead.
CONTEXT:  PL/pgSQL function "inline_code_block" line 2 at EXECUTE statement

Barring objections, I'll add the same error message to the non-execute 
codepath.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 9929e04..a21ea53 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2889,6 +2889,17 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 			exec_set_found(estate, false);
 			break;
 
+			/* Some SPI errors deserve specific error messages */
+		case SPI_ERROR_COPY:
+			ereport(ERROR,
+	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+	 errmsg("cannot COPY to/from client in PL/pgSQL")));
+		case SPI_ERROR_TRANSACTION:
+			ereport(ERROR,
+	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+	 errmsg("cannot begin/end transactions in PL/pgSQL"),
+			errhint("Use a BEGIN block with an EXCEPTION clause instead.")));
+
 		default:
 			elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",
  expr->query, SPI_result_code_string(rc));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Boszormenyi Zoltan
Boszormenyi Zoltan írta:
> Heikki Linnakangas írta:
>   
>> On 26.10.2010 18:34, Boszormenyi Zoltan wrote:
>> 
>>> thank you very much for pointing me to dynahash, here is the
>>> next version that finally seems to work.
>>>
>>> Two patches are attached, the first is the absolute minimum for
>>> making it work, this still has the Tree type for canon_pathkeys
>>> and eq_classes got the same treatment as join_rel_list/join_rel_hash
>>> has in the current sources: if the list grows larger than 32, a hash
>>> table
>>> is created. It seems to be be enough for doing in for
>>>   get_eclass_for_sort_expr()
>>> only, the other users of eq_classes aren't bothered by this change.
>>>   
>> That's better, but can't you use dynahash for canon_pathkeys as well?
>> 
>
> Here's a purely dynahash solution. It's somewhat slower than
> the tree version, 0.45 vs 0.41 seconds in the cached case for the
> previously posted test case.
>   

And now in context diff, sorry for my affection towards unified diffs. :-)

> Best regards,
> Zoltán Böszörményi
>
>   


-- 
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/

diff -dcrpN postgresql.orig/src/backend/optimizer/path/equivclass.c postgresql.1/src/backend/optimizer/path/equivclass.c
*** postgresql.orig/src/backend/optimizer/path/equivclass.c	2010-10-15 10:31:47.0 +0200
--- postgresql.1/src/backend/optimizer/path/equivclass.c	2010-10-26 17:01:57.0 +0200
***
*** 24,29 
--- 24,30 
  #include "optimizer/planmain.h"
  #include "optimizer/prep.h"
  #include "optimizer/var.h"
+ #include "utils/hsearch.h"
  #include "utils/lsyscache.h"
  
  
*** add_eq_member(EquivalenceClass *ec, Expr
*** 360,434 
  
  
  /*
!  * get_eclass_for_sort_expr
!  *	  Given an expression and opfamily info, find an existing equivalence
!  *	  class it is a member of; if none, build a new single-member
!  *	  EquivalenceClass for it.
!  *
!  * sortref is the SortGroupRef of the originating SortGroupClause, if any,
!  * or zero if not.	(It should never be zero if the expression is volatile!)
!  *
!  * This can be used safely both before and after EquivalenceClass merging;
!  * since it never causes merging it does not invalidate any existing ECs
!  * or PathKeys.
!  *
!  * Note: opfamilies must be chosen consistently with the way
!  * process_equivalence() would do; that is, generated from a mergejoinable
!  * equality operator.  Else we might fail to detect valid equivalences,
!  * generating poor (but not incorrect) plans.
   */
! EquivalenceClass *
! get_eclass_for_sort_expr(PlannerInfo *root,
! 		 Expr *expr,
! 		 Oid expr_datatype,
! 		 List *opfamilies,
! 		 Index sortref)
  {
! 	EquivalenceClass *newec;
! 	EquivalenceMember *newem;
  	ListCell   *lc1;
! 	MemoryContext oldcontext;
  
  	/*
! 	 * Scan through the existing EquivalenceClasses for a match
  	 */
! 	foreach(lc1, root->eq_classes)
  	{
! 		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
! 		ListCell   *lc2;
  
  		/*
! 		 * Never match to a volatile EC, except when we are looking at another
! 		 * reference to the same volatile SortGroupClause.
  		 */
! 		if (cur_ec->ec_has_volatile &&
! 			(sortref == 0 || sortref != cur_ec->ec_sortref))
! 			continue;
! 
! 		if (!equal(opfamilies, cur_ec->ec_opfamilies))
  			continue;
  
! 		foreach(lc2, cur_ec->ec_members)
  		{
! 			EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
! 
! 			/*
! 			 * If below an outer join, don't match constants: they're not as
! 			 * constant as they look.
! 			 */
! 			if (cur_ec->ec_below_outer_join &&
! cur_em->em_is_const)
! continue;
  
! 			if (expr_datatype == cur_em->em_datatype &&
! equal(expr, cur_em->em_expr))
! return cur_ec;	/* Match! */
  		}
  	}
  
  	/*
- 	 * No match, so build a new single-member EC
- 	 *
  	 * Here, we must be sure that we construct the EC in the right context. We
  	 * can assume, however, that the passed expr is long-lived.
  	 */
--- 361,463 
  
  
  /*
!  * eq_classes_match - matching function for eq_classes_hash in PlannerInfo
   */
! static int
! eq_classes_match(const void *key1, const void *key2, Size keysize)
  {
! 	EquivalenceClass *ec1 = (EquivalenceClass *) key1; /* this is in the hashtable */
! 	EquivalenceClass *ec2 = (EquivalenceClass *) key2; /* this is the new matched entry */
  	ListCell   *lc1;
! 	ListCell   *lc2;
  
  	/*
! 	 * Never match to a volatile EC, except when we are looking at another
! 	 * reference to the same volatile SortGroupClause.
  	 */
! 	if (ec1->ec_has_volatile &&
! 		(ec2->ec_sortref == 0 || ec2->ec_sortref != ec1->ec_sortref))
! 		return 1;
! 
! 	if (!equal(ec1->ec_opfamilies, ec2->ec_opfamilies))
! 		return 1;
! 
! 	foreach(lc1, ec1->ec_members)
  	{
! 		EquivalenceMember *em1 = (EquivalenceMember *) lfirst(lc1

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-28 Thread Boszormenyi Zoltan
Heikki Linnakangas írta:
> On 26.10.2010 18:34, Boszormenyi Zoltan wrote:
>> thank you very much for pointing me to dynahash, here is the
>> next version that finally seems to work.
>>
>> Two patches are attached, the first is the absolute minimum for
>> making it work, this still has the Tree type for canon_pathkeys
>> and eq_classes got the same treatment as join_rel_list/join_rel_hash
>> has in the current sources: if the list grows larger than 32, a hash
>> table
>> is created. It seems to be be enough for doing in for
>>   get_eclass_for_sort_expr()
>> only, the other users of eq_classes aren't bothered by this change.
>
> That's better, but can't you use dynahash for canon_pathkeys as well?

Here's a purely dynahash solution. It's somewhat slower than
the tree version, 0.45 vs 0.41 seconds in the cached case for the
previously posted test case.

Best regards,
Zoltán Böszörményi

-- 
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/

diff -durpN postgresql.orig/src/backend/optimizer/path/equivclass.c postgresql.1/src/backend/optimizer/path/equivclass.c
--- postgresql.orig/src/backend/optimizer/path/equivclass.c	2010-10-15 10:31:47.0 +0200
+++ postgresql.1/src/backend/optimizer/path/equivclass.c	2010-10-26 17:01:57.0 +0200
@@ -24,6 +24,7 @@
 #include "optimizer/planmain.h"
 #include "optimizer/prep.h"
 #include "optimizer/var.h"
+#include "utils/hsearch.h"
 #include "utils/lsyscache.h"
 
 
@@ -360,75 +361,103 @@ add_eq_member(EquivalenceClass *ec, Expr
 
 
 /*
- * get_eclass_for_sort_expr
- *	  Given an expression and opfamily info, find an existing equivalence
- *	  class it is a member of; if none, build a new single-member
- *	  EquivalenceClass for it.
- *
- * sortref is the SortGroupRef of the originating SortGroupClause, if any,
- * or zero if not.	(It should never be zero if the expression is volatile!)
- *
- * This can be used safely both before and after EquivalenceClass merging;
- * since it never causes merging it does not invalidate any existing ECs
- * or PathKeys.
- *
- * Note: opfamilies must be chosen consistently with the way
- * process_equivalence() would do; that is, generated from a mergejoinable
- * equality operator.  Else we might fail to detect valid equivalences,
- * generating poor (but not incorrect) plans.
+ * eq_classes_match - matching function for eq_classes_hash in PlannerInfo
  */
-EquivalenceClass *
-get_eclass_for_sort_expr(PlannerInfo *root,
-		 Expr *expr,
-		 Oid expr_datatype,
-		 List *opfamilies,
-		 Index sortref)
+static int
+eq_classes_match(const void *key1, const void *key2, Size keysize)
 {
-	EquivalenceClass *newec;
-	EquivalenceMember *newem;
+	EquivalenceClass *ec1 = (EquivalenceClass *) key1; /* this is in the hashtable */
+	EquivalenceClass *ec2 = (EquivalenceClass *) key2; /* this is the new matched entry */
 	ListCell   *lc1;
-	MemoryContext oldcontext;
+	ListCell   *lc2;
 
 	/*
-	 * Scan through the existing EquivalenceClasses for a match
+	 * Never match to a volatile EC, except when we are looking at another
+	 * reference to the same volatile SortGroupClause.
 	 */
-	foreach(lc1, root->eq_classes)
+	if (ec1->ec_has_volatile &&
+		(ec2->ec_sortref == 0 || ec2->ec_sortref != ec1->ec_sortref))
+		return 1;
+
+	if (!equal(ec1->ec_opfamilies, ec2->ec_opfamilies))
+		return 1;
+
+	foreach(lc1, ec1->ec_members)
 	{
-		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
-		ListCell   *lc2;
+		EquivalenceMember *em1 = (EquivalenceMember *) lfirst(lc1);
 
 		/*
-		 * Never match to a volatile EC, except when we are looking at another
-		 * reference to the same volatile SortGroupClause.
+		 * If below an outer join, don't match constants: they're not as
+		 * constant as they look.
 		 */
-		if (cur_ec->ec_has_volatile &&
-			(sortref == 0 || sortref != cur_ec->ec_sortref))
-			continue;
-
-		if (!equal(opfamilies, cur_ec->ec_opfamilies))
+		if (ec1->ec_below_outer_join &&
+			em1->em_is_const)
 			continue;
 
-		foreach(lc2, cur_ec->ec_members)
+		foreach(lc2, ec2->ec_members)
 		{
-			EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
-
-			/*
-			 * If below an outer join, don't match constants: they're not as
-			 * constant as they look.
-			 */
-			if (cur_ec->ec_below_outer_join &&
-cur_em->em_is_const)
-continue;
+			EquivalenceMember *em2 = (EquivalenceMember *) lfirst(lc2);
 
-			if (expr_datatype == cur_em->em_datatype &&
-equal(expr, cur_em->em_expr))
-return cur_ec;	/* Match! */
+			if (em1->em_datatype == em2->em_datatype &&
+equal(em1->em_expr, em2->em_expr))
+return 0;
 		}
 	}
 
+	return 1;
+}
+
+
+/*
+ * build_eq_classes_hash
+ *	Build the initial contents of PlannerInfo.eq_classes_hash
+ *	for faster search in PlannerInfo.eq_classes. This is used
+ *	to  make   get_eclass_for_sort_expr()  faster  for  la