Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Dimitri Fontaine
Thom Brown t...@linux.com writes:
 Will you also be committing the trigger function variable changes
 shortly?  I don't wish to test anything prior to this as that will
 involve a complete re-test from my side anyway.

It's on its way, I had to spend time elsewhere, sorry about that. With
some luck I can post a intermediate patch later this evening limited to
PLpgSQL support for your testing.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 1:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  For an example in our own
 problem space look at mysql, whose regression tests take well over an
 hour to run on a fast box.  So they must be damn near bug-free right?
 Uh, not so much, and I think the fact that developers can't easily run
 their test suite is not unrelated to that.

The other problem with this approach is that it's hard to keep a huge
test suite 100% clean. Changes inevitably introduce behaviour changes
that cause some of the tests to fail. If the test suite is huge then
it's a lot of work to be continually fixing these tests and you're
always behind. If it's always the case that some tests in this huge
suite are failing then it's extra work whenever you make a change to
dig through the results and determine whether any of the failures are
caused by your changes and represent a real problem. Even if you do
the work it's easy to get it wrong and miss a real failure.

My suggestion would be to go ahead and check in the python or perl
script but not make that the pg_regress tests that are run by mak
check. Cherry pick just a good set of tests that test most of the
tricky bits and check that in to run on make test. I tihnk there's
even precedent for that in one of the other modules that has a make
longcheck or make slowcheck or something like that.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Andres Freund
On Sunday, March 18, 2012 07:29:30 PM Tom Lane wrote:
 BTW, I've been looking through how to do what I suggested earlier to get
 rid of the coziness and code duplication between CreateTableAs and the
 prepare.c code; namely, let CreateTableAs create a DestReceiver and then
 call ExecuteQuery with that receiver.  It appears that we still need at
 least two bits of added complexity with that approach:
 
 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so
 that it can enforce that the prepared query is a SELECT.  (BTW, maybe
 this should be weakened to something that returns tuples, in view of
 RETURNING?)
I don't really see the use case but given the amount of work it probably takes 
it seems reasonable to allow that.

 2. Since ExecuteQuery goes through the Portal machinery, there's no way
 for it to pass in eflags to control the table OIDs setup.  This is
 easily solved by adding an eflags parameter to PortalStart, which
 doesn't seem too awful to me, since the typical caller would just pass
 zero.  (ExecuteQuery would also have to know about testing
 interpretOidsOption to compute the eflags to use, unless we add an
 eflags parameter to it, which might be the cleaner option.)
If we go down this route I think adding an eflag is the better choice. Thinking 
of it - my patch already added that ;)

 In short I'm thinking: add an eflags parameter to PortalStart, and add
 both an eflags parameter and a bool mustReturnTuples parameter to
 ExecuteQuery.  This definitely seems cleaner than the current
 duplication of code.
Hm. I am not *that* convinced anymore. It wasn't that much duplication in the 
end...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and pg_config dependency

2012-03-19 Thread Bruce Momjian
On Fri, Mar 16, 2012 at 08:11:17PM -0300, Alvaro Herrera wrote:
 
 Excerpts from Bruce Momjian's message of vie mar 16 20:06:28 -0300 2012:
  Àlvaro told me he got a Spanish-language report that pg_upgrade
  failed because it required pg_config, and pg_config is only supplied
  with the devel packages.
  
  I initially thought that it was a packaging problem, but I later
  realized the pg_config is mostly developer settings, and that using
  pg_config was not getting any change to libdir by dynamic_library_path
  in postgresql.conf, and that I should just merge the pg_upgrade_support
  detection code into the existing shared library detection LOAD code I
  already had.
  
  This avoids the pg_config dependency, works better for libdir, and
  reduces the amount of code.
 
 Bruce also found a reference to this old bug report:
 http://archives.postgresql.org/pgsql-bugs/2011-12/msg00254.php
 This includes a link to a Debian bug report by Peter.
 
  Patch attached.  Should this be backpatched to PG 9.1;  I think so.
 
 +1

Applied, and backpatched to 9.1.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-03-19 Thread Robert Haas
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark st...@mit.edu wrote:
 On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote:
 12789    28.2686  libc-2.13.so             strcoll_l
 6802     15.0350  postgres                 text_cmp

 I'm still curious how it would compare to call strxfrm and sort the
 resulting binary blobs. I don't think the sortsupport stuff actually
 makes this any easier though. Since using it requires storing the
 binary blob somewhere I think the support would have to be baked into
 tuplesort (or hacked into the sortkey as an expr that was evaluated
 earlier somehow).

Well, the real problem here is that the strxfrm'd representations
aren't just bigger - they are huge.  On MacBook Pro, if the input
representation is n characters, the strxfrm'd representation is 9x+3
characters.  If the we're sorting very wide tuples of which the sort
key is only a small part, maybe that would be acceptable, but if the
sort key makes up the bulk of the tuple than caching the strxfrm()
representation works out to slashing work_mem tenfold.  That might be
just fine if the sort is going to fit in work_mem either way, but if
it turns a quicksort into a heap sort then I feel pretty confident
that it's going to be a loser.  Keep in mind that even if the
strxfrm'd representation were no larger at all, it would still amount
to an additional copy of the data, so you'd still potentially be
eating up lots of work_mem that way.  The fact that it's an order of
magnitude larger is just making a bad problem worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, it occurred to me that we could pretty easily jury-rig
 something that would give us an idea about the actual benefit available
 here.  To wit: make a C function that wraps strxfrm, basically
 strxfrm(text) returns bytea.  Then compare the performance of
 ORDER BY text_col to ORDER BY strxfrm(text_col).

 (You would need to have either both or neither of text and bytea
 using the sortsupport code paths for this to be a fair comparison.)

Since the index will be ~9x bigger at least on this machine, I think I
know the answer, but I suppose it doesn't hurt to test it.  It's not
that much work.

 One other thing I've always wondered about in this connection is the
 general performance of sorting toasted datums.  Is it better to detoast
 them in every comparison, or pre-detoast to save comparison cycles at
 the cost of having to push much more data around?  I didn't see any
 discussion of this point in Robert's benchmarks, but I don't think we
 should go very far towards enabling sortsupport for text until we
 understand the issue and know whether we need to add more infrastructure
 for it.  If you cross your eyes a little bit, this is very much like
 the strxfrm question...

It would be surprising to me if there is a one-size-fits-all answer to
this question.  For example, if the tuple got toasted because it's got
lots of columns and we had to take desperate measures to get it to fit
into an 8kB block at all, chances are that detoasting will work out
well.  We'll use a bit more memory, but hopefully that'll be repaid by
much faster comparisons.  OTOH, if you have a data set with many
relatively short strings and a few very long ones, detoasting up-front
could turn a quicksort into a heapsort.  Since only a small fraction
of the comparisons would have involved one of the problematic long
strings anyway, it's unlikely to be worth the expense of keeping those
strings around in detoasted form for the entire sort (unless maybe
reconstructing them even a few times is problematic because we're
under heavy cache pressure and we get lots of disk seeks as a result).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so
 that it can enforce that the prepared query is a SELECT.  (BTW, maybe
 this should be weakened to something that returns tuples, in view of
 RETURNING?)

+1 for something that returns with tuples.   CREATE TABLE ... AS
DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to
support.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so
 that it can enforce that the prepared query is a SELECT.  (BTW, maybe
 this should be weakened to something that returns tuples, in view of
 RETURNING?)

 +1 for something that returns with tuples.   CREATE TABLE ... AS
 DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to
 support.

For the moment I've backed off that idea.  The main definitional
question we'd have to resolve is whether we want to allow WITH NO DATA,
and if so what does that mean (should the DELETE execute, or not?).
I am also not certain that the RETURNING code paths would cope with
a WITH OIDS specification, and there are some other things that would
need fixed.  It might be cool to do it sometime, but it's not going to
happen in this patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-19 Thread Robert Haas
On Sat, Mar 17, 2012 at 3:58 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tor, 2012-03-01 at 19:19 +0200, Peter Eisentraut wrote:
 I think the best fix would be to rearrange _PrintFileData() so that it
 doesn't use FH at all.  Instead, we could define a separate
 ArchiveHandle field IF that works more like OF, and then change
 ahwrite() to use that.

 Here is a patch that might fix this.  I haven't been able to test this
 properly, so this is just from tracing the code.  It looks like
 _PrintFileData() doesn't need to use FH at all, so it could use a local
 file handle variable instead.  Could someone verify this please?

It looks like this code can be used via the undocumented -Ff option to
pg_dump.  But considering this code was added in 2000 as demonstration
code and has apparently never been documented, and considering also
that we now have the directory archive format which is presumably
quite a similar idea but documented and intended for production use,
maybe we should just rip out pg_backup_files/archFiles altogether.
pg_dump is crufty enough without supporting undocumented and obsolete
options for multiple decades.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Robert Haas
On Mon, Mar 19, 2012 at 12:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so
 that it can enforce that the prepared query is a SELECT.  (BTW, maybe
 this should be weakened to something that returns tuples, in view of
 RETURNING?)

 +1 for something that returns with tuples.   CREATE TABLE ... AS
 DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to
 support.

 For the moment I've backed off that idea.  The main definitional
 question we'd have to resolve is whether we want to allow WITH NO DATA,
 and if so what does that mean (should the DELETE execute, or not?).
 I am also not certain that the RETURNING code paths would cope with
 a WITH OIDS specification, and there are some other things that would
 need fixed.  It might be cool to do it sometime, but it's not going to
 happen in this patch.

Fair enough.  It would be nice to have, but it definitely does not
seem worth spending a lot of time on right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Peter Eisentraut
On sön, 2012-03-18 at 21:16 -0400, Tom Lane wrote:
 If we were going to change the output at all, I would vote for CREATE
 TABLE  so as to preserve the rowcount functionality.  Keep in
 mind though that this would force client-side changes, for instance in
 libpq's PQcmdTuples().  Fixing that one routine isn't so painful, but
 what of other client-side libraries, not to mention applications?

Doesn't seem worth it to me.  At least, SELECT  makes some sense:
 rows were selected.  CREATE TABLE  means what?   tables
were created?

What might make sense is to delegate this additional information to
separate fields in a future protocol revision.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Robert Haas
On Mon, Mar 19, 2012 at 12:53 PM, Peter Eisentraut pete...@gmx.net wrote:
 On sön, 2012-03-18 at 21:16 -0400, Tom Lane wrote:
 If we were going to change the output at all, I would vote for CREATE
 TABLE  so as to preserve the rowcount functionality.  Keep in
 mind though that this would force client-side changes, for instance in
 libpq's PQcmdTuples().  Fixing that one routine isn't so painful, but
 what of other client-side libraries, not to mention applications?

 Doesn't seem worth it to me.  At least, SELECT  makes some sense:
  rows were selected.  CREATE TABLE  means what?   tables
 were created?

 What might make sense is to delegate this additional information to
 separate fields in a future protocol revision.

I think that we would not have bothered to add the row count to the
command tag output for SELECT unless it were useful.  It seems to be
*more* useful for CTAS than for SELECT; after all, SELECT also returns
the actual rows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-03-19 Thread Marco Nenciarini
Hi Noah,

thank you again for your thorough review, which is much appreciated.

 I think the patch has reached the stage where a committer can review
 it without wasting much time on things that might change radically.
 So, I'm marking it Ready for Committer.  Please still submit an update
 correcting the above; I'm sure your committer will appreciate it.

Attached is v5, which should address all the remaining issues.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it 

On Fri, Mar 16, 2012 at 11:33:12PM -0400, Noah Misch wrote:
 I recommend removing the new block of code in RI_FKey_eachcascade_del() and
 letting array_remove() throw the error.  If you find a way to throw a nicer
 error without an extra scan, by all means submit that to a future CF as an
 improvement.  I don't think it's important enough to justify cycles at this
 late hour of the current CF.

It makes sense; we have removed the block of code and updated the error
message following your suggestion. Now the error is thrown by array_remove()
itself.
We'll keep an eye on this for further improvements in the future.


   pg_constraint.confeach needs documentation.
  
  Most of pg_constraint columns, including all the boolean ones, are
  documented only in the description column of
  
  http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760
  
  it seems that confiseach should not be an exception, since it just
  indicates whether the constraint is of a certain kind or not.
 
 Your patch adds two columns to pg_constraint, confiseach and confeach, but it
 only mentions confiseach in documentation.  Just add a similar minimal mention
 of confeach.


Sorry, our mistake; a mention for confeach has now been added, and both
entries have been reordered to reflect the column position in
pg_constraint).

 That is to say, they start with a capital letter and end with a period.  Your
 old text was fine apart from the lack of a period.  (Your new text also lacks
 a period.)

Fixed, it should be fine now (another misunderstanding on our side, apologies).

 If the cost doesn't exceed O(F log P), where F is the size of the FK table and
 P is the size of the PK table, I'm not worried.  If it can be O(F^2), we would
 have a problem to be documented, if not fixed.

We have rewritten the old query in a simpler way; now its cost is O(F log P).
Here F must represent the size of the flattened table, that is, the total
number of values that must be checked, which seems a reasonable assumption
in any case.

 Your change to array_replace() made me look at array_remove() again and
 realize that it needs the same treatment.  This command yields a segfault:
   SELECT array_remove('{a,null}'::text[], null);


Fixed.

 
 This latest version introduces two calls to get_element_type(), both of which
 should be get_base_element_type().

Fixed.

 Patch Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE, committed
 between v3b and v4 of this patch, added code to ATAddForeignKeyConstraint()
 requiring an update in this patch.  Run this in the regression DB:
   [local] regression=# alter table fktableforeachfk alter ftest1 type int[];
   ERROR:  could not find cast from 1007 to 23

Thanks for pointing it out. We have added a regression test and then fixed the 
issue.

 
 RI_PLAN_EACHCASCADE_DEL_DOUPDATE should be RI_PLAN_EACHCASCADE_DEL_DODELETE.

Fixed.



EACH-foreign-key.v5.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Mar 19, 2012 at 12:53 PM, Peter Eisentraut pete...@gmx.net wrote:
 Doesn't seem worth it to me.  At least, SELECT  makes some sense:
  rows were selected.  CREATE TABLE  means what?   tables
 were created?
 
 What might make sense is to delegate this additional information to
 separate fields in a future protocol revision.

 I think that we would not have bothered to add the row count to the
 command tag output for SELECT unless it were useful.  It seems to be
 *more* useful for CTAS than for SELECT; after all, SELECT also returns
 the actual rows.

I think we're all in agreement that we need to keep the rowcount
functionality.  What seems to me to be in some doubt is whether to
continue to present the tag SELECT  or to change it to something
like CREATE TABLE .  For the moment I've got the patch doing the
former.  It would not be terribly hard to change it, but I'm not going
to break backward compatibility unless there's a pretty clear consensus
to do so.


BTW, I just came across another marginal-loss-of-functionality issue:
in previous versions you could PREPARE a SELECT INTO, but now you get

regression=# prepare foo as select * into bar from int8_tbl;
ERROR:  utility statements cannot be prepared

Is anybody excited about that?  If it is something we have to keep,
it seems like pretty nearly a deal-breaker for this patch, because
storing a CreateTableAsStmt containing an already-prepared plan would
complicate matters unreasonably.  You can still get approximately
the same result with

prepare foo as select * from int8_tbl;
create table bar as execute foo;

which if anything is more useful since you didn't nail down the target
table name in the PREPARE, but nonetheless it's different.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Alvaro Herrera

Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012:

 I am understanding the scenario now. I will take a little of your time to
 modify my original idea:
 
 The middle layer still exists, but NOT on the individual client side ,
 rather , on the server side. That is, we maintain the middle layer on the
 server, and it is same for all the users. We can mutate the ordering, and
 changes would be reflected in all the clients, since they are all accessing
 the same middle layer, present on the server.

I will take a little of your time to ask whether you read the older
threads on the topic -- the one that Andrew Dunstan linked to, and the
older ones, to which that email links to?  This whole area has been
extensively discussed before.

In any case, I repeat what I said before: this is way too complex a
topic for it to make a good GSoC project (unless you're already
intimately familiar with the parser and executor code.)  You need to
pick something smaller, more localized.  

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Atri Sharma
On Mon, Mar 19, 2012 at 11:51 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012:

 I am understanding the scenario now. I will take a little of your time to
 modify my original idea:

 The middle layer still exists, but NOT on the individual client side ,
 rather , on the server side. That is, we maintain the middle layer on the
 server, and it is same for all the users. We can mutate the ordering, and
 changes would be reflected in all the clients, since they are all accessing
 the same middle layer, present on the server.

 I will take a little of your time to ask whether you read the older
 threads on the topic -- the one that Andrew Dunstan linked to, and the
 older ones, to which that email links to?  This whole area has been
 extensively discussed before.

 In any case, I repeat what I said before: this is way too complex a
 topic for it to make a good GSoC project (unless you're already
 intimately familiar with the parser and executor code.)  You need to
 pick something smaller, more localized.

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support


--

Hi Alvaro,

Thanks for your reply.Yes,I have read the earlier posts.

I am familiar with database systems.I have worked on them before and I
have done designing of some very basic databases.

Please guide me to some area where I can research for some gsoc project.

Thanks,

Atri


-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Tom Lane
I wrote:
 One thing I soon found is that it lacks support for EXPLAIN SELECT INTO.

While I'm not particularly excited about fixing PREPARE ... SELECT INTO
or CREATE RULE ... SELECT INTO, I've come to the conclusion that the
EXPLAIN case is a must-fix.  Because not only is EXPLAIN SELECT INTO
broken, but so is EXPLAIN CREATE TABLE AS, and the latter functionality
is actually documented.  So presumably somebody went out of their way
to make this work, at some point.

Since I've got the table-creation code moved into intorel_startup,
this doesn't look to be that painful, but it will require an API change
for ExplainOnePlan, which is slightly annoying because that's probably
in use by third-party plugins.  We could either break them obviously
(by adding an explicit parameter) or break them subtly (by adding an
ExplainState field they might forget to initialize).  The former seems
preferable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 maybe we should just rip out pg_backup_files/archFiles altogether.
 pg_dump is crufty enough without supporting undocumented and obsolete
 options for multiple decades.

+1

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-19 Thread Andrew Dunstan



On 03/19/2012 02:53 PM, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

maybe we should just rip out pg_backup_files/archFiles altogether.
pg_dump is crufty enough without supporting undocumented and obsolete
options for multiple decades.

+1





Yeah, go for it.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Atri Sharma
On Tue, Mar 20, 2012 at 12:08 AM, Atri Sharma atri.j...@gmail.com wrote:
 On Mon, Mar 19, 2012 at 11:51 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

 Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012:

 I am understanding the scenario now. I will take a little of your time to
 modify my original idea:

 The middle layer still exists, but NOT on the individual client side ,
 rather , on the server side. That is, we maintain the middle layer on the
 server, and it is same for all the users. We can mutate the ordering, and
 changes would be reflected in all the clients, since they are all accessing
 the same middle layer, present on the server.

 I will take a little of your time to ask whether you read the older
 threads on the topic -- the one that Andrew Dunstan linked to, and the
 older ones, to which that email links to?  This whole area has been
 extensively discussed before.

 In any case, I repeat what I said before: this is way too complex a
 topic for it to make a good GSoC project (unless you're already
 intimately familiar with the parser and executor code.)  You need to
 pick something smaller, more localized.

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support


 --

 Hi Alvaro,

 Thanks for your reply.Yes,I have read the earlier posts.

 I am familiar with database systems.I have worked on them before and I
 have done designing of some very basic databases.

 Please guide me to some area where I can research for some gsoc project.

 Thanks,

 Atri


 --
 Regards,

 Atri
 l'apprenant


--
Hi Alvaro,

I was just going through PGfoundry and I think I will be able to work
on the PL/Java project.Please let me know If I can proceed further on
ot for GSOC and also,if currently there is any work that needs to be
done on PL/java.
-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So has somebody found a hole in the n log n lower bound on the cost of
 comparison-based sorting?  I thought that had been proven pretty
 rigorously.

There's not much danger of anyone finding a way around that bound
since the proof is quite trivial, but keep in mind that it's a
worst-case bound.  For example, if you have reason to believe that the
input is likely to already be sorted, you can check for that case in
using just n-1 comparisons.  If it turns out you're right, you can
sort the data in linear time.  Heck, you can also check for
reverse-sorted input at the same time, if you like, and handle that
special case too.  Of course, when the data is unordered, such
special-case checks hurt rather than help, even though it's still O(n
lg n) overall; those pesky constant factors do matter quite a bit.
Still, sometimes it's worth it: our quicksort algorithm checks for
sorted input on each recursive invocation, since quicksort degrades
rather badly in that scenario; but our heapsort doesn't contain a
similar check, probably because heapsort typically works fine in that
scenario.

One thing that seems inefficient to me about our current algorithm is
the use of the run number as a leading column in the sort key.
There's no real reason why the tuples destined for the next run need
to be maintained in heap order; we could just store them unordered and
heapify the whole lot of them when it's time to start the next run.
That ought to save comparison cycles for the current run, since the
heap will be less deep, and there are other possible savings as well -
in particular, an unordered array of tuples can be heapify'd in linear
time, whereas our current algorithm is O(n lg n).  However, when I
actually implemented this, I found that doing it this way was a loser.
 In fact, excluding the next-run tuples from the heap for the current
run was a BIG loser even before you add in the time required to
re-heapify.  This confused the daylights out of me for a while,
because it's hard to understand how insert and siftup can be slower on
a small heap than a large one.

My working theory is that, even though we must necessarily do more
comparisons when manipulating a larger heap, many of those comparisons
are resolved by comparing run numbers, and that's a lot cheaper than
invoking the real comparator.  For example, if we insert into a heap
whose rightmost child is in the next run, and the new tuple goes into
the current run, and the current size of the heap is such that the
initial position of the new element is a descendent of the right node,
it will very quickly crash through all of the next-run tuples and only
need one REAL comparison - against the root.  Either the new element
ends up as the root, or it ends up as the direct child of the root;
now we remove the root and, perhaps, replace it with its rightmost
child, meaning that the next element we read in can do the exact same
thing all over again.  If we exclude the next-run elements from the
heap, then the average depth of the heap when inserting a new element
is smaller, but all the elements are in the same-run, and we have to
invoke the real comparator every time.  In other words, those next-run
tuples act as dummies which effectively create a heap of uneven depth,
and the average depth encountered when inserting tuples turns out to
be less than what we get by pulling out the dummies and making the
depth uniform.

This makes me kind of nervous, because I don't understand why things
should work out so well as all that.  If throwing some dummy elements
into the heap makes things work better, then maybe we should throw in
some more.  Or maybe it would work better to take some but not all of
them out.  There certainly doesn't seem to be any indication in the
code that this is an anticipated effect, and it seems an awfully
providential accident.

It may (or may not) be related to the effect that Greg Stark mentions
in a nearby post: increasing work_mem makes the heap bigger, so heap
maintenance gets more expensive, sometimes without any corresponding
benefit.  For example, if the input happens to already be sorted, then
inserting into the heap is cheap, because each new element is already
greater than its parent, and life is good.  But removing from the heap
is painfully expensive, because to remove element from the heap we
stick the last element in the heap into the whole left by the
departing element and then sift it down.  However, that requires
2*heap_depth comparisons, all of which involve really invoking the
comparison function since everything's going to end up going into a
single run, and that gets more and more expensive as work_mem
increases.  So a heap sort of already-sorted data seems to get slower
and slower the more memory you give it to work with.  For example, a
sort of 5 million random strings of length 30, all in shared_buffers,
takes 22.03 seconds on my MacBook Pro with work_mem = 

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Alvaro Herrera

Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012:

 I was just going through PGfoundry and I think I will be able to work
 on the PL/Java project.Please let me know If I can proceed further on
 ot for GSOC and also,if currently there is any work that needs to be
 done on PL/java.

Hm, I'm not sure on PL/Java.  As far as I know, it is a dead project.
But maybe I'm wrong and somebody can mentor you to do something useful
with it -- there are so many advancements in PLs that I'm sure there
must be something that can be done with PL/Java.

Did you have a look at the TODO list? http://wiki.postgresql.org/wiki/Todo

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: autocomplete for functions

2012-03-19 Thread Peter Eisentraut
On fre, 2012-03-16 at 13:47 -0400, Tom Lane wrote:
 I'm a bit concerned about whether that's actually going to be useful.
 A quick check shows that in the regression database, the proposed patch
 produces 3246 possible completions, which suggests that by the time you
 get down to a unique match you're going to have typed most of the name
 anyway.

Well, the regression test database is not really an example of real-life
object naming, I think.  I tried this out on a couple of real databases
and found it quite handy.
 
 BTW, you should at least exclude dropped columns, I think.
 
Yes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Eisentraut
On mån, 2012-03-19 at 02:35 +, Peter Geoghegan wrote:
 I see your point of view. I suppose I can privately hold onto the test
 suite, since it might prove useful again.

I would still like to have those tests checked in, but not run by
default, in case someone wants to hack on this particular feature again.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Heikki Linnakangas

On 19.03.2012 21:29, Alvaro Herrera wrote:


Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012:


I was just going through PGfoundry and I think I will be able to work
on the PL/Java project.Please let me know If I can proceed further on
ot for GSOC and also,if currently there is any work that needs to be
done on PL/java.


Hm, I'm not sure on PL/Java.  As far as I know, it is a dead project.


I haven't followed PL/java either, but it doesn't look quite dead to me. 
The last release was in September 2011, and there are active discussions 
on the mailing lists. If it's not completely up-to-date with latest JDK 
and PostgreSQL releases, perhaps bringing it up-to-date and then adding 
some missing functionality would be a useful GSoC project.


I would suggest that you ask on the pl/java mailing list if there is 
something suitable for a GSoC project there, and if one of the pl/java 
developers would be willing to mentor.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: autocomplete for functions

2012-03-19 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On fre, 2012-03-16 at 13:47 -0400, Tom Lane wrote:
 I'm a bit concerned about whether that's actually going to be useful.
 A quick check shows that in the regression database, the proposed patch
 produces 3246 possible completions, which suggests that by the time you
 get down to a unique match you're going to have typed most of the name
 anyway.

 Well, the regression test database is not really an example of real-life
 object naming, I think.

Perhaps not, but a solid 2000 of those names are from the system-created
entries in pg_proc, and the regression DB doesn't have an especially
large number of tables either.  I doubt that real DBs are likely to have
materially fewer completions.

This connects somewhat to the discussions we've had in the past about
trying to get not-intended-for-public-use functions out of the standard
search path.  Not that you want to put a full visibility check into the
tab-completion query, but if it could simply exclude a pg_private
namespace, that probably wouldn't be too expensive.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Eisentraut
On mån, 2012-03-19 at 08:59 +, Greg Stark wrote:
 The other problem with this approach is that it's hard to keep a huge
 test suite 100% clean. Changes inevitably introduce behaviour changes
 that cause some of the tests to fail.

I think we are used to that because of the way pg_regress works.  When
you have a better test infrastructure that tests actual functionality
rather than output formatting, this shouldn't be the case (nearly as
much).

If someone wanted to bite the bullet and do the work, I think we could
move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly
standard Test::* modules) and reduce that useless reformatting work and
test more interesting things.  Just a thought ...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: autocomplete for functions

2012-03-19 Thread Alvaro Herrera

Excerpts from Tom Lane's message of lun mar 19 16:53:49 -0300 2012:

 This connects somewhat to the discussions we've had in the past about
 trying to get not-intended-for-public-use functions out of the standard
 search path.  Not that you want to put a full visibility check into the
 tab-completion query,

I'm rather of the contrary opinion -- surely if we're going to complete
function names, we should only complete those that are in schemas in the
path; similarly for column names.  (BTW I didn't check but does this
completion work if I schema-qualify a column name?)

 but if it could simply exclude a pg_private
 namespace, that probably wouldn't be too expensive.

+1

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Geoghegan
On 19 March 2012 19:55, Peter Eisentraut pete...@gmx.net wrote:
 If someone wanted to bite the bullet and do the work, I think we could
 move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly
 standard Test::* modules) and reduce that useless reformatting work and
 test more interesting things.  Just a thought ...

I think that that is a good idea. However, I am not a Perl hacker,
though this is the second time that that has left me at a disadvantage
when working on Postgres, so I think it's probably time to learn a
certain amount.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] double free in current HEAD's pg_dump

2012-03-19 Thread Alvaro Herrera

Excerpts from Joachim Wieland's message of dom mar 18 02:24:21 -0300 2012:
 There's a double free in the current HEAD's pg_dump. Fix attached.

Thanks, applied.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Storage Manager crash at mdwrite()

2012-03-19 Thread Tareq Aljabban
On Fri, Mar 16, 2012 at 8:34 PM, Greg Stark st...@mit.edu wrote:

 On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis pg...@j-davis.com wrote:
  There is a lot of difference between those two. In particular, it looks
  like the problem you are seeing is coming from the background writer,
  which is not running during initdb.

 The difference that comes to mind is that the postmaster forks. If the
 library opens any connections prior to forking and then uses them
 after forking that would work at first but it would get confused
 quickly once more than one backend tries to use the same connection.
 The data being sent would all be mixed together and they would see
 responses to requests other processes sent.


The opened connections thing was the first thing I thought of.. the HDFS
documentation claims the C interface is thread-safe..
However, since I noticed that different processes (having different process
IDs) are calling the mdwrite, then it might be a possibility.
I tried reconnecting to HDFS on every mdwrite request, but it didn't work
out.



 You need to ensure that any network connections are opened up *after*
 the new processes are forked.

 There are other things going on that could cause problems. initdb
 probably doesn't deal with many errors so it might not be casing any
 longjmps that happen when Postgres handles errors. I suspect it
 doesn't create any temporary files, etc.

 There's also a long history of threads not playing well with Postgres.
 I think that's overblown and I believe they should work fine. Most of
 it was caused by a bug in an old version of libc. But you do have to
 ensure that the other threads don't call any Postgres functions
 because the Postgres code is very much not thread-aware.

 --
 greg



I'm novice in PG, and if I got this one running then I'll have achieved
what I wanted to do without further digging in PG. So I was thinking if
there was a configuration option (or something similar) that will eliminate
(or reduce) the role the background writer process.. I believe it can be
one of the WAL options but I'm not sure.. Any suggestions?


Re: [HACKERS] sortsupport for text

2012-03-19 Thread Martijn van Oosterhout
On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote:
 On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark st...@mit.edu wrote:
  On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote:
  12789    28.2686  libc-2.13.so             strcoll_l
  6802     15.0350  postgres                 text_cmp
 
  I'm still curious how it would compare to call strxfrm and sort the
  resulting binary blobs. I don't think the sortsupport stuff actually
  makes this any easier though. Since using it requires storing the
  binary blob somewhere I think the support would have to be baked into
  tuplesort (or hacked into the sortkey as an expr that was evaluated
  earlier somehow).
 
 Well, the real problem here is that the strxfrm'd representations
 aren't just bigger - they are huge.  On MacBook Pro, if the input
 representation is n characters, the strxfrm'd representation is 9x+3
 characters.  

Ouch. I was holding out hope that you could get a meaningful
improvement if we could use the first X bytes of the strxfrm output so
you only need to do a strcoll on strings that actually nearly match.
But with an information density of 9 bytes for one 1 character it
doesn't seem worthwhile.

That and this gem in the strxfrm manpage:

RETURN VALUE
   The  strxfrm()  function returns the number of bytes required to
   store the transformed string in dest excluding the terminating
   '\0' character.  If the value returned is n or more, the
   contents of dest are indeterminate.

Which means that you have to take the entire transformed string, you
can't just ask for the first bit. I think that kind of leaves the whole
idea dead in the water.

Just for interest I looked at the ICU API for this and they have the
same restriction.  There is another function which you can use to
return partial sort keys (ucol_nextSortKeyPart) but it produces
uncompressed sortkeys, which it seems is what Mac OSX is doing, which
seems useless for our purposes.  Either this is a hard problem or we're
nowhere near a target use case.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Merlin Moncure
On Mon, Mar 19, 2012 at 2:49 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 19.03.2012 21:29, Alvaro Herrera wrote:


 Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012:

 I was just going through PGfoundry and I think I will be able to work
 on the PL/Java project.Please let me know If I can proceed further on
 ot for GSOC and also,if currently there is any work that needs to be
 done on PL/java.


 Hm, I'm not sure on PL/Java.  As far as I know, it is a dead project.


 I haven't followed PL/java either, but it doesn't look quite dead to me. The
 last release was in September 2011, and there are active discussions on the
 mailing lists. If it's not completely up-to-date with latest JDK and
 PostgreSQL releases, perhaps bringing it up-to-date and then adding some
 missing functionality would be a useful GSoC project.

 I would suggest that you ask on the pl/java mailing list if there is
 something suitable for a GSoC project there, and if one of the pl/java
 developers would be willing to mentor.

pl/java works pretty well and is somewhat widely used although it
might need some more active maintainers.  just jumping into the
project and nailing some old bugs and getting the juices flowing would
be a tremendous help, as well as the less glamorous work of
documentation and regular status updates.

one pl/java based project that IMO is just screaming to be done is a
pl/java based FDW (wrapping JDBC) that would approximately reproduce
dblink...maybe with some extra features geared towards ETL type jobs
like a row based callback for transformations in java.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Daniel Farina
On Sun, Mar 18, 2012 at 7:35 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 19 March 2012 01:50, Tom Lane t...@sss.pgh.pa.us wrote:
 I am *not* a fan of regression tests that try to microscopically test
 every feature in the system.

 I see your point of view. I suppose I can privately hold onto the test
 suite, since it might prove useful again.

 I will work on a pg_regress based approach with a reasonably-sized
 random subset of about 20 of my existing tests, to provide some basic
 smoke testing.

This may sound rather tortured, but in the main regression suite there
is a .c file that links some stuff into the backend that is then
accessed via CREATE FUNCTION to do some special fiddly bits.  Could a
creative hook be used here to avoid the repetition you are avoiding
via Python? (e.g. constant resetting of pg_stat_statements or
whatnot).  It might sound too much like changing the system under
test, but I think it would still retain most of the value.

I also do like the pg_regress workflow in general, although clearly it
cannot do absolutely everything.  Running and interpreting the results
of your tests was not hard, but it was definitely *different* which
could be a headache if one-off testing frameworks proliferate.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas robertmh...@gmail.com wrote:
 There's no real reason why the tuples destined for the next run need
 to be maintained in heap order; we could just store them unordered and
 heapify the whole lot of them when it's time to start the next run.

This sounded familiar

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-03-19 Thread Alvaro Herrera

Excerpts from Joachim Wieland's message of lun mar 19 00:31:47 -0300 2012:
 On Wed, Mar 14, 2012 at 2:02 PM, Robert Haas robertmh...@gmail.com wrote:
  I think we should somehow unify both functions, the code is not very
  consistent in this respect, it also calls exit_horribly() when it has
  AH available. See for example pg_backup_tar.c
 
  I think we should get rid of die_horribly(), and instead have arrange
  to always clean up AH via an on_exit_nicely hook.
 
 Attached is a patch that gets rid of die_horribly().
 
 For the parallel case it maintains an array with as many elements as
 we have worker processes. When the workers start, they enter their Pid
 (or ThreadId) and their ArchiveHandle (AH). The exit handler function
 in a process can then find its own ArchiveHandle by comparing the own
 Pid with all the elements in the array.

Sounds good to me in general ... my only gripe is this: I wonder if it
would be better to have a central routine that knows about both
archive_close_connection and archive_close_connection_parallel -- and
the argument to the callback is a struct that contains both a pointer to
the struct with the connection to be closed, a ParallelState (either of
which can be null), and a flag stating which of the ParallelState/AH is
in use.  That way, you avoid having to reset the callbacks when you
switch from AH to parallel; instead you just clear out the AH
connection, set the ParallelState, and flip the switch.  The general
archive_close_connection checks the flag to know which routine to call.

I mean, what you have probably works fine now, but it doesn't seem very
extensible.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-19 Thread Josh Berkus
On 3/18/12 8:11 PM, HuangQi wrote:
 The implementation seems to be done quite fully. There is even a patch
 file. Why is the implementation not added into the release of Postgres? As
 so much has already being done, what could I do in this case for the Gsoc?

That would be good for you to research.  archives.postgresql.org will
help you find the discussions around that.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect assumptions with low LIMITs

2012-03-19 Thread Jeff Davis
On Sat, 2012-03-17 at 12:48 +, Simon Riggs wrote:
 The problems are as I described them
 
 (1) no account made for sparsity, and other factors leading to an
 overestimate of rows (N)
 
 (2) inappropriate assumption of the effect of LIMIT m, which causes a
 costly SeqScan to appear better than an IndexScan for low m/N, when in
 fact that is seldom the case.
 
 Overestimating N in (1) inverts the problem, so that an overestimate
 isn't the safe thing at all.

I think the actual problem has more to do with risk. The planner doesn't
know how uniform the distribution of the table is, which introduces risk
for the table scan.

I would tend to agree that for low selectivity fraction and a very low
limit (e.g. 1-3 in your example) and a large table, it doesn't seem like
a good risk to use a table scan. I don't know how that should be modeled
or implemented though.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 9:23 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 Ouch. I was holding out hope that you could get a meaningful
 improvement if we could use the first X bytes of the strxfrm output so
 you only need to do a strcoll on strings that actually nearly match.
 But with an information density of 9 bytes for one 1 character it
 doesn't seem worthwhile.

When I was playing with glibc it was 4n. I think what they do is have
n bytes for the high order bits, then n bytes for low order bits like
capitalization or whitespace differences. I suspect they used to use
16 bits for each and have gone to some larger size.


 That and this gem in the strxfrm manpage:

 RETURN VALUE
       The  strxfrm()  function returns the number of bytes required to
       store the transformed string in dest excluding the terminating
       '\0' character.  If the value returned is n or more, the
       contents of dest are indeterminate.

 Which means that you have to take the entire transformed string, you
 can't just ask for the first bit. I think that kind of leaves the whole
 idea dead in the water.

I believe the intended API is that you allocate a buffer with your
guess of the right size, call strxfrm and if it returns a larger
number you realloc your buffer and call it again.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-19 Thread Daniel Farina
On Mon, Mar 19, 2012 at 6:17 PM, Josh Berkus j...@agliodbs.com wrote:
 On 3/18/12 8:11 PM, HuangQi wrote:
 The implementation seems to be done quite fully. There is even a patch
 file. Why is the implementation not added into the release of Postgres? As
 so much has already being done, what could I do in this case for the Gsoc?

 That would be good for you to research.  archives.postgresql.org will
 help you find the discussions around that.

I actually tried to find out, personally...not sure if I was searching
wrongly, but searching for TABLESAMPLE did not yield a cornucopia of
useful conversations at the right time in history (~2007), even when
the search is given a broad date-horizon (all), so I, too, an
uninformed as to the specific objections.

http://www.postgresql.org/search/?m=1q=TABLESAMPLEl=d=-1s=d

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Tom Lane
I've applied the CTAS patch after rather heavy editorialization.  Don't
know what consequences that will have for Dimitri's patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas robertmh...@gmail.com wrote:
 There's no real reason why the tuples destined for the next run need
 to be maintained in heap order; we could just store them unordered and
 heapify the whole lot of them when it's time to start the next run.

 This sounded familiar
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e

Yeah, see also the pgsql-hackers thread starting here:
http://archives.postgresql.org/pgsql-hackers/1999-10/msg00384.php

That was a long time ago, of course, but I have some vague recollection
that keeping next-run tuples in the current heap achieves a net savings
in the total number of comparisons needed to heapify both runs.
Robert's point about integer comparisons being faster than data
comparisons may or may not be relevant.  Certainly they are faster, but
there are never very many run numbers in the heap at once (possibly no
more than 2, I forget; and in any case often only 1).  So I'd expect
most tuple comparisons to end up having to do a data comparison anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-19 Thread Qi Huang

  On Mon, Mar 19, 2012 at 6:17 PM, Josh Berkus j...@agliodbs.com wrote:
  On 3/18/12 8:11 PM, HuangQi wrote:
  The implementation seems to be done quite fully. There is even a patch
  file. Why is the implementation not added into the release of Postgres? As
  so much has already being done, what could I do in this case for the Gsoc?
 
  That would be good for you to research.  archives.postgresql.org will
  help you find the discussions around that.
 
 I actually tried to find out, personally...not sure if I was searching
 wrongly, but searching for TABLESAMPLE did not yield a cornucopia of
 useful conversations at the right time in history (~2007), even when
 the search is given a broad date-horizon (all), so I, too, an
 uninformed as to the specific objections.
 
 http://www.postgresql.org/search/?m=1q=TABLESAMPLEl=d=-1s=d
 


I sent a mail to Nail Conway asking him about this. Hope he could give a good 
answer. While waiting for the response, how about the skip scan? Daniel 
mentioned there is still some unknown.I searched this mail thread suggesting 
the skip scan to TODO list. 
http://archives.postgresql.org/pgsql-bugs/2010-03/msg00144.phpAlso this thread 
talking about http://archives.postgresql.org/pgsql-hackers/2010-03/msg00328.php
Not sure whether this is feasible for Gsoc

Best RegardsHuang Qi Victor   

Re: [HACKERS] patch for parallel pg_dump

2012-03-19 Thread Erik Rijkers
On Tue, March 20, 2012 04:04, Joachim Wieland wrote:
 On Mon, Mar 19, 2012 at 9:14 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Sounds good to me in general ... my only gripe is this: I wonder if it
 would be better to have a central routine that knows about both
 archive_close_connection and archive_close_connection_parallel -- and
 the argument to the callback is a struct that contains both a pointer to
 the struct with the connection to be closed [...]

 I had a similar idea before but then concluded that for it you need to
 have this struct globally available so that everybody (pg_dump.c /
 pg_restore.c / pg_backup_archive.c) can access it to set the
 appropriate state.

 I gave it a second thought and now just defined a function that these
 consumers can call, that way the variable can stay at file scope at
 least.

 Also we don't need this switch, we can set the ParallelState in the
 struct before any child forks off and reset it to NULL after the last
 child has terminated.

 New patch attached, thanks for your comments.


[pg_dump_die_horribly.2.diff ]


In my hands, the patch complains:


In file included from gram.y:13255:0:
scan.c: In function ‘yy_try_NUL_trans’:
scan.c:16243:23: warning: unused variable ‘yyg’ [-Wunused-variable]
pg_backup_archiver.c:3320:1: error: static declaration of 
‘archive_close_connection’ follows
non-static declaration
pg_backup.h:170:13: note: previous declaration of ‘archive_close_connection’ 
was here
make[3]: *** [pg_backup_archiver.o] Error 1
make[2]: *** [all-pg_dump-recurse] Error 2
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-bin-recurse] Error 2
make: *** [all-src-recurse] Error 2
-- make returned 2 - abort



Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-19 Thread Robert Haas
On Sat, Mar 17, 2012 at 8:28 PM, Daniel Farina dan...@heroku.com wrote:
 This thread evolved out of an attempt to implement
 pg_terminate_backend for non-superusers.  I thought -- probably
 erroneously -- that the major objection to that was the known
 possibility of a PID-cycling race condition, whereby a signal could be
 misdirected, in the case of terminate_backend, SIGTERM.  So now this
 fork of the thread is about fixing these unlikely races, and then
 passing administration requests (such as query cancel or die ) as
 out-of-band information via SIGUSR1, just like how LISTEN/NOTIFY and
 conflict signals are passed.

 To prevent ambiguity, I am using a new special number -- a 'SessionId'
 -- that is guaranteed unique to all backends ever created during the
 uptime of a database.  This number is currently implemented in a way
 that is guessable (so it cannot be accepted from external sources),
 but I actually think it may be even more useful for a number of other
 uses if given a non-guessable form (like cancellation keys).  In this
 respect it would fulfill pretty much the same purposes as the notion
 of a session seen on the web.

It's after midnight here so maybe I'm being slow, but I don't
understand what problem the SessionId solves.  ISTM that you could
solve the problem like this:

1. Acquire ProcArrayLock in exclusive mode, to keep the set of PGPROCs
from changing.
2. Search for the target PGPROC by PID; when you find it, set a bit in
the PGPROC indicating that you want it to cancel/die/whatever.
3. Remember the PID.
4. Release ProcArrayLock.
5. Send SIGUSR1.

If the PGPROC gets recycled between 4 and 5, the target backend will
find the relevant bits no longer set, and will do nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-03-19 Thread Erik Rijkers

 [pg_dump_die_horribly.2.diff ]


 In my hands, the patch complains:


 In file included from gram.y:13255:0:
 scan.c: In function ‘yy_try_NUL_trans’:
 scan.c:16243:23: warning: unused variable ‘yyg’ [-Wunused-variable]
 pg_backup_archiver.c:3320:1: error: static declaration of 
 ‘archive_close_connection’ follows
 non-static declaration
 pg_backup.h:170:13: note: previous declaration of ‘archive_close_connection’ 
 was here
 make[3]: *** [pg_backup_archiver.o] Error 1
 make[2]: *** [all-pg_dump-recurse] Error 2
 make[2]: *** Waiting for unfinished jobs
 make[1]: *** [all-bin-recurse] Error 2
 make: *** [all-src-recurse] Error 2
 -- make returned 2 - abort


I should add:  Centos 5.8, gcc 4.6.3


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Noah Misch
On Mon, Mar 19, 2012 at 09:49:32PM +0200, Peter Eisentraut wrote:
 On m??n, 2012-03-19 at 02:35 +, Peter Geoghegan wrote:
  I see your point of view. I suppose I can privately hold onto the test
  suite, since it might prove useful again.
 
 I would still like to have those tests checked in, but not run by
 default, in case someone wants to hack on this particular feature again.

Agreed.  Also, patch review becomes materially smoother when the author
includes comprehensive tests.  When a usage I wish to verify already appears
in the submitted tests, that saves time.  I respect the desire to keep regular
make check lean, but not if it means comprehensive tests get written to be
buried in the mailing list archives or never submitted at all.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect assumptions with low LIMITs

2012-03-19 Thread Daniel Farina
On Mon, Mar 19, 2012 at 6:19 PM, Jeff Davis pg...@j-davis.com wrote:
 On Sat, 2012-03-17 at 12:48 +, Simon Riggs wrote:
 The problems are as I described them

 (1) no account made for sparsity, and other factors leading to an
 overestimate of rows (N)

 (2) inappropriate assumption of the effect of LIMIT m, which causes a
 costly SeqScan to appear better than an IndexScan for low m/N, when in
 fact that is seldom the case.

 Overestimating N in (1) inverts the problem, so that an overestimate
 isn't the safe thing at all.

 I think the actual problem has more to do with risk. The planner doesn't
 know how uniform the distribution of the table is, which introduces risk
 for the table scan.

 I would tend to agree that for low selectivity fraction and a very low
 limit (e.g. 1-3 in your example) and a large table, it doesn't seem like
 a good risk to use a table scan. I don't know how that should be modeled
 or implemented though.

FWIW, we have been bitten by the uniformity assumption a number of
times, but we kind of know what to look for.  It also has an
interesting interpretation as applied to garbage (insomuch as queries
all carry a conceptual WHERE xmin...xmax... predicate).  Sometimes a
table is mostly garbage for a particular value, and the index scan
constantly has to seek the next tuple in its search to find
non-garbage.

I don't know how many times the uniformity assumption has held up fine
(guess: many), but cases of where an index cond
 has that little Filter: footnote has been a ticking timebomb for a
number of people I've interacted with.  It can be subtle because the
amount of scanned data may not be enormous, so upon re-running the
query it is not nearly as pathological as the original run given cache
effects, the only way to make it very visible is to look at the
EXPLAIN with BUFFERS option and note how many pages are being
discarded by the filter/how many pages of the underlying relation and
indexes are being processed only to be discarded.

Were I writing a OLTP-query performance linting tool, queries with
partially-matched index conditions might even be on my list of things
to warn about.  That is unfortunate for cases where the unmatched part
of the index condition may only qualify, say, five records in all
situations, but the database has no construct to know and enforce this
AFAIK.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-03-19 Thread Noah Misch
On Mon, Mar 19, 2012 at 06:41:39PM +0100, Marco Nenciarini wrote:
 Attached is v5, which should address all the remaining issues.

Looks clean to me.

 On Fri, Mar 16, 2012 at 11:33:12PM -0400, Noah Misch wrote:
  If the cost doesn't exceed O(F log P), where F is the size of the FK table 
  and
  P is the size of the PK table, I'm not worried.  If it can be O(F^2), we 
  would
  have a problem to be documented, if not fixed.
 
 We have rewritten the old query in a simpler way; now its cost is O(F log P).
 Here F must represent the size of the flattened table, that is, the total
 number of values that must be checked, which seems a reasonable assumption
 in any case.

Great; I find that approach easier to reason about.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers