Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Dimitri Fontaine
Thom Brown  writes:
> Will you also be committing the trigger function variable changes
> shortly?  I don't wish to test anything prior to this as that will
> involve a complete re-test from my side anyway.

It's on its way, I had to spend time elsewhere, sorry about that. With
some luck I can post a intermediate patch later this evening limited to
PLpgSQL support for your testing.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 1:50 AM, Tom Lane  wrote:
>  For an example in our own
> problem space look at mysql, whose regression tests take well over an
> hour to run on a fast box.  So they must be damn near bug-free right?
> Uh, not so much, and I think the fact that developers can't easily run
> their test suite is not unrelated to that.

The other problem with this approach is that it's hard to keep a huge
test suite 100% clean. Changes inevitably introduce behaviour changes
that cause some of the tests to fail. If the test suite is huge then
it's a lot of work to be continually fixing these tests and you're
always behind. If it's always the case that some tests in this huge
suite are failing then it's extra work whenever you make a change to
dig through the results and determine whether any of the failures are
caused by your changes and represent a real problem. Even if you do
the work it's easy to get it wrong and miss a real failure.

My suggestion would be to go ahead and check in the python or perl
script but not make that the pg_regress tests that are run by mak
check. Cherry pick just a good set of tests that test most of the
tricky bits and check that in to run on make test. I tihnk there's
even precedent for that in one of the other modules that has a make
longcheck or make slowcheck or something like that.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Andres Freund
On Sunday, March 18, 2012 07:29:30 PM Tom Lane wrote:
> BTW, I've been looking through how to do what I suggested earlier to get
> rid of the coziness and code duplication between CreateTableAs and the
> prepare.c code; namely, let CreateTableAs create a DestReceiver and then
> call ExecuteQuery with that receiver.  It appears that we still need at
> least two bits of added complexity with that approach:
> 
> 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so
> that it can enforce that the prepared query is a SELECT.  (BTW, maybe
> this should be weakened to "something that returns tuples", in view of
> RETURNING?)
I don't really see the use case but given the amount of work it probably takes 
it seems reasonable to allow that.

> 2. Since ExecuteQuery goes through the Portal machinery, there's no way
> for it to pass in eflags to control the table OIDs setup.  This is
> easily solved by adding an eflags parameter to PortalStart, which
> doesn't seem too awful to me, since the typical caller would just pass
> zero.  (ExecuteQuery would also have to know about testing
> interpretOidsOption to compute the eflags to use, unless we add an
> eflags parameter to it, which might be the cleaner option.)
If we go down this route I think adding an eflag is the better choice. Thinking 
of it - my patch already added that ;)

> In short I'm thinking: add an eflags parameter to PortalStart, and add
> both an eflags parameter and a "bool mustReturnTuples" parameter to
> ExecuteQuery.  This definitely seems cleaner than the current
> duplication of code.
Hm. I am not *that* convinced anymore. It wasn't that much duplication in the 
end...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and pg_config dependency

2012-03-19 Thread Bruce Momjian
On Fri, Mar 16, 2012 at 08:11:17PM -0300, Alvaro Herrera wrote:
> 
> Excerpts from Bruce Momjian's message of vie mar 16 20:06:28 -0300 2012:
> > Àlvaro told me he got a Spanish-language report that pg_upgrade
> > failed because it required pg_config, and pg_config is only supplied
> > with the devel packages.
> > 
> > I initially thought that it was a packaging problem, but I later
> > realized the pg_config is mostly developer settings, and that using
> > pg_config was not getting any change to libdir by dynamic_library_path
> > in postgresql.conf, and that I should just merge the pg_upgrade_support
> > detection code into the existing shared library detection "LOAD" code I
> > already had.
> > 
> > This avoids the pg_config dependency, works better for libdir, and
> > reduces the amount of code.
> 
> Bruce also found a reference to this old bug report:
> http://archives.postgresql.org/pgsql-bugs/2011-12/msg00254.php
> This includes a link to a Debian bug report by Peter.
> 
> > Patch attached.  Should this be backpatched to PG 9.1;  I think so.
> 
> +1

Applied, and backpatched to 9.1.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-03-19 Thread Robert Haas
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark  wrote:
> On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas  wrote:
>> 12789    28.2686  libc-2.13.so             strcoll_l
>> 6802     15.0350  postgres                 text_cmp
>
> I'm still curious how it would compare to call strxfrm and sort the
> resulting binary blobs. I don't think the sortsupport stuff actually
> makes this any easier though. Since using it requires storing the
> binary blob somewhere I think the support would have to be baked into
> tuplesort (or hacked into the sortkey as an expr that was evaluated
> earlier somehow).

Well, the real problem here is that the strxfrm'd representations
aren't just bigger - they are huge.  On MacBook Pro, if the input
representation is n characters, the strxfrm'd representation is 9x+3
characters.  If the we're sorting very wide tuples of which the sort
key is only a small part, maybe that would be acceptable, but if the
sort key makes up the bulk of the tuple than caching the strxfrm()
representation works out to slashing work_mem tenfold.  That might be
just fine if the sort is going to fit in work_mem either way, but if
it turns a quicksort into a heap sort then I feel pretty confident
that it's going to be a loser.  Keep in mind that even if the
strxfrm'd representation were no larger at all, it would still amount
to an additional copy of the data, so you'd still potentially be
eating up lots of work_mem that way.  The fact that it's an order of
magnitude larger is just making a bad problem worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 11:08 AM, Tom Lane  wrote:
> However, it occurred to me that we could pretty easily jury-rig
> something that would give us an idea about the actual benefit available
> here.  To wit: make a C function that wraps strxfrm, basically
> strxfrm(text) returns bytea.  Then compare the performance of
> ORDER BY text_col to ORDER BY strxfrm(text_col).
>
> (You would need to have either both or neither of text and bytea
> using the sortsupport code paths for this to be a fair comparison.)

Since the index will be ~9x bigger at least on this machine, I think I
know the answer, but I suppose it doesn't hurt to test it.  It's not
that much work.

> One other thing I've always wondered about in this connection is the
> general performance of sorting toasted datums.  Is it better to detoast
> them in every comparison, or pre-detoast to save comparison cycles at
> the cost of having to push much more data around?  I didn't see any
> discussion of this point in Robert's benchmarks, but I don't think we
> should go very far towards enabling sortsupport for text until we
> understand the issue and know whether we need to add more infrastructure
> for it.  If you cross your eyes a little bit, this is very much like
> the strxfrm question...

It would be surprising to me if there is a one-size-fits-all answer to
this question.  For example, if the tuple got toasted because it's got
lots of columns and we had to take desperate measures to get it to fit
into an 8kB block at all, chances are that detoasting will work out
well.  We'll use a bit more memory, but hopefully that'll be repaid by
much faster comparisons.  OTOH, if you have a data set with many
relatively short strings and a few very long ones, detoasting up-front
could turn a quicksort into a heapsort.  Since only a small fraction
of the comparisons would have involved one of the problematic long
strings anyway, it's unlikely to be worth the expense of keeping those
strings around in detoasted form for the entire sort (unless maybe
reconstructing them even a few times is problematic because we're
under heavy cache pressure and we get lots of disk seeks as a result).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane  wrote:
> 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so
> that it can enforce that the prepared query is a SELECT.  (BTW, maybe
> this should be weakened to "something that returns tuples", in view of
> RETURNING?)

+1 for "something that returns with tuples".   CREATE TABLE ... AS
DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to
support.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Tom Lane
Robert Haas  writes:
> On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane  wrote:
>> 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so
>> that it can enforce that the prepared query is a SELECT.  (BTW, maybe
>> this should be weakened to "something that returns tuples", in view of
>> RETURNING?)

> +1 for "something that returns with tuples".   CREATE TABLE ... AS
> DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to
> support.

For the moment I've backed off that idea.  The main definitional
question we'd have to resolve is whether we want to allow WITH NO DATA,
and if so what does that mean (should the DELETE execute, or not?).
I am also not certain that the RETURNING code paths would cope with
a WITH OIDS specification, and there are some other things that would
need fixed.  It might be cool to do it sometime, but it's not going to
happen in this patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-19 Thread Robert Haas
On Sat, Mar 17, 2012 at 3:58 AM, Peter Eisentraut  wrote:
> On tor, 2012-03-01 at 19:19 +0200, Peter Eisentraut wrote:
>> I think the best fix would be to rearrange _PrintFileData() so that it
>> doesn't use FH at all.  Instead, we could define a separate
>> ArchiveHandle field IF that works more like OF, and then change
>> ahwrite() to use that.
>
> Here is a patch that might fix this.  I haven't been able to test this
> properly, so this is just from tracing the code.  It looks like
> _PrintFileData() doesn't need to use FH at all, so it could use a local
> file handle variable instead.  Could someone verify this please?

It looks like this code can be used via the undocumented -Ff option to
pg_dump.  But considering this code was added in 2000 as demonstration
code and has apparently never been documented, and considering also
that we now have the "directory" archive format which is presumably
quite a similar idea but documented and intended for production use,
maybe we should just rip out pg_backup_files/archFiles altogether.
pg_dump is crufty enough without supporting undocumented and obsolete
options for multiple decades.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Robert Haas
On Mon, Mar 19, 2012 at 12:45 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Mar 18, 2012 at 2:29 PM, Tom Lane  wrote:
>>> 1. ExecuteQuery still has to know that's a CREATE TABLE AS operation so
>>> that it can enforce that the prepared query is a SELECT.  (BTW, maybe
>>> this should be weakened to "something that returns tuples", in view of
>>> RETURNING?)
>
>> +1 for "something that returns with tuples".   CREATE TABLE ... AS
>> DELETE FROM ... WHERE ... RETURNING ... seems like a cool thing to
>> support.
>
> For the moment I've backed off that idea.  The main definitional
> question we'd have to resolve is whether we want to allow WITH NO DATA,
> and if so what does that mean (should the DELETE execute, or not?).
> I am also not certain that the RETURNING code paths would cope with
> a WITH OIDS specification, and there are some other things that would
> need fixed.  It might be cool to do it sometime, but it's not going to
> happen in this patch.

Fair enough.  It would be nice to have, but it definitely does not
seem worth spending a lot of time on right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Peter Eisentraut
On sön, 2012-03-18 at 21:16 -0400, Tom Lane wrote:
> If we were going to change the output at all, I would vote for "CREATE
> TABLE " so as to preserve the rowcount functionality.  Keep in
> mind though that this would force client-side changes, for instance in
> libpq's PQcmdTuples().  Fixing that one routine isn't so painful, but
> what of other client-side libraries, not to mention applications?

Doesn't seem worth it to me.  At least, "SELECT " makes some sense:
 rows were selected.  "CREATE TABLE " means what?   tables
were created?

What might make sense is to delegate this additional information to
separate fields in a future protocol revision.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Robert Haas
On Mon, Mar 19, 2012 at 12:53 PM, Peter Eisentraut  wrote:
> On sön, 2012-03-18 at 21:16 -0400, Tom Lane wrote:
>> If we were going to change the output at all, I would vote for "CREATE
>> TABLE " so as to preserve the rowcount functionality.  Keep in
>> mind though that this would force client-side changes, for instance in
>> libpq's PQcmdTuples().  Fixing that one routine isn't so painful, but
>> what of other client-side libraries, not to mention applications?
>
> Doesn't seem worth it to me.  At least, "SELECT " makes some sense:
>  rows were selected.  "CREATE TABLE " means what?   tables
> were created?
>
> What might make sense is to delegate this additional information to
> separate fields in a future protocol revision.

I think that we would not have bothered to add the row count to the
command tag output for SELECT unless it were useful.  It seems to be
*more* useful for CTAS than for SELECT; after all, SELECT also returns
the actual rows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-03-19 Thread Marco Nenciarini
Hi Noah,

thank you again for your thorough review, which is much appreciated.

> I think the patch has reached the stage where a committer can review
> it without wasting much time on things that might change radically.
> So, I'm marking it Ready for Committer.  Please still submit an update
> correcting the above; I'm sure your committer will appreciate it.

Attached is v5, which should address all the remaining issues.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it 

On Fri, Mar 16, 2012 at 11:33:12PM -0400, Noah Misch wrote:
> I recommend removing the new block of code in RI_FKey_eachcascade_del() and
> letting array_remove() throw the error.  If you find a way to throw a nicer
> error without an extra scan, by all means submit that to a future CF as an
> improvement.  I don't think it's important enough to justify cycles at this
> late hour of the current CF.

It makes sense; we have removed the block of code and updated the error
message following your suggestion. Now the error is thrown by array_remove()
itself.
We'll keep an eye on this for further improvements in the future.


> > > pg_constraint.confeach needs documentation.
> > 
> > Most of pg_constraint columns, including all the boolean ones, are
> > documented only in the "description" column of
> > 
> > http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760
> > 
> > it seems that confiseach should not be an exception, since it just
> > indicates whether the constraint is of a certain kind or not.
> 
> Your patch adds two columns to pg_constraint, confiseach and confeach, but it
> only mentions confiseach in documentation.  Just add a similar minimal mention
> of confeach.


Sorry, our mistake; a mention for confeach has now been added, and both
entries have been reordered to reflect the column position in
pg_constraint).

> That is to say, they start with a capital letter and end with a period.  Your
> old text was fine apart from the lack of a period.  (Your new text also lacks
> a period.)

Fixed, it should be fine now (another misunderstanding on our side, apologies).

> If the cost doesn't exceed O(F log P), where F is the size of the FK table and
> P is the size of the PK table, I'm not worried.  If it can be O(F^2), we would
> have a problem to be documented, if not fixed.

We have rewritten the old query in a simpler way; now its cost is O(F log P).
Here F must represent the size of the "flattened" table, that is, the total
number of values that must be checked, which seems a reasonable assumption
in any case.

> Your change to array_replace() made me look at array_remove() again and
> realize that it needs the same treatment.  This command yields a segfault:
>   SELECT array_remove('{a,null}'::text[], null);


Fixed.

> 
> This latest version introduces two calls to get_element_type(), both of which
> should be get_base_element_type().

Fixed.

> Patch "Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE", committed
> between v3b and v4 of this patch, added code to ATAddForeignKeyConstraint()
> requiring an update in this patch.  Run this in the regression DB:
>   [local] regression=# alter table fktableforeachfk alter ftest1 type int[];
>   ERROR:  could not find cast from 1007 to 23

Thanks for pointing it out. We have added a regression test and then fixed the 
issue.

> 
> RI_PLAN_EACHCASCADE_DEL_DOUPDATE should be RI_PLAN_EACHCASCADE_DEL_DODELETE.

Fixed.



EACH-foreign-key.v5.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Tom Lane
Robert Haas  writes:
> On Mon, Mar 19, 2012 at 12:53 PM, Peter Eisentraut  wrote:
>> Doesn't seem worth it to me.  At least, "SELECT " makes some sense:
>>  rows were selected.  "CREATE TABLE " means what?   tables
>> were created?
>> 
>> What might make sense is to delegate this additional information to
>> separate fields in a future protocol revision.

> I think that we would not have bothered to add the row count to the
> command tag output for SELECT unless it were useful.  It seems to be
> *more* useful for CTAS than for SELECT; after all, SELECT also returns
> the actual rows.

I think we're all in agreement that we need to keep the rowcount
functionality.  What seems to me to be in some doubt is whether to
continue to present the tag "SELECT " or to change it to something
like "CREATE TABLE ".  For the moment I've got the patch doing the
former.  It would not be terribly hard to change it, but I'm not going
to break backward compatibility unless there's a pretty clear consensus
to do so.


BTW, I just came across another marginal-loss-of-functionality issue:
in previous versions you could PREPARE a SELECT INTO, but now you get

regression=# prepare foo as select * into bar from int8_tbl;
ERROR:  utility statements cannot be prepared

Is anybody excited about that?  If it is something we have to keep,
it seems like pretty nearly a deal-breaker for this patch, because
storing a CreateTableAsStmt containing an already-prepared plan would
complicate matters unreasonably.  You can still get approximately
the same result with

prepare foo as select * from int8_tbl;
create table bar as execute foo;

which if anything is more useful since you didn't nail down the target
table name in the PREPARE, but nonetheless it's different.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Alvaro Herrera

Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012:

> I am understanding the scenario now. I will take a little of your time to
> modify my original idea:
> 
> The "middle layer" still exists, but NOT on the individual client side ,
> rather , on the server side. That is, we maintain the middle layer on the
> server, and it is same for all the users. We can mutate the ordering, and
> changes would be reflected in all the clients, since they are all accessing
> the same middle layer, present on the server.

I will take a little of your time to ask whether you read the older
threads on the topic -- the one that Andrew Dunstan linked to, and the
older ones, to which that email links to?  This whole area has been
extensively discussed before.

In any case, I repeat what I said before: this is way too complex a
topic for it to make a good GSoC project (unless you're already
intimately familiar with the parser and executor code.)  You need to
pick something smaller, more localized.  

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Atri Sharma
On Mon, Mar 19, 2012 at 11:51 PM, Alvaro Herrera
 wrote:
>
> Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012:
>
>> I am understanding the scenario now. I will take a little of your time to
>> modify my original idea:
>>
>> The "middle layer" still exists, but NOT on the individual client side ,
>> rather , on the server side. That is, we maintain the middle layer on the
>> server, and it is same for all the users. We can mutate the ordering, and
>> changes would be reflected in all the clients, since they are all accessing
>> the same middle layer, present on the server.
>
> I will take a little of your time to ask whether you read the older
> threads on the topic -- the one that Andrew Dunstan linked to, and the
> older ones, to which that email links to?  This whole area has been
> extensively discussed before.
>
> In any case, I repeat what I said before: this is way too complex a
> topic for it to make a good GSoC project (unless you're already
> intimately familiar with the parser and executor code.)  You need to
> pick something smaller, more localized.
>
> --
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support


--

Hi Alvaro,

Thanks for your reply.Yes,I have read the earlier posts.

I am familiar with database systems.I have worked on them before and I
have done designing of some very basic databases.

Please guide me to some area where I can research for some gsoc project.

Thanks,

Atri


-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Tom Lane
I wrote:
> One thing I soon found is that it lacks support for EXPLAIN SELECT INTO.

While I'm not particularly excited about fixing PREPARE ... SELECT INTO
or CREATE RULE ... SELECT INTO, I've come to the conclusion that the
EXPLAIN case is a must-fix.  Because not only is EXPLAIN SELECT INTO
broken, but so is EXPLAIN CREATE TABLE AS, and the latter functionality
is actually documented.  So presumably somebody went out of their way
to make this work, at some point.

Since I've got the table-creation code moved into intorel_startup,
this doesn't look to be that painful, but it will require an API change
for ExplainOnePlan, which is slightly annoying because that's probably
in use by third-party plugins.  We could either break them obviously
(by adding an explicit parameter) or break them subtly (by adding an
ExplainState field they might forget to initialize).  The former seems
preferable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-19 Thread Tom Lane
Robert Haas  writes:
> maybe we should just rip out pg_backup_files/archFiles altogether.
> pg_dump is crufty enough without supporting undocumented and obsolete
> options for multiple decades.

+1

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-19 Thread Andrew Dunstan



On 03/19/2012 02:53 PM, Tom Lane wrote:

Robert Haas  writes:

maybe we should just rip out pg_backup_files/archFiles altogether.
pg_dump is crufty enough without supporting undocumented and obsolete
options for multiple decades.

+1





Yeah, go for it.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Atri Sharma
On Tue, Mar 20, 2012 at 12:08 AM, Atri Sharma  wrote:
> On Mon, Mar 19, 2012 at 11:51 PM, Alvaro Herrera
>  wrote:
>>
>> Excerpts from Atri Sharma's message of dom mar 18 01:05:23 -0300 2012:
>>
>>> I am understanding the scenario now. I will take a little of your time to
>>> modify my original idea:
>>>
>>> The "middle layer" still exists, but NOT on the individual client side ,
>>> rather , on the server side. That is, we maintain the middle layer on the
>>> server, and it is same for all the users. We can mutate the ordering, and
>>> changes would be reflected in all the clients, since they are all accessing
>>> the same middle layer, present on the server.
>>
>> I will take a little of your time to ask whether you read the older
>> threads on the topic -- the one that Andrew Dunstan linked to, and the
>> older ones, to which that email links to?  This whole area has been
>> extensively discussed before.
>>
>> In any case, I repeat what I said before: this is way too complex a
>> topic for it to make a good GSoC project (unless you're already
>> intimately familiar with the parser and executor code.)  You need to
>> pick something smaller, more localized.
>>
>> --
>> Álvaro Herrera 
>> The PostgreSQL Company - Command Prompt, Inc.
>> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
>
> --
>
> Hi Alvaro,
>
> Thanks for your reply.Yes,I have read the earlier posts.
>
> I am familiar with database systems.I have worked on them before and I
> have done designing of some very basic databases.
>
> Please guide me to some area where I can research for some gsoc project.
>
> Thanks,
>
> Atri
>
>
> --
> Regards,
>
> Atri
> l'apprenant


--
Hi Alvaro,

I was just going through PGfoundry and I think I will be able to work
on the PL/Java project.Please let me know If I can proceed further on
ot for GSOC and also,if currently there is any work that needs to be
done on PL/java.
-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane  wrote:
> So has somebody found a hole in the n log n lower bound on the cost of
> comparison-based sorting?  I thought that had been proven pretty
> rigorously.

There's not much danger of anyone finding a way around that bound
since the proof is quite trivial, but keep in mind that it's a
worst-case bound.  For example, if you have reason to believe that the
input is likely to already be sorted, you can check for that case in
using just n-1 comparisons.  If it turns out you're right, you can
"sort" the data in linear time.  Heck, you can also check for
reverse-sorted input at the same time, if you like, and handle that
special case too.  Of course, when the data is unordered, such
special-case checks hurt rather than help, even though it's still O(n
lg n) overall; those pesky constant factors do matter quite a bit.
Still, sometimes it's worth it: our quicksort algorithm checks for
sorted input on each recursive invocation, since quicksort degrades
rather badly in that scenario; but our heapsort doesn't contain a
similar check, probably because heapsort typically works fine in that
scenario.

One thing that seems inefficient to me about our current algorithm is
the use of the run number as a leading column in the sort key.
There's no real reason why the tuples destined for the next run need
to be maintained in heap order; we could just store them unordered and
heapify the whole lot of them when it's time to start the next run.
That ought to save comparison cycles for the current run, since the
heap will be less deep, and there are other possible savings as well -
in particular, an unordered array of tuples can be heapify'd in linear
time, whereas our current algorithm is O(n lg n).  However, when I
actually implemented this, I found that doing it this way was a loser.
 In fact, excluding the next-run tuples from the heap for the current
run was a BIG loser even before you add in the time required to
re-heapify.  This confused the daylights out of me for a while,
because it's hard to understand how insert and siftup can be slower on
a small heap than a large one.

My working theory is that, even though we must necessarily do more
comparisons when manipulating a larger heap, many of those comparisons
are resolved by comparing run numbers, and that's a lot cheaper than
invoking the real comparator.  For example, if we insert into a heap
whose rightmost child is in the next run, and the new tuple goes into
the current run, and the current size of the heap is such that the
initial position of the new element is a descendent of the right node,
it will very quickly crash through all of the next-run tuples and only
need one REAL comparison - against the root.  Either the new element
ends up as the root, or it ends up as the direct child of the root;
now we remove the root and, perhaps, replace it with its rightmost
child, meaning that the next element we read in can do the exact same
thing all over again.  If we exclude the next-run elements from the
heap, then the average depth of the heap when inserting a new element
is smaller, but all the elements are in the same-run, and we have to
invoke the real comparator every time.  In other words, those next-run
tuples act as dummies which effectively create a heap of uneven depth,
and the average depth encountered when inserting tuples turns out to
be less than what we get by pulling out the dummies and making the
depth uniform.

This makes me kind of nervous, because I don't understand why things
should work out so well as all that.  If throwing some dummy elements
into the heap makes things work better, then maybe we should throw in
some more.  Or maybe it would work better to take some but not all of
them out.  There certainly doesn't seem to be any indication in the
code that this is an anticipated effect, and it seems an awfully
providential accident.

It may (or may not) be related to the effect that Greg Stark mentions
in a nearby post: increasing work_mem makes the heap bigger, so heap
maintenance gets more expensive, sometimes without any corresponding
benefit.  For example, if the input happens to already be sorted, then
inserting into the heap is cheap, because each new element is already
greater than its parent, and life is good.  But removing from the heap
is painfully expensive, because to remove element from the heap we
stick the last element in the heap into the whole left by the
departing element and then sift it down.  However, that requires
2*heap_depth comparisons, all of which involve really invoking the
comparison function since everything's going to end up going into a
single run, and that gets more and more expensive as work_mem
increases.  So a heap sort of already-sorted data seems to get slower
and slower the more memory you give it to work with.  For example, a
sort of 5 million random strings of length 30, all in shared_buffers,
takes 22.03 seconds on my MacBook Pro with work_mem = 128MB, but just

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Alvaro Herrera

Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012:

> I was just going through PGfoundry and I think I will be able to work
> on the PL/Java project.Please let me know If I can proceed further on
> ot for GSOC and also,if currently there is any work that needs to be
> done on PL/java.

Hm, I'm not sure on PL/Java.  As far as I know, it is a dead project.
But maybe I'm wrong and somebody can mentor you to do something useful
with it -- there are so many advancements in PLs that I'm sure there
must be something that can be done with PL/Java.

Did you have a look at the TODO list? http://wiki.postgresql.org/wiki/Todo

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: autocomplete for functions

2012-03-19 Thread Peter Eisentraut
On fre, 2012-03-16 at 13:47 -0400, Tom Lane wrote:
> I'm a bit concerned about whether that's actually going to be useful.
> A quick check shows that in the regression database, the proposed patch
> produces 3246 possible completions, which suggests that by the time you
> get down to a unique match you're going to have typed most of the name
> anyway.

Well, the regression test database is not really an example of real-life
object naming, I think.  I tried this out on a couple of real databases
and found it quite handy.
> 
> BTW, you should at least exclude dropped columns, I think.
> 
Yes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Eisentraut
On mån, 2012-03-19 at 02:35 +, Peter Geoghegan wrote:
> I see your point of view. I suppose I can privately hold onto the test
> suite, since it might prove useful again.

I would still like to have those tests checked in, but not run by
default, in case someone wants to hack on this particular feature again.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Heikki Linnakangas

On 19.03.2012 21:29, Alvaro Herrera wrote:


Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012:


I was just going through PGfoundry and I think I will be able to work
on the PL/Java project.Please let me know If I can proceed further on
ot for GSOC and also,if currently there is any work that needs to be
done on PL/java.


Hm, I'm not sure on PL/Java.  As far as I know, it is a dead project.


I haven't followed PL/java either, but it doesn't look quite dead to me. 
The last release was in September 2011, and there are active discussions 
on the mailing lists. If it's not completely up-to-date with latest JDK 
and PostgreSQL releases, perhaps bringing it up-to-date and then adding 
some missing functionality would be a useful GSoC project.


I would suggest that you ask on the pl/java mailing list if there is 
something suitable for a GSoC project there, and if one of the pl/java 
developers would be willing to mentor.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: autocomplete for functions

2012-03-19 Thread Tom Lane
Peter Eisentraut  writes:
> On fre, 2012-03-16 at 13:47 -0400, Tom Lane wrote:
>> I'm a bit concerned about whether that's actually going to be useful.
>> A quick check shows that in the regression database, the proposed patch
>> produces 3246 possible completions, which suggests that by the time you
>> get down to a unique match you're going to have typed most of the name
>> anyway.

> Well, the regression test database is not really an example of real-life
> object naming, I think.

Perhaps not, but a solid 2000 of those names are from the system-created
entries in pg_proc, and the regression DB doesn't have an especially
large number of tables either.  I doubt that real DBs are likely to have
materially fewer completions.

This connects somewhat to the discussions we've had in the past about
trying to get not-intended-for-public-use functions out of the standard
search path.  Not that you want to put a full visibility check into the
tab-completion query, but if it could simply exclude a "pg_private"
namespace, that probably wouldn't be too expensive.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Eisentraut
On mån, 2012-03-19 at 08:59 +, Greg Stark wrote:
> The other problem with this approach is that it's hard to keep a huge
> test suite 100% clean. Changes inevitably introduce behaviour changes
> that cause some of the tests to fail.

I think we are used to that because of the way pg_regress works.  When
you have a better test infrastructure that tests actual functionality
rather than output formatting, this shouldn't be the case (nearly as
much).

If someone wanted to bite the bullet and do the work, I think we could
move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly
standard Test::* modules) and reduce that useless reformatting work and
test more interesting things.  Just a thought ...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: autocomplete for functions

2012-03-19 Thread Alvaro Herrera

Excerpts from Tom Lane's message of lun mar 19 16:53:49 -0300 2012:

> This connects somewhat to the discussions we've had in the past about
> trying to get not-intended-for-public-use functions out of the standard
> search path.  Not that you want to put a full visibility check into the
> tab-completion query,

I'm rather of the contrary opinion -- surely if we're going to complete
function names, we should only complete those that are in schemas in the
path; similarly for column names.  (BTW I didn't check but does this
completion work if I schema-qualify a column name?)

> but if it could simply exclude a "pg_private"
> namespace, that probably wouldn't be too expensive.

+1

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Peter Geoghegan
On 19 March 2012 19:55, Peter Eisentraut  wrote:
> If someone wanted to bite the bullet and do the work, I think we could
> move to a Perl/TAP-based test suite (not pgTAP, but Perl and some fairly
> standard Test::* modules) and reduce that useless reformatting work and
> test more interesting things.  Just a thought ...

I think that that is a good idea. However, I am not a Perl hacker,
though this is the second time that that has left me at a disadvantage
when working on Postgres, so I think it's probably time to learn a
certain amount.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] double free in current HEAD's pg_dump

2012-03-19 Thread Alvaro Herrera

Excerpts from Joachim Wieland's message of dom mar 18 02:24:21 -0300 2012:
> There's a double free in the current HEAD's pg_dump. Fix attached.

Thanks, applied.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Storage Manager crash at mdwrite()

2012-03-19 Thread Tareq Aljabban
On Fri, Mar 16, 2012 at 8:34 PM, Greg Stark  wrote:

> On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis  wrote:
> > There is a lot of difference between those two. In particular, it looks
> > like the problem you are seeing is coming from the background writer,
> > which is not running during initdb.
>
> The difference that comes to mind is that the postmaster forks. If the
> library opens any connections prior to forking and then uses them
> after forking that would work at first but it would get confused
> quickly once more than one backend tries to use the same connection.
> The data being sent would all be mixed together and they would see
> responses to requests other processes sent.
>

The opened connections thing was the first thing I thought of.. the HDFS
documentation claims the C interface is thread-safe..
However, since I noticed that different processes (having different process
IDs) are calling the mdwrite, then it might be a possibility.
I tried reconnecting to HDFS on every mdwrite request, but it didn't work
out.



> You need to ensure that any network connections are opened up *after*
> the new processes are forked.
>
> There are other things going on that could cause problems. initdb
> probably doesn't deal with many errors so it might not be casing any
> longjmps that happen when Postgres handles errors. I suspect it
> doesn't create any temporary files, etc.
>
> There's also a long history of threads not playing well with Postgres.
> I think that's overblown and I believe they should work fine. Most of
> it was caused by a bug in an old version of libc. But you do have to
> ensure that the other threads don't call any Postgres functions
> because the Postgres code is very much not thread-aware.
>
> --
> greg
>


I'm novice in PG, and if I got this one running then I'll have achieved
what I wanted to do without further digging in PG. So I was thinking if
there was a configuration option (or something similar) that will eliminate
(or reduce) the role the background writer process.. I believe it can be
one of the WAL options but I'm not sure.. Any suggestions?


Re: [HACKERS] sortsupport for text

2012-03-19 Thread Martijn van Oosterhout
On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote:
> On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark  wrote:
> > On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas  wrote:
> >> 12789    28.2686  libc-2.13.so             strcoll_l
> >> 6802     15.0350  postgres                 text_cmp
> >
> > I'm still curious how it would compare to call strxfrm and sort the
> > resulting binary blobs. I don't think the sortsupport stuff actually
> > makes this any easier though. Since using it requires storing the
> > binary blob somewhere I think the support would have to be baked into
> > tuplesort (or hacked into the sortkey as an expr that was evaluated
> > earlier somehow).
> 
> Well, the real problem here is that the strxfrm'd representations
> aren't just bigger - they are huge.  On MacBook Pro, if the input
> representation is n characters, the strxfrm'd representation is 9x+3
> characters.  

Ouch. I was holding out hope that you could get a meaningful
improvement if we could use the first X bytes of the strxfrm output so
you only need to do a strcoll on strings that actually nearly match.
But with an information density of 9 bytes for one 1 character it
doesn't seem worthwhile.

That and this gem in the strxfrm manpage:

RETURN VALUE
   The  strxfrm()  function returns the number of bytes required to
   store the transformed string in dest excluding the terminating
   '\0' character.  If the value returned is n or more, the
   contents of dest are indeterminate.

Which means that you have to take the entire transformed string, you
can't just ask for the first bit. I think that kind of leaves the whole
idea dead in the water.

Just for interest I looked at the ICU API for this and they have the
same restriction.  There is another function which you can use to
return partial sort keys (ucol_nextSortKeyPart) but it produces
"uncompressed sortkeys", which it seems is what Mac OSX is doing, which
seems useless for our purposes.  Either this is a hard problem or we're
nowhere near a target use case.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-19 Thread Merlin Moncure
On Mon, Mar 19, 2012 at 2:49 PM, Heikki Linnakangas
 wrote:
> On 19.03.2012 21:29, Alvaro Herrera wrote:
>>
>>
>> Excerpts from Atri Sharma's message of lun mar 19 16:20:09 -0300 2012:
>>
>>> I was just going through PGfoundry and I think I will be able to work
>>> on the PL/Java project.Please let me know If I can proceed further on
>>> ot for GSOC and also,if currently there is any work that needs to be
>>> done on PL/java.
>>
>>
>> Hm, I'm not sure on PL/Java.  As far as I know, it is a dead project.
>
>
> I haven't followed PL/java either, but it doesn't look quite dead to me. The
> last release was in September 2011, and there are active discussions on the
> mailing lists. If it's not completely up-to-date with latest JDK and
> PostgreSQL releases, perhaps bringing it up-to-date and then adding some
> missing functionality would be a useful GSoC project.
>
> I would suggest that you ask on the pl/java mailing list if there is
> something suitable for a GSoC project there, and if one of the pl/java
> developers would be willing to mentor.

pl/java works pretty well and is somewhat widely used although it
might need some more active maintainers.  just jumping into the
project and nailing some old bugs and getting the juices flowing would
be a tremendous help, as well as the less glamorous work of
documentation and regular status updates.

one pl/java based project that IMO is just screaming to be done is a
pl/java based FDW (wrapping JDBC) that would approximately reproduce
dblink...maybe with some extra features geared towards ETL type jobs
like a row based callback for transformations in java.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Daniel Farina
On Sun, Mar 18, 2012 at 7:35 PM, Peter Geoghegan  wrote:
> On 19 March 2012 01:50, Tom Lane  wrote:
>> I am *not* a fan of regression tests that try to microscopically test
>> every feature in the system.
>
> I see your point of view. I suppose I can privately hold onto the test
> suite, since it might prove useful again.
>
> I will work on a pg_regress based approach with a reasonably-sized
> random subset of about 20 of my existing tests, to provide some basic
> smoke testing.

This may sound rather tortured, but in the main regression suite there
is a .c file that links some stuff into the backend that is then
accessed via CREATE FUNCTION to do some special fiddly bits.  Could a
creative hook be used here to avoid the repetition you are avoiding
via Python? (e.g. constant resetting of pg_stat_statements or
whatnot).  It might sound too much like changing the system under
test, but I think it would still retain most of the value.

I also do like the pg_regress workflow in general, although clearly it
cannot do absolutely everything.  Running and interpreting the results
of your tests was not hard, but it was definitely *different* which
could be a headache if one-off testing frameworks proliferate.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas  wrote:
> There's no real reason why the tuples destined for the next run need
> to be maintained in heap order; we could just store them unordered and
> heapify the whole lot of them when it's time to start the next run.

This sounded familiar

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-03-19 Thread Alvaro Herrera

Excerpts from Joachim Wieland's message of lun mar 19 00:31:47 -0300 2012:
> On Wed, Mar 14, 2012 at 2:02 PM, Robert Haas  wrote:
> >> I think we should somehow unify both functions, the code is not very
> >> consistent in this respect, it also calls exit_horribly() when it has
> >> AH available. See for example pg_backup_tar.c
> >
> > I think we should get rid of die_horribly(), and instead have arrange
> > to always clean up AH via an on_exit_nicely hook.
> 
> Attached is a patch that gets rid of die_horribly().
> 
> For the parallel case it maintains an array with as many elements as
> we have worker processes. When the workers start, they enter their Pid
> (or ThreadId) and their ArchiveHandle (AH). The exit handler function
> in a process can then find its own ArchiveHandle by comparing the own
> Pid with all the elements in the array.

Sounds good to me in general ... my only gripe is this: I wonder if it
would be better to have a central routine that knows about both
archive_close_connection and archive_close_connection_parallel -- and
the argument to the callback is a struct that contains both a pointer to
the struct with the connection to be closed, a ParallelState (either of
which can be null), and a flag stating which of the ParallelState/AH is
in use.  That way, you avoid having to reset the callbacks when you
switch from AH to parallel; instead you just clear out the AH
connection, set the ParallelState, and flip the switch.  The general
archive_close_connection checks the flag to know which routine to call.

I mean, what you have probably works fine now, but it doesn't seem very
extensible.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-19 Thread Josh Berkus
On 3/18/12 8:11 PM, HuangQi wrote:
> The implementation seems to be done quite fully. There is even a patch
> file. Why is the implementation not added into the release of Postgres? As
> so much has already being done, what could I do in this case for the Gsoc?

That would be good for you to research.  archives.postgresql.org will
help you find the discussions around that.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect assumptions with low LIMITs

2012-03-19 Thread Jeff Davis
On Sat, 2012-03-17 at 12:48 +, Simon Riggs wrote:
> The problems are as I described them
> 
> (1) no account made for sparsity, and other factors leading to an
> overestimate of rows (N)
> 
> (2) inappropriate assumption of the effect of LIMIT m, which causes a
> costly SeqScan to appear better than an IndexScan for low m/N, when in
> fact that is seldom the case.
> 
> Overestimating N in (1) inverts the problem, so that an overestimate
> isn't the safe thing at all.

I think the actual problem has more to do with risk. The planner doesn't
know how uniform the distribution of the table is, which introduces risk
for the table scan.

I would tend to agree that for low selectivity fraction and a very low
limit (e.g. 1-3 in your example) and a large table, it doesn't seem like
a good risk to use a table scan. I don't know how that should be modeled
or implemented though.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 9:23 PM, Martijn van Oosterhout
 wrote:
> Ouch. I was holding out hope that you could get a meaningful
> improvement if we could use the first X bytes of the strxfrm output so
> you only need to do a strcoll on strings that actually nearly match.
> But with an information density of 9 bytes for one 1 character it
> doesn't seem worthwhile.

When I was playing with glibc it was 4n. I think what they do is have
n bytes for the high order bits, then n bytes for low order bits like
capitalization or whitespace differences. I suspect they used to use
16 bits for each and have gone to some larger size.


> That and this gem in the strxfrm manpage:
>
> RETURN VALUE
>       The  strxfrm()  function returns the number of bytes required to
>       store the transformed string in dest excluding the terminating
>       '\0' character.  If the value returned is n or more, the
>       contents of dest are indeterminate.
>
> Which means that you have to take the entire transformed string, you
> can't just ask for the first bit. I think that kind of leaves the whole
> idea dead in the water.

I believe the intended API is that you allocate a buffer with your
guess of the right size, call strxfrm and if it returns a larger
number you realloc your buffer and call it again.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-19 Thread Daniel Farina
On Mon, Mar 19, 2012 at 6:17 PM, Josh Berkus  wrote:
> On 3/18/12 8:11 PM, HuangQi wrote:
>> The implementation seems to be done quite fully. There is even a patch
>> file. Why is the implementation not added into the release of Postgres? As
>> so much has already being done, what could I do in this case for the Gsoc?
>
> That would be good for you to research.  archives.postgresql.org will
> help you find the discussions around that.

I actually tried to find out, personally...not sure if I was searching
wrongly, but searching for TABLESAMPLE did not yield a cornucopia of
useful conversations at the right time in history (~2007), even when
the search is given a broad date-horizon (all), so I, too, an
uninformed as to the specific objections.

http://www.postgresql.org/search/?m=1&q=TABLESAMPLE&l=&d=-1&s=d

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-19 Thread Tom Lane
I've applied the CTAS patch after rather heavy editorialization.  Don't
know what consequences that will have for Dimitri's patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Tom Lane
Greg Stark  writes:
> On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas  wrote:
>> There's no real reason why the tuples destined for the next run need
>> to be maintained in heap order; we could just store them unordered and
>> heapify the whole lot of them when it's time to start the next run.

> This sounded familiar
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e

Yeah, see also the pgsql-hackers thread starting here:
http://archives.postgresql.org/pgsql-hackers/1999-10/msg00384.php

That was a long time ago, of course, but I have some vague recollection
that keeping next-run tuples in the current heap achieves a net savings
in the total number of comparisons needed to heapify both runs.
Robert's point about integer comparisons being faster than data
comparisons may or may not be relevant.  Certainly they are faster, but
there are never very many run numbers in the heap at once (possibly no
more than 2, I forget; and in any case often only 1).  So I'd expect
most tuple comparisons to end up having to do a data comparison anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-19 Thread Qi Huang

 > On Mon, Mar 19, 2012 at 6:17 PM, Josh Berkus  wrote:
> > On 3/18/12 8:11 PM, HuangQi wrote:
> >> The implementation seems to be done quite fully. There is even a patch
> >> file. Why is the implementation not added into the release of Postgres? As
> >> so much has already being done, what could I do in this case for the Gsoc?
> >
> > That would be good for you to research.  archives.postgresql.org will
> > help you find the discussions around that.
> 
> I actually tried to find out, personally...not sure if I was searching
> wrongly, but searching for TABLESAMPLE did not yield a cornucopia of
> useful conversations at the right time in history (~2007), even when
> the search is given a broad date-horizon (all), so I, too, an
> uninformed as to the specific objections.
> 
> http://www.postgresql.org/search/?m=1&q=TABLESAMPLE&l=&d=-1&s=d
> 


I sent a mail to Nail Conway asking him about this. Hope he could give a good 
answer. While waiting for the response, how about the skip scan? Daniel 
mentioned there is still some unknown.I searched this mail thread suggesting 
the skip scan to TODO list. 
http://archives.postgresql.org/pgsql-bugs/2010-03/msg00144.phpAlso this thread 
talking about http://archives.postgresql.org/pgsql-hackers/2010-03/msg00328.php
Not sure whether this is feasible for Gsoc

Best RegardsHuang Qi Victor   

Re: [HACKERS] patch for parallel pg_dump

2012-03-19 Thread Erik Rijkers
On Tue, March 20, 2012 04:04, Joachim Wieland wrote:
> On Mon, Mar 19, 2012 at 9:14 PM, Alvaro Herrera
>  wrote:
>> Sounds good to me in general ... my only gripe is this: I wonder if it
>> would be better to have a central routine that knows about both
>> archive_close_connection and archive_close_connection_parallel -- and
>> the argument to the callback is a struct that contains both a pointer to
>> the struct with the connection to be closed [...]
>
> I had a similar idea before but then concluded that for it you need to
> have this struct globally available so that everybody (pg_dump.c /
> pg_restore.c / pg_backup_archive.c) can access it to set the
> appropriate state.
>
> I gave it a second thought and now just defined a function that these
> consumers can call, that way the variable can stay at file scope at
> least.
>
> Also we don't need this switch, we can set the ParallelState in the
> struct before any child forks off and reset it to NULL after the last
> child has terminated.
>
> New patch attached, thanks for your comments.
>

[pg_dump_die_horribly.2.diff ]


In my hands, the patch complains:


In file included from gram.y:13255:0:
scan.c: In function ‘yy_try_NUL_trans’:
scan.c:16243:23: warning: unused variable ‘yyg’ [-Wunused-variable]
pg_backup_archiver.c:3320:1: error: static declaration of 
‘archive_close_connection’ follows
non-static declaration
pg_backup.h:170:13: note: previous declaration of ‘archive_close_connection’ 
was here
make[3]: *** [pg_backup_archiver.o] Error 1
make[2]: *** [all-pg_dump-recurse] Error 2
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-bin-recurse] Error 2
make: *** [all-src-recurse] Error 2
-- make returned 2 - abort



Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-19 Thread Robert Haas
On Sat, Mar 17, 2012 at 8:28 PM, Daniel Farina  wrote:
> This thread evolved out of an attempt to implement
> pg_terminate_backend for non-superusers.  I thought -- probably
> erroneously -- that the major objection to that was the known
> possibility of a PID-cycling race condition, whereby a signal could be
> misdirected, in the case of terminate_backend, SIGTERM.  So now this
> fork of the thread is about fixing these unlikely races, and then
> passing administration requests (such as "query cancel" or "die" ) as
> out-of-band information via SIGUSR1, just like how LISTEN/NOTIFY and
> conflict signals are passed.
>
> To prevent ambiguity, I am using a new special number -- a 'SessionId'
> -- that is guaranteed unique to all backends ever created during the
> uptime of a database.  This number is currently implemented in a way
> that is guessable (so it cannot be accepted from external sources),
> but I actually think it may be even more useful for a number of other
> uses if given a non-guessable form (like cancellation keys).  In this
> respect it would fulfill pretty much the same purposes as the notion
> of a "session" seen on the web.

It's after midnight here so maybe I'm being slow, but I don't
understand what problem the SessionId solves.  ISTM that you could
solve the problem like this:

1. Acquire ProcArrayLock in exclusive mode, to keep the set of PGPROCs
from changing.
2. Search for the target PGPROC by PID; when you find it, set a bit in
the PGPROC indicating that you want it to cancel/die/whatever.
3. Remember the PID.
4. Release ProcArrayLock.
5. Send SIGUSR1.

If the PGPROC gets recycled between 4 and 5, the target backend will
find the relevant bits no longer set, and will do nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-03-19 Thread Erik Rijkers
>
> [pg_dump_die_horribly.2.diff ]
>
>
> In my hands, the patch complains:
>
>
> In file included from gram.y:13255:0:
> scan.c: In function ‘yy_try_NUL_trans’:
> scan.c:16243:23: warning: unused variable ‘yyg’ [-Wunused-variable]
> pg_backup_archiver.c:3320:1: error: static declaration of 
> ‘archive_close_connection’ follows
> non-static declaration
> pg_backup.h:170:13: note: previous declaration of ‘archive_close_connection’ 
> was here
> make[3]: *** [pg_backup_archiver.o] Error 1
> make[2]: *** [all-pg_dump-recurse] Error 2
> make[2]: *** Waiting for unfinished jobs
> make[1]: *** [all-bin-recurse] Error 2
> make: *** [all-src-recurse] Error 2
> -- make returned 2 - abort
>

I should add:  Centos 5.8, gcc 4.6.3


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-19 Thread Noah Misch
On Mon, Mar 19, 2012 at 09:49:32PM +0200, Peter Eisentraut wrote:
> On m??n, 2012-03-19 at 02:35 +, Peter Geoghegan wrote:
> > I see your point of view. I suppose I can privately hold onto the test
> > suite, since it might prove useful again.
> 
> I would still like to have those tests checked in, but not run by
> default, in case someone wants to hack on this particular feature again.

Agreed.  Also, patch review becomes materially smoother when the author
includes comprehensive tests.  When a usage I wish to verify already appears
in the submitted tests, that saves time.  I respect the desire to keep regular
"make check" lean, but not if it means comprehensive tests get written to be
buried in the mailing list archives or never submitted at all.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect assumptions with low LIMITs

2012-03-19 Thread Daniel Farina
On Mon, Mar 19, 2012 at 6:19 PM, Jeff Davis  wrote:
> On Sat, 2012-03-17 at 12:48 +, Simon Riggs wrote:
>> The problems are as I described them
>>
>> (1) no account made for sparsity, and other factors leading to an
>> overestimate of rows (N)
>>
>> (2) inappropriate assumption of the effect of LIMIT m, which causes a
>> costly SeqScan to appear better than an IndexScan for low m/N, when in
>> fact that is seldom the case.
>>
>> Overestimating N in (1) inverts the problem, so that an overestimate
>> isn't the safe thing at all.
>
> I think the actual problem has more to do with risk. The planner doesn't
> know how uniform the distribution of the table is, which introduces risk
> for the table scan.
>
> I would tend to agree that for low selectivity fraction and a very low
> limit (e.g. 1-3 in your example) and a large table, it doesn't seem like
> a good risk to use a table scan. I don't know how that should be modeled
> or implemented though.

FWIW, we have been bitten by the uniformity assumption a number of
times, but we kind of know what to look for.  It also has an
interesting interpretation as applied to garbage (insomuch as queries
all carry a conceptual "WHERE xmin...xmax... predicate).  Sometimes a
table is mostly garbage for a particular value, and the index scan
constantly has to seek the next tuple in its search to find
non-garbage.

I don't know how many times the uniformity assumption has held up fine
(guess: many), but cases of where an index cond
 has that little "Filter:" footnote has been a ticking timebomb for a
number of people I've interacted with.  It can be subtle because the
amount of scanned data may not be enormous, so upon re-running the
query it is not nearly as pathological as the original run given cache
effects, the only way to make it very visible is to look at the
EXPLAIN with BUFFERS option and note how many pages are being
discarded by the filter/how many pages of the underlying relation and
indexes are being processed only to be discarded.

Were I writing a OLTP-query performance linting tool, queries with
partially-matched index conditions might even be on my list of things
to warn about.  That is unfortunate for cases where the unmatched part
of the index condition may only qualify, say, five records in all
situations, but the database has no construct to know and enforce this
AFAIK.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-03-19 Thread Noah Misch
On Mon, Mar 19, 2012 at 06:41:39PM +0100, Marco Nenciarini wrote:
> Attached is v5, which should address all the remaining issues.

Looks clean to me.

> On Fri, Mar 16, 2012 at 11:33:12PM -0400, Noah Misch wrote:
> > If the cost doesn't exceed O(F log P), where F is the size of the FK table 
> > and
> > P is the size of the PK table, I'm not worried.  If it can be O(F^2), we 
> > would
> > have a problem to be documented, if not fixed.
> 
> We have rewritten the old query in a simpler way; now its cost is O(F log P).
> Here F must represent the size of the "flattened" table, that is, the total
> number of values that must be checked, which seems a reasonable assumption
> in any case.

Great; I find that approach easier to reason about.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers