date:20121206

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2012-12-06 Thread Amit Kapila

On Friday, December 07, 2012 12:06 AM Robert Haas wrote:
> On Thu, Dec 6, 2012 at 1:01 PM, Tom Lane  wrote:
> > I think taking a second whack at setting the visibility bit is a fine
> > idea, but let's drop all the rest of this premature optimization.
> 
> +1.
> 
> If there's any optimization needed here, we should try to do it by
> remembering relevant details from the first vacuum pass in
> backend-private memory, rather than by changing the on-disk format.
> 
> One other thought: I'm wondering if we shouldn't try to push the work
> of setting the all-visible bit into heap_page_prune().  That would
> allow HOT pruning to set the bit.  For example, consider an
> all-visible page.  A tuple is HOT-updated and the page becomes
> not-all-visible.  Now the page is pruned, removing the old tuple and
> changing the line pointer to a redirect.  Presto, page is all-visible
> again.

  I think it can reduce some load of Vacuum as well, but the only thing is
it should not make
  Page prune as heavy.
  By the way, isn't this idea similar to patch at below link:
  http://archives.postgresql.org/pgsql-hackers/2010-02/msg02344.php

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-06 Thread Asif Rehman

Thanks.

Regards,
--Asif


On Fri, Dec 7, 2012 at 9:11 AM, Tom Lane  wrote:

> Asif Rehman  writes:
> > I have attached the stripped-down version. I will leave the type
> coercions
> > support for a separate patch.
>
> Applied with assorted corrections.
>
> regards, tom lane
>

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-06 Thread Tom Lane

Asif Rehman  writes:
> I have attached the stripped-down version. I will leave the type coercions
> support for a separate patch.

Applied with assorted corrections.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2012-12-06 Thread Pavan Deolasee

On Fri, Dec 7, 2012 at 12:05 AM, Robert Haas  wrote:
> On Thu, Dec 6, 2012 at 1:01 PM, Tom Lane  wrote:
>> I think taking a second whack at setting the visibility bit is a fine
>> idea, but let's drop all the rest of this premature optimization.
>
> +1.
>
> If there's any optimization needed here, we should try to do it by
> remembering relevant details from the first vacuum pass in
> backend-private memory, rather than by changing the on-disk format.
>

Yeah, I talked about that approach on the other thread. I thought we
can store the page LSN in the backend private memory for all such
pages and compare that with the current page LSN to know if the page
got an intermediate action to demand a recheck for all-visibility. But
I agree to keep these aggressive optimizations to side for now and
revisit them if necessary.

> One other thought: I'm wondering if we shouldn't try to push the work
> of setting the all-visible bit into heap_page_prune().  That would
> allow HOT pruning to set the bit.  For example, consider an
> all-visible page.  A tuple is HOT-updated and the page becomes
> not-all-visible.  Now the page is pruned, removing the old tuple and
> changing the line pointer to a redirect.  Presto, page is all-visible
> again.
>

+1

I had submitted a patch for that way back in 2008 or 2009, but blame
me for not pursuing to the end.
http://archives.postgresql.org/message-id/2e78013d0812042257x175e5a45w5edeaff14f724...@mail.gmail.com

Alex Hunsaker had reviewed that patch and confirmed a significant
improvement in the vacuum time. I think the patch needed some rework,
but I dropped the ball or got busy with other things while waiting on
others. If you think its useful to have, I will do the necessary work
and submit for the next commitfest.

Excerpts from Alex's comments:

"The only major difference was with this patch vacuum time (after the
first select after some hot updates) was significantly reduced for my
test case (366ms vs 16494ms).

There was no noticeable (within noise) select or update slow down.

I was able to trigger WARNING:  PD_ALL_VISIBLE flag once while running
pgbench but have not be able to re-create it... (should I keep
trying?)"

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Alvaro Herrera

Andrew Dunstan wrote:
> 
> On 12/06/2012 09:23 PM, Bruce Momjian wrote:

> >As soon as pg_dump stopped dumping the CREATE INDEX, pg_upgrade would
> >stop creating creating it in the new cluster, and not transfer the index
> >files.
> 
> So we'll lose the index definition and leave some files behind? This
> sounds a bit messy to say the least.

I find it hard to get excited about this being a real problem.  If the
index has been kept invalid, how come the definition is so valuable?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] pg_ping utility

2012-12-06 Thread Phil Sorber

On Thu, Dec 6, 2012 at 8:54 PM, Michael Paquier
 wrote:
>
>
> On Thu, Dec 6, 2012 at 12:29 AM, Phil Sorber  wrote:
>>
>> On Wed, Dec 5, 2012 at 8:53 AM, Alvaro Herrera 
>> wrote:
>> > No, I think it is the reference docs on the returned value that must be
>> > fixed.  That is, instead of saying that the return value correspond to
>> > the enum values, you should be saying that it will return
>> > 0 if it's okay, 1 in another case and 2 in yet
>> > another case.  And then next to the PQping() enum, add a comment that
>> > the values must not be messed around with because pg_isready exposes
>> > them to users and shell scripts.
>>
>> +1 I'm on board with this.
>
> OK. Let's do that and then mark this patch as ready for committer.
> Thanks,

Those changes have been made.

>
> --
> Michael Paquier
> http://michael.otacoo.com

Something I was just thinking about while testing this again. I
mentioned the issue before about someone meaning to put -v and putting
-V instead and it being a potential source of problems. What about
making verbose the default and removing -v and adding -q to make it
quiet? This would also match other tools behavior. I want to get this
wrapped up and I am fine with it as is, but just wanted to ask what
others thought.

Thanks.


pg_isready_bin_v7.diff
Description: Binary data


pg_isready_docs_v7.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2012-12-06 Thread Pavan Deolasee

On Thu, Dec 6, 2012 at 11:31 PM, Tom Lane  wrote:

>
> I think taking a second whack at setting the visibility bit is a fine
> idea, but let's drop all the rest of this premature optimization.
>

Fair enough. I thought about doing it that way but was worried that an
additional page scan will raise eyebrows. While it does not affect the
common case because we would have done that scan anyways in the
subsequent vacuum, but in the worst case where most of the pages not
remain all-visible by the time we come back to the second phase of
vacuum, this additional line pointer scan will add some overhead. With
couple of pluses for the approach, I won't mind doing it this way
though.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependency in GROUP BY through JOINs

2012-12-06 Thread David Rowley

> From: Simon Riggs [mailto:si...@2ndquadrant.com]
> Sent: 07 December 2012 05:44
> To: David Rowley
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Functional dependency in GROUP BY through JOINs
> 
> On 5 December 2012 23:37, David Rowley  wrote:
> 
> > Though this plan might not be quite as optimal as it could be as it
> > performs the grouping after the join.
> 
> PostgreSQL always calculates aggregation as the last step.

I didn't know that was always the case, but it makes sense I guess. 
This is probably a bigger project than I imagined it would be then.

> 
> It's a well known optimisation to push-down GROUP BY clauses to the lowest
> level, but we don't do that, yet.
> 
> You're right that it can make a massive difference to many queries.
> 

I agree.

Maybe it'd be something worthwhile for the future then. Perhaps if others
agree it should be something to go on the TODO list?

Regards

David Rowley

> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependency in GROUP BY through JOINs

2012-12-06 Thread David Rowley

> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: 07 December 2012 06:22
> To: Simon Riggs
> Cc: David Rowley; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Functional dependency in GROUP BY through JOINs
> 
> Simon Riggs  writes:
> > On 5 December 2012 23:37, David Rowley 
> wrote:
> >> Though this plan might not be quite as optimal as it could be as it
> >> performs the grouping after the join.
> 
> > PostgreSQL always calculates aggregation as the last step.
> 
> > It's a well known optimisation to push-down GROUP BY clauses to the
> > lowest level, but we don't do that, yet.
> 
> > You're right that it can make a massive difference to many queries.
> 
> In the case being presented here, it's not apparent to me that there's any
> advantage to be had at all.  You still need to aggregate over the rows
joining
> to each uniquely-keyed row.  So how exactly are you going to "push down
> the GROUP BY", and where does the savings come from?
> 

I guess the saving is, in at least this case it's because the join only
joins 10 rows on either side, but I think also because the grouping would
also be cheaper because it's done on an INT rather than CHAR().
But I'm thinking you're meaning the planner would have to know this is
cheaper and compare both versions of the plan. 

I should have showed the plan of the other nested query originally, so here
it is this time.

Version = 9.2.1


test=# EXPLAIN ANALYZE
test-# SELECT p.product_code,SUM(s.quantity)
test-# FROM products p
test-# INNER JOIN bigsalestable s ON p.productid = s.productid
test-# GROUP BY p.product_code;
  QUERY PLAN

--
 HashAggregate  (cost=18926.22..18926.32 rows=10 width=150) (actual
time=553.403..553.405 rows=10 loops=1)
   ->  Hash Join  (cost=1.23..18676.22 rows=5 width=150) (actual
time=0.041..324.970 rows=100 loops=1)
 Hash Cond: (s.productid = p.productid)
 ->  Seq Scan on bigsalestable s  (cost=0.00..14425.00 rows=100
width=8) (actual time=0.012..70.627 rows=100 loops=1)
 ->  Hash  (cost=1.10..1.10 rows=10 width=150) (actual
time=0.013..0.013 rows=10 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 1kB
   ->  Seq Scan on products p  (cost=0.00..1.10 rows=10
width=150) (actual time=0.004..0.007 rows=10 loops=1)
 Total runtime: 553.483 ms
(8 rows)

test=# EXPLAIN ANALYZE
test-# SELECT p.product_code,s.quantity
test-# FROM products AS p
test-# INNER JOIN (SELECT productid,SUM(quantity) AS quantity FROM
bigsalestable GROUP BY productid) AS s ON p.productid = s.productid;
 QUERY PLAN


 Hash Join  (cost=19426.22..19431.07 rows=10 width=154) (actual
time=295.548..295.557 rows=10 loops=1)
   Hash Cond: (bigsalestable.productid = p.productid)
   ->  HashAggregate  (cost=19425.00..19427.00 rows=200 width=8) (actual
time=295.514..295.518 rows=10 loops=1)
 ->  Seq Scan on bigsalestable  (cost=0.00..14425.00 rows=100
width=8) (actual time=0.004..59.330 rows=100 loops=1)
   ->  Hash  (cost=1.10..1.10 rows=10 width=150) (actual time=0.017..0.017
rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 ->  Seq Scan on products p  (cost=0.00..1.10 rows=10 width=150)
(actual time=0.010..0.012 rows=10 loops=1)
 Total runtime: 295.612 ms
(8 rows)

Regards

David Rowley


>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> I think we would have have pg_dump --binary-upgrade issue an UPDATE to
> the system catalogs to mark it as invalid.

That'd work for me too- I'm not particular on if it's done as a direct
catalog update or some undocumented feature of CREATE INDEX.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Stephen Frost

* Jeff Davis (pg...@j-davis.com) wrote:
> That is documented in the committed patch -- it's a trade, basically
> saying that you lose isolation but avoid extra writes. It seems
> reasonable that the user gets this behavior if specifically requested.

Strictly speaking, it could actually be two different uesrs..

> In the simple approach that only affects loads in the same transaction
> as the create, this is not an issue. The only issue there is for other
> reads in the same transaction. I think you already know that, but I am
> repeating for clarity.

Actually, no, I'm not convinced of that.  If a seperate transaction
starts before the create/insert, and is still open when the create/insert
transaction commits, wouldn't that seperate transaction see rows in the
newly created table?

That's more-or-less the specific issue I'm bringing up as a potential
concern.  If it's just a FrozenXID stored in the heap and there isn't
anything telling the 2nd transaction to not consider this table that
exists through SnapshotNow, how are we going to know to not look at the
tuples?  Or do we accept that it's "ok" for those to be visible?

How I wish we could fix the table visibility and not have to worry about
this issue at all, which would also remove the need for some special
keyword to be used to tell us it's 'ok' to use this optimization...

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Or preserve it as-is.
> 
> To do that, we would have to add an option to CREATE INDEX to create it
> in an invalid state.  Which is stupid...

Only in a binary-upgrade mode.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Dec  6, 2012 at 09:45:11PM -0500, Stephen Frost wrote:
> > Or preserve it as-is.  I don't really like the 'make them fix it'
> > option, as a user could run into that in the middle of a planned upgrade
> > that had been tested and never had that come up.
> 
> They would get the warning during pg_upgrade --check, of course.

Sure, if they happened to have a concurrent index creation going when
they ran the check...  But what if they didn't and it only happened to
happen during the actual pg_upgrade?  I'm still not thrilled with this
idea of making the user have to abort in the middle to address something
that, really, isn't a big deal to just preserve and deal with later...

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Functional dependency in GROUP BY through JOINs

2012-12-06 Thread David Rowley


> From: Kevin Grittner [mailto:kgri...@mail.com]
> > I think I'm right in thinking that if a unique index exists to match
> > the group by clause, and the join condition is equality (probably
> > using the same operator class as the unique btree index?), then the
> > grouping could be pushed up to before the join.
> 
> Off-hand, it seems equivalent to me; I don't know how much work it would
> be.
> 
> Out of curiosity, does the first query's plan change if you run this instead?:
> 
> SELECT s.product_code,SUM(s.quantity)
> FROM products p
> INNER JOIN bigsalestable s ON p.productid = s.productid GROUP BY
> s.product_code;
> 

I should have made it more clear about the lack of product_code in the 
bigsalestable, there's only productid, the lookup to the product table was to 
obtain the actual more user friendly product_code.

The actual table def's are in the attachment of the original email.

Regards

David Rowley

> -Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Bruce Momjian

On Thu, Dec  6, 2012 at 10:06:13PM -0500, Tom Lane wrote:
> Stephen Frost  writes:
> > * Andrew Dunstan (and...@dunslane.net) wrote:
> >> Making the user fix it seems much more sensible to me. Otherwise I
> >> suspect we'll find users who get strangely surprised when they can
> >> no longer find any trace of an expected index in their upgraded
> >> database.
> 
> > Or preserve it as-is.
> 
> To do that, we would have to add an option to CREATE INDEX to create it
> in an invalid state.  Which is stupid...

I think we would have have pg_dump --binary-upgrade issue an UPDATE to
the system catalogs to mark it as invalid.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Jeff Davis

On Thu, 2012-12-06 at 20:49 -0500, Stephen Frost wrote:
> I'm less concerned about the hint bits and more concerned about the
> implications of the FrozenXID being used, which would make the rows
> visible to other transactions even if they began before the rows were
> loaded.

That is documented in the committed patch -- it's a trade, basically
saying that you lose isolation but avoid extra writes. It seems
reasonable that the user gets this behavior if specifically requested.

> My concern is more about what happens to transactions which are opened
> before this transaction commits and that they might be able to see data
> in the table.

In the simple approach that only affects loads in the same transaction
as the create, this is not an issue. The only issue there is for other
reads in the same transaction. I think you already know that, but I am
repeating for clarity.

> As I mentioned before, I'm not *convinced* that this is really a big
> deal, or even a problem for this patch, but it's something to *consider*
> and think about because, as much as we like to say that DDL is
> transaction-safe, it's *not* completely MVCC and things like creating
> tables and committing them will show up as visible even in transactions
> that shouldn't be able to see them.  Due to that, we need to think about
> what happens with data in those tables, etc.

Agreed. I don't think anyone is suggesting that we change the semantics
of existing commands, unless it's to improve the serializability of DDL.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Bruce Momjian

On Thu, Dec  6, 2012 at 09:45:11PM -0500, Stephen Frost wrote:
> * Andrew Dunstan (and...@dunslane.net) wrote:
> > So we'll lose the index definition and leave some files behind? This
> > sounds a bit messy to say the least.
> 
> Agreed.
> 
> > Making the user fix it seems much more sensible to me. Otherwise I
> > suspect we'll find users who get strangely surprised when they can
> > no longer find any trace of an expected index in their upgraded
> > database.
> 
> Or preserve it as-is.  I don't really like the 'make them fix it'
> option, as a user could run into that in the middle of a planned upgrade
> that had been tested and never had that come up.

They would get the warning during pg_upgrade --check, of course.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Bruce Momjian

On Thu, Dec  6, 2012 at 09:41:00PM -0500, Andrew Dunstan wrote:
> 
> On 12/06/2012 09:23 PM, Bruce Momjian wrote:
> >On Thu, Dec  6, 2012 at 09:10:21PM -0500, Tom Lane wrote:
> >>Bruce Momjian  writes:
> >>>On Thu, Dec  6, 2012 at 07:53:57PM -0500, Tom Lane wrote:
> Because CREATE INDEX CONCURRENTLY can't drop the index if it's already
> failed.  It's not because we want to do that, it's an implementation
> restriction of the horrid kluge that is CREATE/DROP INDEX CONCURRENTLY.
> >>>Well, what is the logic that pg_dump dumps it then, even in
> >>>non-binary-upgrade mode?
> >>Actually, I was thinking about proposing exactly that.  Ideally the
> >>system should totally ignore an invalid index (we just fixed some bugs
> >>in that line already).  So it would be perfectly consistent for pg_dump
> >>to ignore it too, with or without --binary-upgrade.
> >>
> >>One possible spanner in the works for pg_upgrade is that this would mean
> >>there can be relation files in the database directories that it should
> >>ignore (not transfer over).  Dunno if that takes any logic changes.
> >As soon as pg_dump stopped dumping the CREATE INDEX, pg_upgrade would
> >stop creating creating it in the new cluster, and not transfer the index
> >files.
> >
> 
> 
> So we'll lose the index definition and leave some files behind? This
> sounds a bit messy to say the least.

You will lose the index definition, but the new cluster will not have
the invalid index files from the old cluster.

> Making the user fix it seems much more sensible to me. Otherwise I
> suspect we'll find users who get strangely surprised when they can
> no longer find any trace of an expected index in their upgraded
> database.

Well, the indexes weren't being used.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Tom Lane

Stephen Frost  writes:
> * Andrew Dunstan (and...@dunslane.net) wrote:
>> Making the user fix it seems much more sensible to me. Otherwise I
>> suspect we'll find users who get strangely surprised when they can
>> no longer find any trace of an expected index in their upgraded
>> database.

> Or preserve it as-is.

To do that, we would have to add an option to CREATE INDEX to create it
in an invalid state.  Which is stupid...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Stephen Frost

* Andrew Dunstan (and...@dunslane.net) wrote:
> So we'll lose the index definition and leave some files behind? This
> sounds a bit messy to say the least.

Agreed.

> Making the user fix it seems much more sensible to me. Otherwise I
> suspect we'll find users who get strangely surprised when they can
> no longer find any trace of an expected index in their upgraded
> database.

Or preserve it as-is.  I don't really like the 'make them fix it'
option, as a user could run into that in the middle of a planned upgrade
that had been tested and never had that come up.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Andrew Dunstan



On 12/06/2012 09:23 PM, Bruce Momjian wrote:

On Thu, Dec  6, 2012 at 09:10:21PM -0500, Tom Lane wrote:

Bruce Momjian  writes:

On Thu, Dec  6, 2012 at 07:53:57PM -0500, Tom Lane wrote:

Because CREATE INDEX CONCURRENTLY can't drop the index if it's already
failed.  It's not because we want to do that, it's an implementation
restriction of the horrid kluge that is CREATE/DROP INDEX CONCURRENTLY.

Well, what is the logic that pg_dump dumps it then, even in
non-binary-upgrade mode?

Actually, I was thinking about proposing exactly that.  Ideally the
system should totally ignore an invalid index (we just fixed some bugs
in that line already).  So it would be perfectly consistent for pg_dump
to ignore it too, with or without --binary-upgrade.

One possible spanner in the works for pg_upgrade is that this would mean
there can be relation files in the database directories that it should
ignore (not transfer over).  Dunno if that takes any logic changes.

As soon as pg_dump stopped dumping the CREATE INDEX, pg_upgrade would
stop creating creating it in the new cluster, and not transfer the index
files.




So we'll lose the index definition and leave some files behind? This 
sounds a bit messy to say the least.


Making the user fix it seems much more sensible to me. Otherwise I 
suspect we'll find users who get strangely surprised when they can no 
longer find any trace of an expected index in their upgraded database.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to check whether the row was modified by this transaction before?

2012-12-06 Thread Vlad Arkhipov


On 12/07/2012 02:53 AM, Tom Lane wrote:

Vlad Arkhipov  writes:

In a BEFORE UPDATE trigger I need to know whether the row was previously
modified by this transaction. Is it safe to use xmin and txid_current()
for this purpose (xmin is 32-bit txid type but txid_current() returns
64-bit bigint).
IF OLD.xmin = txid_current() THEN

Comparing to txid_current() mod 2^32 would probably work, but note this
will not think that subtransactions or parent transactions are "this
transaction", so any use of savepoints or plpgsql exception blocks is
likely to cause headaches.  Why do you think you need to know this?

regards, tom lane

The use case is quite simple. I'm trying to rewrite our internal system 
versioning extension (SQL feature T180) in more abstract way. Any 
temporal versioned table uses its associated history table to store 
updated and deleted data rows. For this purpose the extension adds AFTER 
UPDATE/DELETE triggers to the table that insert OLD row in the history 
table for updated and deleted rows. But if there are multiple changes to 
a row in the same transaction the trigger should generate a history row 
only for the first change.


On 12/07/2012 06:26 AM, Tom Lane wrote:

It strikes me that the notion of "this row was previously modified by
the current transaction" is squishier than it might look, and we'd do
well to clarify it before we consider exporting anything.  I think there
are three ways you might define such a function:

1. xmin is exactly equal to current (sub)transaction's XID.

2. xmin is this (sub)transaction's XID, or the XID of any subcommitted
subtransaction of it.

3. xmin is this (sub)transaction's XID, or the XID of any subcommitted
subtransaction, or the XID of any open parent transaction or
subcommitted subtransaction thereof.
If I understand you correctly, what I'm looking for is described by the 
3rd case and I may use TransactionIdIsCurrentTransactionId() for this 
purpose?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Bruce Momjian

On Thu, Dec  6, 2012 at 09:10:21PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Thu, Dec  6, 2012 at 07:53:57PM -0500, Tom Lane wrote:
> >> Because CREATE INDEX CONCURRENTLY can't drop the index if it's already
> >> failed.  It's not because we want to do that, it's an implementation
> >> restriction of the horrid kluge that is CREATE/DROP INDEX CONCURRENTLY.
> 
> > Well, what is the logic that pg_dump dumps it then, even in
> > non-binary-upgrade mode?
> 
> Actually, I was thinking about proposing exactly that.  Ideally the
> system should totally ignore an invalid index (we just fixed some bugs
> in that line already).  So it would be perfectly consistent for pg_dump
> to ignore it too, with or without --binary-upgrade.
> 
> One possible spanner in the works for pg_upgrade is that this would mean
> there can be relation files in the database directories that it should
> ignore (not transfer over).  Dunno if that takes any logic changes.

As soon as pg_dump stopped dumping the CREATE INDEX, pg_upgrade would
stop creating creating it in the new cluster, and not transfer the index
files.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Tom Lane

Bruce Momjian  writes:
> On Thu, Dec  6, 2012 at 07:53:57PM -0500, Tom Lane wrote:
>> Because CREATE INDEX CONCURRENTLY can't drop the index if it's already
>> failed.  It's not because we want to do that, it's an implementation
>> restriction of the horrid kluge that is CREATE/DROP INDEX CONCURRENTLY.

> Well, what is the logic that pg_dump dumps it then, even in
> non-binary-upgrade mode?

Actually, I was thinking about proposing exactly that.  Ideally the
system should totally ignore an invalid index (we just fixed some bugs
in that line already).  So it would be perfectly consistent for pg_dump
to ignore it too, with or without --binary-upgrade.

One possible spanner in the works for pg_upgrade is that this would mean
there can be relation files in the database directories that it should
ignore (not transfer over).  Dunno if that takes any logic changes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] pg_ping utility

2012-12-06 Thread Michael Paquier

On Thu, Dec 6, 2012 at 12:29 AM, Phil Sorber  wrote:

> On Wed, Dec 5, 2012 at 8:53 AM, Alvaro Herrera 
> wrote:
> > No, I think it is the reference docs on the returned value that must be
> > fixed.  That is, instead of saying that the return value correspond to
> > the enum values, you should be saying that it will return
> > 0 if it's okay, 1 in another case and 2 in yet
> > another case.  And then next to the PQping() enum, add a comment that
> > the values must not be messed around with because pg_isready exposes
> > them to users and shell scripts.
>
> +1 I'm on board with this.
>
OK. Let's do that and then mark this patch as ready for committer.
Thanks,
-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Stephen Frost

Jeff,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Thu, 2012-12-06 at 20:12 -0500, Stephen Frost wrote:
> > The command is 'FREEZE', which sounded to me like the transaction ID
> > would be set to FrozenXID, meaning that we wouldn't be able to tell if
> > the inserting transaction was before or after ours...
> 
> Freezing does lose information, but I thought that this sub-thread was
> about the HEAP_XMIN_COMMITTED optimization that was in the first version
> of the commit but removed in a later commit. Setting HEAP_XMIN_COMMITTED
> does not lose information.

I'm less concerned about the hint bits and more concerned about the
implications of the FrozenXID being used, which would make the rows
visible to other transactions even if they began before the rows were
loaded.

> That should really be: "aside from the visibility issues before it does
> commit".

My concern is more about what happens to transactions which are opened
before this transaction commits and that they might be able to see data
in the table.

As I mentioned before, I'm not *convinced* that this is really a big
deal, or even a problem for this patch, but it's something to *consider*
and think about because, as much as we like to say that DDL is
transaction-safe, it's *not* completely MVCC and things like creating
tables and committing them will show up as visible even in transactions
that shouldn't be able to see them.  Due to that, we need to think about
what happens with data in those tables, etc.

> Anyway, the HEAP_XMIN_COMMITTED loading optimizations require more
> discussion, but I think they are worth pursuing. The simpler form is
> when the table is created and loaded in the same transaction, but there
> may be some more sophisticated approaches, as well.

Sure.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Jeff Davis

On Thu, 2012-12-06 at 20:05 -0500, Andrew Dunstan wrote:
> I think I prefer the first suggestion. If they are trying to upgrade 
> when there's an invalid index presumably they aren't aware of the 
> invalidity (or they would have done something about it). It would be 
> better to fail and make them fix or remove the index, ISTM.

I'm a little concerned about introducing extra causes of failure into
upgrade when we don't have to. They could have gone on with that invalid
index forever, and I don't see it as the job of upgrade to alert someone
to that problem.

That being said, it's a reasonable position, and I am fine with either
approach.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Jeff Davis

On Thu, 2012-12-06 at 20:12 -0500, Stephen Frost wrote:
> The command is 'FREEZE', which sounded to me like the transaction ID
> would be set to FrozenXID, meaning that we wouldn't be able to tell if
> the inserting transaction was before or after ours...

Freezing does lose information, but I thought that this sub-thread was
about the HEAP_XMIN_COMMITTED optimization that was in the first version
of the commit but removed in a later commit. Setting HEAP_XMIN_COMMITTED
does not lose information.

> Your analysis of the hint bits themselves sounds reasonable but it seems
> independent of the issue regarding setting the actual transaction ID.

Upon re-reading, my last paragraph was worded a little too loosely.

"The interesting thing about HEAP_XMIN_COMMITTED is that it can be set
preemptively if we know that the transaction will actually commit (aside
from the visibility issues within the transaction)."

That should really be: "aside from the visibility issues before it does
commit".

Anyway, the HEAP_XMIN_COMMITTED loading optimizations require more
discussion, but I think they are worth pursuing. The simpler form is
when the table is created and loaded in the same transaction, but there
may be some more sophisticated approaches, as well.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Bruce Momjian

On Thu, Dec  6, 2012 at 07:53:57PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > Yes, I thought of not dumping it.  The problem is that we don't delete
> > the index when it fails, so I assumed we didn't want to lose the index
> > creation information.  I need to understand why we did that.
> 
> Because CREATE INDEX CONCURRENTLY can't drop the index if it's already
> failed.  It's not because we want to do that, it's an implementation
> restriction of the horrid kluge that is CREATE/DROP INDEX CONCURRENTLY.

Well, what is the logic that pg_dump dumps it then, even in
non-binary-upgrade mode?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Stephen Frost

* Jeff Davis (pg...@j-davis.com) wrote:
> However, the rows will *not* be visible, because the serializable
> snapshot doesn't contain the inserting transaction.

That's what we've got now and what would be expected, however...

> Think about the current behavior: right after the commit, another select
> could come along and set all those hint bits anyway. Even if the hint
> bits aren't set, it will do a CLOG lookup and find that the transaction
> committed.

The command is 'FREEZE', which sounded to me like the transaction ID
would be set to FrozenXID, meaning that we wouldn't be able to tell if
the inserting transaction was before or after ours...

Your analysis of the hint bits themselves sounds reasonable but it seems
independent of the issue regarding setting the actual transaction ID.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Andrew Dunstan



On 12/06/2012 07:58 PM, Jeff Davis wrote:

On Thu, 2012-12-06 at 16:31 -0800, Josh Berkus wrote:

There are a few possible fixes.  The first would be to have pg_upgrade
throw an error on any invalid index in the old cluster.  Another option
would be to preserve the invalid state in pg_dump --binary-upgrade.

Or to not dump invalid indexes at all in --binary-upgrade mode.

+1




I think I prefer the first suggestion. If they are trying to upgrade 
when there's an invalid index presumably they aren't aware of the 
invalidity (or they would have done something about it). It would be 
better to fail and make them fix or remove the index, ISTM.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Jeff Davis

On Thu, 2012-12-06 at 16:31 -0800, Josh Berkus wrote:
> > There are a few possible fixes.  The first would be to have pg_upgrade
> > throw an error on any invalid index in the old cluster.  Another option
> > would be to preserve the invalid state in pg_dump --binary-upgrade.
> 
> Or to not dump invalid indexes at all in --binary-upgrade mode.

+1

Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Tom Lane

Bruce Momjian  writes:
> Yes, I thought of not dumping it.  The problem is that we don't delete
> the index when it fails, so I assumed we didn't want to lose the index
> creation information.  I need to understand why we did that.

Because CREATE INDEX CONCURRENTLY can't drop the index if it's already
failed.  It's not because we want to do that, it's an implementation
restriction of the horrid kluge that is CREATE/DROP INDEX CONCURRENTLY.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Jeff Davis

On Thu, 2012-12-06 at 14:18 -0500, Stephen Frost wrote:
> begin;
> 

You need to do a SELECT here to actually get a snapshot.

>   session b
> -
> begin;
> create table q (a 
> integer);
> insert into q values 
> (1);
> commit;
> 
> select * from q;
> 
> 
> You'll get an empty table.  That's not great, but it's life- once
> something is in pg_class, all sessions can see it because the table
> lookups are done using SnapshotNow and aren't truely transactional, but
> at least you can't see any rows in the table because the individual rows
> are marked with the transaction ID which created them and we can't see
> them in our transaction that started before the table was created.
> 
> It sounds like, with this patch/change, this behavior would change.

No, it would not change. Session A would see that the table exists and
see that the rows' inserting transaction (in Session B) committed. That
is correct because the inserting transaction *did* commit, and it's the
same as we have now.

However, the rows will *not* be visible, because the serializable
snapshot doesn't contain the inserting transaction.

Think about the current behavior: right after the commit, another select
could come along and set all those hint bits anyway. Even if the hint
bits aren't set, it will do a CLOG lookup and find that the transaction
committed.

The change being proposed is just to set those hint bits preemptively,
because the fate of the INSERT is identical to the fate of the CREATE
(they are in the same transaction). There will be no observable problem
outside of that CREATE+INSERT transaction. The only catch is what to do
about visibility of the tuples when still inside the transaction (which
is not a problem for transactions doing a simple load).

The interesting thing about HEAP_XMIN_COMMITTED is that it can be set
preemptively if we know that the transaction will actually commit (aside
from the visibility issues within the transaction). Even if the
transaction doesn't commit, it would still be possible to clean out the
dead tuples with a VACUUM, because no information has really been lost
in the process. So there may yet be some kind of safe protocol to set
these even during a load into an existing table...

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Bruce Momjian

On Thu, Dec  6, 2012 at 09:35:19PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > There are a few possible fixes.  The first would be to have pg_upgrade
> > throw an error on any invalid index in the old cluster.  Another option
> > would be to preserve the invalid state in pg_dump --binary-upgrade.
> 
> Yet another option would be for pg_dump --binary-upgrade to ignore
> invalid indexes altogether (and probably "not ready" indexes, too, not
> sure).

Yes, I thought of not dumping it.  The problem is that we don't delete
the index when it fails, so I assumed we didn't want to lose the index
creation information.  I need to understand why we did that.  Why do we
have pg_dump dump the index then?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Alvaro Herrera

Bruce Momjian wrote:

> There are a few possible fixes.  The first would be to have pg_upgrade
> throw an error on any invalid index in the old cluster.  Another option
> would be to preserve the invalid state in pg_dump --binary-upgrade.

Yet another option would be for pg_dump --binary-upgrade to ignore
invalid indexes altogether (and probably "not ready" indexes, too, not
sure).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Josh Berkus


> There are a few possible fixes.  The first would be to have pg_upgrade
> throw an error on any invalid index in the old cluster.  Another option
> would be to preserve the invalid state in pg_dump --binary-upgrade.

Or to not dump invalid indexes at all in --binary-upgrade mode.

> I also need help in how to communicate this to users since our next
> minor release will be in the future.

Blog post?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade problem with invalid indexes

2012-12-06 Thread Bruce Momjian

I got a report today on the IRC channel about a pg_upgrade problem with
upgrading clusters with indexes that exist but are invalid.

For example, if you use CREATE INDEX CONCURRENTLY, then shut down the
server while it is running, the index will be left as INVALID;  from our
CREATE INDEX docs:

   If a problem arises while scanning the table, such as a uniqueness
   violation in a unique index, the CREATE INDEX command will fail but
   leave behind an 'invalid' index. This index will be ignored
   for querying purposes because it might be incomplete; however
   it will still consume update overhead. The psql \d command will
   report such an index as INVALID:

   postgres=# \d tab
  Table "public.tab"
Column |  Type   | Modifiers
   +-+---
col| integer |
   Indexes:
   "idx" btree (col) INVALID

   The recommended recovery method in such cases is to drop the
   index and try again to perform CREATE INDEX CONCURRENTLY. (Another
   possibility is to rebuild the index with REINDEX. However, since
   REINDEX does not support concurrent builds, this option is unlikely
   to seem attractive.)

The problem is that this invalid state is not dumped by pg_dump, meaning
pg_upgrade will restore the index as valid.

There are a few possible fixes.  The first would be to have pg_upgrade
throw an error on any invalid index in the old cluster.  Another option
would be to preserve the invalid state in pg_dump --binary-upgrade.

I also need help in how to communicate this to users since our next
minor release will be in the future.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Daniel Farina

On Thu, Dec 6, 2012 at 9:33 AM, Tom Lane  wrote:
> "MauMau"  writes:
>> I'm using PostgreSQL 9.1.6 on Linux.  I encountered a serious problem that
>> media recovery failed showing the following message:
>> FATAL:  archive file "000100800028" has wrong size: 7340032
>> instead of 16777216
>
> Well, that's unfortunate, but it's not clear that automatic recovery is
> possible.  The only way out of it would be if an undamaged copy of the
> segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
> not even be trying to fetch from the archive if we had a local copy.

I'm inclined to agree with this: I've had a case much like the
original poster (too-short WAL segments because of media issues),
except in my case the archiver had archived a bogus copy of the data
also (short length and all), so our attempt to recover from archives
on a brand new system was met with obscure failure[0].  And, rather
interestingly, the WAL disk was able to "write" bogusly without error
for many minutes, which made for a fairly exotic recovery based on
pg_resetxlog: I grabbed quite a few minutes of WAL of various sub-16MB
sizes to spot check the situation.

It never occurred to me there was a way to really fix this that still
involves the archiver reading from a file system: what can one do when
one no longer trusts read() to get what was write()d?

[0]:  Postgres wasn't very good about reporting the failure: in the
case bogus files have been restored from archives, it seems to just
bounce through timelines it knows about searching for a WAL it likes,
without any real error messaging like got "corrupt wal from archive".
That could probably be fixed.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -DCLOBBER_CACHE_ALWAYS shows COPY FREEZE regression problem

2012-12-06 Thread Simon Riggs

On 7 December 2012 00:06, Andres Freund  wrote:

> Apparently the magic to preserve those values across cache resets isn't
> strong enough for this. Seems bad, because that seems to mean a sinval
> overflow leads to this and related optimizations being lost?

Which seems to back up the case for it being a hint, that the system
might not be able to honor.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -DCLOBBER_CACHE_ALWAYS shows COPY FREEZE regression problem

2012-12-06 Thread Tom Lane

Andrew Dunstan  writes:
> On a new buildfarm member friarbird 
> ,
>  
> configured with _DCLOBBER_CACHE_ALWAYS:

>BEGIN;
>TRUNCATE vistest;
>COPY vistest FROM stdin CSV FREEZE;
> + NOTICE:  FREEZE option specified but pre-conditions not met

The notice is coming out because the relcache is dropping the value of 
rd_newRelfilenodeSubid as a result of cache flushes.  The COPY FREEZE
code is aware of this risk, commenting

 * As mentioned in comments in utils/rel.h, the in-same-transaction test
 * is not completely reliable, since in rare cases rd_createSubid or
 * rd_newRelfilenodeSubid can be cleared before the end of the transaction.
 * However this is OK since at worst we will fail to make the optimization.

but I'd say this seriously throws into question whether it should be
emitting a notice.  That seems to tie into the discussion a little bit
ago about whether the FREEZE option is a command or a hint.  Throwing a
notice seems like a pretty schizophrenic choice: it's not an error but
you're in the user's face about it anyway.  In any case, if the option
is unreliable (and with this implementation it certainly is), we can
*not* treat its failure as an error, so it's not a command.

TBH I think that COPY FREEZE should not have been committed yet;
it doesn't seem to be fully baked.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -DCLOBBER_CACHE_ALWAYS shows COPY FREEZE regression problem

2012-12-06 Thread Andres Freund

On 2012-12-06 23:59:06 +, Simon Riggs wrote:
> On 6 December 2012 23:28, Andrew Dunstan  wrote:
> > On a new buildfarm member friarbird
> > ,
> > configured with _DCLOBBER_CACHE_ALWAYS:
> >
> >   BEGIN;
> >   TRUNCATE vistest;
> >   COPY vistest FROM stdin CSV FREEZE;
> >+ NOTICE:  FREEZE option specified but pre-conditions not met
>
> I don't understand why that build option would produce different
> output for simple logic.
>
> I'll look again in the morning.

It clears the relcache entry as soon as AcceptInvalidationMessages() is
done seems to loose rd_createSubid and rd_newRelfilenodeSubid.

Apparently the magic to preserve those values across cache resets isn't
strong enough for this. Seems bad, because that seems to mean a sinval
overflow leads to this and related optimizations being lost?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] -DCLOBBER_CACHE_ALWAYS shows COPY FREEZE regression problem

2012-12-06 Thread Simon Riggs

On 6 December 2012 23:28, Andrew Dunstan  wrote:
> On a new buildfarm member friarbird
> ,
> configured with _DCLOBBER_CACHE_ALWAYS:
>
>   BEGIN;
>   TRUNCATE vistest;
>   COPY vistest FROM stdin CSV FREEZE;
>+ NOTICE:  FREEZE option specified but pre-conditions not met

I don't understand why that build option would produce different
output for simple logic.

I'll look again in the morning.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Tomas Vondra

Hi,

On 6.12.2012 23:45, MauMau wrote:
> From: "Tom Lane" 
>> Well, that's unfortunate, but it's not clear that automatic recovery is
>> possible.  The only way out of it would be if an undamaged copy of the
>> segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
>> not even be trying to fetch from the archive if we had a local copy.
> 
> No, PG will try to fetch the WAL file from pg_xlog when it cannot get it
> from archive.  XLogFileReadAnyTLI() does that.  Also, PG manual contains
> the following description:
> 
> http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL
> 
> 
> WAL segments that cannot be found in the archive will be sought in
> pg_xlog/; this allows use of recent un-archived segments. However,
> segments that are available from the archive will be used in preference
> to files in pg_xlog/.

So why don't you use an archive command that does not create such
incomplete files? I mean something like this:

archive_command = 'cp %p /arch/%f.tmp && mv /arch/%f.tmp /arch/%f'

Until the file is renamed, it's considered 'incomplete'.


Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] -DCLOBBER_CACHE_ALWAYS shows COPY FREEZE regression problem

2012-12-06 Thread Andrew Dunstan

On a new buildfarm member friarbird 
, 
configured with _DCLOBBER_CACHE_ALWAYS:


  BEGIN;
  TRUNCATE vistest;
  COPY vistest FROM stdin CSV FREEZE;
   + NOTICE:  FREEZE option specified but pre-conditions not met


and similar.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: How to check whether the row was modified by this transaction before?

2012-12-06 Thread Andres Freund

On 2012-12-06 16:26:50 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-12-06 13:59:32 -0500, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
> >>> Vlad Arkhipov  writes:
>  In a BEFORE UPDATE trigger I need to know whether the row was previously
>  modified by this transaction. Is it safe to use xmin and txid_current()
>  for this purpose (xmin is 32-bit txid type but txid_current() returns
>  64-bit bigint).
>
> >>> I wonder if we shouldn't have a function txid_is_current(xid);
>
> > I think architectures with an invalidation-queue to external caches (be
> > it web-proxies or something lower-level) are quite popular. And with the
> > new NOTIFY or pgq relatively simple. Ad to those its sensible not to
> > post a single primary key more than once.
>
> It strikes me that the notion of "this row was previously modified by
> the current transaction" is squishier than it might look, and we'd do
> well to clarify it before we consider exporting anything.

You're right.

I am also wondering if we can assume that for all interesting purposes
enough context is available to determine if the row was just inserted or
updated or whether we need to make a test for HEAP_UPDATED available in
some form.

> I think there
> are three ways you might define such a function:
>
> 1. xmin is exactly equal to current (sub)transaction's XID.
>
> 2. xmin is this (sub)transaction's XID, or the XID of any subcommitted
> subtransaction of it.
>
> 3. xmin is this (sub)transaction's XID, or the XID of any subcommitted
> subtransaction, or the XID of any open parent transaction or
> subcommitted subtransaction thereof.
>
> If I've got my head screwed on straight, test #2 gives you the semantics
> that "the previous row update cannot commit unless the action you are
> about to take (with the current XID) commits".  Test #3 gives you the
> semantics "the action you are about to take cannot commit unless the
> previous row update does".  And test #1 doesn't have much to recommend
> it except simplicity; while it might appear to have the semantics "the
> previous row update will commit if and only if the action you are about
> to take commits", it's actually narrower than that, because the same
> could be said for already-subcommitted subtransactions.
>
> In a cache invalidation context you probably want test #2, but
> TransactionIdIsCurrentTransactionId() presently performs test #3.

I agree that 1) isn't all that interesting but after that it really gets
a bit harder...

2) would be useful to do stuff like avoiding to queue any invalidations
if the table is newly truncated, but that seems like a fragile thing to
do from userspace. A table rewrite seems to be something fundamentally
different from a truncation here because that would need more detail
that txid_is_current(xid) would give you.
Maybe something like 'pg_relation_is_new()' would be interesting, but
that seems like a different thing, and I am not sure the use-case for
that is clear enough.
>From a userlevel perspective 3) seems to be enough if all you want to
test whether the relation is new, but again, you couldn't test for that
sensibly because there's no access to HEAP_UPDATED.

For which cases do you think 2) is interesting wrt. cache invalidations?


I think 3) might be more interesting for the (for me) common case where
you just want to avoid queuing duplicate invalidations. Because all
youre interested in that case is: "Can I be sure that if I commit
another invalidation has already been queued for this tuple.". And 3)
seems to gives you that.

> (We discussed this point in connection with commit 7b90469b, and I've
> been intending since then to take a closer look at all the callers of
> TransactionIdIsCurrentTransactionId to see if these semantics are in
> fact what they need.  We might have some bugs associated with confusion
> on this.)

Its an easy error to make I think, so I wouldn't be too surprised
somebody made it before me. On the other hand, it seems to me that to be
dangerous you need a part of the system thats not obeying transactional
semantics like the indexes which still contain enum oids that aren't in
the catalogs anymore. So I hope there aren't too many places where
people could have made that mistake.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread MauMau


From: "Tom Lane" 

Well, that's unfortunate, but it's not clear that automatic recovery is
possible.  The only way out of it would be if an undamaged copy of the
segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
not even be trying to fetch from the archive if we had a local copy.


No, PG will try to fetch the WAL file from pg_xlog when it cannot get it 
from archive.  XLogFileReadAnyTLI() does that.  Also, PG manual contains the 
following description:


http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL

WAL segments that cannot be found in the archive will be sought in pg_xlog/; 
this allows use of recent un-archived segments. However, segments that are 
available from the archive will be used in preference to files in pg_xlog/.


So, continuing recovery by changing the emode to LOG would work.  What do 
you think about this fix?




I think having PG automatically destroy archive files is bordering on
insane.



I agree.  Before that, PG cannot know the archive location, so PG cannot 
delete the partially filled archive files.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-06 Thread Tomas Vondra

On 6.12.2012 05:47, Shigeru Hanada wrote:
> On Mon, Nov 12, 2012 at 4:36 AM, Tomas Vondra  wrote:
>> Hi,
>>
>> I've prepared a slightly updated patch, based on the previous review.
>> See it attached.
> 
> All changes in v3 patch seem good, however I found some places which requires
> cosmetic changes.  Please see attached v3.1 patch for those changes.

OK, thanks!

>>> Oops, you're right. I omitted 1-3 and 1-4 in actual measurement, but
>>> IMO loading data is necessary to fill the shared buffer up, because
>>> # of buffers which are deleted during COMMIT is major factor of this
>>> patch. And, yes, I verified that all shared buffers are used　after
>>> all loading have been finished.
>>
>> I don't think it's an important factor, as the original code does this:
>>
>>   for (i = 0; i < NBuffers; i++)
>>   {
>> volatile BufferDesc *bufHdr = &BufferDescriptors[i];
>> ...
>>   }
>>
>> in the DropRelFileNodeAllBuffers. That loops through all shared buffers
>> no matter what, so IMHO the performance in this case depends on the
>> total size of the shared buffers. But maybe I'm missing something important.
> 
> I worry the effect of "continue" in the loop to the performance.  If most of
> shared buffers are not used at the moment of DROP, the load of DROP doesn't
> contain the overhead of LockBufHdr and subsequent few lines.
> Do you expect such situation in your use cases?

I still fail to see the issue here - can you give an example of when
this would be a problem?

The code was like this before the patch, I only replaced the simple
comparison to a binary search ... Or do you think that this check means
"if the buffer is empty"?

/* buffer does not belong to any of the relations */
if (rnode == NULL)
continue;

Because it does not - notice that the 'rnode' is a result of the binary
search, not a direct check of the buffer.

So the following few lines (lock and recheck) will be skipped either for
unused buffers and buffers belonging to relations that are not being
dropped.

Maybe we could do a simple 'is the buffer empty' check before the
bsearch call, but I don't expect that to happen very often in real world
(the cache tends to be used).

>> I've done a simple benchmark on my laptop with 2GB shared buffers, it's
>> attached in the drop-test.py (it's a bit messy, but it works).
> [snip]
>> With those parameters, I got these numbers on the laptop:
>>
>>   creating 1 tables
>> all tables created in 3.298694 seconds
>>   dropping 1 tables one by one ...
>> all tables dropped in 32.692478 seconds
>>   creating 1 tables
>> all tables created in 3.458178 seconds
>>   dropping 1 tables in batches by 100...
>> all tables dropped in 3.28268 seconds
>>
>> So it's 33 seconds vs. 3.3 seconds, i.e. 10x speedup. On AWS we usually
>> get larger differences, as we use larger shared buffers and the memory
>> is significantly slower there.
> 
> Do you have performance numbers of same test on not-patched PG?
> This patch aims to improve performance of bulk DROP, so 4th number in
> the result above should be compared to 4th number of not-patched PG?

I've re-run the tests with the current patch on my home workstation, and
the results are these (again 10k tables, dropped either one-by-one or in
batches of 100).

1) unpatched

dropping one-by-one:15.5 seconds
dropping in batches of 100: 12.3 sec

2) patched (v3.1)

dropping one-by-one:32.8 seconds
dropping in batches of 100:  3.0 sec

The problem here is that when dropping the tables one-by-one, the
bsearch overhead is tremendous and significantly increases the runtime.
I've done a simple check (if dropping a single table, use the original
simple comparison) and I got this:

3) patched (v3.2)

dropping one-by-one:16.0 seconds
dropping in batches of 100:  3.3 sec

i.e. the best of both. But it seems like an unnecessary complexity to me
- if you need to drop a lot of tables you'll probably do that in a
transaction anyway.

Tomas
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 2446282..9e92230 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -312,6 +312,10 @@ smgrDoPendingDeletes(bool isCommit)
PendingRelDelete *pending;
PendingRelDelete *prev;
PendingRelDelete *next;
+   SMgrRelation *srels = (SMgrRelation *) palloc(sizeof(SMgrRelation));
+   int nrels = 0,
+   i = 0,
+   maxrels = 1;
 
prev = NULL;
for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -335,14 +339,33 @@ smgrDoPendingDeletes(bool isCommit)
SMgrRelation srel;
 
srel = smgropen(pending->relnode, 
pending->backend);
-   smgrdounlink(srel, false);
-   smgrclose(srel);
+   
+

Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread MauMau


From: "Kevin Grittner" 

If you are attempting a PITR-style recovery and you want to include
WAL entries from the partially-copied file, pad a copy of it with
NUL bytes to the expected length.


I'm afraid This is unacceptably difficult, or almost impossible, for many PG 
users.  How do you do the following?


1. Identify the file type (WAL segment, backup history file, timeline 
history file) and its expected size in the archive_command script. 
archive_command has to handle these three types of files.  Embedding file 
name logic (e.g. WAL is 000100020003) in archive_command is a 
bad idea, because the file name might change in the future PG release.


2. Append NUL bytes to the file in the archive_command shell script or batch 
file.  Particularly I have no idea about Windows.  I have some PG systems 
running on Windows.  This would compromise the ease of use of PostgreSQL.


So I believe PG should handle the problem, not the archive_command.

Regards
MauMau






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-06 Thread Tom Lane

Asif Rehman  writes:
> I have attached the stripped-down version. I will leave the type coercions
> support for a separate patch.

OK, I'll take a look at this one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Background worker processes

2012-12-06 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Background worker processes

I had tested this on EXEC_BACKEND some time ago, and it worked fine, but
I had neglected since then, and now I find it fails with a pretty
strange message on startup.  Also, Andres and I have been talking about
other possible problems in that scenario (mostly that the order in which
shared libraries are loaded might not be deterministic, and this would
cause the whole approach to fall down).

I'm considering reverting this; but since it won't cause a visible
problem unless and until a worker is loaded, leaving it in place might
enable someone else to peek at it while I come up with some idea to
handle the broken EXEC_BACKEND case ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: How to check whether the row was modified by this transaction before?

2012-12-06 Thread Tom Lane

Andres Freund  writes:
> On 2012-12-06 13:59:32 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>>> On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
>>> Vlad Arkhipov  writes:
 In a BEFORE UPDATE trigger I need to know whether the row was previously
 modified by this transaction. Is it safe to use xmin and txid_current()
 for this purpose (xmin is 32-bit txid type but txid_current() returns
 64-bit bigint).

>>> I wonder if we shouldn't have a function txid_is_current(xid);

> I think architectures with an invalidation-queue to external caches (be
> it web-proxies or something lower-level) are quite popular. And with the
> new NOTIFY or pgq relatively simple. Ad to those its sensible not to
> post a single primary key more than once.

It strikes me that the notion of "this row was previously modified by
the current transaction" is squishier than it might look, and we'd do
well to clarify it before we consider exporting anything.  I think there
are three ways you might define such a function:

1. xmin is exactly equal to current (sub)transaction's XID.

2. xmin is this (sub)transaction's XID, or the XID of any subcommitted
subtransaction of it.

3. xmin is this (sub)transaction's XID, or the XID of any subcommitted
subtransaction, or the XID of any open parent transaction or
subcommitted subtransaction thereof.

If I've got my head screwed on straight, test #2 gives you the semantics
that "the previous row update cannot commit unless the action you are
about to take (with the current XID) commits".  Test #3 gives you the
semantics "the action you are about to take cannot commit unless the
previous row update does".  And test #1 doesn't have much to recommend
it except simplicity; while it might appear to have the semantics "the
previous row update will commit if and only if the action you are about
to take commits", it's actually narrower than that, because the same
could be said for already-subcommitted subtransactions.

In a cache invalidation context you probably want test #2, but
TransactionIdIsCurrentTransactionId() presently performs test #3.

(We discussed this point in connection with commit 7b90469b, and I've
been intending since then to take a closer look at all the callers of
TransactionIdIsCurrentTransactionId to see if these semantics are in
fact what they need.  We might have some bugs associated with confusion
on this.)

I'm not sure which of these semantics we might wish to expose to users.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-06 Thread Asif Rehman

Hi,

I have attached the stripped-down version. I will leave the type coercions
support for a separate patch.

Regards,
--Asif



On Fri, Dec 7, 2012 at 1:14 AM, Tom Lane  wrote:

> Robert Haas  writes:
> > Are you going to commit a stripped-down version of the patch?
>
> I set it back to "waiting on author" --- don't know if he wants to
> produce a stripped-down version with no type coercions, or try to use
> cast-based coercions.
>
> regards, tom lane
>


return_rowtype.v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-06 Thread Dimitri Fontaine

Andres Freund  writes:
> On 2012-12-06 18:42:22 +, Simon Riggs wrote:
>> "in-between state" means what? And what danger do you see?
>
> For example during table rewrites we have a temporary pg_class entry
> thats a full copy of the table, with a separate oid, relfilenode and
> everything. That gets dropped rather unceremonially, without the usual
> safety checks. If the user did anything referencing that table we would
> possibly have a corrupt catalog or even a segfault at our hands.
>
> For normal triggers the code takes quite some care to avoid such
> dangers.

I think we need to be solving that problem when we implement new firing
points for event trigger. The 'table_rewrite' event needs to fire at a
time when the code can cope with it. That's the main difficulty in
adding events in that system, asserting their code location safety.

> Event triggers get called *during* the ALTER TABLE. So if were not
> careful they see something thats not easy to handle.

They need to fire before catalogs are modified, or after, not in
between, I agree with that. I don't see other ways of implementing that
than carefully placing the call to user code in the backend's code.

> I am for example not sure what would happen if we had a "rewrite" event
> trigger which inserts a log entry into a logtable. Not a stupid idea,
> right?
> Now imagine we had a deferred unique key on that logtable and the
> logtable is the one that gets rewritten...

The log insert needs to happen either before or after the rewrite, in
terms of catalog state. I don't see any magic bullet here.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)

2012-12-06 Thread Dimitri Fontaine

Tom Lane  writes:
> Only for these new-style thingies.  I am not suggesting breaking the
> existing file-based implementation, only offering a parallel
> catalog-based implementation too.  We'd have to think about what to do
> for name collisions --- probably having the catalog entry take
> precedence is okay, but is there an argument for something else?

Yeah, well, I would have prefered to have two ways to fill-in the
templates then only work from the templates. That would solve the name
collision problem, and I guess would allow to share more code.

The other thing is that I want to support extensions that use both
models: say the prototype has been written in pl/pythonu but the next
version now is switching to C/.so…

>> [ need separate catalogs for install scripts and update scripts ]
> Check.

Ok.

>>pg_extension_control(extension, version, default_version,
>> default_full_version, module_pathname,
>> relocatable, superuser, schema, requires)
>
> Given that the feature is going to be restricted to pure-SQL extensions,
> I'm pretty sure we can do without module_pathname, and maybe some other
> things.

I already removed "directory" from that list beause once in the catalogs
you don't care where the files might have been found. MODULE_PATHNAME is
about how to read the script that we would store in the catalogs, not
sure we can bypass that.

> Yeah, possibly, but I don't have a better idea yet.  I don't like
> either PARAMETERS or SCRIPT --- for one thing, those don't convey the
> idea that this is an object in its own right rather than an attribute of
> an extension.

A template is something that needs to be instanciated with specific
parameters, and can get used any number of times with different sets of
parameters to build each time a new object. It's nothing like what we're
talking about, or I don't understand it at all.

My understanding is that we store the extension "sources" in our
catalogs so as to be able to execute them later. The only option we have
that looks like a template parameter would be the SCHEMA, the others are
about picking the right sources/script.

> Actually, given the text search precedent, I'm not sure why you're so
> against TEMPLATE.

See above, my understanding of your proposal is not matching the
definition I know of that term.

>>That would mean that ALTER EXTENSION could create objects in other
>>catalogs for an extension that does not exists itself yet, but is now
>>known available (select * from pg_available_extensions()).
>
> Man, that is just horrid.  It brings back exactly the confusion we're
> trying to eliminate by using the "template" terminology.  We don't want
> it to look like manipulating a template has anything to do with altering
> an extension of the same name (which might or might not even be
> installed).

I still can't help but thinking in terms of populating the "templates" one
way or the other and then using the "templates" to create or update the
extension itself.

We could maybe have a command akin to "yum update" or "apt-get update"
that would refresh the TEMPLATEs from disk (handling name conflicts,
file name parsing and control files parsing), and some options to the
EXTENSION commands to force a refresh before working?

So either

  REFRESH EXTENSION TEMPLATES;
  ALTER EXTENSION hstore UPDATE TO '1.2';

or

  ALTER EXTENSION hstore UPDATE TO '1.2' WITH TEMPLATE REFRESH;

So my horrid proposal above would mean that the REFRESH option defaults
to true, and is also available to CREATE EXTENSION. I'm not sure how
much less horrid that makes it, but I sure hope it allows to better
explain / convey my vision about the thing.

>> The $2.56 question being what would be the pg_dump policy of the
>> "extension templates" objects?
>
> The ones that are catalog objects, not file objects, should be dumped
> I think.

Agreed.

> Wrong.  There is no reason whatsoever to load file-based stuff into
> catalogs.  That just adds complication and overhead to cases that work
> already, and will break update cases (what happens when a package update
> changes the files?).

What happens if the extension that was a created from a template is now
maintained on-disk (switch from pl/perlu to C)? What if the extension
that was on-disk because you couldn't use a template in 9.1 and 9.2 now
wants to be managed by the template system?

What if the PGXN guys think template are a perfect solution to integrate
into their client tool but the debian and yum packagers prefer to ship
disk-based extensions? And you want to switch from one packaging system
to the other?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-06 Thread Tom Lane

Robert Haas  writes:
> Are you going to commit a stripped-down version of the patch?

I set it back to "waiting on author" --- don't know if he wants to
produce a stripped-down version with no type coercions, or try to use
cast-based coercions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-06 Thread Petr Jelinek

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
> ow...@postgresql.org] On Behalf Of Andres Freund
> Sent: 06 December 2012 20:44
> To: Petr Jelinek
> Cc: 'Simon Riggs'; 'Robert Haas'; 'Dimitri Fontaine'; 'Josh Berkus';
pgsql-
> hack...@postgresql.org
> Subject: Re: [HACKERS] ALTER TABLE ... NOREWRITE option
> > > Event triggers get called *during* the ALTER TABLE. So if were not
> > > careful they see something thats not easy to handle.
> > >
> >
> > I thought the point of this was to call the trigger *before* anything
> > happens.
> 
> Just because the rewrite hasn't started yet, doesn't mean nothing else has
> been changed.
> 
> Note, I am not saying this is impossible or anything, the original point
drawn
> into question was that we need to be especially careful with choosing
> callsites and thats its not trivial to do right.
> 

Ok my assumption is that the event would be fired before ALTER actually did
anything, firing triggers while DDL is actually already being executed seems
like bad idea.

Regards
Petr Jelinek



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-06 Thread Andres Freund

On 2012-12-06 20:27:33 +0100, Petr Jelinek wrote:
> > -Original Message-
> > From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
> > ow...@postgresql.org] On Behalf Of Andres Freund
> > Sent: 06 December 2012 20:04
> > To: Simon Riggs
> > Cc: Robert Haas; Dimitri Fontaine; Josh Berkus;
> pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] ALTER TABLE ... NOREWRITE option
> > > I don't see any difference between an event trigger and these
> > statements...
> > >
> > > BEGIN;
> > > ALTER TABLE x ...;
> > > SELECT somefunction();
> > > ALTER TABLE y ...;
> > > COMMIT;
> >
> > Event triggers get called *during* the ALTER TABLE. So if were not careful
> > they see something thats not easy to handle.
> >
>
> I thought the point of this was to call the trigger *before* anything
> happens.

Just because the rewrite hasn't started yet, doesn't mean nothing else
has been changed.

Note, I am not saying this is impossible or anything, the original point
drawn into question was that we need to be especially careful with
choosing callsites and thats its not trivial to do right.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-06 Thread Petr Jelinek

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
> ow...@postgresql.org] On Behalf Of Andres Freund
> Sent: 06 December 2012 20:04
> To: Simon Riggs
> Cc: Robert Haas; Dimitri Fontaine; Josh Berkus;
pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] ALTER TABLE ... NOREWRITE option
> > I don't see any difference between an event trigger and these
> statements...
> >
> > BEGIN;
> > ALTER TABLE x ...;
> > SELECT somefunction();
> > ALTER TABLE y ...;
> > COMMIT;
> 
> Event triggers get called *during* the ALTER TABLE. So if were not careful
> they see something thats not easy to handle.
> 

I thought the point of this was to call the trigger *before* anything
happens.

Regards
Petr Jelinek




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Stephen Frost

Jeff,

* Jeff Davis (pg...@j-davis.com) wrote:
> That isn't a problem, because the other session won't see the tuple in
> pg_class until the creating transaction commits, at which point the rows
> have committed, too (because this would only kick in when the rows are
> loaded in the same transaction as the CREATE).

See, that's what I originally thought but was corrected on it at one
point (by Tom, iirc).

I just did a simple test against 9.2.1 using serializable mode.  In this
mode, if you do something like this:

session a
-

begin;

  session b
  -
  begin;
  create table q (a 
integer);
  insert into q values 
(1);
  commit;

select * from q;

You'll get an empty table.  That's not great, but it's life- once
something is in pg_class, all sessions can see it because the table
lookups are done using SnapshotNow and aren't truely transactional, but
at least you can't see any rows in the table because the individual rows
are marked with the transaction ID which created them and we can't see
them in our transaction that started before the table was created.

It sounds like, with this patch/change, this behavior would change.
Now, I'm not necessairly against this, but it's clearly something
different than what we had before and might be an issue for some.  If,
in the general case, we're actually 'ok' with this, I'd ask why we don't
simply do it by default..?  If we're *not* 'ok' with it, perhaps we
shouldn't be doing it at all...

If we fix the bigger issue (which, as I understand it from various
discussions with Tom and Robert, is very difficult), then this whole
problem goes away entirely.  I always figured/expected that to be how
we'd fix this, not just punt on the issue and tell the user "sure, you
can do this, hope you know what you're doing...".

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Jeff Davis

On Thu, 2012-12-06 at 18:16 +, Simon Riggs wrote:
> > I tend to agree with Andres on this one.  This feels a bit like
> > accepting a command but then not actually following-through on it
> > if it turns out we can't actually do it.  If it's truely an optimization
> > (and I suspect my other email/question might provide insight into that),
> > then it should be something we can 'just do' without needing to be asked
> > to do it, along the same lines of not WAL'ing when the appropriate
> > conditions are met (table created in this transaction, etc, etc).
> 
> That depends whether its a command or a do-if-possible hint. Its
> documented as the latter.
> 
> Similar to the way VACUUM tries to truncate a relation, but gives up
> if it can't. And on an even closer example, VACUUM FREEZE itself,
> which doesn't guarantee that all rows are frozen at the end of it...

Also, if the set of conditions changes in the future, we would have a
problem if that caused new errors to appear.

I think a WARNING might make more sense than a NOTICE, but I don't have
a strong opinion about that.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: How to check whether the row was modified by this transaction before?

2012-12-06 Thread Andres Freund

On 2012-12-06 13:59:32 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
> >> Vlad Arkhipov  writes:
> >>> In a BEFORE UPDATE trigger I need to know whether the row was previously
> >>> modified by this transaction. Is it safe to use xmin and txid_current()
> >>> for this purpose (xmin is 32-bit txid type but txid_current() returns
> >>> 64-bit bigint).
>
> > I wonder if we shouldn't have a function txid_is_current(xid);
>
> Yeah, I was wondering that too, and wanted to know if the OP had a
> use-case that was mainstream enough to justify adding such a function.

I think architectures with an invalidation-queue to external caches (be
it web-proxies or something lower-level) are quite popular. And with the
new NOTIFY or pgq relatively simple. Ad to those its sensible not to
post a single primary key more than once.

Magnus had talks about specifically that on various conferences if that
counts as anything ;)

Mainstreamy enough?

Andres
--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-06 Thread Robert Haas

On Thu, Dec 6, 2012 at 1:36 PM, Tom Lane  wrote:
> I'm against putting I/O coercion semantics into tupconvert, period.  Ever.
> If plpgsql wants that behavior rather than something more consistent
> with the rest of the system, it needs to implement it for itself.

I'm sure that can be done.  I don't think anyone is objecting to that,
just trying to get useful behavior out of the system.

Are you going to commit a stripped-down version of the patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-06 Thread Andres Freund

On 2012-12-06 18:42:22 +, Simon Riggs wrote:
> On 6 December 2012 18:31, Andres Freund  wrote:
> > On 2012-12-06 18:21:09 +, Simon Riggs wrote:
> >> On 6 December 2012 00:46, Robert Haas  wrote:
> >> > On Wed, Dec 5, 2012 at 6:45 PM, Simon Riggs  
> >> > wrote:
> >> >> Yes, but it is also the trigger writers problem.
> >> >
> >> > Maybe to some degree.  I don't think that a server crash or something
> >> > like a block-read error is ever tolerable though, no matter how silly
> >> > the user is with their event trigger logic.  If we go down that road
> >> > it will be impossible to know whether errors that are currently
> >> > reliable indicators of software or hardware problems are in fact
> >> > caused by event triggers.   Of course, if an event trigger causes the
> >> > system to error out in some softer way, that's perfectly fine...
> >>
> >> How are event triggers more dangerous than normal triggers/functions?
> >
> > Normal triggers aren't run when the catalog is in an in-between state
> > because they aren't run while catalog modifications are taking place.
>
> "in-between state" means what? And what danger do you see?

For example during table rewrites we have a temporary pg_class entry
thats a full copy of the table, with a separate oid, relfilenode and
everything. That gets dropped rather unceremonially, without the usual
safety checks. If the user did anything referencing that table we would
possibly have a corrupt catalog or even a segfault at our hands.

For normal triggers the code takes quite some care to avoid such
dangers.

>  If its just "someone might write bad code" that horse has already
> bolted - functions, triggers, executor hooks, operators, indexes etc

Not sure what you mean by that. Those don't get called in situation
where they don't have a reliable work-environment.

> I don't see any difference between an event trigger and these statements...
>
> BEGIN;
> ALTER TABLE x ...;
> SELECT somefunction();
> ALTER TABLE y ...;
> COMMIT;

Event triggers get called *during* the ALTER TABLE. So if were not
careful they see something thats not easy to handle.

I am for example not sure what would happen if we had a "rewrite" event
trigger which inserts a log entry into a logtable. Not a stupid idea,
right?
Now imagine we had a deferred unique key on that logtable and the
logtable is the one that gets rewritten...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to check whether the row was modified by this transaction before?

2012-12-06 Thread Tom Lane

Andres Freund  writes:
> On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
>> Vlad Arkhipov  writes:
>>> In a BEFORE UPDATE trigger I need to know whether the row was previously
>>> modified by this transaction. Is it safe to use xmin and txid_current()
>>> for this purpose (xmin is 32-bit txid type but txid_current() returns
>>> 64-bit bigint).

> I wonder if we shouldn't have a function txid_is_current(xid);

Yeah, I was wondering that too, and wanted to know if the OP had a
use-case that was mainstream enough to justify adding such a function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2012-12-06 Thread Tom Lane

Robert Haas  writes:
> One other thought: I'm wondering if we shouldn't try to push the work
> of setting the all-visible bit into heap_page_prune().

Hm, maybe ...

>  But it seems to me that a page can't be all-visible unless there are
> no dead line pointers and no HOT chains of length != 1, and
> heap_prune_chain() does manage to call HeapTupleSatisfiesVacuum() for
> every tuple, so the raw information seems like it is available without
> any additional CLOG lookups.

HeapTupleSatisfiesVacuum is interested in whether a dead tuple is dead
to everybody, but I don't think it figures out whether a live tuple is
live to everybody.  On the assumption that most tuples are live, adding
the latter calculation might represent significant expense.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade status display

2012-12-06 Thread Bruce Momjian

On Thu, Dec  6, 2012 at 02:53:44PM -0300, Alvaro Herrera wrote:
> Robert Haas escribió:
> > On Wed, Dec 5, 2012 at 10:04 PM, Bruce Momjian  wrote:
> > > Pg_upgrade displays file names during copy and database names during
> > > dump/restore.  Andrew Dunstan identified three bugs:
> > >
> > > *  long file names were being truncated to 60 _leading_ characters, which
> > > often do not change for long file names
> > >
> > > *  file names were truncated to 60 characters in log files
> > >
> > > *  carriage returns were being output to log files
> > >
> > > The attached patch fixes these --- it prints 60 _trailing_ characters to
> > > the status display, and full path names without carriage returns to log
> > > files.
> > 
> > This might be a dumb question, but why limit it to 60 characters at
> > all instead of, say, MAXPGPATH?
> 
> I think this should be keyed off the terminal width, actually, no?  The
> whole point of this is to overwrite the same line over and over, right?

That seems like overkill for a status message.  It is just there so
users know pg_upgrade isn't stuck, which was the complaint before the
message was used.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade status display

2012-12-06 Thread Bruce Momjian


On Thu, Dec  6, 2012 at 12:43:53PM -0500, Robert Haas wrote:
> On Wed, Dec 5, 2012 at 10:04 PM, Bruce Momjian  wrote:
> > Pg_upgrade displays file names during copy and database names during
> > dump/restore.  Andrew Dunstan identified three bugs:
> >
> > *  long file names were being truncated to 60 _leading_ characters, which
> > often do not change for long file names
> >
> > *  file names were truncated to 60 characters in log files
> >
> > *  carriage returns were being output to log files
> >
> > The attached patch fixes these --- it prints 60 _trailing_ characters to
> > the status display, and full path names without carriage returns to log
> > files.
> 
> This might be a dumb question, but why limit it to 60 characters at
> all instead of, say, MAXPGPATH?

It is limited to 60 only for screen display, so the user knows what is
being processed.  If the text wraps across several lines, the \r trick
to overwrite the string will not work.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-06 Thread Simon Riggs

On 6 December 2012 18:31, Andres Freund  wrote:
> On 2012-12-06 18:21:09 +, Simon Riggs wrote:
>> On 6 December 2012 00:46, Robert Haas  wrote:
>> > On Wed, Dec 5, 2012 at 6:45 PM, Simon Riggs  wrote:
>> >> Yes, but it is also the trigger writers problem.
>> >
>> > Maybe to some degree.  I don't think that a server crash or something
>> > like a block-read error is ever tolerable though, no matter how silly
>> > the user is with their event trigger logic.  If we go down that road
>> > it will be impossible to know whether errors that are currently
>> > reliable indicators of software or hardware problems are in fact
>> > caused by event triggers.   Of course, if an event trigger causes the
>> > system to error out in some softer way, that's perfectly fine...
>>
>> How are event triggers more dangerous than normal triggers/functions?
>
> Normal triggers aren't run when the catalog is in an in-between state
> because they aren't run while catalog modifications are taking place.

"in-between state" means what? And what danger do you see?
 If its just "someone might write bad code" that horse has already
bolted - functions, triggers, executor hooks, operators, indexes etc

I don't see any difference between an event trigger and these statements...

BEGIN;
ALTER TABLE x ...;
SELECT somefunction();
ALTER TABLE y ...;
COMMIT;

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: How to check whether the row was modified by this transaction before?

2012-12-06 Thread Andres Freund

On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
> Tom Lane wrote:
> > Vlad Arkhipov  writes:
> > > In a BEFORE UPDATE trigger I need to know whether the row was previously
> > > modified by this transaction. Is it safe to use xmin and txid_current()
> > > for this purpose (xmin is 32-bit txid type but txid_current() returns
> > > 64-bit bigint).

I wonder if we shouldn't have a function txid_is_current(xid); I could
have used that previously to avoid queuing multiple external
cache-invalidations if something gets repeatedly updated in the same
transaction. And I seem to remember some people here asking this
question before on the lists.

> >
> > >IF OLD.xmin = txid_current() THEN
> >
> > Comparing to txid_current() mod 2^32 would probably work,
>
> I think we should be setting the initial epoch to something other than
> zero.  That way, some quick testing would have revealed this problem
> immediately.

+1, currently the difference of xid vs bigint is hard to spot.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-06 Thread Tom Lane

Robert Haas  writes:
> On Thu, Dec 6, 2012 at 12:31 AM, Pavel Stehule  
> wrote:
>> but we can limit a implicit coercion in tupconvert via new parameter -
>> because we would to forward plpgsql behave just from this direction.
>> Then when this parameter - maybe "allowIOCoercion" will be false, then
>> tupconvert will have same behave like before.

> It would be nice to make that work, but it could be left for a
> separate patch, I suppose.  I'm OK with Tom's proposal to go ahead and
> commit the core mechanic first, if doing more than that is
> controversial.

I'm against putting I/O coercion semantics into tupconvert, period.  Ever.
If plpgsql wants that behavior rather than something more consistent
with the rest of the system, it needs to implement it for itself.

If the per-column conversions were cast-based, it wouldn't be such a
flagrantly bad idea to implement it in tupconvert.  I'm still not
convinced that that's the best place for it though.  tupconvert is about
rearranging columns, not about changing their contents.

It might be more appropriate to invent an expression evaluation
structure that could handle such nontrivial composite-type coercions.
I'm imagining that somehow we disassemble a composite value (produced by
some initial expression node), pass its fields through coercion nodes as
required, and then reassemble them in a toplevel RowExpr.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2012-12-06 Thread Robert Haas

On Thu, Dec 6, 2012 at 1:01 PM, Tom Lane  wrote:
> I think taking a second whack at setting the visibility bit is a fine
> idea, but let's drop all the rest of this premature optimization.

+1.

If there's any optimization needed here, we should try to do it by
remembering relevant details from the first vacuum pass in
backend-private memory, rather than by changing the on-disk format.

One other thought: I'm wondering if we shouldn't try to push the work
of setting the all-visible bit into heap_page_prune().  That would
allow HOT pruning to set the bit.  For example, consider an
all-visible page.  A tuple is HOT-updated and the page becomes
not-all-visible.  Now the page is pruned, removing the old tuple and
changing the line pointer to a redirect.  Presto, page is all-visible
again.

Also, it seems to me that heap_page_prune() is already figuring out
most of the stuff we need to know in order to decide whether to set
PD_ALL_VISIBLE.  The logic looks quite different from what is
happening in the vacuum code, because the vacuum code iterates over
all of the line pointers while the pruning code is walking HOT chains.
 But it seems to me that a page can't be all-visible unless there are
no dead line pointers and no HOT chains of length != 1, and
heap_prune_chain() does manage to call HeapTupleSatisfiesVacuum() for
every tuple, so the raw information seems like it is available without
any additional CLOG lookups.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-06 Thread Jan Wieck

Kevin and Robert are well aware of most of the below. I just want to put 
this out here so other people, who haven't followed the discussion too 
closely, may chime in.


Some details on the problem:

First of all, there is a minimum number of 1000 pages that the vacuum 
scan must detect as possibly being all empty at the end of a relation. 
Without at least 8MB of possible free space at the end, the code never 
calls lazy_truncate_heap(). This means we don't have to worry about tiny 
relations at all. Any relation that stays under 8MB turnover between 
autovacuum VACUUM runs can never get into this ever.


Relations that have higher turnover than that, but at random places or 
with a high percentage of rather static rows, don't fall into the 
problem category either. They may never accumulate that much "contiguous 
free space at the end". The turnover will be reusing free space all over 
the place. So again, lazy_truncate_heap() won't be called ever.


Relations that eventually build up more than 8MB of free space at the 
end aren't automatically a problem. The autovacuum VACUUM scan just 
scanned those pages at the end, which means that the safety scan for 
truncate, done under exclusive lock, is checking exactly those pages at 
the end and most likely they are still in memory. The truncate safety 
scan will be fast due to a 99+% buffer cache hit rate.


The only actual problem case (I have found so far) are rolling window 
tables of significant size, that can bloat multiple times their normal 
size every now and then. This is indeed a rare corner case and I have no 
idea how many users may (unknowingly) be suffering from it.


This rare corner case triggers lazy_truncate_heap() with a significant 
amount of free space to truncate. The table bloats, then all the bloat 
is deleted and the periodic 100% turnover will guarantee that all "live" 
tuples will shortly after circulate in lower block numbers again, with 
gigabytes of empty space at the end.


This by itself isn't a problem still. The existing code may do the job 
just fine "unless" there is "frequent" access to that very table. Only 
at this special combination of circumstances we actually have a problem.


Only now, with a significant amount of free space at the end and 
frequent access to the table, the truncate safety scan takes long enough 
and has to actually read pages from disk to interfere with client 
transactions.


At this point, the truncate safety scan may have to be interrupted to 
let the frequent other traffic go through. This is what we accomplish 
with the autovacuum_truncate_lock_check interval, where we voluntarily 
release the lock whenever someone else needs it. I agree with Kevin that 
a 20ms check interval is reasonable because the code to check this is 
even less expensive than releasing the exclusive lock we're holding.


At the same time, completely giving up and relying on the autovacuum 
launcher to restart another worker isn't as free as it looks like 
either. The next autovacuum worker will have to do the VACUUM scan 
first, before getting to the truncate phase. We cannot just skip blindly 
to the truncate code. With repeated abortion of the truncate, the table 
would deteriorate and accumulate dead tuples again. The removal of dead 
tuples and their index tuples has priority.


As said earlier in the discussion, the VACUUM scan will skip pages, that 
are marked as completely visible. So the scan won't physically read the 
majority of the empty pages at the end of the table over and over. But 
it will at least scan all pages, that had been modified since the last 
VACUUM run.


To me this means that we want to be more generous to the truncate code 
about acquiring the exclusive lock. In my tests, I've seen that a 
rolling window table with a "live" set of just 10 MB or so, but empty 
space of 3 GB, can still have a 2 minute VACUUM scan time. Throwing that 
work away because we can't acquire the exclusive lock withing 2 seconds 
is a waste of effort.


Something in between 2-60 seconds sounds more reasonable to me.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-06 Thread Andres Freund

On 2012-12-06 18:21:09 +, Simon Riggs wrote:
> On 6 December 2012 00:46, Robert Haas  wrote:
> > On Wed, Dec 5, 2012 at 6:45 PM, Simon Riggs  wrote:
> >> Yes, but it is also the trigger writers problem.
> >
> > Maybe to some degree.  I don't think that a server crash or something
> > like a block-read error is ever tolerable though, no matter how silly
> > the user is with their event trigger logic.  If we go down that road
> > it will be impossible to know whether errors that are currently
> > reliable indicators of software or hardware problems are in fact
> > caused by event triggers.   Of course, if an event trigger causes the
> > system to error out in some softer way, that's perfectly fine...
>
> How are event triggers more dangerous than normal triggers/functions?

Normal triggers aren't run when the catalog is in an in-between state
because they aren't run while catalog modifications are taking place.

Consider a trigger running before CREATE INDEX CONCURRENTLY (which
relies on being the first thing to do database access in a transaction)
that does database access.

Or a trigger running during a table rewrite that inserts into the
intermediary table (pg_rewrite_xxx or whatever they are named). That
possibly would lead to a crash because the pg_class entry of that table
are suddently gone.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to check whether the row was modified by this transaction before?

2012-12-06 Thread Tom Lane

Alvaro Herrera  writes:
> I think we should be setting the initial epoch to something other than
> zero.  That way, some quick testing would have revealed this problem
> immediately.

Yeah, having initdb start the epoch at 1 doesn't seem unreasonable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-06 Thread Simon Riggs

On 6 December 2012 00:46, Robert Haas  wrote:
> On Wed, Dec 5, 2012 at 6:45 PM, Simon Riggs  wrote:
>> Yes, but it is also the trigger writers problem.
>
> Maybe to some degree.  I don't think that a server crash or something
> like a block-read error is ever tolerable though, no matter how silly
> the user is with their event trigger logic.  If we go down that road
> it will be impossible to know whether errors that are currently
> reliable indicators of software or hardware problems are in fact
> caused by event triggers.   Of course, if an event trigger causes the
> system to error out in some softer way, that's perfectly fine...

How are event triggers more dangerous than normal triggers/functions?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)

2012-12-06 Thread Tom Lane

Dimitri Fontaine  writes:
> Tom Lane  writes:
>> I think a separate kind of "extension template" object would make a lot
>> more sense.

> I'm on board now. We still have some questions to answer, and here's a
> worked out design proposal for implementing my understanding of your
> "extension's template" idea:

>  - Extension Scripts are now stored in the catalogs, right?

Only for these new-style thingies.  I am not suggesting breaking the
existing file-based implementation, only offering a parallel
catalog-based implementation too.  We'd have to think about what to do
for name collisions --- probably having the catalog entry take
precedence is okay, but is there an argument for something else?

> [ need separate catalogs for install scripts and update scripts ]

Check.

>pg_extension_control(extension, version, default_version,
> default_full_version, module_pathname,
> relocatable, superuser, schema, requires)

Given that the feature is going to be restricted to pure-SQL extensions,
I'm pretty sure we can do without module_pathname, and maybe some other
things.

>  - The naming "TEMPLATE" appears to me to be too much of a generic
>naming for our usage here, so I'm not sure about it yet.

Yeah, possibly, but I don't have a better idea yet.  I don't like
either PARAMETERS or SCRIPT --- for one thing, those don't convey the
idea that this is an object in its own right rather than an attribute of
an extension.

>Oh actually TEMPLATE is already a keyword thanks to text search,

Actually, given the text search precedent, I'm not sure why you're so
against TEMPLATE.

>That would mean that ALTER EXTENSION could create objects in other
>catalogs for an extension that does not exists itself yet, but is now
>known available (select * from pg_available_extensions()).

Man, that is just horrid.  It brings back exactly the confusion we're
trying to eliminate by using the "template" terminology.  We don't want
it to look like manipulating a template has anything to do with altering
an extension of the same name (which might or might not even be
installed).

> The $2.56 question being what would be the pg_dump policy of the
> "extension templates" objects?

The ones that are catalog objects, not file objects, should be dumped
I think.

> Now, my understanding is that CREATE EXTENSION would check for templates
> being already available in the catalogs and failing to find them would
> have to do the extra steps of creating them from disk files as a
> preparatory step, right? (backward compatibility requirement)

Wrong.  There is no reason whatsoever to load file-based stuff into
catalogs.  That just adds complication and overhead to cases that work
already, and will break update cases (what happens when a package update
changes the files?).

> I don't think we could easily match a .so with an extension's template
> so I won't be proposing that, but we could quite easily match them now
> with extensions, because we're going to have to LOAD the module while
> creating_extension = true.

One more time: this mode has nothing to do with extensions that involve
a .so.  It's for extensions that can be represented purely as scripts,
and thus are self-contained in the proposed catalog entries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Simon Riggs

On 6 December 2012 17:02, Stephen Frost  wrote:
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> It's not a bug. Requesting a useful, but not critical optimisation is
>> just a hint. The preconditions are not easy to understand, so I see no
>> reason to punish people that misunderstand, or cause programs to fail
>> in ways that need detailed understanding to make them work again.
>
> I tend to agree with Andres on this one.  This feels a bit like
> accepting a command but then not actually following-through on it
> if it turns out we can't actually do it.  If it's truely an optimization
> (and I suspect my other email/question might provide insight into that),
> then it should be something we can 'just do' without needing to be asked
> to do it, along the same lines of not WAL'ing when the appropriate
> conditions are met (table created in this transaction, etc, etc).

That depends whether its a command or a do-if-possible hint. Its
documented as the latter.

Similar to the way VACUUM tries to truncate a relation, but gives up
if it can't. And on an even closer example, VACUUM FREEZE itself,
which doesn't guarantee that all rows are frozen at the end of it...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to check whether the row was modified by this transaction before?

2012-12-06 Thread Alvaro Herrera

Tom Lane wrote:
> Vlad Arkhipov  writes:
> > In a BEFORE UPDATE trigger I need to know whether the row was previously 
> > modified by this transaction. Is it safe to use xmin and txid_current() 
> > for this purpose (xmin is 32-bit txid type but txid_current() returns 
> > 64-bit bigint).
> 
> >IF OLD.xmin = txid_current() THEN
> 
> Comparing to txid_current() mod 2^32 would probably work,

I think we should be setting the initial epoch to something other than
zero.  That way, some quick testing would have revealed this problem
immediately.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2012-12-06 Thread Tom Lane

Pavan Deolasee  writes:
> So the idea that the patch implements is this. When we scan pages in
> the first phase of vacuum, if we find a page that has all-visible
> tuples but also has one or more dead tuples that we know the second
> phase of vacuum will remove, we mark such page with a special flag
> called PD_ALL_VISIBLE_OR_DEAD (I'm not married to the name, so feel
> free to suggest a better name). What this flag tells the second phase
> of vacuum is to mark the page all-visible and set the VM bit unless
> some intermediate action on the page again inserts a not-all-visible
> tuple. If such an action happens, the PD_ALL_VISIBLE_OR_DEAD flag will
> be cleared by that backend and the second phase of vacuum will not set
> all-visible flag and VM bit.

This seems overly complicated, as well as a poor use of a precious page
header flag bit.  Why not just recheck all-visibility status after
performing the deletions?  Keep in mind that even if there were some
not-all-visible tuples when we were on the page the first time, they
might have become all-visible by the time we return.  So this is going
out of its way to produce a less-than-optimal result.

> The patch itself is quite small and works as intended. One thing that
> demanded special handling is the fact that visibilitymap_set()
> requires a cutoff xid to tell the Hot Standby to resolve conflicts
> appropriately. We could have scanned the page again during the second
> phase to compute the cutoff xid, but I thought we can overload the
> pd_prune_xid field in the page header to store this information which
> is already computed in the first phase.

And this is *far* too cute.  The use of that field is complicated enough
already without having its meaning vary depending on randomly-designed
flag bits.  And it's by no means obvious that using a by-now-old value
when we return to the page is a good idea anyway (or even correct).

I think taking a second whack at setting the visibility bit is a fine
idea, but let's drop all the rest of this premature optimization.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependency in GROUP BY through JOINs

2012-12-06 Thread Kevin Grittner

Tom Lane wrote:

> In the case being presented here, it's not apparent to me that
> there's any advantage to be had at all.

The OP reported a different plan which was twice as fast, although
showing EXPLAIN ANALYZE results for both would be nice confirmation
of that.

> You still need to aggregate over the rows joining to each
> uniquely-keyed row.

Yes.

> So how exactly are you going to "push down the GROUP BY", and
> where does the savings come from?

There are 100 million rows in bigsalestable that fall into 450
groups. The question is whether you look up related data in the
products table once per row or once per group. Apparently those
extra 999550 lookups take enough time to matter.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependency in GROUP BY through JOINs

2012-12-06 Thread Simon Riggs

On 6 December 2012 17:21, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 5 December 2012 23:37, David Rowley  wrote:
>>> Though this plan might not be quite as optimal as it could be as it performs
>>> the grouping after the join.
>
>> PostgreSQL always calculates aggregation as the last step.
>
>> It's a well known optimisation to push-down GROUP BY clauses to the
>> lowest level, but we don't do that, yet.
>
>> You're right that it can make a massive difference to many queries.
>
> In the case being presented here, it's not apparent to me that there's
> any advantage to be had at all.  You still need to aggregate over the
> rows joining to each uniquely-keyed row.  So how exactly are you going
> to "push down the GROUP BY", and where does the savings come from?

David presents SQL that shows how that is possible.

In terms of operators, after push down we aggregate 1 million rows and
then join 450. Which seems cheaper than join 1 million rows and
aggregate 1 million. So we're passing nearly 1 million fewer rows into
the join.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Jeff Davis

On Thu, 2012-12-06 at 11:55 -0500, Stephen Frost wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> When I last recall this discussion (likely in some bar in Europe), the
> problem was also that an independent session would be able to:
> 
> a) see that the table exists (due to SnapshotNow being used for lookups)
> b) see rows in the table
> 
> Part 'a' is something which I think would be good to fix (but hard),
> and it sounds like this patch might make part 'b' happen, which I think
> makes the part 'a' problem worse.  I'm guessing this isn't a problem for
> the TRUNCATE case because the second session would still be looking at
> the pre-TRUNCATE files on disk, right?  Is that reference to exactly
> which files on disk to look at being used to address the CREATE problem
> too..?

That isn't a problem, because the other session won't see the tuple in
pg_class until the creating transaction commits, at which point the rows
have committed, too (because this would only kick in when the rows are
loaded in the same transaction as the CREATE).

So, yes, it's like TRUNCATE in that regard.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade status display

2012-12-06 Thread Alvaro Herrera

Robert Haas escribió:
> On Wed, Dec 5, 2012 at 10:04 PM, Bruce Momjian  wrote:
> > Pg_upgrade displays file names during copy and database names during
> > dump/restore.  Andrew Dunstan identified three bugs:
> >
> > *  long file names were being truncated to 60 _leading_ characters, which
> > often do not change for long file names
> >
> > *  file names were truncated to 60 characters in log files
> >
> > *  carriage returns were being output to log files
> >
> > The attached patch fixes these --- it prints 60 _trailing_ characters to
> > the status display, and full path names without carriage returns to log
> > files.
> 
> This might be a dumb question, but why limit it to 60 characters at
> all instead of, say, MAXPGPATH?

I think this should be keyed off the terminal width, actually, no?  The
whole point of this is to overwrite the same line over and over, right?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to check whether the row was modified by this transaction before?

2012-12-06 Thread Tom Lane

Vlad Arkhipov  writes:
> In a BEFORE UPDATE trigger I need to know whether the row was previously 
> modified by this transaction. Is it safe to use xmin and txid_current() 
> for this purpose (xmin is 32-bit txid type but txid_current() returns 
> 64-bit bigint).

>IF OLD.xmin = txid_current() THEN

Comparing to txid_current() mod 2^32 would probably work, but note this
will not think that subtransactions or parent transactions are "this
transaction", so any use of savepoints or plpgsql exception blocks is
likely to cause headaches.  Why do you think you need to know this?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to check whether the row was modified by this transaction before?

2012-12-06 Thread Robert Haas

On Thu, Dec 6, 2012 at 3:58 AM, Vlad Arkhipov  wrote:
> In a BEFORE UPDATE trigger I need to know whether the row was previously
> modified by this transaction. Is it safe to use xmin and txid_current() for
> this purpose (xmin is 32-bit txid type but txid_current() returns 64-bit
> bigint).
>
> CREATE FUNCTION test_trigger()
> RETURNS TRIGGER AS $$
> BEGIN
>   IF OLD.xmin = txid_current() THEN
> -- Do something.
>   ELSE
> -- Do something else.
>   END IF;
> END;
> $$ LANGUAGE plpgsql;

txid_current() will return a different value from xmin after the XID
space has wrapped around at least once; also, you might need to
consider subtransactions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-06 Thread Robert Haas

On Thu, Dec 6, 2012 at 12:31 AM, Pavel Stehule  wrote:
> 2012/12/5 Tom Lane :
>> Asif Rehman  writes:
>>> Here is the updated patch. I overlooked the loop, checking to free the
>>> conversions map. Here are the results now.
>>
>> I looked at this patch briefly.  It seems to me to be completely
>> schizophrenic about the coercion rules.  This bit:
>>
>> -   if (atttypid != att->atttypid ||
>> -   (atttypmod != att->atttypmod && atttypmod >= 
>> 0))
>> +   if ((atttypmod != att->atttypmod && atttypmod >= 0) 
>> ||
>> +   !can_coerce_type(1, &atttypid, 
>> &(att->atttypid), COERCE_IMPLICIT_CAST))
>>
>> says that we'll allow column types to differ if there is an implicit
>> cast from the source to the target (or at least I think that's what's
>> intended, although it's got the source and target backwards).  Fine, but
>> then why don't we use the cast machinery to do the conversion?  This
>> is taking plpgsql's cowboy let's-coerce-via-IO-no-matter-whether-that's-
>> right-or-not approach and sticking it into a core part of the system.
>> There's no guarantee at all that applying typoutput then typinput
>> will match the conversion semantics you get from an actual cast, and
>> in many practical cases such as int4 to int8 it'll be drastically less
>> efficient anyway.  It would make more sense to do something similar to
>> coerce_record_to_complex(), that is modify the expression tree to
>> coerce the columns using the regular cast machinery.
>>
>> Also, the typmod part of the test seems completely broken.  For one
>> thing, comparing typmods isn't sane if the types themselves aren't
>> the same.  And it's quite unclear to me why we'd want to have an
>> anything-goes policy for type coercion, or even an implicit-casts-only
>> policy, but then insist that the typmods match exactly.  This coding
>> will allow varchar(42) to text, but not varchar(42) to varchar(43)
>> ... where's the sense in that?
>>
>> The patch also seems to go a great deal further than what was asked for
>> originally, or indeed is mentioned in the documentation patch, namely
>> fixing the restriction on RETURN to allow composite-typed expressions.
>> Specifically it's changing the code that accepts composite input
>> arguments.  Do we actually want that?  If so, shouldn't it be
>> documented?
>>
>> I'm inclined to suggest that we drop all the coercion stuff and just
>> do what Robert actually asked for originally, which was the mere
>> ability to return a composite value as long as it matched the function's
>> result type.  I'm not convinced that we want automatic implicit type
>> coercions here.  In any case I'm very much against sticking such a thing
>> into general-purpose support code like tupconvert.c.  That will create a
>> strong likelihood that plpgsql's poorly-designed coercion semantics will
>> leak into other aspects of the system.
>
> I think so without some change of coercion this patch is not too
> useful because very simply test fail
>
> create type foo(a int, b text);
>
> create or replace function foo_func()
> returns foo as $$
> begin
>   ...
>   return (10, 'hello');
>
> end;
>
> but we can limit a implicit coercion in tupconvert via new parameter -
> because we would to forward plpgsql behave just from this direction.
> Then when this parameter - maybe "allowIOCoercion" will be false, then
> tupconvert will have same behave like before.

It would be nice to make that work, but it could be left for a
separate patch, I suppose.  I'm OK with Tom's proposal to go ahead and
commit the core mechanic first, if doing more than that is
controversial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-06 Thread Robert Haas

On Wed, Dec 5, 2012 at 10:16 PM, Jan Wieck  wrote:
> On 12/5/2012 2:00 PM, Robert Haas wrote:
>>
>> Many it'd be sensible to relate the retry time to the time spend
>> vacuuming the table.  Say, if the amount of time spent retrying
>> exceeds 10% of the time spend vacuuming the table, with a minimum of
>> 1s and a maximum of 1min, give up.  That way, big tables will get a
>> little more leeway than small tables, which is probably appropriate.
>
> That sort of "dynamic" approach would indeed be interesting. But I fear that
> it is going to be complex at best. The amount of time spent in scanning
> heavily depends on the visibility map. The initial vacuum scan of a table
> can take hours or more, but it does update the visibility map even if the
> vacuum itself is aborted later. The next vacuum may scan that table in
> almost no time at all, because it skips all blocks that are marked "all
> visible".

Well, if that's true, then there's little reason to worry about giving
up quickly, because the next autovacuum a minute later won't consume
many resources.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade status display

2012-12-06 Thread Robert Haas

On Wed, Dec 5, 2012 at 10:04 PM, Bruce Momjian  wrote:
> Pg_upgrade displays file names during copy and database names during
> dump/restore.  Andrew Dunstan identified three bugs:
>
> *  long file names were being truncated to 60 _leading_ characters, which
> often do not change for long file names
>
> *  file names were truncated to 60 characters in log files
>
> *  carriage returns were being output to log files
>
> The attached patch fixes these --- it prints 60 _trailing_ characters to
> the status display, and full path names without carriage returns to log
> files.

This might be a dumb question, but why limit it to 60 characters at
all instead of, say, MAXPGPATH?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Tom Lane

"MauMau"  writes:
> I'm using PostgreSQL 9.1.6 on Linux.  I encountered a serious problem that 
> media recovery failed showing the following message:
> FATAL:  archive file "000100800028" has wrong size: 7340032 
> instead of 16777216

Well, that's unfortunate, but it's not clear that automatic recovery is
possible.  The only way out of it would be if an undamaged copy of the
segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
not even be trying to fetch from the archive if we had a local copy.

> Therefore, I think postgres must continue recovery by fetching files from 
> pg_xlog/ when it encounters a partially filled archive files.  In addition, 
> it may be necessary to remove the partially filled archived files, because 
> they might prevent media recovery in the future (is this true?).  I mean we 
> need the following fix.  What do you think?

I think having PG automatically destroy archive files is bordering on
insane.  It might be reasonable for the archiving process to do
something like this, if it has a full-size copy of the file available
to replace the damaged copy with.  But otherwise you're just throwing
away what's probably the only copy of useful data.

> I've heard that the next minor release is scheduled during this weekend.  I 
> really wish this problem will be fixed in that release.  If you wish, I'll 
> post the patch tomorrow or the next day.  Could you include the fix in the 
> weekend release?

Even if this were a good and uncontroversial idea, I'm afraid you're
several days too late.  The release is today, not next week.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependency in GROUP BY through JOINs

2012-12-06 Thread Tom Lane

Simon Riggs  writes:
> On 5 December 2012 23:37, David Rowley  wrote:
>> Though this plan might not be quite as optimal as it could be as it performs
>> the grouping after the join.

> PostgreSQL always calculates aggregation as the last step.

> It's a well known optimisation to push-down GROUP BY clauses to the
> lowest level, but we don't do that, yet.

> You're right that it can make a massive difference to many queries.

In the case being presented here, it's not apparent to me that there's
any advantage to be had at all.  You still need to aggregate over the
rows joining to each uniquely-keyed row.  So how exactly are you going
to "push down the GROUP BY", and where does the savings come from?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> It's not a bug. Requesting a useful, but not critical optimisation is
> just a hint. The preconditions are not easy to understand, so I see no
> reason to punish people that misunderstand, or cause programs to fail
> in ways that need detailed understanding to make them work again.

I tend to agree with Andres on this one.  This feels a bit like
accepting a command but then not actually following-through on it
if it turns out we can't actually do it.  If it's truely an optimization
(and I suspect my other email/question might provide insight into that),
then it should be something we can 'just do' without needing to be asked
to do it, along the same lines of not WAL'ing when the appropriate
conditions are met (table created in this transaction, etc, etc).

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-06 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> I haven't looked at the committed patch - which seemed a bit

Disclaimer- neither have I, but..

When I last recall this discussion (likely in some bar in Europe), the
problem was also that an independent session would be able to:

a) see that the table exists (due to SnapshotNow being used for lookups)
b) see rows in the table

Part 'a' is something which I think would be good to fix (but hard),
and it sounds like this patch might make part 'b' happen, which I think
makes the part 'a' problem worse.  I'm guessing this isn't a problem for
the TRUNCATE case because the second session would still be looking at
the pre-TRUNCATE files on disk, right?  Is that reference to exactly
which files on disk to look at being used to address the CREATE problem
too..?

That said, I love the idea of having a way to drop tuples in w/o having
to go back and rewrite them three times..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-06 Thread Kevin Grittner

MauMau wrote:

> [Problem]
> I'm using PostgreSQL 9.1.6 on Linux. I encountered a serious
> problem that media recovery failed showing the following message:
> 
> FATAL: archive file "000100800028" has wrong size:
> 7340032 instead of 16777216
> 
> I'm using normal cp command to archive WAL files. That is:
> 
>  archive_command = '/path/to/my_script.sh "%p"
"/backup/archive_log/%f"'
> 
> <>
> --
> #!/bin/sh
> some processing...
> cp "$1" "$2"
> other processing...
> --
> 
> 
> The media recovery was triggered by power failure. The disk drive
> that stored $PGDATA failed after a power failure. So I replaced
> the failed disk, and performed media recovery by creating
> recovery.conf and running pg_ctl start. However, pg_ctl failed
> with the above error message.

If you are attempting a PITR-style recovery and you want to include
WAL entries from the partially-copied file, pad a copy of it with
NUL bytes to the expected length.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependency in GROUP BY through JOINs

2012-12-06 Thread Simon Riggs

On 5 December 2012 23:37, David Rowley  wrote:

> Though this plan might not be quite as optimal as it could be as it performs
> the grouping after the join.

PostgreSQL always calculates aggregation as the last step.

It's a well known optimisation to push-down GROUP BY clauses to the
lowest level, but we don't do that, yet.

You're right that it can make a massive difference to many queries.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool

2012-12-06 Thread Andres Freund

Hi,

I tried to address most (all?) your comments in the version from
http://archives.postgresql.org/message-id/20121204175212.GB12055%40awork2.anarazel.de
.

On 2012-11-15 11:31:55 -0500, Peter Eisentraut wrote:
> > +xlogdump: $(OBJS) $(shell find ../../backend ../../timezone -name 
> > objfiles.txt|xargs cat|tr -s " " "\012"|grep -v /main.o|sed 
> > 's/^/..\/..\/..\//')
> > +   $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
>
> This looks pretty evil, and there is no documentation about what it is
> supposed to do.
>
> Windows build support needs some thought.

Ok, since Alvaro made it possible the build now only has rules like:

xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/%
  rm -f $@ && $(LN_S) $< .

clogdesc.c: % : $(top_srcdir)/src/backend/access/rmgrdesc/%
rm -f $@ && $(LN_S) $< .
and
OBJS = \
 clogdesc.o dbasedesc.o gindesc.o gistdesc.o hashdesc.o heapdesc.o \
 mxactdesc.o nbtdesc.o relmapdesc.o seqdesc.o smgrdesc.o spgdesc.o \
 standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o assert.o \
 $(WIN32RES) \
 pg_xlogdump.o pqexpbuf_strinfo.o compat.o tables.o xlogreader.o \

pg_xlogdump: $(OBJS) | submake-libpq submake-libpgport
 $(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX)  $(LIBS) 
$(libpq_pgport) -o $@$(X)

Thats easier easier to integrate into the windows build?

> > +static void
> > +usage(void)
> > +{
> > +   printf(_("%s reads/writes postgres transaction logs for 
> > debugging.\n\n"),
> > +  progname);
> > +   printf(_("Usage:\n"));
> > +   printf(_("  %s [OPTION]...\n"), progname);
> > +   printf(_("\nOptions:\n"));
> > +   printf(_("  -v, --version  output version information, then 
> > exit\n"));
> > +   printf(_("  -h, --help show this help, then exit\n"));
> > +   printf(_("  -s, --startfrom where recptr onwards to 
> > read\n"));
> > +   printf(_("  -e, --end  up to which recptr to read\n"));
> > +   printf(_("  -t, --timeline which timeline do we want to 
> > read\n"));
> > +   printf(_("  -i, --inpath   from where do we want to read? 
> > cwd/pg_xlog is the default\n"));
> > +   printf(_("  -o, --output   where to write [start, end]\n"));
> > +   printf(_("  -f, --file wal file to parse\n"));
> > +}
>
> Options list should be in alphabetic order (or some other less random
> order).  Most of these descriptions are not very intelligible (at least
> without additional documentation).

I tried to improve the help, its now:

pg_xlogdump: reads/writes postgres transaction logs for debugging.

Usage:
  pg_xlogdump [OPTION]...

Options:
  -b, --bkp-details  output detailed information about backup blocks
  -e, --end RECPTR   read wal up to RECPTR
  -f, --file FILEwal file to parse, cannot be specified together with -p
  -h, --help show this help, then exit
  -p, --path PATHfrom where do we want to read? cwd/pg_xlog is the 
default
  -s, --start RECPTR read wal in directory indicated by -p starting at 
RECPTR
  -t, --timeline TLI which timeline do we want to read, defaults to 1
  -v, --version  output version information, then exit

I wonder whether it would make sense to split the help into different
sections? It seems likely we will gain some more options...

> no nls.mk

Do I need to do anything for that besides:

# src/bin/pg_xlogdump/nls.mk
CATALOG_NAME = pg_xlogdump
AVAIL_LANGUAGES  =
GETTEXT_FILES= pg_xlogdump.c

?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2012-12-06 Thread Kevin Grittner

Robert Haas wrote:
> Jeff Davis  wrote:
>> Or, I could write up a test framework in ruby or python, using
>> the appropriate pg driver, and some not-so-portable shell
>> commands to start and stop the server. Then, I can publish that
>> on this list, and that would at least make it easier to test
>> semi-manually and give greater confidence in pre-commit
>> revisions.
> 
> That latter approach is similar to what happened with SSI's
> isolation tester. It started out in Python, and then Heikki
> rewrote it in C.
> If Python/Ruby code is massively simpler to write than the C
> code, that might be a good way to start out. It'll be an aid to
> reviewers even if neither it nor any descendent gets committed.
> 
> Frankly, I think some automated testing harness (written in C or
> Perl) that could do fault-injection tests as part of the
> buildfarm would be amazingly awesome. I'm drooling just thinking
> about it. But I guess that's getting ahead of myself.

There may be room for both.

My experience was that the dtester tool from Markus made it
relatively easy for me to hack up new tests which gave detailed
information about which permutations were behaving as desired,
which were known not to be covered, and which had regressions.
That speed of adding new tests and detail about improvements or
regressions allowed faster development than would have been
possible with the isolation tester that Heikki wrote in C.

On the other hand, dtester requires python (in fact, I think it
requries python version 2.x were x is 5 or greater), a requirement
which I don't think we want to add for builds. It wasn't very
compatible with the normal make check environment, either in how it
was run or in its output. And it was much slower than the isolation
test framework -- like by about an order of magnitude.

So for a completed product on which you want to test for
regressions, the isolation tester is much better. For a development
effort on the scale of SSI, I would want to have dtester or
something very like it available.

Neither one quite handles tests for all the types of concurrency
conditions that one might want. I had some idea how to add some
additonal useful cases to dtester, and it didn't look outrageously
hard. I haven't really looked at how to do that in the insolation
tester, so I don't know how hard it would be there.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange isolation test buildfarm failure on guaibasaurus

2012-12-06 Thread Tom Lane

Christian Ullrich  writes:
> * Tom Lane wrote:
>> We really need to scare up another buildfarm member to run with
>> -DCLOBBER_CACHE_ALWAYS, now that jaguar has stopped doing so.

> I would be happy to do that on jaguarundi, in exchange for dialing down 
> the build frequency from hourly to something a bit less ambitious. That 
> machine is mostly idle anyway.

Sure, once or twice a day would be fine.  (I think jaguar was only doing
it once a day.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependency in GROUP BY through JOINs

2012-12-06 Thread Kevin Grittner

David Rowley wrote:

> If we wanted to see the sales per product we could write
> something like this:

> SELECT p.product_code,SUM(s.quantity)
> FROM products p
> INNER JOIN bigsalestable s ON p.productid = s.productid
> GROUP BY p.product_code;

> Though this plan might not be quite as optimal as it could be as
> it performs the grouping after the join.

> Of course the query could have been written in the first place
> as:

> SELECT p.product_code,s.quantity
> FROM products AS p
> INNER JOIN (SELECT productid,SUM(quantity) AS quantity
>             FROM bigsalestable GROUP BY productid) AS s
>   ON p.productid = s.productid;

> And that would have given us a more efficient plan.

> Of course, for these actual plans to be equivalent there would
> naturally have to be a unique index on product_code in the
> products table.
> 
> I think I'm right in thinking that if a unique index exists to
> match the group by clause, and the join condition is equality
> (probably using the same operator class as the unique btree
> index?), then the grouping could be pushed up to before the join.

Off-hand, it seems equivalent to me; I don't know how much work it
would be.

Out of curiosity, does the first query's plan change if you run
this instead?:

SELECT s.product_code,SUM(s.quantity)
FROM products p
INNER JOIN bigsalestable s ON p.productid = s.productid
GROUP BY s.product_code;

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for hint bit i/o mitigation

2012-12-06 Thread Merlin Moncure

On Thu, Dec 6, 2012 at 3:59 AM, Amit Kapila  wrote:
> On Thursday, November 22, 2012 3:00 AM Greg Smith wrote:
>> On 11/16/12 9:03 AM, Merlin Moncure wrote:
>> > Atri ran some quick n dirty tests to see if there
>> > were any regressions.  He benched a large scan followed by vacuum.  So
>> > far, results are inconclusive so better testing methodologies will
>> > definitely be greatly appreciated.  One of the challenges with working
>> > in this part of the code is that it's quite difficult to make changes
>> > without impacting at least some workloads.
>>
>> I'm adding something to pgbench-tools to test for some types of vacuum
>> regressions that I've seen before.  And the checksum benchmarking I've
>> already signed up for overlaps with this one quite a bit.  I'd suggest
>> reviewers here focus on code quality, correctness, and targeted
>> optimization testing.  I'm working heavily on a larger benchmarking
>> framework that both this and checksums will fit into right now.
>
> We are planning below performance tests for hint-bit I/O mitigation patch:
>
> Test case -1: Select performance in sequential scan and vacuum operation
> with I/O statistics
> Bulk operations are happened in multiple transactions.
> 1. Stop the auto-vacuum.
> 2. Create table
> 3. Insert 1 records in one transaction for 1000 times.
> 4A. Use pgbench to select all the records using sequential scan for 5
> min  - 8 threads.
> 4B. Record the IO statistics.
> 5. After completion of test-case check VACUUM performance.
>
> Test case -2:
> Select performance in index scan and vacuum operation with I/O
> statistics
> Same as testcase - 1 change the 4A as follows
> 4A. Use pgbench with -F option to select random records using
> index scan for 5 min  - 8 threads.
>
> Test case -3:
> Select performance in sequential scan and vacuum operation with I/O
> statistics
> When bulk operations are happened in one transaction.
> 1. Stop the auto-vacuum.
> 2. Create table
> 3. Insert 10,000,000 times.
> 4A. Use pgbench to select all the records using sequential scan for 5
> min  - 8 threads.
> 4B. Record the IO statistics.
> 5. After completion of test-case check VACUUM performance.
>
> Test case -4:
> Select performance in index scan and vacuum operation with I/O
> statistics
> Same as testcase - 3 change the 4A as follows
> 4A. Use pgbench to select random records using index scan for 5
> min  - 8 threads.
>
> Test case -5:
> Check original pgbench performance & vacuum operation
> 1. For  select only and tcp_b  performance suites with scale factor of
> 75 & 150, threads 8 &16
>
> Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty
> only for setting the hit bits. )
> 1. Session - 1 : Open a some long transaction
>
> 2. Session - 2 : Insert some records & commit
> Do the select - all the tuples.
> Checkpoint;
> Vacuum the table
> Checkpoint;
> 3. Record the IO statistics & time taken for Vacuum & 2nd
> Checkpoint.
>
> Test case -7: (This is also to check Vacuum I/O)
> 1. Have replication setup.
> 2. Insert some records & commit
> 3. Vacuum the table
> 4. Upgrade the standby.
> 5. Select the all the tuples on new master
> 6. Vacuum the tbl on new master.
> 6B. Record the IO statistics & time taken for  Vacuum on new
> master.

Thanks for that -- that's fairly comprehensive I'd say.  I'm quite
interested in that benchmarking framework as well.  Do you need help
setting up the scripts?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 116 matches

Mail list logo