Re: [HACKERS] branching for 9.2devel

2011-05-01 Thread Pavan Deolasee
On Tue, Apr 26, 2011 at 2:25 AM, Andrew Dunstan  wrote:

>
>
> On 04/25/2011 04:28 PM, Tom Lane wrote:
>
>> Andrew Dunstan  writes:
>>
>>> On 04/25/2011 03:30 PM, Tom Lane wrote:
>>>
 *Ouch*.  Really?  It's hard to believe that anyone would consider it
 remotely usable for more than toy-sized projects, if you have to list
 all the typedef names on the command line.

>>> Looks like BSD does the same. It's just that we hide it in pgindent:
>>>
>> Oh wow, I never noticed that.  That's going to be a severe problem for
>> the "run it anywhere" goal.  The typedefs list is already close to 32K,
>> and is not going anywhere but up.  There are already platforms on which
>> a shell command line that long will fail, and I think once we break past
>> 32K we might find it failing on even pretty popular ones.
>>
>>
>>
>
>
> Well, my solution would be to replace pgindent with a perl script (among
> other advantages, it would then run everywhere we build, including Windows),
>  and filter the typedefs list so that we only use the ones that appear in
> each file with that file, instead of passing the whole list to each file.
>
>
Can we not setup a automatic mechanism where a submitter can send a patch to
some email id, the patch gets applied on the current HEAD, pgindent is run
and the new patch is sent back to the submitter who can then submit it to
the hackers for review. If the patch does not apply cleanly, the same can
also be emailed back to the submitter.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Select For Update and Left Outer Join

2011-05-01 Thread Tom Lane
Patrick Earl  writes:
> On Sun, May 1, 2011 at 9:00 PM, Tom Lane  wrote:
>> Addition of new rows certainly isn't supposed to be prevented by a
>> SELECT FOR UPDATE, but that's not the problem here.  What I *do* expect
>> a SELECT FOR UPDATE to promise is that the rows it did return can't
>> change or be deleted by someone else for the life of my transaction.

> While I don't have access to the SQL standard to check if the
> semantics are documented, the semantics don't seem clear cut.

The specific syntax of SELECT FOR UPDATE isn't in the standard at all
--- the standard considers FOR UPDATE to be a clause you can attach to
DECLARE CURSOR, not a bare SELECT, and also the items that can be
mentioned in FOR UPDATE OF are individual column names not table names.
But ignoring that little detail, as best I can tell the standard only
allows FOR UPDATE to be applied to columns for which the cursor output
is guaranteed to be one-to-one with elements of the underlying table,
ie you could say UPDATE tab SET col =  WHERE CURRENT OF 
and expect that a single well-defined table cell would get updated.
This is certainly not the case for columns in the nullable side of an
outer join, where there might be no existing cell that could be updated.
The question of whether the cell is locked against concurrent updates
isn't something that the spec addresses, so far as I've ever seen; but
it is perfectly clear that there should be something there that could be
updated.  So your proposal doesn't work from that standpoint either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Select For Update and Left Outer Join

2011-05-01 Thread Patrick Earl
On Sun, May 1, 2011 at 9:00 PM, Tom Lane  wrote:
> Addition of new rows certainly isn't supposed to be prevented by a
> SELECT FOR UPDATE, but that's not the problem here.  What I *do* expect
> a SELECT FOR UPDATE to promise is that the rows it did return can't
> change or be deleted by someone else for the life of my transaction.
> This is not possible to promise for null-extended rows unless you
> somehow lock out addition of a matching row on the inside of the join.
> Without that, a row that looked like  when you
> selected it might suddenly start looking like 
> due to someone else's modification.  And after that, since you still
> haven't got a lock on the cat row, the cat fields could keep on
> changing.
>
> I'm prepared to believe that there are some applications where that
> can't happen due to other interlocking, or doesn't matter to the
> application, but SELECT FOR UPDATE really can't assume that.  I think
> what you're proposing is to fundamentally break the semantics of SELECT
> FOR UPDATE for the sake of convenience.

While I don't have access to the SQL standard to check if the
semantics are documented, the semantics don't seem clear cut.  The
question is whether the thing that you've promised won't change is the
result row of the query or the source rows that built the result.  I
would like to know how other databases handle this, but I do know that
it works on MSSQL.  Perhaps it's using something like the source row
locking semantics I mentioned.

> You didn't explain exactly why your application doesn't care about this,
> but I wonder whether it's because you know that a lock on the parent
> "pet" row is sufficient due to application coding rules.  If so, you
> could just tell SELECT FOR UPDATE to only lock the "pet" rows, and be
> happy:
>
>        select * from pet left join cat ... for update of pet;

If I select a collection that contains both dogs and cats, run a
polymorphic operation that affects dogs and cats differently, then
save my results back, I would need locks on all rows, not just the pet
information.  When "parts" of a "row" are stored in different tables,
the results from both tables need to be locked.

I see a couple possible workarounds, neither of which are particularly
appealing:
* Run many queries, being careful to ensure the appropriate rows
from each table are locked.
* Lock all related tables.

Perhaps the syntax could be extended to indicate to the database that
it's safe to lock on just the selected rows.

   select * from pet left join cat ... for update of pet, cat;

On a conceptual level, I still tend to think of select for update as
"locking the data returned by the query."  If no data is returned by
the query, I don't get a lock on that non-data.  Is this an area that
is covered by the standard?

Thanks for your thoughts on this Tom.  Your time is much appreciated.

Patrick Earl

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Select For Update and Left Outer Join

2011-05-01 Thread Tom Lane
Patrick Earl  writes:
> On Sun, May 1, 2011 at 4:05 PM, Tom Lane  wrote:
>> Quite.  What would it mean to lock the absence of a row?

> I would argue that SELECT FOR UPDATE never locks on the absence of a
> row.  For example, if I do:
>   SELECT * FROM Table WHERE Column = 10
> The existing rows are locked, but somebody could come along and add
> another unlocked row with Column = 10.

Addition of new rows certainly isn't supposed to be prevented by a
SELECT FOR UPDATE, but that's not the problem here.  What I *do* expect
a SELECT FOR UPDATE to promise is that the rows it did return can't
change or be deleted by someone else for the life of my transaction.
This is not possible to promise for null-extended rows unless you
somehow lock out addition of a matching row on the inside of the join.
Without that, a row that looked like  when you
selected it might suddenly start looking like 
due to someone else's modification.  And after that, since you still
haven't got a lock on the cat row, the cat fields could keep on
changing.

I'm prepared to believe that there are some applications where that
can't happen due to other interlocking, or doesn't matter to the
application, but SELECT FOR UPDATE really can't assume that.  I think
what you're proposing is to fundamentally break the semantics of SELECT
FOR UPDATE for the sake of convenience.

You didn't explain exactly why your application doesn't care about this,
but I wonder whether it's because you know that a lock on the parent
"pet" row is sufficient due to application coding rules.  If so, you
could just tell SELECT FOR UPDATE to only lock the "pet" rows, and be
happy:

select * from pet left join cat ... for update of pet;

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing collapse_limits?

2011-05-01 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Joshua Berkus wrote:

> I'm not comfortable with increasing the default, yet.  While folks on 
> dedicated good hardware can handle a collapse of 10-12 joins, a lot 
> of people are running PostgreSQL on VMs these days whose real CPU 
> power is no better than a Pentium IV.

Really? First, I don't think that's true, the average CPU power 
is much higher than that. Second, this sounds like the 'ol 
"tune it for a toaster" trap where we never make improvements 
to the defaults because someone, somewhere, might *gasp* use 
Postgres on an underpowered server.

> Also, if you're doing OLTP queries on small tables, spending 20ms 
> planning a query is unreasonably slow in a way it is not for a 
> DW query.

Again, seriously? Do you have numbers to back that up?

I could see not going to 16 right away, but who would honestly have a 
problem with going to 10? I agree with Tom, let's bump this up a 
little bit and see what happens. My guess is that we won't see a 
single post in which we advise people to drop it down from 10 to 8. 
Personally, I'd like to see them go to 12, as that's the best sweet 
spot I've seen in the field, but I'll take 10 first. :)

Tom Lane asked re setting to 10:
> Don't know how much difference that would make in the real world though.

I've seen a handful of cases that have benefitted from 10, but many 
more* that benefitted from 12 (*okay, a larger handful anyway, it's not 
like I have to adjust it too often).

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201105012153
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk2+DqsACgkQvJuQZxSWSshRfQCgzX5JlnCmKTndA7WcF/mt0Kpk
b30AoLKrVKMm0rbZNNhgVjt/Xne4NDpj
=0deF
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-01 Thread Robert Treat
On Sun, May 1, 2011 at 1:14 PM, Kevin Grittner
 wrote:
> Robert Treat  wrote:
>> Tom Lane  wrote:
>
>>>        CF #1: June 1-30
>>>        CF #2: August 1-31
>>>        CF #3: October 1-31
>>>        CF #4 (one week shortened CF): December 1-7
>>>        CF #5: January 1-31
>>>
>>> I think the main thing we have to think about before choosing is
>>> whether we believe that we can shorten the CFs at all.  Josh's
>>> proposal had 3-week CFs after the first one, which makes it a lot
>>> easier to have a fest in November or December, but only if you
>>> really can end it on time.
>>
>> If we made the "deadline" for patch acceptance into 9.2 CF#4, then
>> shortening that to a two week cycle whose main goal was simply to
>> sanity check patches for 9.2 would probably work. Most would
>> probably still need further work, which we would expect to get
>> handled in the final, full CF#5, but we wouldn't let anything new
>> come into CF#5. This way when we get the 100 patch pile up in
>> CF#4, there's no expectation that those patches will be committed,
>> just that they can be sanity checked for the 9.2 release.
>
> Which makes it not really a CommitFest, but rather ... a SanityFest?
>
> To make sure I understand you, you're suggesting no WIP patch review
> in the last two CFs?  (Of course nothing stops someone from looking
> at someone else's WIP between fests.)  Would a patch submitted to
> #4, the sanity of which was questioned, be eligible for another try
> in #5.
>

I think you can have WIP patches for both CF#4 and CF#5. What we're
hoping to get from CF#4 is a better scope on the number of patches we
might have to get 9.2 out the door. WRT patches whose sanity is
questioned, I'd presume that  questioning would have a list of
specific complaints, so if you address those between CF#4 and CF#5, I
don't see why you can't try again.

Robert Treat
play: xzilla.net
work: omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Select For Update and Left Outer Join

2011-05-01 Thread Patrick Earl
On Sun, May 1, 2011 at 4:05 PM, Tom Lane  wrote:
> Patrick Earl  writes:
>> The query to get all the pets is as follows:
>
>> select * from Pet
>> left join Dog on Dog.Id = Pet.Id
>> left join Cat on Cat.Id = Pet.Id
>
>> Now suppose you want to lock to ensure that your Cat is not updated
>> concurrently.  You add FOR UPDATE, but then PostgreSQL gets upset and
>> complains that locking on the nullable side of an outer join is not
>> allowed.
>
> Quite.  What would it mean to lock the absence of a row?
>
>> From our data model, we know that for every single Pet, there can
>> never be a Dog or Cat that spontaneously appears, so locking in this
>> case is totally safe.
>
> You might know that, but you didn't explain how you know that or how
> the database could be expected to know it.
>
>                        regards, tom lane
>

I would argue that SELECT FOR UPDATE never locks on the absence of a
row.  For example, if I do:
  SELECT * FROM Table WHERE Column = 10
The existing rows are locked, but somebody could come along and add
another unlocked row with Column = 10.

Whether I'm specifying a secondary set of records (via a criteria
involving a join) or a primary set of records (via a regular where
criteria), FOR UPDATE always allows new rows to appear.

The fact that new rows are not locked is common and expected.  Whether
they appear in the result set due to the join or due to the where
clause seems unimportant to me.

Patrick Earl

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange view performance

2011-05-01 Thread Tom Lane
Pavel Stehule  writes:
> I am solving  a strange situation, where using a view is slower than
> using same tables directly.

> The view is defined as

> CREATE VIEW v1 AS
>   SELECT *
> FROM A
>LEFT JOIN B
>LEFT JOIN C
>LEFT JOIN D

> and query is
>SELECT *
>   FROM T
>  LEFT JOIN v

> this query is slower than:

> SELECT *
>FROM T
>   LEFT JOIN A
>   LEFT JOIN B
>   LEFT JOIN C
>   LEFT JOIN D

> Is there a some reason for this behave?

Well, they don't necessarily mean the same thing --- these are only
logically equivalent if the left joins all commute, which would depend
on the ON conditions.

> [ EXPLAIN outputs ]

But I also notice that you are using collapse/geqo limits of 12 for
queries that involve 13 base relations, so that'd mean that syntactic
differences could lead to plan differences too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Select For Update and Left Outer Join

2011-05-01 Thread Tom Lane
Patrick Earl  writes:
> The query to get all the pets is as follows:

> select * from Pet
> left join Dog on Dog.Id = Pet.Id
> left join Cat on Cat.Id = Pet.Id

> Now suppose you want to lock to ensure that your Cat is not updated
> concurrently.  You add FOR UPDATE, but then PostgreSQL gets upset and
> complains that locking on the nullable side of an outer join is not
> allowed.

Quite.  What would it mean to lock the absence of a row?

> From our data model, we know that for every single Pet, there can
> never be a Dog or Cat that spontaneously appears, so locking in this
> case is totally safe.

You might know that, but you didn't explain how you know that or how
the database could be expected to know it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIREAD lock versus ACCESS EXCLUSIVE lock

2011-05-01 Thread Kevin Grittner
"Kevin Grittner"  wrote:
 
> I haven't dug into ALTER INDEX enough to know whether it can ever
> cause an index to be rebuilt.  If so, we need to treat it like
> DROP INDEX and REINDEX -- which should change all predicate locks
> of any granularity on the index into relation locks on the
> associated table.
>  
> CLUSTER or an ALTER TABLE which causes a rewrite should change all
> predicate locks on the table and all indexes into relation locks
> on the associated table.  (Obviously, an existing relation lock on
> the table doesn't require any action.)
>  
> TRUNCATE TABLE and DROP TABLE should generate a rw-conflict *in*
> to the enclosing transaction (if it is serializable) from all
> transactions holding predicate locks on the table or its indexes.
> Note that this could cause a transactions which is running one of
> these statements to roll back with a serialization error. This
> seems correct to me, since these operations essentially delete all
> rows.  If you don't want the potential rollback, these operations
> should be run at another isolation level.  The difference between
> these two statements is that I think that TRUNCATE TABLE should
> also move the existing predicate locks to relation locks on the
> table while DROP TABLE (for obvious reasons) should just delete
> the predicate locks.
>  
> DROP DATABASE should quietly clean up any predicate locks from
> committed transactions which haven't yet hit their cleanup point
> because of overlapping transactions in other databases.
 
I missed VACUUM FULL when pulling together the above, but I haven't
found any other omissions.  (Still happy to hear about any that
anyone can spot.)
 
I notice that most of these need to shift transfer locks to relation
locks on the heap, either for a single index or for the heap and all
indexes.  I wrote a function to do this and called it from one place
to be able to test it.  Consider this a WIP patch on which I would
appreciate review while I work on finding the other places to call
it and the miscellaneous other fixes needed.
 
Note that I had to expose one previously-static function from
index.c to find the heap OID from the index OID.  Also, I ran
pgindent against predicate.c, as I generally like to do when I
modify much code, and it found four comment blocks in predicate.c
touched since the recent global pgindent run which it re-wrapped.
I can manually exclude those from the final patch if people would
prefer that; but if people can ignore those whitespace tweaks, it
might not be all bad to get standard formatting onto them at this
point.
 
-Kevin

*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***
*** 54,59 
--- 54,60 
  #include "parser/parser.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
+ #include "storage/predicate.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
  #include "utils/builtins.h"
***
*** 114,120  static void validate_index_heapscan(Relation heapRelation,
IndexInfo *indexInfo,
Snapshot snapshot,
v_i_state *state);
- static OidIndexGetRelation(Oid indexId);
  static void SetReindexProcessing(Oid heapOid, Oid indexOid);
  static void ResetReindexProcessing(void);
  static void SetReindexPending(List *indexes);
--- 115,120 
***
*** 2721,2727  validate_index_heapscan(Relation heapRelation,
   * IndexGetRelation: given an index's relation OID, get the OID of the
   * relation it is an index on.Uses the system cache.
   */
! static Oid
  IndexGetRelation(Oid indexId)
  {
HeapTuple   tuple;
--- 2721,2727 
   * IndexGetRelation: given an index's relation OID, get the OID of the
   * relation it is an index on.Uses the system cache.
   */
! Oid
  IndexGetRelation(Oid indexId)
  {
HeapTuple   tuple;
***
*** 2782,2787  reindex_index(Oid indexId, bool skip_constraint_checks)
--- 2782,2793 
 */
CheckTableNotInUse(iRel, "REINDEX INDEX");
  
+   /*
+* All predicate locks on the index are about to be made invalid.
+* Promote them to relation locks on the heap.
+*/
+   TransferPredicateLocksToHeapRelation(iRel);
+ 
PG_TRY();
{
/* Suppress use of the target index while rebuilding it */
*** a/src/backend/storage/lmgr/predicate.c
--- b/src/backend/storage/lmgr/predicate.c
***
*** 185,190 
--- 185,191 
  #include "access/twophase.h"
  #include "access/twophase_rmgr.h"
  #include "access/xact.h"
+ #include "catalog/index.h"
  #include "miscadmin.h"
  #include "storage/bufmgr.h"
  #include "storage/predicate.h"
***
*** 975,982  InitPredicateLocks(void)
boolfound;
  
/*
!* Compute size of predicate lock target hashtable.
!* Note these calculations

Re: [HACKERS] Proposed patch: Smooth replication during VACUUM FULL

2011-05-01 Thread Simon Riggs
On Sun, May 1, 2011 at 6:51 PM, Tom Lane  wrote:
> Simon Riggs  writes:
>> On Sat, Apr 30, 2011 at 11:48 PM, Tom Lane  wrote:
>>> A bigger objection to this patch is that it seems quite incomplete.
>>> I'm not sure there's much point in adding delays to the first loop of
>>> copy_heap_data() without also providing for delays inside the sorting
>>> code and the eventual index rebuilds; which will make the patch
>>> significantly more complicated and invasive.
>
>> The patch puts the old behaviour of vacuum delay back into VACUUM FULL
>> and seems worth backpatching to 9.0 and 9.1 to me, since it is so
>> simple.
>
> No, it does perhaps 1% of what's needed to make the new implementation
> react to vacuum_cost_delay in a useful way.  I see no point in applying
> this as-is, let alone back-patching it.

Gabriele's test results show the opposite. It is clearly very effective.

Please look again at the test results.

I'm surprised that the reaction isn't "good catch" and the patch
quickly applied.


>> Previously there wasn't any delay in the sort or indexing either, so
>> it's a big ask to put that in now and it would also make backpatching
>> harder.
>
> You're missing the point: there wasn't any sort or reindex in the old
> implementation of VACUUM FULL.  The CLUSTER-based implementation makes
> use of very large chunks of code that were simply never used before
> by VACUUM.
>
>>> Another question is whether this is the right place to be looking
>>> at all.  If Gabriele's setup can't keep up with replication when a
>>> VAC FULL is running, then it can't keep up when under load, period.
>>> This seems like a pretty band-aid-ish response to that sort of problem.
>
>> This isn't about whether the system can cope with the load, its about
>> whether replication lag is affected by the load.
>
> And I think you're missing the point here too.  Even if we cluttered
> the system to the extent of making all steps of VACUUM FULL honor
> vacuum_cost_delay, it wouldn't fix Gabriele's problem, because any other
> I/O-intensive query would produce the same effect.  We could further
> clutter everything else that someone defines as a "maintenance query",
> and it *still* wouldn't fix the problem.  It would be much more
> profitable to attack the performance of replication directly.

I don't think the performance of replication is at issue. This is
about resource control.

The point is that replication is quite likely to be a high priority,
especially with sync rep. Speeding up replication is not the same
thing as rate limiting other users to enforce a lower priority, which
is what vacuum_delay does.

> I don't
> really feel a need to put cost_delay stuff into anything that's not used
> by autovacuum.

It seems reasonable to me to hope that one day we might be able to
specify that certain tasks have a lower rate of resource usage than
others.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing collapse_limits?

2011-05-01 Thread Robert Haas
On Apr 30, 2011, at 10:21 PM, Tom Lane  wrote:
> It also occurs to me to wonder if we could adjust the limit on-the-fly
> based on noticing whether or not the query is prone to worst-case
> behavior, ie how dense is the join connection graph.

I've had this thought - or a similar one - before also. I am not sure how to 
make it work mechanically but I think it would be tremendous if we could make 
it work. For most people, my previous naive suggestion (remove the limit 
entirely) would actually work fine, BUT if you hit the problem cases then even 
a small increase is too much. So I don't really think increasing the limit will 
eliminate  the need for manual fiddling - what we really need to do is come up 
with a more accurate measure of measure of complexity than "number of tables".

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-01 Thread Robert Haas
On May 1, 2011, at 9:34 PM, "Kevin Grittner"  
wrote:
> Joshua Berkus  wrote:
> 
>> Generally the last week only has 1-3 patches open
> 
> The last CF I managed the end of the third week looked like this:
> 
> http://archives.postgresql.org/pgsql-hackers/2010-08/msg00334.php
> 
> That is, we had 15 patches still pending out of 72 submitted:
> 
> 9 ready for committer
> 1 waiting on author
> 5 needing review
> 
> If you want to view it as a *commit* fest, that is really 15 to go. 
> If you're viewing it as a *review* fest, those six broke down:
> 
> 3 were patches submitted by committers (1 of which was WIP)
> 1 other was WIP
> 1 was down to tweaking docs
> 1 got a review the next day, showing it wasn't ready
> 
> So we either need to markedly increase the pace of CFs (which is
> hard without more reviewers unless we provide "brisker" review and
> kick things back a lot faster) or we need to stop thinking that the
> goal is to get them *committed* during the CommitFest; but I thought
> that was kinda the point.

+1.

...Robert

Re: [HACKERS] branching for 9.2devel

2011-05-01 Thread Kevin Grittner
Joshua Berkus  wrote:
 
> Generally the last week only has 1-3 patches open
 
The last CF I managed the end of the third week looked like this:
 
http://archives.postgresql.org/pgsql-hackers/2010-08/msg00334.php
 
That is, we had 15 patches still pending out of 72 submitted:
 
9 ready for committer
1 waiting on author
5 needing review
 
If you want to view it as a *commit* fest, that is really 15 to go. 
If you're viewing it as a *review* fest, those six broke down:
 
3 were patches submitted by committers (1 of which was WIP)
1 other was WIP
1 was down to tweaking docs
1 got a review the next day, showing it wasn't ready
 
So we either need to markedly increase the pace of CFs (which is
hard without more reviewers unless we provide "brisker" review and
kick things back a lot faster) or we need to stop thinking that the
goal is to get them *committed* during the CommitFest; but I thought
that was kinda the point.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing the continuation-line prompt in psql?

2011-05-01 Thread Joshua Berkus
Dimitri,

> > I'll bet someone a fancy drink at a conference that this thread goes
> > to at least 100 posts.
> 
> Of course, if we all are to argue about this bet… :)

Darn!  You've uncovered by sinister plan.  Foiled again!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing collapse_limits?

2011-05-01 Thread Joshua Berkus

Pavel,

> Actually we had to solve a issue with slow SELECT. The problem was in
> low value of JOIN_COLLAPSE_LIMITS. Can we increase a default of this
> value. I checked some complex query, and planner needed about 200ms
> for JOIN_COLLAPSE_LIMIT = 16. So some around 12 can be well.

I'm not comfortable with increasing the default, yet.  While folks on dedicated 
good hardware can handle a collapse of 10-12 joins, a lot of people are running 
PostgreSQL on VMs these days whose real CPU power is no better than a Pentium 
IV.  Also, if you're doing OLTP queries on small tables, spending 20ms planning 
a query is unreasonably slow in a way it is not for a DW query.

It does make a reasonable piece of advice for those tuning for DW, though.  
I'll add it to my list.

Speaking of which, what happened to replacing GEQO with Simulated Annealing?  
Where did that project go?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-01 Thread Joshua Berkus


> I think the main thing we have to think about before choosing is
> whether
> we believe that we can shorten the CFs at all. Josh's proposal had
> 3-week CFs after the first one, which makes it a lot easier to have a
> fest in November or December, but only if you really can end it on
> time.

I think that 3 weeks is doable.  Generally by the last week of all of the CF 
except for the last one, we're largely waiting on either (a) authors who are 
slow to respond, (b) patches which are really hard to review, or (c) arguing 
out spec stuff on -hackers.  Generally the last week only has 1-3 patches open, 
and any of these things could be grounds for booting to the next CF anyway or 
working on the patches outside the CF.  For really hard patches (like Synch 
Rep) those things don't fit into the CF cycle anyway.

I'm not convinced that shorter than 3 weeks is doable, at least not without 
changing to a model of binary accept-or-reject.  Communications speeds are too 
slow and reviewer's availability is too random.

> In addition to the fun of working around the holiday season, perhaps
> we should also consider how much work we're likely to get out of
> people
> in the summer. Is it going to be useful to schedule a fest in either
> July or August? Will one month be better than the other?

Doesn't make a difference, both are equally bad.  However, if we're short on 
European reviewers, at least we'll be able to punt European patches immediately 
because the authors won't be answering their e-mail.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed patch: Smooth replication during VACUUM FULL

2011-05-01 Thread Tom Lane
Simon Riggs  writes:
> On Sat, Apr 30, 2011 at 11:48 PM, Tom Lane  wrote:
>> A bigger objection to this patch is that it seems quite incomplete.
>> I'm not sure there's much point in adding delays to the first loop of
>> copy_heap_data() without also providing for delays inside the sorting
>> code and the eventual index rebuilds; which will make the patch
>> significantly more complicated and invasive.

> The patch puts the old behaviour of vacuum delay back into VACUUM FULL
> and seems worth backpatching to 9.0 and 9.1 to me, since it is so
> simple.

No, it does perhaps 1% of what's needed to make the new implementation
react to vacuum_cost_delay in a useful way.  I see no point in applying
this as-is, let alone back-patching it.

> Previously there wasn't any delay in the sort or indexing either, so
> it's a big ask to put that in now and it would also make backpatching
> harder.

You're missing the point: there wasn't any sort or reindex in the old
implementation of VACUUM FULL.  The CLUSTER-based implementation makes
use of very large chunks of code that were simply never used before
by VACUUM.

>> Another question is whether this is the right place to be looking
>> at all.  If Gabriele's setup can't keep up with replication when a
>> VAC FULL is running, then it can't keep up when under load, period.
>> This seems like a pretty band-aid-ish response to that sort of problem.

> This isn't about whether the system can cope with the load, its about
> whether replication lag is affected by the load.

And I think you're missing the point here too.  Even if we cluttered
the system to the extent of making all steps of VACUUM FULL honor
vacuum_cost_delay, it wouldn't fix Gabriele's problem, because any other
I/O-intensive query would produce the same effect.  We could further
clutter everything else that someone defines as a "maintenance query",
and it *still* wouldn't fix the problem.  It would be much more
profitable to attack the performance of replication directly.  I don't
really feel a need to put cost_delay stuff into anything that's not used
by autovacuum.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed patch: Smooth replication during VACUUM FULL

2011-05-01 Thread Simon Riggs
On Sat, Apr 30, 2011 at 11:48 PM, Tom Lane  wrote:
> Jaime Casanova  writes:
>> On Sat, Apr 30, 2011 at 1:19 PM, Gabriele Bartolini
>>> I have noticed that during VACUUM FULL on reasonably big tables, replication
>>> lag climbs. In order to smooth down the replication lag, I propose the
>>> attached patch which enables vacuum delay for VACUUM FULL.
>
>> AFAICS, the problem is that those operations involve the rebuild of
>> tables, so we can't simply stop in the middle and wait because we will
>> need to hold a strong lock more time... also the patch seems to be
>> only doing something for CLUSTER and not for VACUUM FULL.
>> or am i missing something?
>
> No, actually it would have no effect on CLUSTER because VacuumCostActive
> wouldn't be set.

Ah, good point, so the patch is accurate even though very short.


> I think this is basically fixing an oversight in the
> patch that changed VACUUM FULL into a variant of CLUSTER.  We used to
> use vacuum_delay_point() in the main loops in old-style VACUUM FULL,
> but forgot to consider doing so in the CLUSTER-ish implementation.
> The argument about holding locks longer doesn't seem relevant to me:
> enabling delays during VACUUM FULL would've had that effect in the old
> implementation, too, but nobody ever complained about that, and besides
> the feature isn't enabled by default.
>
> A bigger objection to this patch is that it seems quite incomplete.
> I'm not sure there's much point in adding delays to the first loop of
> copy_heap_data() without also providing for delays inside the sorting
> code and the eventual index rebuilds; which will make the patch
> significantly more complicated and invasive.

The patch puts the old behaviour of vacuum delay back into VACUUM FULL
and seems worth backpatching to 9.0 and 9.1 to me, since it is so
simple.

Previously there wasn't any delay in the sort or indexing either, so
it's a big ask to put that in now and it would also make backpatching
harder.

I think we should backpatch this now, but work on an additional delay
in sort and index for 9.2 to "complete" this thought. ISTM a good idea
for us to add similar delay code to all "maintenance" commands, should
the user wish to use it (9.2+).

> Another question is whether this is the right place to be looking
> at all.  If Gabriele's setup can't keep up with replication when a
> VAC FULL is running, then it can't keep up when under load, period.
> This seems like a pretty band-aid-ish response to that sort of problem.

This isn't about whether the system can cope with the load, its about
whether replication lag is affected by the load.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Select For Update and Left Outer Join

2011-05-01 Thread Patrick Earl
[I previously posted this to pgsql-sql, but received no response as of
yet... it's more of a development team oriented message in any case.]

In ORMs like NHibernate, there are a few strategies for mapping
inheritance to SQL.  One of these is "Joined Subclass," which allows
for the elimination of duplicate data and clean separation of class
contents.

With a class hierarchy such as this:

Pet
Dog : Pet
Cat : Pet

The query to get all the pets is as follows:

select * from Pet
left join Dog on Dog.Id = Pet.Id
left join Cat on Cat.Id = Pet.Id

Now suppose you want to lock to ensure that your Cat is not updated
concurrently.  You add FOR UPDATE, but then PostgreSQL gets upset and
complains that locking on the nullable side of an outer join is not
allowed.

>From our data model, we know that for every single Pet, there can
never be a Dog or Cat that spontaneously appears, so locking in this
case is totally safe.  Unfortunately, PostgreSQL doesn't seem to
provide any mechanism to lock just the rows involved in this query.

Any advice?  I'd be happy if such a thing was implemented in the
engine, as it's supported by other databases without trouble.

As another note, I'm one of the NHibernate developers and I'm working
to get all the NHibernate tests working with PostgreSQL.  The two
significant cases I've had to avoid testing are the "FOR UPDATE"
mentioned above and null characters in UTF strings.  Storing a UTF
"char" which defaults to zero doesn't work on PostgreSQL because it's
apparently still using zero-terminated string functions. :(

Aside from those two things, it looks like PostgreSQL is going to be
passing all the tests soon, so that's good news. :)

Patrick Earl

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-01 Thread Robert Treat
On Sat, Apr 30, 2011 at 5:33 PM, Tom Lane  wrote:
> "Kevin Grittner"  writes:
>> Joshua Berkus  wrote:
>>> I just searched backwards on this thread and I can't find it.
>
>> I think he's talking about the bottom of this post:
>> http://archives.postgresql.org/message-id/BANLkTimnjZNemdpqgK=8Mj=pzq33pz0...@mail.gmail.com
>
> ... which was:
>
>        CF #1: June 1-30
>        CF #2: August 1-31
>        CF #3: October 1-31
>        CF #4 (one week shortened CF): December 1-7
>        CF #5: January 1-31
>
> I think the main thing we have to think about before choosing is whether
> we believe that we can shorten the CFs at all.  Josh's proposal had
> 3-week CFs after the first one, which makes it a lot easier to have a
> fest in November or December, but only if you really can end it on time.
>

If we made the "deadline" for patch acceptance into 9.2 CF#4, then
shortening that to a two week cycle whose main goal was simply to
sanity check patches for 9.2 would probably work. Most would probably
still need further work, which we would expect to get handled in the
final, full CF#5, but we wouldn't let anything new come into CF#5.
This way when we get the 100 patch pile up in CF#4, there's no
expectation that those patches will be committed, just that they can
be sanity checked for the 9.2 release.


Robert Treat
play: xzilla.net
work: omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-01 Thread Kevin Grittner
Robert Treat  wrote:
> Tom Lane  wrote:
 
>>CF #1: June 1-30
>>CF #2: August 1-31
>>CF #3: October 1-31
>>CF #4 (one week shortened CF): December 1-7
>>CF #5: January 1-31
>>
>> I think the main thing we have to think about before choosing is
>> whether we believe that we can shorten the CFs at all.  Josh's
>> proposal had 3-week CFs after the first one, which makes it a lot
>> easier to have a fest in November or December, but only if you
>> really can end it on time.
> 
> If we made the "deadline" for patch acceptance into 9.2 CF#4, then
> shortening that to a two week cycle whose main goal was simply to
> sanity check patches for 9.2 would probably work. Most would
> probably still need further work, which we would expect to get
> handled in the final, full CF#5, but we wouldn't let anything new
> come into CF#5. This way when we get the 100 patch pile up in
> CF#4, there's no expectation that those patches will be committed,
> just that they can be sanity checked for the 9.2 release.
 
Which makes it not really a CommitFest, but rather ... a SanityFest?
 
To make sure I understand you, you're suggesting no WIP patch review
in the last two CFs?  (Of course nothing stops someone from looking
at someone else's WIP between fests.)  Would a patch submitted to
#4, the sanity of which was questioned, be eligible for another try
in #5.
 
I'm just trying to picture how this idea might work.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a bit strange btree index tuples

2011-05-01 Thread Tomas Vondra
Dne 1.5.2011 05:30, Tom Lane napsal(a):
> Tomas Vondra  writes:
>> testdb=# select bt_page_items('test_index', 3);
>>   bt_page_items
>> --
>>  (1,"(1,1)",8,f,f,"")
>>  (2,"(2,1)",12,f,f,"ca 01 00 00")
>>  (3,"(4,1)",12,f,f,"93 03 00 00")
>> (3 rows)
> 
>> I don't understand the first row and I've been unable to find out if
>> it's something special for the btree indexes or what.
> 
> You should read src/backend/access/nbtree/README, which would explain to
> you why it is that leftmost tuples on interior btree pages don't contain
> key values.

Thanks, I somehow missed that doc yesterday (ok, it was 3 AM so it's
understandable). I guess the last paragraph in "Notes About Data
Representation" explains this behaviour.

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make world fails

2011-05-01 Thread Peter Eisentraut
On tor, 2011-04-28 at 00:03 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On ons, 2011-04-27 at 17:54 -0300, Alvaro Herrera wrote:
> >> I take it that if I have a manpages/docbook.xsl in that path, it uses
> >> that instead of trying to fetch it from sourceforge.
> 
> > Exactly.
> 
> > If you don't want to depend on net access, you can do something like
> > make whatever XSLTPROCFLAGS=--nonet
> 
> Is there a way to say "fetch all the documents I need for this build
> into my local cache"?  Then you could do that when your network was up,
> and not have to worry about failures in future.  The set of URIs we
> reference doesn't change much.

No, not without some external program to do the caching.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] strange view performance

2011-05-01 Thread Pavel Stehule
Hello

I am solving  a strange situation, where using a view is slower than
using same tables directly.

The view is defined as

CREATE VIEW v1 AS
  SELECT *
FROM A
   LEFT JOIN B
   LEFT JOIN C
   LEFT JOIN D

and query is
   SELECT *
  FROM T
 LEFT JOIN v

this query is slower than:

SELECT *
   FROM T
  LEFT JOIN A
  LEFT JOIN B
  LEFT JOIN C
  LEFT JOIN D

Is there a some reason for this behave?

set enable_hashjoin to off;
set work_mem to '10MB';
set JOIN_COLLAPSE_LIMIT to 12;
set geqo_threshold to 12;
explain analyze select * from v_vypis_parcel where par_id = 1396907206

/**
"Nested Loop Left Join  (cost=0.00..50.73 rows=1 width=399) (actual
time=0.655..0.914 rows=1 loops=1)"
"  Join Filter: (katastr_uzemi.kod = parcely.katuze_kod)"
"  ->  Nested Loop Left Join  (cost=0.00..43.79 rows=1 width=349)
(actual time=0.627..0.655 rows=1 loops=1)"
"Join Filter: (casti_obci.obce_kod = obce.kod)"
"->  Nested Loop Left Join  (cost=0.00..39.29 rows=1
width=304) (actual time=0.461..0.487 rows=1 loops=1)"
"  Join Filter: (casti_obci.kod = budovy.caobce_kod)"
"  ->  Nested Loop Left Join  (cost=0.00..31.83 rows=1
width=254) (actual time=0.183..0.208 rows=1 loops=1)"
"Join Filter: (parcely.zdpaze_kod = zdroje_parcel_ze.kod)"
"->  Nested Loop Left Join  (cost=0.00..30.77
rows=1 width=191) (actual time=0.175..0.199 rows=1 loops=1)"
"  Join Filter: (zp_vyuziti_poz.kod =
parcely.zpvypa_kod)"
"  ->  Nested Loop Left Join
(cost=0.00..29.14 rows=1 width=135) (actual time=0.130..0.153 rows=1
loops=1)"
"->  Nested Loop Left Join
(cost=0.00..28.76 rows=1 width=142) (actual time=0.119..0.139 rows=1
loops=1)"
"  Join Filter: (t_budov.kod =
budovy.typbud_kod)"
"  ->  Nested Loop Left Join
(cost=0.00..27.62 rows=1 width=139) (actual time=0.111..0.124 rows=1
loops=1)"
"Join Filter:
(t_bud_ii.kod = casti_budov.typbud_kod)"
"->  Nested Loop Left Join
 (cost=0.00..26.49 rows=1 width=136) (actual time=0.096..0.107 rows=1
loops=1)"
"  Join Filter:
(d_pozemku.kod = parcely.drupoz_kod)"
"  ->  Nested Loop
Left Join  (cost=0.00..25.24 rows=1 width=131) (actual
time=0.071..0.079 rows=1 loops=1)"
"->  Nested
Loop Left Join  (cost=0.00..16.95 rows=1 width=127) (actual
time=0.057..0.061 rows=1 loops=1)"
"  ->
Nested Loop Left Join  (cost=0.00..16.61 rows=1 width=113) (actual
time=0.049..0.053 rows=1 loops=1)"
"
->  Index Scan using par_pk on parcely  (cost=0.00..8.31 rows=1
width=84) (actual time=0.028..0.029 rows=1 loops=1)"
"
Index Cond: (id = 1396907206::numeric)"
"
->  Index Scan using bud_pk on budovy  (cost=0.00..8.28 rows=1
width=40) (actual time=0.014..0.015 rows=1 loops=1)"
"
Index Cond: (budovy.id = parcely.bud_id)"
"  ->
Index Scan using i_casti_budov_budid on casti_budov  (cost=0.00..0.30
rows=3 width=25) (actual time=0.005..0.005 rows=0 loops=1)"
"
Index Cond: (casti_budov.bud_id = budovy.id)"
"->  Index
Scan using tel_pk on telesa  (cost=0.00..8.28 rows=1 width=15) (actual
time=0.011..0.012 rows=1 loops=1)"
"  Index
Cond: (parcely.tel_id = telesa.id)"
"  ->  Seq Scan on
d_pozemku  (cost=0.00..1.11 rows=11 width=19) (actual
time=0.004..0.012 rows=11 loops=1)"
"->  Seq Scan on t_budov
t_bud_ii  (cost=0.00..1.06 rows=6 width=17) (actual time=0.002..0.005
rows=6 loops=1)"
"  ->  Seq Scan on t_budov
(cost=0.00..1.06 rows=6 width=17) (actual time=0.001..0.005 rows=6
loops=1)"
"->  Index Scan using tel_pk on telesa
tel_bud  (cost=0.00..0.37 rows=1 width=15) (actual time=0.009..0.010
rows=1 loops=1)"
"  Index Cond: (budovy.tel_id = tel_bud.id)"
"  ->  Seq Scan on zp_vyuziti_poz
(cost=0.00..1.28 rows=28 width=70) (actual time=0.002..0.020 rows=28
loops=1)"
"->  Seq Scan on zdroje_parcel_ze
(cost=0.00..1.03 rows=3 width=70) (actual time=0.002..0.004 rows=3
loops=1)"
"  ->  Seq Scan on casti_obci  (cost=0.00..4.98 rows=198
width=58) (actual time=0.002..0.128 rows=198 loops=1)"
"->  Seq Scan on obce  (cost=0.00..3.11 rows=111 width=53)
(actual time=