Re: [HACKERS] should I post the patch as committed?

2010-04-22 Thread Pavan Deolasee
On Tue, Apr 20, 2010 at 10:41 PM, Alvaro Herrera alvhe...@commandprompt.com
 wrote:




 I think committing a patch from a non-regular is a special case and
 attaching the modified patch is reasonable in that case.

 My 8.8 Richter ...



Or may be just mention the commit id for easy look up in the git log.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Simon Riggs
On Thu, 2010-04-22 at 08:57 +0300, Heikki Linnakangas wrote:
  
  I think the assert is a good idea.  If there's no real problem here,
  the assert won't trip.  It's just a safety precaution.
 
 Right. And assertions also act as documentation, they are a precise and
 compact way to document invariants we assume to hold. A comment
 explaining why the cyclic nature of XIDs is not a problem would be nice
 too, in addition or instead of the assertions.

Agreed. I was going to reply just that earlier but have been distracted
on other things.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot Standby b-tree delete records review

2010-04-22 Thread Heikki Linnakangas
btree_redo:
   case XLOG_BTREE_DELETE:
 
   /*
* Btree delete records can conflict with standby queries. You
* might think that vacuum records would conflict as well, but
* we've handled that already. XLOG_HEAP2_CLEANUP_INFO records
* provide the highest xid cleaned by the vacuum of the heap
* and so we can resolve any conflicts just once when that
* arrives. After that any we know that no conflicts exist
* from individual btree vacuum records on that index.
*/
   {
   TransactionId latestRemovedXid = 
 btree_xlog_delete_get_latestRemovedXid(record);
   xl_btree_delete *xlrec = (xl_btree_delete *) 
 XLogRecGetData(record);
 
   /*
* XXX Currently we put everybody on death row, because
* currently _bt_delitems() supplies 
 InvalidTransactionId.
* This can be fairly painful, so providing a better 
 value
* here is worth some thought and possibly some effort 
 to
* improve.
*/
   ResolveRecoveryConflictWithSnapshot(latestRemovedXid, 
 xlrec-node);
   }
   break;

The XXX comment is out-of-date, the latestRemovedXid value is calculated
by btree_xlog_delete_get_latestRemovedXid() nowadays.

If we're re-replaying the WAL record, for example after restarting the
standby server, btree_xlog_delete_get_latestRemovedXid() won't find the
deleted records and will return InvalidTransactionId. That's OK, because
until we reach again the point in WAL where we were before the restart,
we don't accept read-only connections so there's no-one to kill anyway,
but you do get a useless Invalid latestRemovedXid reported, using
latestCompletedXid instead message in the log (that shouldn't be
capitalized, BTW).

It would be nice to check if there's any potentially conflicting
read-only queries before calling
btree_xlog_delete_get_latestRemovedXid(), which can be quite expensive.

If the Invalid latestRemovedXid reported, using latestCompletedXid
instead message is going to happen commonly, I think it should be
downgraded to DEBUG1. If it's an unexpected scenario, it should be
upgraded to WARNING.

In btree_xlog_delete_get_latestRemovedXid:
   Assert(num_unused == 0);

Can't that happen as well in a re-replay scenario, if a heap item was
vacuumed away later on?

   /*
* Note that if all heap tuples were LP_DEAD then we will be
* returning InvalidTransactionId here. This seems very unlikely
* in practice.
*/

If none of the removed heap tuples were present anymore, we currently
return InvalidTransactionId, which kills/waits out all read-only
queries. But if none of the tuples were present anymore, the read-only
queries wouldn't have seen them anyway, so ISTM that we should treat
InvalidTransactionId return value as we don't need to kill anyone.

Why does btree_xlog_delete_get_latestRemovedXid() keep the
num_unused/num_dead/num_redirect counts, it doesn't actually do anything
with them.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure twophase.c (3) (testing HS/SR)

2010-04-22 Thread Heikki Linnakangas
Can you still reproduce this or has some of the changes since then fixed
it? We never quite figured out the cause..

Erik Rijkers wrote:
 On Thu, March 4, 2010 17:00, Erik Rijkers wrote:
 in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30

 With three patches:

   new_smart_shutdown_20100201.patch
   extend_format_of_recovery_info_funcs_v4.20100303.patch
   fix-KnownAssignedXidsRemoveMany-1.patch

   pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary

 FailedAssertion, File: twophase.c, Line: 1201.

 
 For the record, this still happens (FailedAssertion, File: twophase.c, 
 Line: 1201.)
 (created 2010.03.13 23:49 cvs).
 
 Unfortunately, it does not happen always, or predictably.
 
 patches:
  new_smart_shutdown_20100201.patch
  extend_format_of_recovery_info_funcs_v4.20100303.patch
  (both here: 
 http://archives.postgresql.org/pgsql-hackers/2010-03/msg00446.php )
 
   (fix-KnownAssignedXidsRemoveMany-1.patch has been committed, I think?)
 
 
 I use commandlines like this to copy schemas across from 8.4.2 to 9.0devel:
 pg_dump -c -h /tmp -p 5432 -n myschema --no-owner --no-privileges mydb \
   | psql -1qtA -h /tmp -p 7575 -d replicas
 
 (the copied schemas were together 175 GB)
 
 As I seem to be the only one who finds this, I started looking what could be 
 unique in this
 install: and it would be postbio, which we use for its gist-indexing on ranges
 (http://pgfoundry.org/projects/postbio/).  We use postbio's int_interval type 
 as a column type. 
 But keep in mind that sometimes the whole dump+restore+replication completes 
 OK.
 
 
 Other installed modules are:
   contrib/btree_gist
   contrib/seg
   contrib/adminpack
 
 log_line_prefix = '%t %p %d %u start=%s ' # slave
 
 pgsql.sr_hotslave/logfile:
 
 2010-03-13 23:54:59 CET 15765   start=2010-03-13 23:54:59 CET LOG:  database 
 system was
 interrupted; last known up at 2010-03-13 23:54:31 CET
 cp: cannot stat 
 `/var/data1/pg_stuff/dump/hotslave/replication_archive/00010001':
 No such file or directory
 2010-03-13 23:55:00 CET 15765   start=2010-03-13 23:54:59 CET LOG:  entering 
 standby mode
 2010-03-13 23:55:00 CET 15765   start=2010-03-13 23:54:59 CET LOG:  redo 
 starts at 0/120
 2010-03-13 23:55:00 CET 15765   start=2010-03-13 23:54:59 CET LOG:  
 consistent recovery state
 reached at 0/200
 2010-03-13 23:55:00 CET 15763   start=2010-03-13 23:54:59 CET LOG:  database 
 system is ready to
 accept read only connections
 TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File: 
 twophase.c, Line: 1201)
 2010-03-14 05:28:59 CET 15763   start=2010-03-13 23:54:59 CET LOG:  startup 
 process (PID 15765)
 was terminated by signal 6: Aborted
 2010-03-14 05:28:59 CET 15763   start=2010-03-13 23:54:59 CET LOG:  
 terminating any other active
 server processes
 
 
 Maybe I'll try now to setup a similar instance without postbio, to see if the 
 crash still occurs.
 
 hth,
 
 Erik Rijkers
 
 
 


-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby b-tree delete records review

2010-04-22 Thread Simon Riggs
On Thu, 2010-04-22 at 10:24 +0300, Heikki Linnakangas wrote:
 btree_redo:
  case XLOG_BTREE_DELETE:
  
  /*
   * Btree delete records can conflict with standby queries. You
   * might think that vacuum records would conflict as well, but
   * we've handled that already. XLOG_HEAP2_CLEANUP_INFO records
   * provide the highest xid cleaned by the vacuum of the heap
   * and so we can resolve any conflicts just once when that
   * arrives. After that any we know that no conflicts exist
   * from individual btree vacuum records on that index.
   */
  {
  TransactionId latestRemovedXid = 
  btree_xlog_delete_get_latestRemovedXid(record);
  xl_btree_delete *xlrec = (xl_btree_delete *) 
  XLogRecGetData(record);
  
  /*
   * XXX Currently we put everybody on death row, because
   * currently _bt_delitems() supplies 
  InvalidTransactionId.
   * This can be fairly painful, so providing a better 
  value
   * here is worth some thought and possibly some effort 
  to
   * improve.
   */
  ResolveRecoveryConflictWithSnapshot(latestRemovedXid, 
  xlrec-node);
  }
  break;
 
 The XXX comment is out-of-date, the latestRemovedXid value is calculated
 by btree_xlog_delete_get_latestRemovedXid() nowadays.

Removed, thanks.

 If we're re-replaying the WAL record, for example after restarting the
 standby server, btree_xlog_delete_get_latestRemovedXid() won't find the
 deleted records and will return InvalidTransactionId. That's OK, because
 until we reach again the point in WAL where we were before the restart,
 we don't accept read-only connections so there's no-one to kill anyway,
 but you do get a useless Invalid latestRemovedXid reported, using
 latestCompletedXid instead message in the log (that shouldn't be
 capitalized, BTW).

 It would be nice to check if there's any potentially conflicting
 read-only queries before calling
 btree_xlog_delete_get_latestRemovedXid(), which can be quite expensive.

Good idea. You're welcome to add such tuning yourself, if you like.

 If the Invalid latestRemovedXid reported, using latestCompletedXid
 instead message is going to happen commonly, I think it should be
 downgraded to DEBUG1. If it's an unexpected scenario, it should be
 upgraded to WARNING.

Set to DEBUG because the above optimisation makes it return invalid much
more frequently, which we don't want reported.

 In btree_xlog_delete_get_latestRemovedXid:
  Assert(num_unused == 0);
 
 Can't that happen as well in a re-replay scenario, if a heap item was
 vacuumed away later on?

OK, will remove. The re-replay gets me every time.

  /*
   * Note that if all heap tuples were LP_DEAD then we will be
   * returning InvalidTransactionId here. This seems very unlikely
   * in practice.
   */
 
 If none of the removed heap tuples were present anymore, we currently
 return InvalidTransactionId, which kills/waits out all read-only
 queries. But if none of the tuples were present anymore, the read-only
 queries wouldn't have seen them anyway, so ISTM that we should treat
 InvalidTransactionId return value as we don't need to kill anyone.

That's not the point. The tuples were not themselves the sole focus,
they indicated the latestRemovedXid of the backend on the primary that
had performed the deletion. So even if those tuples are no longer
present there may be others with similar xids that would conflict, so we
cannot ignore. Comment updated.

 Why does btree_xlog_delete_get_latestRemovedXid() keep the
 num_unused/num_dead/num_redirect counts, it doesn't actually do anything
 with them.

Probably a debug tool. Removed.

Changes committed.

Thanks for the review.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby b-tree delete records review

2010-04-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Thu, 2010-04-22 at 10:24 +0300, Heikki Linnakangas wrote:
 btree_redo:
 /*
  * Note that if all heap tuples were LP_DEAD then we will be
  * returning InvalidTransactionId here. This seems very unlikely
  * in practice.
  */
 If none of the removed heap tuples were present anymore, we currently
 return InvalidTransactionId, which kills/waits out all read-only
 queries. But if none of the tuples were present anymore, the read-only
 queries wouldn't have seen them anyway, so ISTM that we should treat
 InvalidTransactionId return value as we don't need to kill anyone.
 
 That's not the point. The tuples were not themselves the sole focus,

Yes, they were. We're replaying a b-tree deletion record, which removes
pointers to some heap tuples, making them unreachable to any read-only
queries. If any of them still need to be visible to read-only queries,
we have a conflict. But if all of the heap tuples are gone already,
removing the index pointers to them can'ẗ change the situation for any
query. If any of them should've been visible to a query, the damage was
done already by whoever pruned the heap tuples leaving just the
tombstone LP_DEAD item pointers (in the heap) behind.

Or do we use the latestRemovedXid value for something else as well?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby b-tree delete records review

2010-04-22 Thread Simon Riggs
On Thu, 2010-04-22 at 11:28 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Thu, 2010-04-22 at 10:24 +0300, Heikki Linnakangas wrote:
  btree_redo:
/*
 * Note that if all heap tuples were LP_DEAD then we will be
 * returning InvalidTransactionId here. This seems very unlikely
 * in practice.
 */
  If none of the removed heap tuples were present anymore, we currently
  return InvalidTransactionId, which kills/waits out all read-only
  queries. But if none of the tuples were present anymore, the read-only
  queries wouldn't have seen them anyway, so ISTM that we should treat
  InvalidTransactionId return value as we don't need to kill anyone.
  
  That's not the point. The tuples were not themselves the sole focus,
 
 Yes, they were. We're replaying a b-tree deletion record, which removes
 pointers to some heap tuples, making them unreachable to any read-only
 queries. If any of them still need to be visible to read-only queries,
 we have a conflict. But if all of the heap tuples are gone already,
 removing the index pointers to them can'ẗ change the situation for any
 query. If any of them should've been visible to a query, the damage was
 done already by whoever pruned the heap tuples leaving just the
 tombstone LP_DEAD item pointers (in the heap) behind.

You're missing my point. Those tuples are indicators of what may lie
elsewhere in the database, completely unreferenced by this WAL record.
Just because these referenced tuples are gone doesn't imply that all
tuple versions written by the as yet-unknown-xids are also gone. We
can't infer anything about the whole database just from one small group
of records.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby b-tree delete records review

2010-04-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Thu, 2010-04-22 at 11:28 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
 On Thu, 2010-04-22 at 10:24 +0300, Heikki Linnakangas wrote:
 btree_redo:
   /*
* Note that if all heap tuples were LP_DEAD then we will be
* returning InvalidTransactionId here. This seems very unlikely
* in practice.
*/
 If none of the removed heap tuples were present anymore, we currently
 return InvalidTransactionId, which kills/waits out all read-only
 queries. But if none of the tuples were present anymore, the read-only
 queries wouldn't have seen them anyway, so ISTM that we should treat
 InvalidTransactionId return value as we don't need to kill anyone.
 That's not the point. The tuples were not themselves the sole focus,
 Yes, they were. We're replaying a b-tree deletion record, which removes
 pointers to some heap tuples, making them unreachable to any read-only
 queries. If any of them still need to be visible to read-only queries,
 we have a conflict. But if all of the heap tuples are gone already,
 removing the index pointers to them can'ẗ change the situation for any
 query. If any of them should've been visible to a query, the damage was
 done already by whoever pruned the heap tuples leaving just the
 tombstone LP_DEAD item pointers (in the heap) behind.
 
 You're missing my point. Those tuples are indicators of what may lie
 elsewhere in the database, completely unreferenced by this WAL record.
 Just because these referenced tuples are gone doesn't imply that all
 tuple versions written by the as yet-unknown-xids are also gone. We
 can't infer anything about the whole database just from one small group
 of records.

Have you got an example of that?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby b-tree delete records review

2010-04-22 Thread Simon Riggs
On Thu, 2010-04-22 at 11:56 +0300, Heikki Linnakangas wrote:

  If none of the removed heap tuples were present anymore, we currently
  return InvalidTransactionId, which kills/waits out all read-only
  queries. But if none of the tuples were present anymore, the read-only
  queries wouldn't have seen them anyway, so ISTM that we should treat
  InvalidTransactionId return value as we don't need to kill anyone.
  That's not the point. The tuples were not themselves the sole focus,
  Yes, they were. We're replaying a b-tree deletion record, which removes
  pointers to some heap tuples, making them unreachable to any read-only
  queries. If any of them still need to be visible to read-only queries,
  we have a conflict. But if all of the heap tuples are gone already,
  removing the index pointers to them can'ẗ change the situation for any
  query. If any of them should've been visible to a query, the damage was
  done already by whoever pruned the heap tuples leaving just the
  tombstone LP_DEAD item pointers (in the heap) behind.
  
  You're missing my point. Those tuples are indicators of what may lie
  elsewhere in the database, completely unreferenced by this WAL record.
  Just because these referenced tuples are gone doesn't imply that all
  tuple versions written by the as yet-unknown-xids are also gone. We
  can't infer anything about the whole database just from one small group
  of records.
 
 Have you got an example of that?

I don't need one, I have suggested the safe route. In order to infer
anything, and thereby further optimise things, we would need proof that
no cases can exist, which I don't have. Perhaps we can add yet, not
sure about that either.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby b-tree delete records review

2010-04-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Thu, 2010-04-22 at 11:56 +0300, Heikki Linnakangas wrote:
 
 If none of the removed heap tuples were present anymore, we currently
 return InvalidTransactionId, which kills/waits out all read-only
 queries. But if none of the tuples were present anymore, the read-only
 queries wouldn't have seen them anyway, so ISTM that we should treat
 InvalidTransactionId return value as we don't need to kill anyone.
 That's not the point. The tuples were not themselves the sole focus,
 Yes, they were. We're replaying a b-tree deletion record, which removes
 pointers to some heap tuples, making them unreachable to any read-only
 queries. If any of them still need to be visible to read-only queries,
 we have a conflict. But if all of the heap tuples are gone already,
 removing the index pointers to them can'ẗ change the situation for any
 query. If any of them should've been visible to a query, the damage was
 done already by whoever pruned the heap tuples leaving just the
 tombstone LP_DEAD item pointers (in the heap) behind.
 You're missing my point. Those tuples are indicators of what may lie
 elsewhere in the database, completely unreferenced by this WAL record.
 Just because these referenced tuples are gone doesn't imply that all
 tuple versions written by the as yet-unknown-xids are also gone. We
 can't infer anything about the whole database just from one small group
 of records.
 Have you got an example of that?
 
 I don't need one, I have suggested the safe route. In order to infer
 anything, and thereby further optimise things, we would need proof that
 no cases can exist, which I don't have. Perhaps we can add yet, not
 sure about that either.

It's good to be safe rather than sorry, but I'd still like to know
because I'm quite surprised by that, and got me worried that I don't
understand how hot standby works as well as I thought I did. I thought
the point of stopping replay/killing queries at a b-tree deletion record
is precisely that it makes some heap tuples invisible to running
read-only queries. If it doesn't make any tuples invisible, why do any
queries need to be killed? And why was it OK for them to be running just
before replaying the b-tree deletion record?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby b-tree delete records review

2010-04-22 Thread Simon Riggs
On Thu, 2010-04-22 at 12:18 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Thu, 2010-04-22 at 11:56 +0300, Heikki Linnakangas wrote:
  
  If none of the removed heap tuples were present anymore, we currently
  return InvalidTransactionId, which kills/waits out all read-only
  queries. But if none of the tuples were present anymore, the read-only
  queries wouldn't have seen them anyway, so ISTM that we should treat
  InvalidTransactionId return value as we don't need to kill anyone.
  That's not the point. The tuples were not themselves the sole focus,
  Yes, they were. We're replaying a b-tree deletion record, which removes
  pointers to some heap tuples, making them unreachable to any read-only
  queries. If any of them still need to be visible to read-only queries,
  we have a conflict. But if all of the heap tuples are gone already,
  removing the index pointers to them can'ẗ change the situation for any
  query. If any of them should've been visible to a query, the damage was
  done already by whoever pruned the heap tuples leaving just the
  tombstone LP_DEAD item pointers (in the heap) behind.
  You're missing my point. Those tuples are indicators of what may lie
  elsewhere in the database, completely unreferenced by this WAL record.
  Just because these referenced tuples are gone doesn't imply that all
  tuple versions written by the as yet-unknown-xids are also gone. We
  can't infer anything about the whole database just from one small group
  of records.
  Have you got an example of that?
  
  I don't need one, I have suggested the safe route. In order to infer
  anything, and thereby further optimise things, we would need proof that
  no cases can exist, which I don't have. Perhaps we can add yet, not
  sure about that either.
 
 It's good to be safe rather than sorry, but I'd still like to know
 because I'm quite surprised by that, and got me worried that I don't
 understand how hot standby works as well as I thought I did. I thought
 the point of stopping replay/killing queries at a b-tree deletion record
 is precisely that it makes some heap tuples invisible to running
 read-only queries. If it doesn't make any tuples invisible, why do any
 queries need to be killed? And why was it OK for them to be running just
 before replaying the b-tree deletion record?

I'm sorry but I'm too busy to talk further on this today. Since we are
discussing a further optimisation rather than a bug, I hope it is OK to
come back to this again later.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] don't allow walsender to consume superuser_reserved_connection slots, or during shutdown

2010-04-22 Thread Robert Haas
On Wed, Apr 21, 2010 at 10:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Here's the fine patch.  The actual code changes are simple and seem to
 work as expected, but I struggled a bit with the phrasing of the
 messages.  Feel free to suggest improvements.

 Stick with the original wording?  I don't really see a need to change it.

I don't think that's a good idea.  If we just say that the remaining
connection slots are for superusers, someone will inevitably ask us
why their superuser replication can't connect.  I think it's important
to phrase things as accurately as possible.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thread safety and libxml2

2010-04-22 Thread Robert Haas
On Mon, Apr 19, 2010 at 10:52 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Feb 18, 2010 at 8:41 PM, Bruce Momjian br...@momjian.us wrote:
 Peter Eisentraut wrote:
 On ons, 2009-12-30 at 12:55 -0500, Greg Smith wrote:
  Basically, configure failed on their OpenBSD system because thread
  safety is on but the libxml2 wasn't compiled with threaded support:
  http://xmlsoft.org/threads.html
 
  Disabling either feature (no --with-libxml or --disable-thread-safety)
  gives a working build.

 This could perhaps be fixed by excluding libxml when running the thread
 test.  The thread result is only used in the client libraries and libxml
 is only used in the backend, so those two shouldn't meet each other in
 practice.

 The attached patch removes -lxml2 from the link line of the thread
 test program.  Comments?  Can anyone test this fixes the OpenBSD
 problem?

 Can someone take the time to test this whether this patch fixes the
 problem?  This is on the list of open items for PG 9.0, but
 considering that there's been a proposed patch available for almost
 two months and no responses to this thread, it may be time to conclude
 that nobody cares very much - in which case we can either remove this
 item or relocate it to the TODO list.

Since no one has responded to this, I'm moving this to the section of
the open items list called long-term issues: These items are not
9.0-specific. They should be fixed eventually, but not for now.  I am
inclined to think this isn't worth adding to the main TODO list.  If
someone complains about it again, we can ask them to test the patch.
If not, I don't see much point in investing any more time in it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


recovery_connections cannot start (was Re: [HACKERS] master in standby mode croaks)

2010-04-22 Thread Robert Haas
On Sat, Apr 17, 2010 at 6:52 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Apr 17, 2010 at 6:41 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, 2010-04-17 at 17:44 -0400, Robert Haas wrote:

  I will change the error message.

 I gave a good deal of thought to trying to figure out a cleaner
 solution to this problem than just changing the error message and
 failed.  So let's change the error message.  Of course I'm not quite
 sure what we should change it TO, given that the situation is the
 result of an interaction between three different GUCs and we have no
 way to distinguish which one(s) are the problem.

 You need all three covers it.

 Actually you need standby_connections and either archive_mode=on or
 max_wal_senders0, I think.

One way we could fix this is use 2 bits rather than 1 for
XLogStandbyInfoMode.  One bit could indicate that either
archive_mode=on or max_wal_senders0, and the second bit could
indicate that recovery_connections=on.  If the second bit is unset, we
could emit the existing complaint:

recovery connections cannot start because the recovery_connections
parameter is disabled on the WAL source server

If the other bit is unset, then we could instead complain:

recovery connections cannot start because archive_mode=off and
max_wal_senders=0 on the WAL source server

If we don't want to use two bits there, it's hard to really describe
all the possibilities in a reasonable number of characters.  The only
thing I can think of is to print a message and a hint:

recovery_connections cannot start due to incorrect settings on the WAL
source server
HINT: make sure recovery_connections=on and either archive_mode=on or
max_wal_senders0

I haven't checked whether the hint would be displayed in the log on
the standby, but presumably we could make that be the case if it's not
already.

I think the first way is better because it gives the user more
specific information about what they need to fix.  Thinking about how
each case might happen, since the default for recovery_connections is
'on', it seems that recovery_connections=off will likely only be an
issue if the user has explicitly turned it off.  The other case, where
archive_mode=off and max_wal_senders=0, will likely only occur if
someone takes a snapshot of the master without first setting up
archiving or SR.  Both of these will probably happen relatively
rarely, but since we're burning a whole byte for XLogStandbyInfoMode
(plus 3 more bytes of padding?), it seems like we might as well snag
one more bit for clarity.

Thoughts?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BETA

2010-04-22 Thread Robert Haas
On Wed, Apr 21, 2010 at 9:41 AM, Marc G. Fournier scra...@hub.org wrote:
 On Wed, 21 Apr 2010, Robert Haas wrote:
 Well, never mind that then.  How about a beta next week?

 I'm good for that ...

Anyone else want to weigh in for or against this?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BETA

2010-04-22 Thread Marc G. Fournier

On Thu, 22 Apr 2010, Robert Haas wrote:


On Wed, Apr 21, 2010 at 9:41 AM, Marc G. Fournier scra...@hub.org wrote:

On Wed, 21 Apr 2010, Robert Haas wrote:

Well, never mind that then.  How about a beta next week?


I'm good for that ...


Anyone else want to weigh in for or against this?


We're discussing scheduling on -core right now, triggered by your email, 
and will put out a notice shortly ... although we did just do a back 
branch release, we have a second one that has to be done, so we're trying 
to balance schedules around doing both, but not simultaneously ...



Marc G. FournierHub.Org Hosting Solutions S.A.
scra...@hub.org http://www.hub.org

Yahoo:yscrappySkype: hub.orgICQ:7615664MSN:scra...@hub.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BETA

2010-04-22 Thread Robert Haas
On Thu, Apr 22, 2010 at 12:18 PM, Marc G. Fournier scra...@hub.org wrote:
 We're discussing scheduling on -core right now, triggered by your email, and
 will put out a notice shortly ... although we did just do a back branch
 release, we have a second one that has to be done, so we're trying to
 balance schedules around doing both, but not simultaneously ...

OK, thanks!

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Erik Rijkers
On Sun, April 18, 2010 13:01, Simon Riggs wrote:

 OK, I'll put a spinlock around access to the head of the array.

 v2 patch attached


knownassigned_sortedarray.v2.diff applied to cvs HEAD (2010.04.21 22:36)

I have done a few smaller tests (scale 500, clients 1, 20):

init:
  pgbench -h /tmp -p 6565 -U rijkers -i -s 500 replicas


4x primary, clients 1:
scale: 500 clients:  1  tps = 11496.372655  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11580.141685  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11478.294747  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps = 11741.432016  pgbench -p 6565 -n -S -c 1 -T 
900 -j 1

4x standby, clients 1:
scale: 500 clients:  1  tps =   727.217672  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   785.431011  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   825.291817  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1
scale: 500 clients:  1  tps =   868.107638  pgbench -p 6566 -n -S -c 1 -T 
900 -j 1


4x primary, clients 20:
scale: 500 clients: 20  tps = 34963.054102  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34818.985407  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34964.545013  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps = 34959.210687  pgbench -p 6565 -n -S -c 20 -T 
900 -j 1

4x standby, clients 20:
scale: 500 clients: 20  tps =  1099.808192  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =   905.926703  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =   943.531989  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1
scale: 500 clients: 20  tps =  1082.215913  pgbench -p 6566 -n -S -c 20 -T 
900 -j 1


This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).

I have no idea what could cause this; is no one else is seeing this ?

(if I have time I'll repeat on other hardware in the weekend)

any comment is welcome...


Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq connectoin redirect

2010-04-22 Thread feng tian

While these can be handled at higher level, for example, by setting 
up LDAP or as Hekki suggested, tricking DNS, the problem is that 
I don't have control of how the user connect to the server.  They 
may not use LDAP.  Solution like pgbouncer has advantages.  User
just get one ip/port and everything else happens automatically.

Thanks,


 Subject: Re: [HACKERS] libpq connectoin redirect
 From: li...@jwp.name
 Date: Wed, 21 Apr 2010 15:52:39 -0700
 CC: pgsql-hackers@postgresql.org
 To: ft...@hotmail.com
 
 On Apr 20, 2010, at 10:03 PM, feng tian wrote:
  Another way to do this, is to send the client an redirect message.  When 
  client connect to 127.0.0.10, instead of accepting the connection, it can 
  reply to client telling it to reconnect to one of the server on 
  127.0.0.11-14.  
 
 ISTM that this would be better handled at a higher-level. That is, given a 
 server (127.0.0.10) that holds 127.0.0.11-14. Connect to that server and 
 query for the correct target host.
  
_
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

Re: [HACKERS] libpq connectoin redirect

2010-04-22 Thread feng tian

Hi, John,

The change will be on the libpq client side.  I am not saying this is a general
solution for the distributed transaction/scale out.  However, in many cases, it
is very useful.  For example, in my case, I have about 100 departments each has 
it own database.  The balance machine can just redirect to the right box 
according
to database/user.  The 4 boxes I have may not even get domain name or static IP.
Another scenario, if I have some kind of replication set up, I can send 
transaction
processing to master and analytic reporting query to slaves. 

Thanks,
Feng

feng tian wrote:
Hi,

I want to load balance a postgres server on 4 physical 
machines, say 
127.0.0.11-14.  I can set up a pgbouncer on 127.0.0.10 and 
connection 
pooling to my four boxes.  However, the traffic from/to clients
 will 
go through an extra hop.  Another way to do this, is to send 
the 
client an redirect message.  When client connect to 
127.0.0.10, 
instead of accepting the connection, it can reply to client 
telling it 
to reconnect to one of the server on 127.0.0.11-14. 

I am planning to write/submit a patch to do that.  I wonder if 
there 
is similar effort in extending libpq protocol, or, if you have 
better 
ideas on how to achieve this.

how do you plan on maintaining consistency, transactional 
integrity and 
atomicity of updates across these 4 machines?
  
_
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

Re: [HACKERS] Thoughts on pg_hba.conf rejection

2010-04-22 Thread Alvaro Herrera
Tom Lane escribió:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, Apr 20, 2010 at 7:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  (You might want to look back at the archived discussions about how to
  avoid storing entries for temp tables in these catalogs; that poses
  many of the same issues.)
 
  Do you happen to know what a good search term might be?  I tried
  searching for things like pg_class temp tables and pg_class
  temporary tables and didn't come up with much.
 
 I found this thread:
 http://archives.postgresql.org/pgsql-hackers/2008-07/msg00593.php
 I claimed in that message that there were previous discussions but
 I did not come across them right away.

I vaguely remember that there was a discussion about pg_attribute and
the extra rows for system rows for all tables, which diverged into a
discussion about temp tables and those other extra rows.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Greg Smith

Erik Rijkers wrote:

This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).
  


Any possibility the standby is built with assertions turned out?  That's 
often the cause of this type of difference between pgbench results on 
two systems, which easy to introduce when everyone is building from 
source.  You should try this on both systems:


psql -c show debug_assertions


just to rule that out.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Mark Kirkwood

Greg Smith wrote:

Erik Rijkers wrote:
This is the same behaviour (i.e. extreme slow standby) that I saw 
earlier (and which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many

hours maybe even days (without bouncing either primary or standby).
  


Any possibility the standby is built with assertions turned out?  
That's often the cause of this type of difference between pgbench 
results on two systems, which easy to introduce when everyone is 
building from source.  You should try this on both systems:


psql -c show debug_assertions




Or even:

pg_config --configure

on both systems might be worth checking.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Erik Rijkers
On Thu, April 22, 2010 23:54, Mark Kirkwood wrote:
 Greg Smith wrote:
 Erik Rijkers wrote:
 This is the same behaviour (i.e. extreme slow standby) that I saw
 earlier (and which caused the
 original post, btw).  In that earlier instance, the extreme slowness
 disappeared later, after many
 hours maybe even days (without bouncing either primary or standby).


 Any possibility the standby is built with assertions turned out?
 That's often the cause of this type of difference between pgbench
 results on two systems, which easy to introduce when everyone is
 building from source.  You should try this on both systems:

 psql -c show debug_assertions



 Or even:

 pg_config --configure

 on both systems might be worth checking.

(these instances are on a single server, btw)

primary:

$ pg_config
BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin
DOCDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
HTMLDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
INCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
PKGINCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
INCLUDEDIR-SERVER = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include/server
LIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
PKGLIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
LOCALEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/locale
MANDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/man
SHAREDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share
SYSCONFDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/etc
PGXS = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary' 
'--with-pgport=6565'
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml' 
'--with-libxslt'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib'
LDFLAGS_SL =
LIBS = -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -ltermcap -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.0devel-sr_primary
[data:port:db   PGPORT=6565   
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data  
PGDATABASE=replicas]
2010.04.22 20:55:28 rijk...@denkraam:~/src/perl/85devel [0]
$ time ./run_test_suite.sh
[data:port:db   PGPORT=6565   
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data  
PGDATABASE=replicas]
2010.04.22 21:00:26 rijk...@denkraam:~/src/perl/85devel [1]

standby:

$ pg_config
BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/bin
DOCDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
HTMLDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/doc
INCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
PKGINCLUDEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include
INCLUDEDIR-SERVER = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/include/server
LIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
PKGLIBDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib
LOCALEDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/locale
MANDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share/man
SHAREDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/share
SYSCONFDIR = /var/data1/pg_stuff/pg_installations/pgsql.sr_primary/etc
PGXS = 
/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary' 
'--with-pgport=6565'
'--enable-depend' '--with-openssl' '--with-perl' '--with-libxml' 
'--with-libxslt'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/lib'
LDFLAGS_SL =
LIBS = -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -ltermcap -lcrypt 
-ldl -lm
VERSION = PostgreSQL 9.0devel-sr_primary



$ grep -Ev '(^[[:space:]]*#)|(^$)' pgsql.sr_*ry/data/postgresql.conf
pgsql.sr_primary/data/postgresql.conf:data_directory =
'/var/data1/pg_stuff/pg_installations/pgsql.sr_primary/data'
pgsql.sr_primary/data/postgresql.conf:port = 6565
pgsql.sr_primary/data/postgresql.conf:max_connections = 100
pgsql.sr_primary/data/postgresql.conf:shared_buffers = 256MB
pgsql.sr_primary/data/postgresql.conf:checkpoint_segments = 50
pgsql.sr_primary/data/postgresql.conf:archive_mode = 'on'
pgsql.sr_primary/data/postgresql.conf:archive_command= 'cp %p
/var/data1/pg_stuff/dump/replication_archive/%f'

Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Simon Riggs
On Thu, 2010-04-22 at 20:39 +0200, Erik Rijkers wrote:
 On Sun, April 18, 2010 13:01, Simon Riggs wrote:

 any comment is welcome...

Please can you re-run with -l and post me the file of times

Please also rebuild using --enable-profile so we can see what's
happening.

Can you also try the enclosed patch which implements prefetching during
replay of btree delete records. (Need to set effective_io_concurrency)

Thanks for your further help.

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index f4c7bf4..9918688 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -578,6 +578,8 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	OffsetNumber 	hoffnum;
 	TransactionId	latestRemovedXid = InvalidTransactionId;
 	TransactionId	htupxid = InvalidTransactionId;
+	TransactionId	oldestxmin = GetCurrentOldestXmin(true, true);
+	TransactionId	latestCompletedXid;
 	int i;
 
 	/*
@@ -586,8 +588,12 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	 * That returns InvalidTransactionId, and so will conflict with
 	 * users, but since we just worked out that's zero people, its OK.
 	 */
-	if (CountDBBackends(InvalidOid) == 0)
-		return latestRemovedXid;
+	if (!TransactionIdIsValid(oldestxmin))
+		return oldestxmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	latestCompletedXid = ShmemVariableCache-latestCompletedXid;
+	LWLockRelease(ProcArrayLock);
 
 	/*
 	 * Get index page
@@ -603,6 +609,27 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 	 */
 	unused = (OffsetNumber *) ((char *) xlrec + SizeOfBtreeDelete);
 
+	/*
+	 * Prefetch the heap buffers.
+	 */
+	for (i = 0; i  xlrec-nitems; i++)
+	{
+		/*
+		 * Identify the index tuple about to be deleted
+		 */
+		iitemid = PageGetItemId(ipage, unused[i]);
+		itup = (IndexTuple) PageGetItem(ipage, iitemid);
+
+		/*
+		 * Locate the heap page that the index tuple points at
+		 */
+		hblkno = ItemPointerGetBlockNumber((itup-t_tid));
+		XLogPrefetchBuffer(xlrec-hnode, MAIN_FORKNUM, hblkno);
+	}
+
+	/*
+	 * Read through the heap tids
+	 */
 	for (i = 0; i  xlrec-nitems; i++)
 	{
 		/*
@@ -659,6 +686,16 @@ btree_xlog_delete_get_latestRemovedXid(XLogRecord *record)
 latestRemovedXid = htupxid;
 
 			htupxid = HeapTupleHeaderGetXmax(htuphdr);
+
+			/*
+			 * Stop searching when we've found a recent xid
+			 */
+			if (TransactionIdFollowsOrEquals(htupxid,latestCompletedXid))
+			{
+UnlockReleaseBuffer(hbuffer);
+break;
+			}
+
 			if (TransactionIdFollows(htupxid, latestRemovedXid))
 latestRemovedXid = htupxid;
 		}
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 9ee2036..3ea3a40 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -342,6 +342,16 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	return buffer;
 }
 
+void
+XLogPrefetchBuffer(RelFileNode rnode, ForkNumber forknum,
+	BlockNumber blkno)
+{
+	Relation reln = CreateFakeRelcacheEntry(rnode);
+
+	reln-rd_istemp = false;
+
+	PrefetchBuffer(reln, forknum, blkno);
+}
 
 /*
  * Struct actually returned by XLogFakeRelcacheEntry, though the declared
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 5a214c8..bb23c16 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -933,6 +933,21 @@ TransactionIdIsActive(TransactionId xid)
 TransactionId
 GetOldestXmin(bool allDbs, bool ignoreVacuum)
 {
+	TransactionId result = GetCurrentOldestXmin(allDbs, ignoreVacuum);
+
+	/*
+	 * Compute the cutoff XID, being careful not to generate a permanent XID
+	 */
+	result -= vacuum_defer_cleanup_age;
+	if (!TransactionIdIsNormal(result))
+		result = FirstNormalTransactionId;
+
+	return result;
+}
+
+TransactionId
+GetCurrentOldestXmin(bool allDbs, bool ignoreVacuum)
+{
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId result;
 	int			index;
@@ -985,13 +1000,6 @@ GetOldestXmin(bool allDbs, bool ignoreVacuum)
 
 	LWLockRelease(ProcArrayLock);
 
-	/*
-	 * Compute the cutoff XID, being careful not to generate a permanent XID
-	 */
-	result -= vacuum_defer_cleanup_age;
-	if (!TransactionIdIsNormal(result))
-		result = FirstNormalTransactionId;
-
 	return result;
 }
 
diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h
index 8477f88..caa8aa3 100644
--- a/src/include/access/xlogutils.h
+++ b/src/include/access/xlogutils.h
@@ -28,6 +28,9 @@ extern void XLogTruncateRelation(RelFileNode rnode, ForkNumber forkNum,
 extern Buffer XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init);
 extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	   BlockNumber blkno, ReadBufferMode mode);
+extern void XLogPrefetchBuffer(RelFileNode rnode, ForkNumber forknum,
+		BlockNumber blkno);
+
 
 extern Relation 

Re: recovery_connections cannot start (was Re: [HACKERS] master in standby mode croaks)

2010-04-22 Thread Fujii Masao
On Fri, Apr 23, 2010 at 1:04 AM, Robert Haas robertmh...@gmail.com wrote:
 One way we could fix this is use 2 bits rather than 1 for
 XLogStandbyInfoMode.  One bit could indicate that either
 archive_mode=on or max_wal_senders0, and the second bit could
 indicate that recovery_connections=on.  If the second bit is unset, we
 could emit the existing complaint:

 recovery connections cannot start because the recovery_connections
 parameter is disabled on the WAL source server

 If the other bit is unset, then we could instead complain:

 recovery connections cannot start because archive_mode=off and
 max_wal_senders=0 on the WAL source server

 If we don't want to use two bits there, it's hard to really describe
 all the possibilities in a reasonable number of characters.  The only
 thing I can think of is to print a message and a hint:

 recovery_connections cannot start due to incorrect settings on the WAL
 source server
 HINT: make sure recovery_connections=on and either archive_mode=on or
 max_wal_senders0

 I haven't checked whether the hint would be displayed in the log on
 the standby, but presumably we could make that be the case if it's not
 already.

 I think the first way is better because it gives the user more
 specific information about what they need to fix.  Thinking about how
 each case might happen, since the default for recovery_connections is
 'on', it seems that recovery_connections=off will likely only be an
 issue if the user has explicitly turned it off.  The other case, where
 archive_mode=off and max_wal_senders=0, will likely only occur if
 someone takes a snapshot of the master without first setting up
 archiving or SR.  Both of these will probably happen relatively
 rarely, but since we're burning a whole byte for XLogStandbyInfoMode
 (plus 3 more bytes of padding?), it seems like we might as well snag
 one more bit for clarity.

 Thoughts?

I like the second choice since it's  simpler and enough for me.
But I have no objection to the first.

When we encounter the error, we would need to not only change
those parameter values but also take a fresh base backup and
restart the standby using it. The description of this required
procedure needs to be in the document or error message, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing HS/SR - 1 vs 2 performance

2010-04-22 Thread Mark Kirkwood

Erik Rijkers wrote:


This is the same behaviour (i.e. extreme slow standby) that I saw earlier (and 
which caused the
original post, btw).  In that earlier instance, the extreme slowness 
disappeared later, after many
hours maybe even days (without bouncing either primary or standby).

I have no idea what could cause this; is no one else is seeing this ?

(if I have time I'll repeat on other hardware in the weekend)

any comment is welcome...


  


I wonder if what you are seeing is perhaps due to the tables on the 
primary being almost completely cached (from the initial create) and 
those on the standby being at best partially so. That would explain why 
the standby performance catches up after a while ( when its tables are 
equivalently cached).


One way to test this is to 'pre-cache' the standby by selecting every 
row from its tables before running the pgbench test.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] why do we have rd_istemp?

2010-04-22 Thread Robert Haas
Given Relation rel, it looks to me like rel-rd_rel-relistemp will
always give the same answer as rel-rd_istemp.  So why have both?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure twophase.c (3) (testing HS/SR)

2010-04-22 Thread Erik Rijkers
On Thu, April 22, 2010 09:53, Heikki Linnakangas wrote:
 Can you still reproduce this or has some of the changes since then fixed
 it? We never quite figured out the cause..

I don't know for sure:
 Unfortunately, it does not happen always, or predictably.

The only thing that I established after that email was sent,
is that the error can also occur without the postbio package
being been installed (this has happened once).

It's a very easy test; I will probably run it a few more times.


 Erik Rijkers wrote:
 On Thu, March 4, 2010 17:00, Erik Rijkers wrote:
 in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30

 With three patches:

   new_smart_shutdown_20100201.patch
   extend_format_of_recovery_info_funcs_v4.20100303.patch
   fix-KnownAssignedXidsRemoveMany-1.patch

   pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary

 FailedAssertion, File: twophase.c, Line: 1201.


 For the record, this still happens (FailedAssertion, File: twophase.c, 
 Line: 1201.)
 (created 2010.03.13 23:49 cvs).

 Unfortunately, it does not happen always, or predictably.

 patches:
  new_smart_shutdown_20100201.patch
  extend_format_of_recovery_info_funcs_v4.20100303.patch
  (both here: 
 http://archives.postgresql.org/pgsql-hackers/2010-03/msg00446.php )

   (fix-KnownAssignedXidsRemoveMany-1.patch has been committed, I think?)


 I use commandlines like this to copy schemas across from 8.4.2 to 9.0devel:
 pg_dump -c -h /tmp -p 5432 -n myschema --no-owner --no-privileges mydb \
   | psql -1qtA -h /tmp -p 7575 -d replicas

 (the copied schemas were together 175 GB)

 As I seem to be the only one who finds this, I started looking what could be 
 unique in this
 install: and it would be postbio, which we use for its gist-indexing on 
 ranges
 (http://pgfoundry.org/projects/postbio/).  We use postbio's int_interval 
 type as a column type.
 But keep in mind that sometimes the whole dump+restore+replication completes 
 OK.


 Other installed modules are:
   contrib/btree_gist
   contrib/seg
   contrib/adminpack

 log_line_prefix = '%t %p %d %u start=%s ' # slave

 pgsql.sr_hotslave/logfile:

 2010-03-13 23:54:59 CET 15765   start=2010-03-13 23:54:59 CET LOG:  database 
 system was
 interrupted; last known up at 2010-03-13 23:54:31 CET
 cp: cannot stat
 `/var/data1/pg_stuff/dump/hotslave/replication_archive/00010001':
 No such file or directory
 2010-03-13 23:55:00 CET 15765   start=2010-03-13 23:54:59 CET LOG:  entering 
 standby mode
 2010-03-13 23:55:00 CET 15765   start=2010-03-13 23:54:59 CET LOG:  redo 
 starts at 0/120
 2010-03-13 23:55:00 CET 15765   start=2010-03-13 23:54:59 CET LOG:  
 consistent recovery state
 reached at 0/200
 2010-03-13 23:55:00 CET 15763   start=2010-03-13 23:54:59 CET LOG:  database 
 system is ready to
 accept read only connections
 TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File: 
 twophase.c, Line: 1201)
 2010-03-14 05:28:59 CET 15763   start=2010-03-13 23:54:59 CET LOG:  startup 
 process (PID 15765)
 was terminated by signal 6: Aborted
 2010-03-14 05:28:59 CET 15763   start=2010-03-13 23:54:59 CET LOG:  
 terminating any other active
 server processes


 Maybe I'll try now to setup a similar instance without postbio, to see if 
 the crash still
 occurs.

 hth,

 Erik Rijkers





 --
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_buffers documentation

2010-04-22 Thread Robert Haas
On Wed, Apr 21, 2010 at 2:54 AM, Greg Smith g...@2ndquadrant.com wrote:
 Jim Nasby wrote:

 I've also seen large shared buffer settings perform poorly outside of IO
 issues, presumably due to some kind of internal lock contention. I tried
 running 8.3 with 24G for a while, but dropped it back down to our default of
 8G after noticing some performance problems. Unfortunately I don't remember
 the exact details, let alone having a repeatable test case

 We got a report for Jignesh at Sun once that he had a benchmark workload
 where there was a clear performance wall at around 10GB of shared_buffers.
  At http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best he
 says:
 Shared Bufferpool getting better in 8.2, worth to increase it to 3GB (for
 32-bit PostgreSQL) but still
 not great to increase it more than 10GB (for 64-bit PostgreSQL)

 So you running into the same wall around the same amount just fuels the
 existing idea there's an underlying scalablity issue in there.  Nobody with
 that right hardware has put it under the light of a profiler yet as far as I
 know.

It might be interesting to see whether increasing
NUM_BUFFER_PARTITIONS, LOG2_NUM_LOCK_PARTITIONS, and
NUM_LOCK_PARTITIONS alleviates this problem at all.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we have rd_istemp?

2010-04-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Given Relation rel, it looks to me like rel-rd_rel-relistemp will
 always give the same answer as rel-rd_istemp.  So why have both?

Might be historical --- relistemp is pretty new.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers