Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-12 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
My first thought is that the cycle_ctr just adds extra complexity. The 
canceled-flag really is the key in Takahiro-san's patch, so we don't 
need the cycle_ctr anymore.


We don't have to have it in the sense of the code not working without it,
but it probably pays for itself by eliminating useless fsyncs.  The
overhead for it in my proposed implementation is darn near zero in the
non-error case.  Also, Takahiro-san mentioned at one point that he was
concerned to avoid useless fsyncs because of some property of the LDC
patch --- I wasn't too clear on what, but maybe he can explain.


Ok. Perhaps we should not use the canceled-flag but just remove the 
entry from pendingOpsTable like we used to when mdsync_in_progress isn't 
set. We might otherwise accumulate a lot of canceled entries in the hash 
table if checkpoint interval is long and relations are created and 
dropped as part of normal operation.


I think there's one little bug in the patch:

1. AbsorbFsyncRequests is called. A FORGET message is received, and an 
entry in the hash table is marked as canceled
2. Another relation with the same relfilenode is created. This can 
happen after OID wrap-around
3. RememberFsyncRequest is called for the new relation. The old entry is 
still in the hash table, marked with the canceled-flag, so it's not touched.


The fsync request for the new relation is masked by the old canceled 
entry. The trivial fix is to always clear the flag on step 3:


--- md.c2007-04-11 08:18:08.0 +0100
+++ md.c.new2007-04-12 09:21:00.0 +0100
@@ -1161,9 +1161,9 @@
 found);
if (!found) /* new entry, 
so initialize it */

{
-   entry-canceled = false;
entry-cycle_ctr = mdsync_cycle_ctr;
}
+   entry-canceled = false;
/*
 * NB: it's intentional that we don't change cycle_ctr 
if the entry
 * already exists.  The fsync request must be treated 
as old, even



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-12 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I believe Itagaki-san's motivation for tackling this in the LDC patch 
 was the fact that it can fsync the same file many times, and in the 
 worst case go to an endless loop, and adding delays inside the loop 
 makes it much more likely. After that is fixed, I doubt any of the 
 optimizations of trying to avoid extra fsyncs make any difference in 
 real applications, and we should just keep it simple, especially if we 
 back-patch it.

I looked at the dynahash code and noticed that new entries are attached
to the *end* of their hashtable chain.  While this maybe should be
changed to link them at the front, the implication at the moment is that
without a cycle counter it would still be possible to loop indefinitely
because we'd continue to revisit the same file(s) after removing their
hashtable entries.  I think you'd need a constant stream of requests for
more than one file falling into the same hash chain, but it certainly
seems like a potential risk.  I'd prefer a solution that adheres to the
dynahash API's statement that it's unspecified whether newly-added
entries will be visited by hash_seq_search, and will in fact not loop
even if they always are visited.

 That said, I'm getting tired of this piece of code :).

Me too.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-11 Thread Heikki Linnakangas

Tom Lane wrote:

I wrote:

Actually, on second look I think the key idea here is Takahiro-san's
introduction of a cancellation flag in the hashtable entries, to
replace the cases where AbsorbFsyncRequests can try to delete entries.
What that means is mdsync() doesn't need an outer retry loop at all:


I fooled around with this idea and came up with the attached patch.
It seems to do what's intended but could do with more eyeballs and
testing before committing.  Comments please?


I'm traveling today, but I'll take a closer look at it tomorrow morning. 
My first thought is that the cycle_ctr just adds extra complexity. The 
canceled-flag really is the key in Takahiro-san's patch, so we don't 
need the cycle_ctr anymore.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-11 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 My first thought is that the cycle_ctr just adds extra complexity. The 
 canceled-flag really is the key in Takahiro-san's patch, so we don't 
 need the cycle_ctr anymore.

We don't have to have it in the sense of the code not working without it,
but it probably pays for itself by eliminating useless fsyncs.  The
overhead for it in my proposed implementation is darn near zero in the
non-error case.  Also, Takahiro-san mentioned at one point that he was
concerned to avoid useless fsyncs because of some property of the LDC
patch --- I wasn't too clear on what, but maybe he can explain.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-10 Thread Tom Lane
I wrote:
 This patch looks fairly sane to me; I have a few small gripes about
 coding style but that can be fixed while applying.  Heikki, you were
 concerned about the cycle-ID idea; do you have any objection to this
 patch?

Actually, on second look I think the key idea here is Takahiro-san's
introduction of a cancellation flag in the hashtable entries, to
replace the cases where AbsorbFsyncRequests can try to delete entries.

What that means is mdsync() doesn't need an outer retry loop at all:
the periodic AbsorbFsyncRequests calls are not a hazard, and retry of
FileSync failures can be handled as an inner loop on the single failing
table entry.  (We can make the failure counter a local variable, too,
instead of needing space in every hashtable entry.)

And with that change, it's no longer possible for an incoming stream
of fsync requests to keep mdsync from terminating.  It might fsync
more than it really needs to, but it won't repeat itself, and it must
reach the end of the hashtable eventually.  So we don't actually need
the cycle counter at all.

It might be worth having the cycle counter anyway just to avoid doing
useless fsync work.  I'm not sure about this.  If we have a cycle
counter of say 32 bits, then it's theoretically possible for an fsync
to fail 2^32 consecutive times and then be skipped on the next try,
allowing a checkpoint to succeed that should not have.  We can fix that
with a few more lines of logic to detect a wrapped-around value, but is
it worth the trouble?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-10 Thread Tom Lane
I wrote:
 Actually, on second look I think the key idea here is Takahiro-san's
 introduction of a cancellation flag in the hashtable entries, to
 replace the cases where AbsorbFsyncRequests can try to delete entries.
 What that means is mdsync() doesn't need an outer retry loop at all:

I fooled around with this idea and came up with the attached patch.
It seems to do what's intended but could do with more eyeballs and
testing before committing.  Comments please?

(Note: I ignored my own advice not to reindent.  Sorry ...)

regards, tom lane

*** src/backend/storage/smgr/md.c.orig  Wed Jan 17 11:25:01 2007
--- src/backend/storage/smgr/md.c   Tue Apr 10 16:38:27 2007
***
*** 122,135 
BlockNumber segno;  /* which segment */
  } PendingOperationTag;
  
  typedef struct
  {
PendingOperationTag tag;/* hash table key (must be first!) */
!   int failures;   /* number of failed 
attempts to fsync */
  } PendingOperationEntry;
  
  static HTAB *pendingOpsTable = NULL;
  
  
  typedef enum  /* behavior for mdopen  
_mdfd_getseg */
  {
--- 122,140 
BlockNumber segno;  /* which segment */
  } PendingOperationTag;
  
+ typedef uint16 CycleCtr;  /* can be any convenient integer size */
+ 
  typedef struct
  {
PendingOperationTag tag;/* hash table key (must be first!) */
!   boolcanceled;   /* T = request canceled, not 
yet removed */
!   CycleCtrcycle_ctr;  /* mdsync_cycle_ctr when 
request was made */
  } PendingOperationEntry;
  
  static HTAB *pendingOpsTable = NULL;
  
+ static CycleCtr mdsync_cycle_ctr = 0;
+ 
  
  typedef enum  /* behavior for mdopen  
_mdfd_getseg */
  {
***
*** 856,926 
  
  /*
   *mdsync() -- Sync previous writes to stable storage.
-  *
-  * This is only called during checkpoints, and checkpoints should only
-  * occur in processes that have created a pendingOpsTable.
   */
  void
  mdsync(void)
  {
!   boolneed_retry;
  
if (!pendingOpsTable)
elog(ERROR, cannot sync without a pendingOpsTable);
  
/*
!* The fsync table could contain requests to fsync relations that have
!* been deleted (unlinked) by the time we get to them.  Rather than
!* just hoping an ENOENT (or EACCES on Windows) error can be ignored,
!* what we will do is retry the whole process after absorbing fsync
!* request messages again.  Since mdunlink() queues a revoke message
!* before actually unlinking, the fsync request is guaranteed to be gone
!* the second time if it really was this case.  DROP DATABASE likewise
!* has to tell us to forget fsync requests before it starts deletions.
 */
!   do {
!   HASH_SEQ_STATUS hstat;
!   PendingOperationEntry *entry;
!   int absorb_counter;
  
!   need_retry = false;
  
/*
!* If we are in the bgwriter, the sync had better include all 
fsync
!* requests that were queued by backends before the checkpoint 
REDO
!* point was determined. We go that a little better by 
accepting all
!* requests queued up to the point where we start fsync'ing.
 */
!   AbsorbFsyncRequests();
  
!   absorb_counter = FSYNCS_PER_ABSORB;
!   hash_seq_init(hstat, pendingOpsTable);
!   while ((entry = (PendingOperationEntry *) 
hash_seq_search(hstat)) != NULL)
{
/*
!* If fsync is off then we don't have to bother opening 
the file
!* at all.  (We delay checking until this point so that 
changing
!* fsync on the fly behaves sensibly.)
 */
!   if (enableFsync)
{
SMgrRelation reln;
MdfdVec*seg;
  
/*
-* If in bgwriter, we want to absorb pending 
requests every so
-* often to prevent overflow of the fsync 
request queue.  This
-* could result in deleting the current entry 
out from under
-* our hashtable scan, so the procedure is to 
fall out of the
-* scan and start over from the top of the 
function.
-*/
-   if (--absorb_counter = 0)
-   {
-   need_retry = true;
-   break;
-

Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-10 Thread Tom Lane
I wrote:
 I fooled around with this idea and came up with the attached patch.
 It seems to do what's intended but could do with more eyeballs and
 testing before committing.  Comments please?

Earlier I said that I didn't want to back-patch this change, but on
looking at the CVS history I'm reconsidering.  The performance problem
originates from the decision some time ago to do an AbsorbFsyncRequests
every so often during the mdsync loop; without that, and assuming no
actual failures, there isn't any absorption of new requests before
mdsync can complete.  Originally that code only existed in 8.2.x, but
very recently we back-patched it into 8.1.x as part of fixing the
file-deletion-on-Windows problem.  This means that 8.1.x users could
see a performance degradation upon updating to 8.1.8 from prior
subreleases, which wouldn't make them happy.

So I'm now thinking we ought to back-patch into 8.2.x and 8.1.x,
but of course that makes it even more urgent that we test the patch
thoroughly.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Heikki Linnakangas
Here's a fix for the problem that on a busy system, mdsync never 
finishes. See the original problem description on hackers:

http://archives.postgresql.org/pgsql-hackers/2007-04/msg00259.php

The solution is taken from ITAGAKI Takahiro's Load Distributed 
Checkpoint patch. At the beginning of mdsync, the pendingOpsTable is 
copied to a linked list, and that list is then processed until it's empty.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/storage/smgr/md.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/smgr/md.c,v
retrieving revision 1.127
diff -c -r1.127 md.c
*** src/backend/storage/smgr/md.c	17 Jan 2007 16:25:01 -	1.127
--- src/backend/storage/smgr/md.c	5 Apr 2007 10:43:56 -
***
*** 863,989 
  void
  mdsync(void)
  {
! 	bool		need_retry;
  
  	if (!pendingOpsTable)
  		elog(ERROR, cannot sync without a pendingOpsTable);
  
  	/*
! 	 * The fsync table could contain requests to fsync relations that have
! 	 * been deleted (unlinked) by the time we get to them.  Rather than
! 	 * just hoping an ENOENT (or EACCES on Windows) error can be ignored,
! 	 * what we will do is retry the whole process after absorbing fsync
! 	 * request messages again.  Since mdunlink() queues a revoke message
! 	 * before actually unlinking, the fsync request is guaranteed to be gone
! 	 * the second time if it really was this case.  DROP DATABASE likewise
! 	 * has to tell us to forget fsync requests before it starts deletions.
  	 */
! 	do {
! 		HASH_SEQ_STATUS hstat;
! 		PendingOperationEntry *entry;
! 		int			absorb_counter;
  
! 		need_retry = false;
  
  		/*
! 		 * If we are in the bgwriter, the sync had better include all fsync
! 		 * requests that were queued by backends before the checkpoint REDO
! 		 * point was determined. We go that a little better by accepting all
! 		 * requests queued up to the point where we start fsync'ing.
  		 */
  		AbsorbFsyncRequests();
  
! 		absorb_counter = FSYNCS_PER_ABSORB;
! 		hash_seq_init(hstat, pendingOpsTable);
! 		while ((entry = (PendingOperationEntry *) hash_seq_search(hstat)) != NULL)
  		{
! 			/*
! 			 * If fsync is off then we don't have to bother opening the file
! 			 * at all.  (We delay checking until this point so that changing
! 			 * fsync on the fly behaves sensibly.)
! 			 */
! 			if (enableFsync)
! 			{
! SMgrRelation reln;
! MdfdVec*seg;
  
! /*
!  * If in bgwriter, we want to absorb pending requests every so
!  * often to prevent overflow of the fsync request queue.  This
!  * could result in deleting the current entry out from under
!  * our hashtable scan, so the procedure is to fall out of the
!  * scan and start over from the top of the function.
!  */
! if (--absorb_counter = 0)
! {
! 	need_retry = true;
! 	break;
! }
  
! /*
!  * Find or create an smgr hash entry for this relation. This
!  * may seem a bit unclean -- md calling smgr?  But it's really
!  * the best solution.  It ensures that the open file reference
!  * isn't permanently leaked if we get an error here. (You may
!  * say but an unreferenced SMgrRelation is still a leak! Not
!  * really, because the only case in which a checkpoint is done
!  * by a process that isn't about to shut down is in the
!  * bgwriter, and it will periodically do smgrcloseall(). This
!  * fact justifies our not closing the reln in the success path
!  * either, which is a good thing since in non-bgwriter cases
!  * we couldn't safely do that.)  Furthermore, in many cases
!  * the relation will have been dirtied through this same smgr
!  * relation, and so we can save a file open/close cycle.
!  */
! reln = smgropen(entry-tag.rnode);
! 
! /*
!  * It is possible that the relation has been dropped or
!  * truncated since the fsync request was entered.  Therefore,
!  * allow ENOENT, but only if we didn't fail once already on
!  * this file.  This applies both during _mdfd_getseg() and
!  * during FileSync, since fd.c might have closed the file
!  * behind our back.
!  */
! seg = _mdfd_getseg(reln,
!    entry-tag.segno * ((BlockNumber) RELSEG_SIZE),
!    false, EXTENSION_RETURN_NULL);
! if (seg == NULL ||
! 	FileSync(seg-mdfd_vfd)  0)
! {
! 	/*
! 	 * XXX is there any point in allowing more than one try?
! 	 * Don't see one at the moment, but easy to change the
! 	 * test here if so.
! 	 */
! 	if (!FILE_POSSIBLY_DELETED(errno) ||
! 		++(entry-failures)  1)
! 		ereport(ERROR,
! (errcode_for_file_access(),
!  errmsg(could not fsync segment %u of relation %u/%u/%u: %m,
! 		entry-tag.segno,
! 		entry-tag.rnode.spcNode,
! 		entry-tag.rnode.dbNode,
! 		entry-tag.rnode.relNode)));
! 	else
! 		

Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Alvaro Herrera
While skimming over this I was baffled a bit about the usage of
(InvalidBlockNumber - 1) as value for FORGET_DATABASE_FSYNC.  It took me
a while to realize that this code is abusing the BlockNumber typedef to
pass around *segment* numbers, so the useful range is much smaller and
thus the usage of that value is not a problem in practice.

I wonder if it wouldn't be better to clean this up by creating a
separate typedef for segment numbers, with its own special values?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
Here's a fix for the problem that on a busy system, mdsync never 
finishes. See the original problem description on hackers:

http://archives.postgresql.org/pgsql-hackers/2007-04/msg00259.php

The solution is taken from ITAGAKI Takahiro's Load Distributed 
Checkpoint patch. At the beginning of mdsync, the pendingOpsTable is 
copied to a linked list, and that list is then processed until it's empty.


Here's an updated patch, the one I sent earlier is broken. I ignored the 
return value of list_delete_cell.


We could just review and apply ITAGAKI's patch as it is instead of this 
snippet of it, but because that can take some time I'd like to see this 
applied before that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/storage/smgr/md.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/smgr/md.c,v
retrieving revision 1.127
diff -c -r1.127 md.c
*** src/backend/storage/smgr/md.c	17 Jan 2007 16:25:01 -	1.127
--- src/backend/storage/smgr/md.c	5 Apr 2007 16:09:31 -
***
*** 863,989 
  void
  mdsync(void)
  {
! 	bool		need_retry;
  
  	if (!pendingOpsTable)
  		elog(ERROR, cannot sync without a pendingOpsTable);
  
  	/*
! 	 * The fsync table could contain requests to fsync relations that have
! 	 * been deleted (unlinked) by the time we get to them.  Rather than
! 	 * just hoping an ENOENT (or EACCES on Windows) error can be ignored,
! 	 * what we will do is retry the whole process after absorbing fsync
! 	 * request messages again.  Since mdunlink() queues a revoke message
! 	 * before actually unlinking, the fsync request is guaranteed to be gone
! 	 * the second time if it really was this case.  DROP DATABASE likewise
! 	 * has to tell us to forget fsync requests before it starts deletions.
  	 */
! 	do {
! 		HASH_SEQ_STATUS hstat;
! 		PendingOperationEntry *entry;
! 		int			absorb_counter;
  
! 		need_retry = false;
  
  		/*
! 		 * If we are in the bgwriter, the sync had better include all fsync
! 		 * requests that were queued by backends before the checkpoint REDO
! 		 * point was determined. We go that a little better by accepting all
! 		 * requests queued up to the point where we start fsync'ing.
  		 */
  		AbsorbFsyncRequests();
  
! 		absorb_counter = FSYNCS_PER_ABSORB;
! 		hash_seq_init(hstat, pendingOpsTable);
! 		while ((entry = (PendingOperationEntry *) hash_seq_search(hstat)) != NULL)
  		{
! 			/*
! 			 * If fsync is off then we don't have to bother opening the file
! 			 * at all.  (We delay checking until this point so that changing
! 			 * fsync on the fly behaves sensibly.)
! 			 */
! 			if (enableFsync)
! 			{
! SMgrRelation reln;
! MdfdVec*seg;
  
! /*
!  * If in bgwriter, we want to absorb pending requests every so
!  * often to prevent overflow of the fsync request queue.  This
!  * could result in deleting the current entry out from under
!  * our hashtable scan, so the procedure is to fall out of the
!  * scan and start over from the top of the function.
!  */
! if (--absorb_counter = 0)
! {
! 	need_retry = true;
! 	break;
! }
  
! /*
!  * Find or create an smgr hash entry for this relation. This
!  * may seem a bit unclean -- md calling smgr?  But it's really
!  * the best solution.  It ensures that the open file reference
!  * isn't permanently leaked if we get an error here. (You may
!  * say but an unreferenced SMgrRelation is still a leak! Not
!  * really, because the only case in which a checkpoint is done
!  * by a process that isn't about to shut down is in the
!  * bgwriter, and it will periodically do smgrcloseall(). This
!  * fact justifies our not closing the reln in the success path
!  * either, which is a good thing since in non-bgwriter cases
!  * we couldn't safely do that.)  Furthermore, in many cases
!  * the relation will have been dirtied through this same smgr
!  * relation, and so we can save a file open/close cycle.
!  */
! reln = smgropen(entry-tag.rnode);
! 
! /*
!  * It is possible that the relation has been dropped or
!  * truncated since the fsync request was entered.  Therefore,
!  * allow ENOENT, but only if we didn't fail once already on
!  * this file.  This applies both during _mdfd_getseg() and
!  * during FileSync, since fd.c might have closed the file
!  * behind our back.
!  */
! seg = _mdfd_getseg(reln,
!    entry-tag.segno * ((BlockNumber) RELSEG_SIZE),
!    false, EXTENSION_RETURN_NULL);
! if (seg == NULL ||
! 	FileSync(seg-mdfd_vfd)  0)
! {
! 	/*
! 	 * XXX is there any point in allowing more than one try?
! 	 * Don't see one at the moment, but easy to change the
! 	 * test here if so.
! 	 */
! 	if (!FILE_POSSIBLY_DELETED(errno) ||
! 		++(entry-failures)  

Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Here's a fix for the problem that on a busy system, mdsync never 
 finishes. See the original problem description on hackers:

This leaks memory, no?  (list_delete_cell only deletes the ListCell.)
But I dislike copying the table entries anyway, see comment on -hackers.

BTW, it's very hard to see what a patch like this is actually changing.
It might be better to submit a version that doesn't reindent the chunks
of code you aren't changing, so as to reduce the visual size of the
diff.  A note to the committer to reindent the whole function is
sufficient (or if he forgets, pg_indent will fix it eventually).

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I wonder if it wouldn't be better to clean this up by creating a
 separate typedef for segment numbers, with its own special values?

Probably.  I remember having thought about it when I put in the
FORGET_DATABASE_FSYNC hack.  I think I didn't do it because I needed
to backpatch and so I wanted a minimal-size patch.  Feel free to do it
in HEAD ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
Here's a fix for the problem that on a busy system, mdsync never 
finishes. See the original problem description on hackers:


This leaks memory, no?  (list_delete_cell only deletes the ListCell.)


Oh, I just spotted another problem with it and posted an updated patch, 
but I missed that.



But I dislike copying the table entries anyway, see comment on -hackers.


Frankly the cycle id idea sounds more ugly and fragile to me. You'll 
need to do multiple scans of the hash table that way, starting from top 
every time you call AbsorbFsyncRequests (like we do know). But whatever...



BTW, it's very hard to see what a patch like this is actually changing.
It might be better to submit a version that doesn't reindent the chunks
of code you aren't changing, so as to reduce the visual size of the
diff.  A note to the committer to reindent the whole function is
sufficient (or if he forgets, pg_indent will fix it eventually).


Ok, will do that. Or would you like to just take over from here?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 But I dislike copying the table entries anyway, see comment on -hackers.

 Frankly the cycle id idea sounds more ugly and fragile to me. You'll 
 need to do multiple scans of the hash table that way, starting from top 
 every time you call AbsorbFsyncRequests (like we do know).

How so?  You just ignore entries whose cycleid is too large.  You'd have
to be careful about wraparound in the comparisons, but that's not hard
to deal with.  Also, AFAICS you still have the retry problem (and an
even bigger memory leak problem) with this coding --- the to-do list
doesn't eliminate the issue of correct handling of a failure.

 Ok, will do that. Or would you like to just take over from here?

No, I'm up to my ears in varlena.  You're the one in a position to test
this, anyway.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Tom Lane wrote:

But I dislike copying the table entries anyway, see comment on -hackers.


Frankly the cycle id idea sounds more ugly and fragile to me. You'll 
need to do multiple scans of the hash table that way, starting from top 
every time you call AbsorbFsyncRequests (like we do know).


How so?  You just ignore entries whose cycleid is too large.  You'd have
to be careful about wraparound in the comparisons, but that's not hard
to deal with.  Also, AFAICS you still have the retry problem (and an
even bigger memory leak problem) with this coding --- the to-do list
doesn't eliminate the issue of correct handling of a failure.


You have to start the hash_seq_search from scratch after each call to 
AbsorbFsyncRequests because it can remove entries, including the one the 
scan is stopped on.


I think the failure handling is correct in the to-do list approach, 
when an entry is read from the list, it's checked that the entry hasn't 
been removed from the hash table. Actually there was a bug in the 
original LDC patch in the failure handling: it replaced the per-entry 
failures-counter with a local retry_counter variable, but it wasn't 
cleared after a successful write which would lead to bogus ERRORs when 
multiple relations are dropped during the mdsync. I kept the original 
per-entry counter, though the local variable approach could be made to work.


The memory leak obviously needs to be fixed, but that's just a matter of 
adding a pfree.



Ok, will do that. Or would you like to just take over from here?


No, I'm up to my ears in varlena.  You're the one in a position to test
this, anyway.


Ok.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I think the failure handling is correct in the to-do list approach, 
 when an entry is read from the list, it's checked that the entry hasn't 
 been removed from the hash table. Actually there was a bug in the 
 original LDC patch in the failure handling: it replaced the per-entry 
 failures-counter with a local retry_counter variable, but it wasn't 
 cleared after a successful write which would lead to bogus ERRORs when 
 multiple relations are dropped during the mdsync. I kept the original 
 per-entry counter, though the local variable approach could be made to work.

Yeah.  One of the things that bothered me about the patch was that it
would be easy to mess up by updating state in the copied entry instead
of the real info in the hashtable.  It would be clearer what's
happening if the to-do list contains only the lookup keys and not the
whole struct.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Simon Riggs
On Thu, 2007-04-05 at 17:14 +0100, Heikki Linnakangas wrote:

 We could just review and apply ITAGAKI's patch as it is instead of
 this snippet of it, but because that can take some time I'd like to
 see this applied before that. 

I think we are just beginning to understand the quality of Itagaki's
thinking.

We should give him a chance to interact on this and if there are parts
of his patch that we want, then it should be him that does it. I'm not
sure that carving the good bits off each others patches is likely to
help teamwork in the long term. At very least he deserves much credit
for his farsighted work.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Fix mdsync never-ending loop problem

2007-04-05 Thread Heikki Linnakangas

Simon Riggs wrote:

On Thu, 2007-04-05 at 17:14 +0100, Heikki Linnakangas wrote:


We could just review and apply ITAGAKI's patch as it is instead of
this snippet of it, but because that can take some time I'd like to
see this applied before that. 


I think we are just beginning to understand the quality of Itagaki's
thinking.

We should give him a chance to interact on this and if there are parts
of his patch that we want, then it should be him that does it. 


Itagaki, would you like to take a stab at this?


I'm not
sure that carving the good bits off each others patches is likely to
help teamwork in the long term. At very least he deserves much credit
for his farsighted work.


Oh sure! Thank you for your efforts, Itagaki!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org