Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)

2012-11-27 Thread Heikki Linnakangas

On 26.11.2012 14:53, Amit Kapila wrote:

On Friday, November 23, 2012 7:03 PM Heikki Linnakangas

This is what I came up with. It adds a new function, OpenFile, that
returns a raw file descriptor like BasicOpenFile, but the file
descriptor is associated with the current subtransaction and
automatically closed at end-of-xact, in the same way that AllocateFile
and AllocateDir work. In other words, OpenFile is to open() what
AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw
fd and it's solely the caller's responsibility to close it, but many of
the places that used to call BasicOpenFile now use the safer OpenFile
function instead.

This patch plugs three existing fd (or virtual fd) leaks:

1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile 2.
XLogFileLinit() - fixed by adding close() calls to the error cases.
Can't use OpenFile here because the fd is supposed to persist over
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenFile instead of
PathNameOpenFile.


I have gone through the patch and find it okay except for one minor
suggestion
1. Can we put below log in OpenFile as well
+DO_DB(elog(LOG, CloseFile: Allocated %d, numAllocatedDescs));


Thanks. Added that and committed.

I didn't dare to backpatch this, even though it could be fairly easily 
backpatched. The leaks exist in older versions too, but since they're 
extremely rare (zero complaints from the field and it's been like that 
forever), I didn't want to take the risk. Maybe later, after this has 
had more testing in master.



One thing I'm not too fond of is the naming. I'm calling the new
functions OpenFile and CloseFile. There's some danger of confusion
there, as the function to close a virtual file opened with
PathNameOpenFile is called FileClose. OpenFile is really the same kind
of operation as AllocateFile and AllocateDir, but returns an unbuffered
fd. So it would be nice if it was called AllocateSomething, too. But
AllocateFile is already taken. And I don't much like the Allocate*
naming for these anyway, you really would expect the name to contain
open.


OpenFileInTrans
OpenTransactionAwareFile

In anycase OpenFile is also okay.


I ended up calling the functions OpenTransientFile and 
CloseTransientFile. Windows has a library function called OpenFile, so 
that was a pretty bad choice after all.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)

2012-11-26 Thread Amit Kapila
On Friday, November 23, 2012 7:03 PM Heikki Linnakangas
 On 15.11.2012 17:16, Heikki Linnakangas wrote:
  On 15.11.2012 16:55, Tom Lane wrote:
  Heikki Linnakangashlinnakan...@vmware.com writes:
  This is a fairly general issue, actually. Looking around, I can see
  at least two similar cases in existing code, with BasicOpenFile,
  where we will leak file descriptors on error:
 
  Um, don't we automatically clean those up during transaction abort?
 
  Not the ones allocated with PathNameOpenFile or BasicOpenFile. Files
  allocated with AllocateFile() and OpenTemporaryFile() are cleaned up
  at abort.
 
  If we don't, we ought to think about that, not about cluttering
  calling code with certain-to-be-inadequate cleanup in error cases.
 
  Agreed. Cleaning up at end-of-xact won't help walsender or other
  non-backend processes, though, because they don't do transactions. But
  a top-level ResourceOwner that's reset in the sigsetjmp() cleanup
  routine would work.
 
 This is what I came up with. It adds a new function, OpenFile, that
 returns a raw file descriptor like BasicOpenFile, but the file
 descriptor is associated with the current subtransaction and
 automatically closed at end-of-xact, in the same way that AllocateFile
 and AllocateDir work. In other words, OpenFile is to open() what
 AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw
 fd and it's solely the caller's responsibility to close it, but many of
 the places that used to call BasicOpenFile now use the safer OpenFile
 function instead.
 
 This patch plugs three existing fd (or virtual fd) leaks:
 
 1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile 2.
 XLogFileLinit() - fixed by adding close() calls to the error cases.
 Can't use OpenFile here because the fd is supposed to persist over
 transaction boundaries.
 3. lo_import/lo_export - fixed by using OpenFile instead of
 PathNameOpenFile.

I have gone through the patch and find it okay except for one minor
suggestion
1. Can we put below log in OpenFile as well 
+DO_DB(elog(LOG, CloseFile: Allocated %d, numAllocatedDescs));
 
 
 One thing I'm not too fond of is the naming. I'm calling the new
 functions OpenFile and CloseFile. There's some danger of confusion
 there, as the function to close a virtual file opened with
 PathNameOpenFile is called FileClose. OpenFile is really the same kind
 of operation as AllocateFile and AllocateDir, but returns an unbuffered
 fd. So it would be nice if it was called AllocateSomething, too. But
 AllocateFile is already taken. And I don't much like the Allocate*
 naming for these anyway, you really would expect the name to contain
 open.

OpenFileInTrans
OpenTransactionAwareFile

In anycase OpenFile is also okay.


I have one usecase in feature (SQL Command to edit postgresql.conf) very
similar to OpenFile/CloseFile, but I want that when CloseFile is called from
abort, it should remove(unlink) the file as well and during open it has to
retry few times if open is not success.
I have following options: 
1. Extend OpenFile/CloseFile or PathNameOpenFile
2. Write new functions similar to OpenFile/CloseFile, something like
OpenConfLockFile/CloseConfLockFile
3. Use OpenFile/CloseFile  and handle my specific case with PG_TRY ..
PG_CATCH

Any suggestions?

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)

2012-11-26 Thread Heikki Linnakangas

On 26.11.2012 14:53, Amit Kapila wrote:

I have one usecase in feature (SQL Command to edit postgresql.conf) very
similar to OpenFile/CloseFile, but I want that when CloseFile is called from
abort, it should remove(unlink) the file as well and during open it has to
retry few times if open is not success.
I have following options:
1. Extend OpenFile/CloseFile or PathNameOpenFile
2. Write new functions similar to OpenFile/CloseFile, something like
OpenConfLockFile/CloseConfLockFile
3. Use OpenFile/CloseFile  and handle my specific case with PG_TRY ..
PG_CATCH

Any suggestions?


Hmm, if it's just for locking purposes, how about using a lwlock or a 
heavy-weight lock instead?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)

2012-11-26 Thread Amit Kapila
On Monday, November 26, 2012 7:01 PM Heikki Linnakangas wrote:
 On 26.11.2012 14:53, Amit Kapila wrote:
  I have one usecase in feature (SQL Command to edit postgresql.conf)
 very
  similar to OpenFile/CloseFile, but I want that when CloseFile is
 called from
  abort, it should remove(unlink) the file as well and during open it
 has to
  retry few times if open is not success.
  I have following options:
  1. Extend OpenFile/CloseFile or PathNameOpenFile
  2. Write new functions similar to OpenFile/CloseFile, something like
  OpenConfLockFile/CloseConfLockFile
  3. Use OpenFile/CloseFile  and handle my specific case with PG_TRY ..
  PG_CATCH
 
  Any suggestions?
 
 Hmm, if it's just for locking purposes, how about using a lwlock or a
 heavy-weight lock instead?

Its not only for lock, the main idea is that we create temp file and write
modified configuration in that temp file.
In end if it's success, then we rename temp file to .conf file but if it
error out then at abort we need to delete temp file.

So in short, main point is to close/rename the file in case of success (at
end of command) and remove in case of abort.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)

2012-11-26 Thread Tom Lane
Amit Kapila amit.kap...@huawei.com writes:
 On Monday, November 26, 2012 7:01 PM Heikki Linnakangas wrote:
 Hmm, if it's just for locking purposes, how about using a lwlock or a
 heavy-weight lock instead?

 Its not only for lock, the main idea is that we create temp file and write
 modified configuration in that temp file.
 In end if it's success, then we rename temp file to .conf file but if it
 error out then at abort we need to delete temp file.

 So in short, main point is to close/rename the file in case of success (at
 end of command) and remove in case of abort.

I'd go with the TRY/CATCH solution.  It would be worth extending the
fd.c infrastructure if there were multiple users of the feature, but
there are not, nor do I see likely new candidates on the horizon.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Plugging fd leaks (was Re: [HACKERS] Switching timeline over streaming replication)

2012-11-23 Thread Heikki Linnakangas

On 15.11.2012 17:16, Heikki Linnakangas wrote:

On 15.11.2012 16:55, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com writes:

This is a fairly general issue, actually. Looking around, I can see at
least two similar cases in existing code, with BasicOpenFile, where we
will leak file descriptors on error:


Um, don't we automatically clean those up during transaction abort?


Not the ones allocated with PathNameOpenFile or BasicOpenFile. Files
allocated with AllocateFile() and OpenTemporaryFile() are cleaned up at
abort.


If we don't, we ought to think about that, not about cluttering calling
code with certain-to-be-inadequate cleanup in error cases.


Agreed. Cleaning up at end-of-xact won't help walsender or other
non-backend processes, though, because they don't do transactions. But a
top-level ResourceOwner that's reset in the sigsetjmp() cleanup routine
would work.


This is what I came up with. It adds a new function, OpenFile, that 
returns a raw file descriptor like BasicOpenFile, but the file 
descriptor is associated with the current subtransaction and 
automatically closed at end-of-xact, in the same way that AllocateFile 
and AllocateDir work. In other words, OpenFile is to open() what 
AllocateFile is to fopen(). BasicOpenFile is unchanged, it returns a raw 
fd and it's solely the caller's responsibility to close it, but many of 
the places that used to call BasicOpenFile now use the safer OpenFile 
function instead.


This patch plugs three existing fd (or virtual fd) leaks:

1. copy_file() - fixed by by using OpenFile instead of BasicOpenFile
2. XLogFileLinit() - fixed by adding close() calls to the error cases. 
Can't use OpenFile here because the fd is supposed to persist over 
transaction boundaries.
3. lo_import/lo_export - fixed by using OpenFile instead of 
PathNameOpenFile.


In addition, this replaces many BasicOpenFile() calls with OpenFile() 
that were not leaking, because the code meticulously closed the file on 
error. That wasn't strictly necessary, but IMHO it's good for robustness.


One thing I'm not too fond of is the naming. I'm calling the new 
functions OpenFile and CloseFile. There's some danger of confusion 
there, as the function to close a virtual file opened with 
PathNameOpenFile is called FileClose. OpenFile is really the same kind 
of operation as AllocateFile and AllocateDir, but returns an unbuffered 
fd. So it would be nice if it was called AllocateSomething, too. But 
AllocateFile is already taken. And I don't much like the Allocate* 
naming for these anyway, you really would expect the name to contain open.


Do we want to backpatch this? We've had zero complaints, but this seems 
fairly safe to backpatch, and at least the leak in copy_file() can be 
quite annoying. If you run out of disk space in CREATE DATABASE, the 
target file is kept open even though it's deleted, so the space isn't 
reclaimed until you disconnect.


- Heikki
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index dd69c23..cd60dd8 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -531,7 +531,7 @@ SlruInternalWritePage(SlruCtl ctl, int slotno, SlruFlush fdata)
 		int			i;
 
 		for (i = 0; i  fdata-num_files; i++)
-			close(fdata-fd[i]);
+			CloseFile(fdata-fd[i]);
 	}
 
 	/* Re-acquire control lock and update page state */
@@ -593,7 +593,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	 * SlruPhysicalWritePage).	Hence, if we are InRecovery, allow the case
 	 * where the file doesn't exist, and return zeroes instead.
 	 */
-	fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
+	fd = OpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
 	if (fd  0)
 	{
 		if (errno != ENOENT || !InRecovery)
@@ -614,7 +614,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	{
 		slru_errcause = SLRU_SEEK_FAILED;
 		slru_errno = errno;
-		close(fd);
+		CloseFile(fd);
 		return false;
 	}
 
@@ -623,11 +623,11 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	{
 		slru_errcause = SLRU_READ_FAILED;
 		slru_errno = errno;
-		close(fd);
+		CloseFile(fd);
 		return false;
 	}
 
-	if (close(fd))
+	if (CloseFile(fd))
 	{
 		slru_errcause = SLRU_CLOSE_FAILED;
 		slru_errno = errno;
@@ -740,7 +740,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		 * don't use O_EXCL or O_TRUNC or anything like that.
 		 */
 		SlruFileName(ctl, path, segno);
-		fd = BasicOpenFile(path, O_RDWR | O_CREAT | PG_BINARY,
+		fd = OpenFile(path, O_RDWR | O_CREAT | PG_BINARY,
 		   S_IRUSR | S_IWUSR);
 		if (fd  0)
 		{
@@ -773,7 +773,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		slru_errcause = SLRU_SEEK_FAILED;
 		slru_errno = errno;
 		if (!fdata)
-			close(fd);
+			CloseFile(fd);
 		return false;
 	}
 
@@ -786,7 +786,7 @@ SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruFlush fdata)
 		slru_errcause =