Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-10-14 Thread Michael Paquier
On Fri, Oct 04, 2019 at 01:39:38PM -0700, Andres Freund wrote:
> but that reasoning seems bogus to me. For one, on just about any
> platform close always closes the fd, even when returning an error
> (unless you pass in a bad fd, in which case it obviously doesn't). So
> the reasoning that this fixes unnoticed fd leaks doesn't really seem to
> make sense. For another, even if it did, throwing an ERROR seems to
> achieve very little: We continue with a leaked fd *AND* we cause the
> operation to error out.

I have read again a couple of times the commit log, and this mentions
to let users know that a fd is leaking, not that it fixes things.
Still we get to know about it, while previously it was not possible.
In some cases we may see errors in close() after a previous write(2).
Of course this does not apply to all the code paths patched here, but
it seems to me that's a good habit to spread, no?

> I can see reasoning for:
> - LOG, so it can be noticed, but operations continue to work
> - FATAL, to fix the leak
> - PANIC, so we recover from the problem, in case of the close indicating
>   a durability issue

LOG or WARNING would not be visible enough and would likely be skipped
by users.  Not sure that this justifies a FATAL either, and PANIC
would cause more harm than necessary, so for most of them ERROR
sounded like a good compromise, still the elevel choice is not
innocent depending on the code paths patched, because the elevel used
is consistent with the error handling of the surroundings.

> And if your goal was just to achieve consistency, I also don't
> understand, because you left plenty close()'s unchecked? E.g. you added
> an error check in get_controlfile(), but not one in
> ReadControlFile(). alterSystemSetConfigFile() writes, but you didn't add
> one.

Because I have not considered these when looking at transient files.
That may be worth an extra lookup.
--
Michael


signature.asc
Description: PGP signature


Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-10-04 Thread Andres Freund
Hi,

On 2019-03-07 10:56:25 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/heap/rewriteheap.c 
> b/src/backend/access/heap/rewriteheap.c
> index f5cf9ffc9c..bce4274362 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
>errmsg("could not fsync file \"%s\": %m", 
> path)));
>   pgstat_report_wait_end();
>
> - CloseTransientFile(fd);
> + if (CloseTransientFile(fd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close file \"%s\": %m", 
> path)));
>  }
...
> diff --git a/src/backend/access/transam/twophase.c 
> b/src/backend/access/transam/twophase.c
> index 64679dd2de..21986e48fe 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
>   }
>
>   pgstat_report_wait_end();
> - CloseTransientFile(fd);
> +
> + if (CloseTransientFile(fd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close file \"%s\": %m", 
> path)));
>
>   hdr = (TwoPhaseFileHeader *) buf;
>   if (hdr->magic != TWOPHASE_MAGIC)
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index 0fdd82a287..c7047738b6 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -3469,7 +3469,10 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, 
> XLogSegNo srcsegno,
>   (errcode_for_file_access(),
>errmsg("could not close file \"%s\": %m", 
> tmppath)));
>
> - CloseTransientFile(srcfd);
> + if (CloseTransientFile(srcfd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close file \"%s\": %m", 
> path)));
>
>   /*
>* Now move the segment into place with its final name.
...

This seems like an odd set of changes to me. What is this supposed to
buy us?  The commit message says:
2) When opening transient files, it is up to the caller to close the
file descriptors opened.  In error code paths, CloseTransientFile() gets
called to clean up things before issuing an error.  However in normal
exit paths, a lot of callers of CloseTransientFile() never actually
reported errors, which could leave a file descriptor open without
knowing about it.  This is an issue I complained about a couple of
times, but never had the courage to write and submit a patch, so here we
go.

but that reasoning seems bogus to me. For one, on just about any
platform close always closes the fd, even when returning an error
(unless you pass in a bad fd, in which case it obviously doesn't). So
the reasoning that this fixes unnoticed fd leaks doesn't really seem to
make sense. For another, even if it did, throwing an ERROR seems to
achieve very little: We continue with a leaked fd *AND* we cause the
operation to error out.

I can see reasoning for:
- LOG, so it can be noticed, but operations continue to work
- FATAL, to fix the leak
- PANIC, so we recover from the problem, in case of the close indicating
  a durability issue

  commit 9ccdd7f66e3324d2b6d3dec282cfa9ff084083f1
  Author: Thomas Munro 
  Date:   2018-11-19 13:31:10 +1300

  PANIC on fsync() failure.

but ERROR seems to have very little going for it.

The durability argument doesn't seem to apply for the cases where we
previously fsynced the file, a significant fraction of the locations you
touched.

And if your goal was just to achieve consistency, I also don't
understand, because you left plenty close()'s unchecked? E.g. you added
an error check in get_controlfile(), but not one in
ReadControlFile(). alterSystemSetConfigFile() writes, but you didn't add
one.

Greetings,

Andres Freund




Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-08 Thread Michael Paquier
On Fri, Mar 08, 2019 at 10:23:24AM +0900, Michael Paquier wrote:
> Argh...  Thanks for the lookup, Alvaro.

And committed, after an extra pass to beautify the whole experience.
--
Michael


signature.asc
Description: PGP signature


Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-07 Thread Michael Paquier
On Thu, Mar 07, 2019 at 10:00:05PM -0300, Alvaro Herrera wrote:
> I think this one needs a terminating \n.

Argh...  Thanks for the lookup, Alvaro.
--
Michael


signature.asc
Description: PGP signature


Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-07 Thread Alvaro Herrera
On 2019-Mar-07, Michael Paquier wrote:

>  #else
> - close(fd);
> + if (close(fd))
> + {
> + fprintf(stderr, _("%s: could not close file \"%s\": %s"),
> + progname, ControlFilePath, strerror(errno));
> + exit(EXIT_FAILURE);
> + }
>  #endif

I think this one needs a terminating \n.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-07 Thread Georgios Kokolatos
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

The second version of this patch seems to be in order and ready for committer.

Thank you for taking the time to code!

The new status of this patch is: Ready for Committer


Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-06 Thread Michael Paquier
On Wed, Mar 06, 2019 at 02:54:52PM +, Georgios Kokolatos wrote:
> Overall the patch looks good and according to the previous
> discussion fulfils its purpose. 
> 
> It might be worthwhile to also check for errors on close in
> SaveSlotToPath().

Thanks for the feedback, added.  I have spent some time
double-checking this stuff, and noticed that the new errors in
StartupReplicationOrigin() and CheckPointReplicationOrigin() should be
switched from ERROR to PANIC to be consistent.  One message in
dsm_impl_mmap() was not consistent either.

Are there any objections if I commit this patch?
--
Michael
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9905593661..7b39283c89 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1991,7 +1991,10 @@ qtext_load_file(Size *buffer_size)
 		return NULL;
 	}
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+		ereport(LOG,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", PGSS_TEXT_FILE)));
 
 	*buffer_size = stat.st_size;
 	return buf;
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index f5cf9ffc9c..bce4274362 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
  errmsg("could not fsync file \"%s\": %m", path)));
 	pgstat_report_wait_end();
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", path)));
 }
 
 /* ---
@@ -1300,7 +1303,11 @@ CheckPointLogicalRewriteHeap(void)
 		(errcode_for_file_access(),
 		 errmsg("could not fsync file \"%s\": %m", path)));
 			pgstat_report_wait_end();
-			CloseTransientFile(fd);
+
+			if (CloseTransientFile(fd))
+ereport(ERROR,
+		(errcode_for_file_access(),
+		 errmsg("could not close file \"%s\": %m", path)));
 		}
 	}
 	FreeDir(mappings_dir);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3623352b9c..974d42fc86 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -599,7 +599,7 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
 
 	SlruFileName(ctl, path, segno);
 
-	fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 	if (fd < 0)
 	{
 		/* expected: file doesn't exist */
@@ -621,7 +621,13 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
 
 	result = endpos >= (off_t) (offset + BLCKSZ);
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+	{
+		slru_errcause = SLRU_CLOSE_FAILED;
+		slru_errno = errno;
+		return false;
+	}
+
 	return result;
 }
 
@@ -654,7 +660,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	 * SlruPhysicalWritePage).  Hence, if we are InRecovery, allow the case
 	 * where the file doesn't exist, and return zeroes instead.
 	 */
-	fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 	if (fd < 0)
 	{
 		if (errno != ENOENT || !InRecovery)
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index c96c8b60ba..cbd9b2cee1 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -370,7 +370,11 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 			}
 			pgstat_report_wait_end();
 		}
-		CloseTransientFile(srcfd);
+
+		if (CloseTransientFile(srcfd))
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not close file \"%s\": %m", path)));
 	}
 
 	/*
@@ -416,7 +420,6 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 (errcode_for_file_access(),
  errmsg("could not close file \"%s\": %m", tmppath)));
 
-
 	/*
 	 * Now move the completed history file into place with its final name.
 	 */
@@ -495,7 +498,6 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
 (errcode_for_file_access(),
  errmsg("could not close file \"%s\": %m", tmppath)));
 
-
 	/*
 	 * Now move the completed history file into place with its final name.
 	 */
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 64679dd2de..21986e48fe 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
 	}
 
 	pgstat_report_wait_end();
-	CloseTransientFile(fd);
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", path)));
 
 	hdr = (TwoPhaseFileHeader *) buf;
 	if (hdr->magic != TWOPHASE_MAGIC)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0fdd82a287..c7047738b6 100644
--- 

Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-06 Thread Robert Haas
On Fri, Mar 1, 2019 at 5:06 PM Joe Conway  wrote:
> Seems like it would be better to modify the arguments to
> CloseTransientFile() to include the filename being closed, errorlevel,
> and fail_on_error or something similar. Then all the repeated ereport
> stanzas could be eliminated.

Hmm.  I'm not sure that really saves much in terms of notation, and
it's less flexible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-06 Thread Georgios Kokolatos
Overall the patch looks good and according to the previous discussion fulfils 
its purpose.

It might be worthwhile to also check for errors on close in SaveSlotToPath().

pgstat_report_wait_end();

CloseTransientFile(fd);

/* rename to permanent file, fsync file and directory */
if (rename(tmppath, path) != 0)

Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-01 Thread Michael Paquier
On Fri, Mar 01, 2019 at 05:05:54PM -0500, Joe Conway wrote:
> Seems like it would be better to modify the arguments to
> CloseTransientFile() to include the filename being closed, errorlevel,
> and fail_on_error or something similar. Then all the repeated ereport
> stanzas could be eliminated.

Sure.  Now some code paths close file descriptors without having at
hand the file name, which would mean that we'd need to pass NULL as
argument in this case.  That's not really elegant in my opinion.  And
having a consistent mapping with the system's close() is not really
bad to me either..
--
Michael


signature.asc
Description: PGP signature


Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-03-01 Thread Joe Conway
On 2/28/19 9:33 PM, Michael Paquier wrote:
> Hi all,
> 
> Joe's message here has reminded me that we have lacked a lot of error
> handling around CloseTransientFile():
> https://www.postgresql.org/message-id/c49b69ec-e2f7-ff33-4f17-0eaa4f2ce...@joeconway.com
> 
> This has been mentioned by Alvaro a couple of months ago (cannot find
> the thread about that at quick glance), and I just forgot about it at
> that time.  Anyway, attached is a patch to do some cleanup for all
> that:
> - Switch OpenTransientFile to read-only where sufficient.
> - Add more error handling for CloseTransientFile
> A major take of this patch is to make sure that the new error messages
> generated have an elevel consistent with their neighbors.
> 
> Just on time for this last CF.  Thoughts?

Seems like it would be better to modify the arguments to
CloseTransientFile() to include the filename being closed, errorlevel,
and fail_on_error or something similar. Then all the repeated ereport
stanzas could be eliminated.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Tighten error control for OpenTransientFile/CloseTransientFile

2019-02-28 Thread Michael Paquier
Hi all,

Joe's message here has reminded me that we have lacked a lot of error
handling around CloseTransientFile():
https://www.postgresql.org/message-id/c49b69ec-e2f7-ff33-4f17-0eaa4f2ce...@joeconway.com

This has been mentioned by Alvaro a couple of months ago (cannot find
the thread about that at quick glance), and I just forgot about it at
that time.  Anyway, attached is a patch to do some cleanup for all
that:
- Switch OpenTransientFile to read-only where sufficient.
- Add more error handling for CloseTransientFile
A major take of this patch is to make sure that the new error messages
generated have an elevel consistent with their neighbors.

Just on time for this last CF.  Thoughts?
--
Michael
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9905593661..7b39283c89 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1991,7 +1991,10 @@ qtext_load_file(Size *buffer_size)
 		return NULL;
 	}
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+		ereport(LOG,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", PGSS_TEXT_FILE)));
 
 	*buffer_size = stat.st_size;
 	return buf;
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index f5cf9ffc9c..bce4274362 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
  errmsg("could not fsync file \"%s\": %m", path)));
 	pgstat_report_wait_end();
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", path)));
 }
 
 /* ---
@@ -1300,7 +1303,11 @@ CheckPointLogicalRewriteHeap(void)
 		(errcode_for_file_access(),
 		 errmsg("could not fsync file \"%s\": %m", path)));
 			pgstat_report_wait_end();
-			CloseTransientFile(fd);
+
+			if (CloseTransientFile(fd))
+ereport(ERROR,
+		(errcode_for_file_access(),
+		 errmsg("could not close file \"%s\": %m", path)));
 		}
 	}
 	FreeDir(mappings_dir);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3623352b9c..974d42fc86 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -599,7 +599,7 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
 
 	SlruFileName(ctl, path, segno);
 
-	fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 	if (fd < 0)
 	{
 		/* expected: file doesn't exist */
@@ -621,7 +621,13 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
 
 	result = endpos >= (off_t) (offset + BLCKSZ);
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+	{
+		slru_errcause = SLRU_CLOSE_FAILED;
+		slru_errno = errno;
+		return false;
+	}
+
 	return result;
 }
 
@@ -654,7 +660,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	 * SlruPhysicalWritePage).  Hence, if we are InRecovery, allow the case
 	 * where the file doesn't exist, and return zeroes instead.
 	 */
-	fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 	if (fd < 0)
 	{
 		if (errno != ENOENT || !InRecovery)
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index c96c8b60ba..cbd9b2cee1 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -370,7 +370,11 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 			}
 			pgstat_report_wait_end();
 		}
-		CloseTransientFile(srcfd);
+
+		if (CloseTransientFile(srcfd))
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not close file \"%s\": %m", path)));
 	}
 
 	/*
@@ -416,7 +420,6 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 (errcode_for_file_access(),
  errmsg("could not close file \"%s\": %m", tmppath)));
 
-
 	/*
 	 * Now move the completed history file into place with its final name.
 	 */
@@ -495,7 +498,6 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
 (errcode_for_file_access(),
  errmsg("could not close file \"%s\": %m", tmppath)));
 
-
 	/*
 	 * Now move the completed history file into place with its final name.
 	 */
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 64679dd2de..21986e48fe 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
 	}
 
 	pgstat_report_wait_end();
-	CloseTransientFile(fd);
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", path)));
 
 	hdr = (TwoPhaseFileHeader *) buf;
 	if (hdr->magic != TWOPHASE_MAGIC)
diff --git