subject:"Incorrect errno used with %m for backend code"

Re: Incorrect errno used with %m for backend code

2018-06-24 Thread Michael Paquier

On Sun, Jun 24, 2018 at 09:23:50PM +0900, Michael Paquier wrote:
> On Sun, Jun 24, 2018 at 09:22:01AM +0530, Ashutosh Sharma wrote:
>> Okay, I too had a quick look into the source code to see if there are
>> still some places where we could have missed to set an errno to ENOSPC
>> in case of write system call failure but, couldn't find any such place
>> in the code. The v2 version of patch looks good to me.
> 
> Thanks for the review.  I'll try to wrap that tomorrow with proper
> patches for back-branches as things diverge a bit here and there.

Pushed down to 9.3.  All branches conflicted, and particularly in
pg_basebackup some write() calls have been moved around.
--
Michael


signature.asc
Description: PGP signature

Re: Incorrect errno used with %m for backend code

2018-06-23 Thread Ashutosh Sharma

On Sat, Jun 23, 2018 at 6:43 PM, Michael Paquier  wrote:
> On Fri, Jun 22, 2018 at 03:45:33PM +0530, Ashutosh Sharma wrote:
>> Okay, thanks for the confirmation. Few of them are also there in
>> origin.c and snapbuild.c files.
>
> Thanks Ashutosh.  I have been reviewing the whole tree and I found more
> places where this is missing, like rewriteheap.c, reorderbuffer.c or
> pg_basebackup, which gives the attached.
> --

Okay, I too had a quick look into the source code to see if there are
still some places where we could have missed to set an errno to ENOSPC
in case of write system call failure but, couldn't find any such place
in the code. The v2 version of patch looks good to me.

So, to conclude, now, v2 patch fixes two things - 1) It makes ereport
to print a correct error number (the error number that matches with
the error message), 2) It sets the errno to ENOSPC (assuming that the
problem is no disk space) if write system call fails to set an errno.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: Incorrect errno used with %m for backend code

2018-06-23 Thread Michael Paquier

On Fri, Jun 22, 2018 at 11:15:53AM -0400, Alvaro Herrera wrote:
> I wondered for a bit if it would be a better idea to have
> CloseTransientFile restore the existing errno if there is any and close
> works fine -- when I noticed that that function itself says that caller
> should check errno for close() errors.  Most callers seem to do it
> correctly, but a few fail to check the return value.

Yeah, the API in its current shape is simpler to understand.  Once you
get used to the errno stanza of course...

> A bunch of other places use CloseTransientFile while closing shop after
> some other syscall failure, which is what your patch fixes.  I didn't
> review those; checking for close failure seems pointless.

Agreed.

> In some cases, we fsync the file and check that return value, then close
> the file and do *not* check CloseTransientFile's return value --
> examples are CheckPointLogicalRewriteHeap, heap_xlog_logical_rewrite,
> SnapBuildSerialize, SaveSlotToPath, durable_rename.  I don't know if
> it's reasonable for close() to fail after successfully fsyncing a file;
> maybe this is not a problem.  I would patch those nonetheless.

And writeTimeLineHistory.

> be_lo_export() is certainly missing a check: it writes and closes,
> without intervening fsync.

One problem at the same time if possible :)  I think that those
adjustments should be a separate patch.
--
Michael


signature.asc
Description: PGP signature

Re: Incorrect errno used with %m for backend code

2018-06-23 Thread Michael Paquier

On Fri, Jun 22, 2018 at 03:45:33PM +0530, Ashutosh Sharma wrote:
> Okay, thanks for the confirmation. Few of them are also there in
> origin.c and snapbuild.c files.

Thanks Ashutosh.  I have been reviewing the whole tree and I found more
places where this is missing, like rewriteheap.c, reorderbuffer.c or
pg_basebackup, which gives the attached.
--
Michael
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 8d3c861a33..ed7ba181c7 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1168,9 +1168,14 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
 	/* write out tail end of mapping file (again) */
 	pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE);
 	if (write(fd, data, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write to file \"%s\": %m", path)));
+	}
 	pgstat_report_wait_end();
 
 	/*
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 65194db70e..a9ef1b3d73 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1241,12 +1241,17 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	 */
 	if (fstat(fd, ))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 	(errcode_for_file_access(),
 	 errmsg("could not stat two-phase state file \"%s\": %m",
 			path)));
+		}
 		return NULL;
 	}
 
@@ -1274,13 +1279,18 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_READ);
 	if (read(fd, buf, stat.st_size) != stat.st_size)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 	(errcode_for_file_access(),
 	 errmsg("could not read two-phase state file \"%s\": %m",
 			path)));
+		}
 		pfree(buf);
 		return NULL;
 	}
@@ -1663,16 +1673,26 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_WRITE);
 	if (write(fd, content, len) != len)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write two-phase state file: %m")));
 	}
 	if (write(fd, _crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write two-phase state file: %m")));
@@ -1686,7 +1706,10 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not fsync two-phase state file: %m")));
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..1a419aa49b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3268,7 +3268,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		close(fd);
+		errno = save_errno;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not fsync file \"%s\": %m", tmppath)));
@@ -11675,8 +11678,10 @@ retry:
 	if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 (errcode_for_file_access(),
  errmsg("could not seek in log segment %s to offset %u: %m",
@@ -11688,9 +11693,11 @@ retry:
 	if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		pgstat_report_wait_end();
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 (errcode_for_file_access(),
  errmsg("could not read from log segment %s, offset %u: %m",
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 52fe55e2af..4ecdc9220f 100644
--- a/src/backend/access/transam/xlogutils.c
+++

Re: Incorrect errno used with %m for backend code

2018-06-22 Thread Andres Freund

Hi,

On 2018-06-22 11:15:53 -0400, Alvaro Herrera wrote:
> I don't understand the logic in RestoreSlotFromDisk that fsync the file
> prior to reading it.  What are we protecting against?

we could previously have crashed before fsyncing. But we can only rely
on slot contents being durable if we fsync them. Therefore we fsync
before reading, after a crash.

> Anyway, while I think it would be nice to have CloseTransientFile
> restore the original errno if there was one and close goes well, it
> probably unduly complicates its API contract.

Yea, agreed.

Greetings,

Andres Freund

Re: Incorrect errno used with %m for backend code

2018-06-22 Thread Alvaro Herrera

On 2018-Jun-22, Michael Paquier wrote:

> A couple of places use CloseTransientFile without bothering much that
> this can overwrite errno.  I was tempted in keeping errno saved and kept
> if set to a non-zero value, but refrained from doing so as some callers
> may rely on the existing behavior, and the attached shows better the
> intention. 

I wondered for a bit if it would be a better idea to have
CloseTransientFile restore the existing errno if there is any and close
works fine -- when I noticed that that function itself says that caller
should check errno for close() errors.  Most callers seem to do it
correctly, but a few fail to check the return value.  

Quite some places open files O_RDONLY, so lack of error checking after
closing seems ok.  (Unless there's some funny interaction with the fsync
fiasco, of course.)

A bunch of other places use CloseTransientFile while closing shop after
some other syscall failure, which is what your patch fixes.  I didn't
review those; checking for close failure seems pointless.

In some cases, we fsync the file and check that return value, then close
the file and do *not* check CloseTransientFile's return value --
examples are CheckPointLogicalRewriteHeap, heap_xlog_logical_rewrite,
SnapBuildSerialize, SaveSlotToPath, durable_rename.  I don't know if
it's reasonable for close() to fail after successfully fsyncing a file;
maybe this is not a problem.  I would patch those nonetheless.

be_lo_export() is certainly missing a check: it writes and closes,
without intervening fsync.

I don't understand the logic in RestoreSlotFromDisk that fsync the file
prior to reading it.  What are we protecting against?

Anyway, while I think it would be nice to have CloseTransientFile
restore the original errno if there was one and close goes well, it
probably unduly complicates its API contract.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Incorrect errno used with %m for backend code

2018-06-22 Thread Ashutosh Sharma

On Fri, Jun 22, 2018 at 2:44 PM, Michael Paquier  wrote:
> On Fri, Jun 22, 2018 at 01:00:45PM +0530, Ashutosh Sharma wrote:
>> It seems like in case of few system calls for e.g. write system call,
>> errno is not set even if the number of bytes written is smaller than
>> the bytes requested and for such cases we explicitly set an errno to
>> ENOSPC. Something like this,
>>
>> /*
>>  * if write didn't set errno, assume problem is no disk space
>>  */
>> errno = save_errno ? save_errno : ENOSPC;
>>
>> Shouldn't we do similar handling in your patch as well or please let
>> me know if i am missing something here.
>
> Yeah, I can see at least three of them in twophase.c.  Let's fix those
> as well at the same time.

Okay, thanks for the confirmation. Few of them are also there in
origin.c and snapbuild.c files.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: Incorrect errno used with %m for backend code

2018-06-22 Thread Michael Paquier

On Fri, Jun 22, 2018 at 01:00:45PM +0530, Ashutosh Sharma wrote:
> It seems like in case of few system calls for e.g. write system call,
> errno is not set even if the number of bytes written is smaller than
> the bytes requested and for such cases we explicitly set an errno to
> ENOSPC. Something like this,
> 
> /*
>  * if write didn't set errno, assume problem is no disk space
>  */
> errno = save_errno ? save_errno : ENOSPC;
> 
> Shouldn't we do similar handling in your patch as well or please let
> me know if i am missing something here.

Yeah, I can see at least three of them in twophase.c.  Let's fix those
as well at the same time.
--
Michael


signature.asc
Description: PGP signature

Re: Incorrect errno used with %m for backend code

2018-06-22 Thread Ashutosh Sharma

Hi,

It seems like in case of few system calls for e.g. write system call,
errno is not set even if the number of bytes written is smaller than
the bytes requested and for such cases we explicitly set an errno to
ENOSPC. Something like this,

/*
 * if write didn't set errno, assume problem is no disk space
 */
errno = save_errno ? save_errno : ENOSPC;

Shouldn't we do similar handling in your patch as well or please let
me know if i am missing something here.

On Fri, Jun 22, 2018 at 11:45 AM, Michael Paquier  wrote:
> Hi all,
>
> I have been reviewing the use of errno in the backend code when %m is
> used in the logs, and found more places than when I looked at improving
> the error messages for read() calls which bloat the errno value because
> of intermediate system calls.  As the problem is separate and should be
> back-patched, I have preferred beginning a new thread.
>
> A couple of places use CloseTransientFile without bothering much that
> this can overwrite errno.  I was tempted in keeping errno saved and kept
> if set to a non-zero value, but refrained from doing so as some callers
> may rely on the existing behavior, and the attached shows better the
> intention.
>
> Attached is a patch with everything I have spotted.  Any opinions or
> thoughts in getting this back-patched?
>
> Thanks,
> --
> Michael

Incorrect errno used with %m for backend code

2018-06-22 Thread Michael Paquier

Hi all,

I have been reviewing the use of errno in the backend code when %m is
used in the logs, and found more places than when I looked at improving
the error messages for read() calls which bloat the errno value because
of intermediate system calls.  As the problem is separate and should be
back-patched, I have preferred beginning a new thread.

A couple of places use CloseTransientFile without bothering much that
this can overwrite errno.  I was tempted in keeping errno saved and kept
if set to a non-zero value, but refrained from doing so as some callers
may rely on the existing behavior, and the attached shows better the
intention. 

Attached is a patch with everything I have spotted.  Any opinions or
thoughts in getting this back-patched?

Thanks,
--
Michael
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 65194db70e..e5571a67d6 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1241,12 +1241,17 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	 */
 	if (fstat(fd, ))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 	(errcode_for_file_access(),
 	 errmsg("could not stat two-phase state file \"%s\": %m",
 			path)));
+		}
 		return NULL;
 	}
 
@@ -1274,13 +1279,18 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_READ);
 	if (read(fd, buf, stat.st_size) != stat.st_size)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 	(errcode_for_file_access(),
 	 errmsg("could not read two-phase state file \"%s\": %m",
 			path)));
+		}
 		pfree(buf);
 		return NULL;
 	}
@@ -1663,16 +1673,22 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_WRITE);
 	if (write(fd, content, len) != len)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write two-phase state file: %m")));
 	}
 	if (write(fd, _crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write two-phase state file: %m")));
@@ -1686,7 +1702,10 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not fsync two-phase state file: %m")));
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..1a419aa49b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3268,7 +3268,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		close(fd);
+		errno = save_errno;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not fsync file \"%s\": %m", tmppath)));
@@ -11675,8 +11678,10 @@ retry:
 	if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 (errcode_for_file_access(),
  errmsg("could not seek in log segment %s to offset %u: %m",
@@ -11688,9 +11693,11 @@ retry:
 	if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		pgstat_report_wait_end();
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 (errcode_for_file_access(),
  errmsg("could not read from log segment %s, offset %u: %m",
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 52fe55e2af..4ecdc9220f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -718,9 +718,10 @@ XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
 			if (lseek(sendFile, (off_t) startoff, SEEK_SET) < 0)
 			{
 char		path[MAXPGPATH];
+int			save_errno = errno;
 
 XLogFilePath(path, tli, sendSegNo, segsize);
-
+errno = save_errno;
 ereport(ERROR,
 		(errcode_for_file_access(),
 		 errmsg("could not seek in log segment %s to offset %u: %m",
@@ -741,9 +742,10

Re: Incorrect errno used with %m for backend code

Re: Incorrect errno used with %m for backend code

Re: Incorrect errno used with %m for backend code

Re: Incorrect errno used with %m for backend code

Re: Incorrect errno used with %m for backend code

Re: Incorrect errno used with %m for backend code

Re: Incorrect errno used with %m for backend code

Re: Incorrect errno used with %m for backend code

Re: Incorrect errno used with %m for backend code

Incorrect errno used with %m for backend code

10 matches

Site Navigation

Mail list logo

Footer information