Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-16 Thread Magnus Hagander
On Tue, Jan 16, 2007 at 10:20:04AM +0900, Takayuki Tsunakawa wrote:
 From: Magnus Hagander [EMAIL PROTECTED]
  But yeah, that's probably a good idea. A quick look at the code says
 we
  should at least ask people who have this problem to give it a run
 with
  logging at DEBUG5 which should then log exactly what the errorcode
 was.
  Or are you seeing more places that need such logging first?
 
 I'm sorry we can't get get the Win32 error code to be displayed.  I
 got the following messages:
 
 2007-01-16 09:24:48 DEBUG:  checkpoint starting
 2007-01-16 09:24:48 ERROR:  could not open relation 1663/10819/18296:
 Permission denied
 2007-01-16 09:24:48 ERROR:  checkpoint request failed
 2007-01-16 09:24:48 HINT:  Consult recent messages in the server log
 for details.
 2007-01-16 09:24:48 STATEMENT:  checkpoint;
 
 
 The reason is that src/port/open.c does not use _dosmaperr().  It
 converts the Win32 error code to errno directly.  EACCES is converted
 from ERROR_ACCESS_DENIED only.  Mmm, we may have to compromise as
 Tom-san says.

Good point. In this case, we *know* it's access denied then, and not
some other error code.

And actually, when I look at the API docs, our case now seems to be
documented. Or am I misreading our situation. I have:

If you call CreateFile on a file that is pending deletion as a result
of a previous call to DeleteFile, the function fails. The operating
system delays file deletion until all handles to the file are closed.
GetLastError returns ERROR_ACCESS_DENIED.


//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-16 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 And actually, when I look at the API docs, our case now seems to be
 documented. Or am I misreading our situation. I have:

 If you call CreateFile on a file that is pending deletion as a result
 of a previous call to DeleteFile, the function fails. The operating
 system delays file deletion until all handles to the file are closed.
 GetLastError returns ERROR_ACCESS_DENIED.

We are not calling CreateFile ... we're just trying to open the thing.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-16 Thread Magnus Hagander
On Tue, Jan 16, 2007 at 11:11:59AM -0500, Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  And actually, when I look at the API docs, our case now seems to be
  documented. Or am I misreading our situation. I have:
 
  If you call CreateFile on a file that is pending deletion as a result
  of a previous call to DeleteFile, the function fails. The operating
  system delays file deletion until all handles to the file are closed.
  GetLastError returns ERROR_ACCESS_DENIED.
 
 We are not calling CreateFile ... we're just trying to open the thing.

Yes, we are calling CreateFile. It's used to open files as well. Check
the code in src/port/open.c.

It's not very intuitive, but createfile is used both to create new files
and to open existing ones (create handles to them).

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-16 Thread Andrew Dunstan

Tom Lane wrote:

Magnus Hagander [EMAIL PROTECTED] writes:
  

And actually, when I look at the API docs, our case now seems to be
documented. Or am I misreading our situation. I have:



  

If you call CreateFile on a file that is pending deletion as a result
of a previous call to DeleteFile, the function fails. The operating
system delays file deletion until all handles to the file are closed.
GetLastError returns ERROR_ACCESS_DENIED.



We are not calling CreateFile ... we're just trying to open the thing.




see src/port/open.c - pgwin32_open() calls CreateFile().

cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 pg_control is certainly not ever deleted or renamed, and in fact I
 believe there's an LWLock enforcing that only one PG process at a time
 is even touching it.  So we need another theory to explain this one :-(
 
 Right. What we need is a list of which processes have handles open to
 the file, which can be dumped using Process Explorer (there are other
 sysinternals tools to do it as well, but PE is probably the easiest)-
 
 Hmm, are you just assuming that the underlying error is
 ERROR_SHARING_VIOLATION?  One of the things that's bothered me all along
 is that there are a dozen different Windows error codes that we map to
 EACCES ... perhaps it's time to think about disambiguating that a bit
 better?

I was. Using PE is just one way to prove that was it :-)

But yeah, that's probably a good idea. A quick look at the code says we
should at least ask people who have this problem to give it a run with
logging at DEBUG5 which should then log exactly what the errorcode was.
Or are you seeing more places that need such logging first?


//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 But yeah, that's probably a good idea. A quick look at the code says we
 should at least ask people who have this problem to give it a run with
 logging at DEBUG5 which should then log exactly what the errorcode was.
 Or are you seeing more places that need such logging first?

DEBUG5 is going to be a bit voluminous, but let's try that if we can.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 But yeah, that's probably a good idea. A quick look at the code says we
 should at least ask people who have this problem to give it a run with
 logging at DEBUG5 which should then log exactly what the errorcode was.
 Or are you seeing more places that need such logging first?
 
 DEBUG5 is going to be a bit voluminous, but let's try that if we can.

Perhaps we should switch down the DEBUG level of it, at least until we
know what happens?

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 DEBUG5 is going to be a bit voluminous, but let's try that if we can.

 Perhaps we should switch down the DEBUG level of it, at least until we
 know what happens?

That would have to wait on another update release, or at least someone
being willing to build a nonstandard executable for Windows, so let's
first see if people are willing to do the DEBUG5 bit.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Takayuki Tsunakawa
From: Magnus Hagander [EMAIL PROTECTED]
 But yeah, that's probably a good idea. A quick look at the code says
we
 should at least ask people who have this problem to give it a run
with
 logging at DEBUG5 which should then log exactly what the errorcode
was.
 Or are you seeing more places that need such logging first?

I'm sorry we can't get get the Win32 error code to be displayed.  I
got the following messages:

2007-01-16 09:24:48 DEBUG:  checkpoint starting
2007-01-16 09:24:48 ERROR:  could not open relation 1663/10819/18296:
Permission denied
2007-01-16 09:24:48 ERROR:  checkpoint request failed
2007-01-16 09:24:48 HINT:  Consult recent messages in the server log
for details.
2007-01-16 09:24:48 STATEMENT:  checkpoint;


The reason is that src/port/open.c does not use _dosmaperr().  It
converts the Win32 error code to errno directly.  EACCES is converted
from ERROR_ACCESS_DENIED only.  Mmm, we may have to compromise as
Tom-san says.

BTW, why does the bgwriter try to open and write the pages of already
dropped relations?  When dropping relations, DropRelFileNodeBuffers is
called to discard dirty buffers.  If the relation being dropeed has
already been registered in the list of files to be fsynced, isn't it
possible to remove the file from the list before unlinking the file,
asking bgwriter in a similar way as ForwardFsyncRequest()?

#
The timestamp at the head of each message is noisy since the event
viewer has the time info, isn't it?  Besides, several PostgreSQL
messages appeared as one entry of event log, separated by a LF instead
of CR LF.  On Windows, CR LF should separate lines.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-15 Thread Tom Lane
Takayuki Tsunakawa [EMAIL PROTECTED] writes:
 BTW, why does the bgwriter try to open and write the pages of already
 dropped relations?

It does not; the problem is with stale fsync requests.

 If the relation being dropeed has
 already been registered in the list of files to be fsynced, isn't it
 possible to remove the file from the list before unlinking the file,
 asking bgwriter in a similar way as ForwardFsyncRequest()?

I suggested that here
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00642.php
but have received no feedback about it ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Zeugswetter Andreas ADI SD

  I find it very unlikely that you would during normal operations
end up
  in a situation where you would first have permissions to create
files in
  a directory, and then lose them.
  What could be is that you have a directory where you never had
  permissions to create the file in the first place.
 
  Any chance to differentiate between these?
 
 The cases we're concerned about involve access to an existing file,
not
 attempts to create a new one, so I'm not clear what your point is.

I am wondering if we can delete the file by opening it with
FILE_FLAG_DELETE_ON_CLOSE, and immediately close it again. 
The semantics should be clear if we let the OS delete the file after the

last handle on it is closed ? 
Until all handles are closed another process can still open it with 
FILE_SHARE_DELETE (according to docs), but not without the flag.
This seems to be what we want.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/
fs/createfile.asp

If this fails (see the loop in dirmod.c) we could try to move it to
the recycle bin with SHFileOperation with FO_DELETE.

It seems the win unlink is not implemented correctly and we need to
replace it.
I don't feel easy with the ignore EACCES idea. 

Should I try to supply a patch along this line ?

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Magnus Hagander
On Fri, Jan 12, 2007 at 10:49:53AM +0100, Zeugswetter Andreas ADI SD wrote:
 
   I find it very unlikely that you would during normal operations
 end up
   in a situation where you would first have permissions to create
 files in
   a directory, and then lose them.
   What could be is that you have a directory where you never had
   permissions to create the file in the first place.
  
   Any chance to differentiate between these?
  
  The cases we're concerned about involve access to an existing file,
 not
  attempts to create a new one, so I'm not clear what your point is.
 
 I am wondering if we can delete the file by opening it with
 FILE_FLAG_DELETE_ON_CLOSE, and immediately close it again. 
 The semantics should be clear if we let the OS delete the file after the
 
 last handle on it is closed ? 
 Until all handles are closed another process can still open it with 
 FILE_SHARE_DELETE (according to docs), but not without the flag.
 This seems to be what we want.
 http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/
 fs/createfile.asp
 
 If this fails (see the loop in dirmod.c) we could try to move it to
 the recycle bin with SHFileOperation with FO_DELETE.
 
 It seems the win unlink is not implemented correctly and we need to
 replace it.
 I don't feel easy with the ignore EACCES idea. 
 
 Should I try to supply a patch along this line ?

Doesn't sound unreasonable, so yes, let's give it a try at least.

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Magnus Hagander
On Thu, Jan 11, 2007 at 10:39:47PM -0500, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Thu, Jan 11, 2007 at 03:14:37PM -0500, Tom Lane wrote:
  ... And anyway there should never
  *be* a real permissions problem; if there is then the user's been poking
  under the hood sufficient to void the warranty anyway ;-)
 
  Or some other helpful process such as a virus scanner has been poking
  under the hood for you... :(
 
 One point worth making is that I'm not really convinced anymore that
 we have proof that antivirus code has been creating any such problems.

We do. I have positive proof of this being caused by AV software.

I don't know that it has been the problem in *all cases*, certainly, but
I've had kernel stacktraces pointing into AV filter drivers more than
once.


 We have several anecdotal cases where someone reported erratic
 permission denied problems on Windows, and we suggested getting rid
 of any AV code, and it seemed to fix their problem --- but how long did
 they test?  This problem is inherently very timing-sensitive, and so the
 fact that you don't see it for a little while is hardly proof that it's
 gone.  See the report that started this thread for examples of apparent
 correlations that are really quite spurious, like whether the test case
 is being driven locally or not.  It could easy be that every report
 we've heard really traces to the not-yet-deleted-file problem.

No, not all of them. But certainly a fair share of them can have been.

 So basically what we'd have is that if you manually remove permissions
 on a database file or directory you'd be risking data loss; but heck,
 if you manually move, rename, delete such a file you're risking
 (guaranteeing) data loss.

That was the point I was trying tom ake erarlier :-)

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Tom Lane
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
 It seems the win unlink is not implemented correctly and we need to
 replace it.

Easier said than done ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 On Thu, Jan 11, 2007 at 10:39:47PM -0500, Tom Lane wrote:
 One point worth making is that I'm not really convinced anymore that
 we have proof that antivirus code has been creating any such problems.

 We do. I have positive proof of this being caused by AV software.
 I don't know that it has been the problem in *all cases*, certainly, but
 I've had kernel stacktraces pointing into AV filter drivers more than
 once.

No, I didn't claim that Windows AV software is bug-free ;-).  What I
said was that I'm not certain it's related to the permission denied
reports, as opposed to other problems.  Or are your stack traces
specifically for permission denied failures?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Zeugswetter Andreas ADI SD
I wrote:
   I find it very unlikely that you would during normal operations
end up
   in a situation where you would first have permissions to create
files in
   a directory, and then lose them.
   What could be is that you have a directory where you never had
   permissions to create the file in the first place.
  
   Any chance to differentiate between these?
  
  The cases we're concerned about involve access to an existing file,
not
  attempts to create a new one, so I'm not clear what your point is.
 
 I am wondering if we can delete the file by opening it with
 FILE_FLAG_DELETE_ON_CLOSE, and immediately close it again. 
 The semantics should be clear if we let the OS delete the 
 file after the last handle on it is closed ?
 
 Until all handles are closed another process can still open it with 
 FILE_SHARE_DELETE (according to docs), but not without the flag.

Say the docs, but win2000 gives EACCES :-( 

 This seems to be what we want.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/
fs/createfile.asp

Seems we don't get what we want :-(

 If this fails (see the loop in dirmod.c) we could try to move it to
 the recycle bin with SHFileOperation with FO_DELETE.

This does not seem to work eighter.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Magnus Hagander
On Fri, Jan 12, 2007 at 09:47:55AM -0500, Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  On Thu, Jan 11, 2007 at 10:39:47PM -0500, Tom Lane wrote:
  One point worth making is that I'm not really convinced anymore that
  we have proof that antivirus code has been creating any such problems.
 
  We do. I have positive proof of this being caused by AV software.
  I don't know that it has been the problem in *all cases*, certainly, but
  I've had kernel stacktraces pointing into AV filter drivers more than
  once.
 
 No, I didn't claim that Windows AV software is bug-free ;-).  What I
 said was that I'm not certain it's related to the permission denied
 reports, as opposed to other problems.  Or are your stack traces
 specifically for permission denied failures?

I have at least two cases specifically for the permission denied
failures in postgres.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 On Fri, Jan 12, 2007 at 09:47:55AM -0500, Tom Lane wrote:
 No, I didn't claim that Windows AV software is bug-free ;-).  What I
 said was that I'm not certain it's related to the permission denied
 reports, as opposed to other problems.  Or are your stack traces
 specifically for permission denied failures?

 I have at least two cases specifically for the permission denied
 failures in postgres.

Actually, it could still be the same problem, with the AV software only
involved to the extent that it's trying to scan files for viruses.
That could result in the AV code holding a table file open for a little
bit (or not such a little bit, if it's a big table) after it's nominally
been deleted, and that's exactly the situation we see checkpoints
failing in.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 On Fri, Jan 12, 2007 at 09:47:55AM -0500, Tom Lane wrote:
 No, I didn't claim that Windows AV software is bug-free ;-).  What I
 said was that I'm not certain it's related to the permission denied
 reports, as opposed to other problems.  Or are your stack traces
 specifically for permission denied failures?
 
 I have at least two cases specifically for the permission denied
 failures in postgres.
 
 Actually, it could still be the same problem, with the AV software only
 involved to the extent that it's trying to scan files for viruses.
 That could result in the AV code holding a table file open for a little
 bit (or not such a little bit, if it's a big table) after it's nominally
 been deleted, and that's exactly the situation we see checkpoints
 failing in.

Partially the same, but I've seen AV software keeping it open for
hours... Basically until reboot.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Actually, it could still be the same problem, with the AV software only
 involved to the extent that it's trying to scan files for viruses.

 Partially the same, but I've seen AV software keeping it open for
 hours... Basically until reboot.

Well, the bug report that just went by proves there's another problem:

: select version();
: 
: PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
: 3.4.2 (mingw-special)
: 
: pg_log:
: 
: 2007-01-12 17:23:16 PANIC:  could not open control file
: global/pg_control: Permission denied

pg_control is certainly not ever deleted or renamed, and in fact I
believe there's an LWLock enforcing that only one PG process at a time
is even touching it.  So we need another theory to explain this one :-(
... anyone have a better one than Windows is a piece of junk?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Actually, it could still be the same problem, with the AV software only
 involved to the extent that it's trying to scan files for viruses.
 
 Partially the same, but I've seen AV software keeping it open for
 hours... Basically until reboot.
 
 Well, the bug report that just went by proves there's another problem:
 
 : select version();
 : 
 : PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
 : 3.4.2 (mingw-special)
 : 
 : pg_log:
 : 
 : 2007-01-12 17:23:16 PANIC:  could not open control file
 : global/pg_control: Permission denied
 
 pg_control is certainly not ever deleted or renamed, and in fact I
 believe there's an LWLock enforcing that only one PG process at a time
 is even touching it.  So we need another theory to explain this one :-(
 ... anyone have a better one than Windows is a piece of junk?

Right. What we need is a list of which processes have handles open to
the file, which can be dumped using Process Explorer (there are other
sysinternals tools to do it as well, but PE is probably the easiest)-

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 pg_control is certainly not ever deleted or renamed, and in fact I
 believe there's an LWLock enforcing that only one PG process at a time
 is even touching it.  So we need another theory to explain this one :-(

 Right. What we need is a list of which processes have handles open to
 the file, which can be dumped using Process Explorer (there are other
 sysinternals tools to do it as well, but PE is probably the easiest)-

Hmm, are you just assuming that the underlying error is
ERROR_SHARING_VIOLATION?  One of the things that's bothered me all along
is that there are a dozen different Windows error codes that we map to
EACCES ... perhaps it's time to think about disambiguating that a bit
better?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-12 Thread Jim C. Nasby
On Thu, Jan 11, 2007 at 06:04:56PM -0500, Andrew Dunstan wrote:
 Please don't. At least not on the PostgreSQL web site nor in the docs. 
 And no, I don't run my production servers on Windows either.
 
 For good or ill, we made a decision years ago to do a proper Windows 
 port. I think that it's actually worked out reasonably well. All 
 operating systems have warts. Not long ago I tended to advise people not 
 to run mission critical Postgresql on Linux unless they were *very* 
 careful, due to the over-commit issue.
 
Yes, and IIRC we documented the overcommit stuff as well.

This isn't about OS holy wars, it's about providing information so that
people can make an informed decision about what OS to run their database
on.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Tom Lane
Patrick Earl [EMAIL PROTECTED] writes:
 In any case, the unit tests remove all contents and schema within the
 database before starting, and they remove the tables they create as
 they proceed.  Certainly there are many things have been recently
 deleted.

Yeah, I think then there's no question that the bgwriter is trying to
fsync something that's been deleted but isn't yet closed by every
process.  We have things set up so that that's not a really serious
problem anymore --- eventually it will be closed and then the next
checkpoint will succeed.  But CREATE DATABASE insists on checkpointing
and so it's vulnerable to even a transient failure.

I've been resisting changing the checkpoint code to treat EACCES as a
non-error situation on Windows, but maybe we have no choice.  How do
people feel about this idea: #ifdef WIN32 and the open or fsync fails
with EACCES, then

1. Emit a LOG (or maybe DEBUG) message noting the problem.
2. Leave the fsync request entry in the hashtable for next time.
3. Allow the current checkpoint to complete normally anyway.

If the file has actually been deleted, then eventually it will be closed
and the next checkpoint will be able to remove the hash entry.  If
there's something else wrong, we'll keep bleating and maybe the DBA will
notice eventually.

The downside of this is that a real EACCES problem wouldn't get noted at
any level higher than LOG, and so you could theoretically lose data
without much warning.  But I'm not seeing anything else we could do
about it --- AFAIK we have not heard of a way we can distinguish this
case from a real permissions problem.  And anyway there should never
*be* a real permissions problem; if there is then the user's been poking
under the hood sufficient to void the warranty anyway ;-)

Comments?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 I find it very unlikely that you would during normal operations end up
 in a situation where you would first have permissions to create files in
 a directory, and then lose them.
 What could be is that you have a directory where you never had
 permissions to create the file in the first place.

 Any chance to differentiate between these?

The cases we're concerned about involve access to an existing file, not
attempts to create a new one, so I'm not clear what your point is.

I would certainly *love* to differentiate between these failures and
ordinary permissions failures, but so far as I've heard we can't.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 I find it very unlikely that you would during normal operations end up
 in a situation where you would first have permissions to create files in
 a directory, and then lose them.
 What could be is that you have a directory where you never had
 permissions to create the file in the first place.
 
 Any chance to differentiate between these?
 
 The cases we're concerned about involve access to an existing file, not
 attempts to create a new one, so I'm not clear what your point is.

Well, then I don't see it as being a big problem, which was the
question, I think. If pgsql had permissions to create the file, it would
never lose it unless the dba changed something - and if the dba changed
something, then he should check his logs afterwards to make sure he
didn't break anything.

My point is that if we know that *we* could create the file, than the
probability of it being an *actual* permissions problem is very low
during normal operations. So it's most likely the delete issue, and
thus doing what you propose does seem like a fairly safe bet.


 I would certainly *love* to differentiate between these failures and
 ordinary permissions failures, but so far as I've heard we can't.

Right, that's the base problem.

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Jim C. Nasby
On Thu, Jan 11, 2007 at 04:32:42PM -0500, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Given that this could result in data loss, if this was to be done I'd
  very much want to see a way to disable it in a production environment.
 
 Production environments are the same ones that won't be happy with
 random checkpoint failures, either.

Maybe I'm not understanding what happens in a checkpoint failure, but
I'd rather have my pg_xlog fill up (hopefully with a lot af WARNINGS
thrown before-hand) and face a long recovery than lose data...

 If we can't find a way to positively identify the deleted-file failures
 then I think we've got to do something like this.
 
 (You know, of course, that my opinion is that no sane person would run a
 production database on Windows in the first place.  So the data-loss
 risk to me seems less of a problem than the unexpected-failures problem.
 It's not like there aren't a ton of other data-loss scenarios in that OS
 that we can't do anything about...)

Yeah, and I share your opinion. Unfortunately, a lot of others do not.
:(

It would be useful if we had a page somewhere that explained in detail
what these data-loss issues were and why they're out of our control. At
least then people would (or could...) understand why production +
Windows == BadIdea.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Richard Troy

On Thu, 11 Jan 2007, Tom Lane wrote:

...snip...

 (You know, of course, that my opinion is that no sane person would run a
 production database on Windows in the first place.  So the data-loss
 risk to me seems less of a problem than the unexpected-failures problem.
 It's not like there aren't a ton of other data-loss scenarios in that OS
 that we can't do anything about...)

   regards, tom lane


PLEASE OH PLEASE document every f-ing one of them! (And I don't mean
document Windows issues as comments in the source code. Best would be in
the official documentation/on a web page.) On occasion, I could *really*
use such a list! (If such already exists, please point me at it!)

Thing is, Tom, not everybody has the same level of information you have on
the subject...

Regards,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[EMAIL PROTECTED], http://ScienceTools.com/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Andrew Dunstan

Richard Troy wrote:

On Thu, 11 Jan 2007, Tom Lane wrote:

...snip...
  

(You know, of course, that my opinion is that no sane person would run a
production database on Windows in the first place.  So the data-loss
risk to me seems less of a problem than the unexpected-failures problem.
It's not like there aren't a ton of other data-loss scenarios in that OS
that we can't do anything about...)





PLEASE OH PLEASE document every f-ing one of them! (And I don't mean
document Windows issues as comments in the source code. Best would be in
the official documentation/on a web page.) On occasion, I could *really*
use such a list! (If such already exists, please point me at it!)

Thing is, Tom, not everybody has the same level of information you have on
the subject...


  



Please don't. At least not on the PostgreSQL web site nor in the docs. 
And no, I don't run my production servers on Windows either.


For good or ill, we made a decision years ago to do a proper Windows 
port. I think that it's actually worked out reasonably well. All 
operating systems have warts. Not long ago I tended to advise people not 
to run mission critical Postgresql on Linux unless they were *very* 
careful, due to the over-commit issue.


In fact, I don't trust any OS. I use dumps and backups and replication 
to protect myself from them all.


In the present instance, the data loss risk is largely theoretical, as I 
understand it, as we don't expect a genuine EACCESS error.


cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Joshua D. Drake
 
 

 
 
 Please don't. At least not on the PostgreSQL web site nor in the docs. 
 And no, I don't run my production servers on Windows either.

It does seem like it might be a good idea to have FAQs based on each OS,
yes? There are various things that effect each OS differently. The most
obvious to me being shared memory and wal_sync_method.

If could be a good idea to have.

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread jam
On Thu, Jan 11, 2007 at 03:12:07PM -0800, Joshua D. Drake wrote:
 It does seem like it might be a good idea to have FAQs based on each OS,
 yes? There are various things that effect each OS differently. The most
 obvious to me being shared memory and wal_sync_method.
 
 If could be a good idea to have.
 
 Joshua D. Drake
 

+1

regards,
J

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Alvaro Herrera
Joshua D. Drake wrote:

  Please don't. At least not on the PostgreSQL web site nor in the docs. 
  And no, I don't run my production servers on Windows either.
 
 It does seem like it might be a good idea to have FAQs based on each OS,
 yes? There are various things that effect each OS differently. The most
 obvious to me being shared memory and wal_sync_method.

But we have per-platform FAQs.  If there is information missing, the
reason is that nobody has submitted an appropriate patch, nothing more.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread jam
On Thu, Jan 11, 2007 at 09:42:38PM -0300, Alvaro Herrera wrote:
 
 But we have per-platform FAQs.  If there is information missing, the
 reason is that nobody has submitted an appropriate patch, nothing more.
 

where are these FAQs, and why were they not easily found when the original
poster sent his email? is there some SEO we need to do on the websites to
make things more obvious?

regards,
J



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Joshua D. Drake
On Thu, 2007-01-11 at 21:42 -0300, Alvaro Herrera wrote:
 Joshua D. Drake wrote:
 
   Please don't. At least not on the PostgreSQL web site nor in the docs. 
   And no, I don't run my production servers on Windows either.
  
  It does seem like it might be a good idea to have FAQs based on each OS,
  yes? There are various things that effect each OS differently. The most
  obvious to me being shared memory and wal_sync_method.
 
 But we have per-platform FAQs.  If there is information missing, the
 reason is that nobody has submitted an appropriate patch, nothing more.

Yes you are correct, now that I look. It is not obviously apparent
though and they do appear to be quite out of date.

Joshua D. Drake




 
-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Checkpoint request failed on version 8.2.1.

2007-01-11 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Thu, Jan 11, 2007 at 03:14:37PM -0500, Tom Lane wrote:
 ... And anyway there should never
 *be* a real permissions problem; if there is then the user's been poking
 under the hood sufficient to void the warranty anyway ;-)

 Or some other helpful process such as a virus scanner has been poking
 under the hood for you... :(

One point worth making is that I'm not really convinced anymore that
we have proof that antivirus code has been creating any such problems.
We have several anecdotal cases where someone reported erratic
permission denied problems on Windows, and we suggested getting rid
of any AV code, and it seemed to fix their problem --- but how long did
they test?  This problem is inherently very timing-sensitive, and so the
fact that you don't see it for a little while is hardly proof that it's
gone.  See the report that started this thread for examples of apparent
correlations that are really quite spurious, like whether the test case
is being driven locally or not.  It could easy be that every report
we've heard really traces to the not-yet-deleted-file problem.

So basically what we'd have is that if you manually remove permissions
on a database file or directory you'd be risking data loss; but heck,
if you manually move, rename, delete such a file you're risking
(guaranteeing) data loss.  Any sane user is going to figure keep your
fingers away from the moving parts; or if he can't figure that out,
he's got no one but himself to blame.

It's not ideal, granted, but we're dealing with a much-less-than-ideal
OS, so we gotta make some compromises.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster