Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-12-01 Thread Kevin Grittner
Due to its size, in the Windows environment we can't dump this database in any format except plain text, so the zlib issues don't apply here. -Kevin Qingqing Zhou [EMAIL PROTECTED] By they way, they found that they were getting this on a pg_dump, too. We will test both failure cases. If

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-12-01 Thread Kevin Grittner
[Apologies for the delayed response; fighting through a backlog.] I checked with out DBAs, and they are willing to test it. By they way, they found that they were getting this on a pg_dump, too. We will test both failure cases. If the test goes OK, we would be happy to leave it in production

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-12-01 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes: I come up with a patch to fix server-side problem. Applied. For windows, I set a one second waiting time - The code actually does one millisecond; I left it that way since it seems a reasonable value. regards, tom lane

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-30 Thread Qingqing Zhou
On Wed, 30 Nov 2005, Kevin Grittner wrote: I checked with out DBAs, and they are willing to test it. Thanks, that's very nice! By they way, they found that they were getting this on a pg_dump, too. We will test both failure cases. If the test goes OK, we would be happy to leave it in

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-30 Thread Qingqing Zhou
I come up with a patch to fix server-side problem. The basic idea is to convert ERROR_NO_SYSTEM_RESOURCES to EINTR and add code to do retry unless a new error encountered or successfully done. I tweak the FileRead() logic on returnCode = 0 a little bit by separating it to 0 and ==0 parts. This is

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-30 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes: ! default: ! _dosmaperr(error); ! Assert(errno != EINTR); What's the point of that ... didn't it already happen inside read()?

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-30 Thread Qingqing Zhou
On Thu, 1 Dec 2005, Tom Lane wrote: Qingqing Zhou [EMAIL PROTECTED] writes: ! default: ! _dosmaperr(error); ! Assert(errno != EINTR); What's the point of that ... didn't it already happen

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-21 Thread Qingqing Zhou
Tom Lane [EMAIL PROTECTED] wrote Would a simple retry loop actually help? It's not clear to me how persistent such a failure would be. [with reply to all followup threads] Yeah, this is the key and we definitely have no 100% guarantee that several retries will solve the problem - just as

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-21 Thread Jim C. Nasby
On Thu, Nov 17, 2005 at 07:56:21PM +0100, Magnus Hagander wrote: The way I read it, a delay should help. It's basically running out of kernel buffers, and we just delay, somebody else (another process, or an IRQ handler, or whatever) should get finished with their I/O, free up the buffer, and

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-21 Thread Qingqing Zhou
Magnus Hagander [EMAIL PROTECTED] wrote The way I read it, a delay should help. It's basically running out of kernel buffers, and we just delay, somebody else (another process, or an IRQ handler, or whatever) should get finished with their I/O, free up the buffer, and let us have it. Looking

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Magnus Hagander
[copying this one over to hackers] Our DBAs reviewed the Microsoft documentation you referenced, modified the registry, and rebooted the OS. We've been beating up on the database without seeing the error so far. We'll keep at it for a while. Very interesting. As this seems to be a

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Qingqing Zhou
Magnus Hagander [EMAIL PROTECTED] wrote Seems like we could just retry when we get this failure. The question is we need to do a small amount of sleep before we do? Also, we can't just retry forever, there has to be some kind of end to it... (If you read the SQL kb, it can be read as

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Kevin Grittner
1) We run a couple Java applications on the same box to provide middle tier access. When the box is heavily loaded, I think I've seen about 80% PostgreSQL, 20% Java load. 2) I checked that no antivirus software was running, and had the techs pare down the services running on that box to the

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes: None of this seems material, however. It's pretty clear that the problem was exhaustion of the Windows page pool. ... If we don't want to tell Windows users to make highly technical changes to the Windows registry in order to use PostgreSQL, it does

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Kevin Grittner
I'm not an expert on that, but it seems reasonable to me that the page pool would free space as the I/O system caught up with the load. Also, I'm going on what was said by Qingqing and in one of the pages he referenced: http://support.microsoft.com/default.aspx?scid=kb;en-us;274310 -Kevin

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Magnus Hagander
Tom Lane [EMAIL PROTECTED] Kevin Grittner [EMAIL PROTECTED] writes: None of this seems material, however. It's pretty clear that the problem was exhaustion of the Windows page pool. ... If we don't want to tell Windows users to make highly technical changes to the Windows

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Magnus Hagander
None of this seems material, however. It's pretty clear that the problem was exhaustion of the Windows page pool. Our Windows experts have reconfigured the machine (which had been tuned for Sybase ASE). Their changes have boosted the page pool from 20,000 entries to 180,000 entries.

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Kevin Grittner
There weren't a large number of connections -- it seemed to be that the one big update query, by itself, would do this. It seemed to get through a lot of rows before failing. This table is normally insert only -- so it would likely be getting most or all of the space for inserting the updated

Re: [HACKERS] [ADMIN] ERROR: could not read block

2005-11-17 Thread Kevin Grittner
A couple clarifications: There were only a few network sockets open. I'm told that the eventlog was reviewed for any events which mgiht be related to the failures before it was cleared. They found none, so that makes it fairly certain there was no 2020 event. -Kevin Kevin Grittner [EMAIL