Thanks Tom.
I executed a REINDEX DATABASE and received the error:
.
.
.
NOTICE: table "pg_enum" was reindexed
NOTICE: table "pg_namespace" was reindexed
NOTICE: table "pg_conversion" was reindexed
NOTICE: table "pg_depend" was reindexed
NOTICE: table "users" was reindexed
NOTICE: table "resu
Hello everybody,
I had a hard drive failure last week. After lots of effort I've been able to
backup a 700GB database, with only one file with corruption.
When I do some big queries, it throws me errors on this faulty file:
could not read block 390041 of relation 1663/350994/351212: read only 0 o
=?ISO-8859-1?Q?Diego_Fern=E1ndez_Slezak?= writes:
> Hello everybody,
> I had a hard drive failure last week. After lots of effort I've been able to
> backup a 700GB database, with only one file with corruption.
> When I do some big queries, it throws me errors on this faulty file:
> could not rea
Hello everybody,
I had a hard drive failure last week. After lots of effort I've been able to
backup a 700GB database, with only one file with corruption.
When I do some big queries, it throws me errors on this faulty file:
could not read block 390041 of relation 1663/350994/351212: read only 0 o
On Thu, Nov 17, 2005 at 07:56:21PM +0100, Magnus Hagander wrote:
> The way I read it, a delay should help. It's basically running out of
> kernel buffers, and we just delay, somebody else (another process, or an
> IRQ handler, or whatever) should get finished with their I/O, free up
> the buffer, a
A couple clarifications:
There were only a few network sockets open.
I'm told that the eventlog was reviewed for any events which
mgiht be related to the failures before it was cleared. They
found none, so that makes it fairly certain there was no 2020
event.
-Kevin
>>> "Kevin Grittner" <[EMA
There weren't a large number of connections -- it seemed to be
that the one big update query, by itself, would do this. It seemed
to get through a lot of rows before failing. This table is normally
"insert only" -- so it would likely be getting most or all of the space
for inserting the updated r
> None of this seems material, however. It's pretty clear that
> the problem was exhaustion of the Windows page pool. Our
> Windows experts have reconfigured the machine (which had been
> tuned for Sybase ASE). Their changes have boosted the page
> pool from 20,000 entries to 180,000 entries
> >>> Tom Lane <[EMAIL PROTECTED]> >>>
> "Kevin Grittner" <[EMAIL PROTECTED]> writes:
> > None of this seems material, however. It's pretty clear that the
> > problem was exhaustion of the Windows page pool.
> > ...
> > If we don't want to tell Windows users to make highly technical
> > changes
I'm not an expert on that, but it seems reasonable to me that the
page pool would free space as the I/O system caught up with
the load. Also, I'm going on what was said by Qingqing and
in one of the pages he referenced:
http://support.microsoft.com/default.aspx?scid=kb;en-us;274310
-Kevin
>>>
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> None of this seems material, however. It's pretty clear that the
> problem was exhaustion of the Windows page pool.
> ...
> If we don't want to tell Windows users to make highly technical
> changes to the Windows registry in order to use PostgreSQL,
>
1) We run a couple Java applications on the same box to provide
middle tier access. When the box is heavily loaded, I think I've
seen about 80% PostgreSQL, 20% Java load.
2) I checked that no antivirus software was running, and had the
techs pare down the services running on that box to the absol
[copying this one over to hackers]
> Our DBAs reviewed the Microsoft documentation you referenced,
> modified the registry, and rebooted the OS. We've been
> beating up on the database without seeing the error so far.
> We'll keep at it for a while.
Very interesting. As this seems to be a re
Our DBAs reviewed the Microsoft documentation you referenced,
modified the registry, and rebooted the OS. We've been beating
up on the database without seeing the error so far. We'll keep at
it for a while.
-Kevin
>>> Qingqing Zhou <[EMAIL PROTECTED]> >>>
On Wed, 16 Nov 2005, Kevin Grittner
Qingqing Zhou <[EMAIL PROTECTED]> writes:
> On Wed, 16 Nov 2005, Kevin Grittner wrote:
>> [2005-11-16 11:59:29.015 ] 4904 LOG:
>> read failed on relation 1663/16385/1494810: -1 bytes, 1450
> 1450 ERROR_NO_SYSTEM_RESOURCES
> Insufficient system resources exist to complete the requested service
Hm
On Wed, 16 Nov 2005, Kevin Grittner wrote:
> Ran with this change. Didn't take long to hit it.
>
> [2005-11-16 11:59:29.015 ] 4904 LOG:
> read failed on relation 1663/16385/1494810: -1 bytes, 1450
> [2005-11-16 11:59:29.015 ] 4904 ERROR:
> could not read block 25447 of relation 1663/16385/149
Ran with this change. Didn't take long to hit it.
Let me know if there's anything else I can do.
[2005-11-16 11:59:29.015 ] 4904 LOG:
read failed on relation 1663/16385/1494810: -1 bytes, 1450
[2005-11-16 11:59:29.015 ] 4904 ERROR:
could not read block 25447 of relation 1663/16385/1494810: I
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> On Linux:
> md.c:445: warning: implicit declaration of function `GetLastError'
Of course. This is a Windows-only hack.
> On Windows:
> md.c:445: warning: int format, DWORD arg (arg 6)
> md.c:457: warning: int format, DWORD arg (arg 7)
I think this
This code generates warnings on both Linux and Windows. My C
is too rusty to feel confident of what to do.
On Linux:
md.c:445: warning: implicit declaration of function `GetLastError'
On Windows:
md.c:445: warning: int format, DWORD arg (arg 6)
md.c:457: warning: int format, DWORD arg (arg 7)
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> Is there anything you would like me to include in my build for my
> test runs, or any steps you would like me to take during the tests?
You might want to insert some debugging elog's into mdread() in md.c,
rather than in its caller smgrread. I'm conc
Is there anything you would like me to include in my build for my
test runs, or any steps you would like me to take during the tests?
-Kevin
>>> Tom Lane <[EMAIL PROTECTED]> >>>
As I said before, we
really really need to find out what the Windows-level error code is
--- "Invalid argument" isn'
I will patch, build, and run similar updates to try to hit the problem.
Hopefully I can have something to post later today.
-Kevin
>>> Qingqing Zhou <[EMAIL PROTECTED]> >>>
On Tue, 15 Nov 2005, Kevin Grittner wrote:
>
> Is there anything that anyone wants me to do at this point, to try
> to p
"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> ERROR: could not read block 1482762 of relation 1663/16385/16483:
> Invalid argument
> So the block number is increasing each time. I'm inclined to think
> that this is the result of the scan passing over rows added by itself.
It's just about impos
On Tue, 15 Nov 2005, Kevin Grittner wrote:
> I got the error log working on Windows (with redirect_stderr). I had
> to stop and restart postgres to do so. I ran the query (for the fourth
> time), and it completed successfully.
Strange - the phyiscal read for the 2nd, 3rd, 4th time should be t
If I have followed the chain correctly, I saw that you were trying to
run an update statement on a large number of records in a large table
right? I have changed my strategy in the past for this type of problem.
I don't know if it would have fixed this problem or not, but I have seen
with Postg
I got the error log working on Windows (with redirect_stderr). I had
to stop and restart postgres to do so. I ran the query (for the fourth
time), and it completed successfully.
I'm not inclined to believe that changing the redirect_stderr setting
would change this behavior, so I guess that eit
Correction:
dtr=> select count(*) from "DbTranRepository"
dtr-> WHERE (
dtr(> ("userId" <> UPPER("userId")) AND
dtr(> ("timestampValue" BETWEEN '2005-10-28' AND '2005-11-15'));
count
611255
(1 row)
I'm becoming more convinced that this happens as the UPDATE
runs into rows inser
The table has about 23.3 million rows, of which about 200,000 will
be affected by this update. Run time is about an hour. During the
first run, the table was the target of about 45,000 inserts. This rerun
was done as the only task. A third run (also by itself) gave this:
ERROR: could not read
>
> I reran the query. Same error, same relation, different block.
>
> dtr=> UPDATE
> dtr-> "DbTranRepository"
> dtr-> SET "userId" = UPPER("userId")
> dtr-> WHERE (
> dtr(> ("userId" <> UPPER("userId")) AND
> dtr(> ("timestampValue" BETWEEN '2005-10-28' AND '2005-11-15'));
> ERROR
On 11/15/05, Kevin Grittner <[EMAIL PROTECTED]> wrote:
Could my issue be the same problem as this thread?:http://archives.postgresql.org/pgsql-bugs/2005-11/msg00114.phpThe references to "Invalid Argument" caught my eye. That thread
did start from a very different point, though.-Kevin
It's possible
Could my issue be the same problem as this thread?:
http://archives.postgresql.org/pgsql-bugs/2005-11/msg00114.php
The references to "Invalid Argument" caught my eye. That thread
did start from a very different point, though.
-Kevin
>>> "Kevin Grittner" <[EMAIL PROTECTED]> >>>
It appears tha
It appears that the log file is not being written -- I'll start a
separate thread on that issue.
I reran the query. Same error, same relation, different block.
dtr=> UPDATE
dtr-> "DbTranRepository"
dtr-> SET "userId" = UPPER("userId")
dtr-> WHERE (
dtr(> ("userId" <> UPPER("userId")) AND
On Mon, Nov 14, 2005 at 06:19:16PM -0600, Kevin Grittner wrote:
> the moment aren't sure. The current machines are "transitional",
> and it may not be too late to set the permanent servers up with ECC
> memory. Is it something I should fight for?
Yes. Always.
A
--
Andrew Sullivan | [EMAIL P
Scott Marlowe <[EMAIL PROTECTED]> writes:
> On Mon, 2005-11-14 at 17:20, Kevin Grittner wrote:
>> ERROR: could not read block 649847 of relation 1663/16385/16483:
>> Invalid argument
> When a block is unreadable, this means that the OS is experiencing a
> read error from the hard drive.
I'd beli
Both machines are IBM xSeries 346 model 884042U with 6 drives
in a RAID 5 array through an IBM battery backed controller. We
had a couple of these lying around after replacing them with better,
but they have been pretty stable workhorses for us.
I'm checking on whether the RAM is ECC -- the techs
On 11/14/05, Scott Marlowe <[EMAIL PROTECTED]> wrote:
If you were running on top of a RAID 1+0 or RAID 5 array, such an errorwould likely never have happened, since it would have been detected bythe controller, and either the bad block would be mapped out or thedrive would be kicked out of the arra
On Mon, 2005-11-14 at 17:20, Kevin Grittner wrote:
> A programmer ran a query to fix some data against two "identical"
> databases -- one on Linux and one on Windows. They are both 8.1.0,
> running on dual hyperthreaded Xeons, with data on RAID5. The Linux
> update went fine. The Windows attempt
A programmer ran a query to fix some data against two "identical"
databases -- one on Linux and one on Windows. They are both 8.1.0,
running on dual hyperthreaded Xeons, with data on RAID5. The Linux
update went fine. The Windows attempt give this:
dtr=> UPDATE
dtr-> "DbTranRepository"
dtr
38 matches
Mail list logo