Hi Achilleas, Thanks for your quick response. We do have backups and a couple of spare servers running in parallel so we're safe in that sense. Thanks for your advice anyway.
To run *memtest86 *we have to go to the datacenter and that will take us a few days. Is there anything we can do remotely? Could it be anything different than memory HW problem? Btw, we are not using that server at the moment :) Thanks On Mon, Nov 15, 2010 at 12:03 PM, Achilleas Mantzios < ach...@matrix.gatewaynet.com> wrote: > Stop using the system immediately, since many things inserted to the DB > might simply be garbage. > Inspect your memory with memtest86. > > I would even suggest moving to a new HW if available, and start working > into two parallel directions: > > a) try to bring your DB into a sane state > b) try to fix your HW server, which apparently has problems. > > Files are not "disappearing" like that, in normal situations. > You cannot trust your current HW to run postgresql. Its a guarantee method > to lose your data, if you > go on by this start/stop/restore/pray circle. > > IMPORTANT: Handle your old (pre-memory upgrade) backups like expensive > chinese vases. > Start from those good and trustworthy backups to do any database "replay" > (or similar) work which > would lead to a viable situation. > > Στις Monday 15 November 2010 11:55:38 ο/η Pablo Delgado Díaz-Pache έγραψε: > > Hi all, > > > > We've been using postgres for 9 years without a problem until now! Two > > problems in a very short time! > > The first one is described in > > > http://postgresql.1045698.n5.nabble.com/Autovacuum-seems-to-block-database-WARNING-worker-took-too-long-to-start-td3264261.html > > This is another one (not related I think) ... > > > > Postgres server is usually working fine. All of a sudden we start getting > > these errors ... > > > > * **2010-11-09 11:49:15.320 > > CET|2|database1|10.19.0.51(18895)|20929|SELECT|4cd926fd.51c1|2010-11-09 > > 11:48:29 CET|10/417796|1390150|postgres| LOG: duration: 1518.422 ms > > execute <unnamed>: SELECT id_token_fk,xxxxxxxxx ORDER BY avadate* > > * **2010-11-09 11:52:25.364 > > CET|1|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09 > > 11:51:59 CET|7/430041|0|postgres| ERROR: could not open relation > > base/273198960/273198979: No such file or directory* > > * **2010-11-09 11:52:25.364 > > CET|2|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09 > > 11:51:59 CET|7/430041|0|postgres| STATEMENT: SELECT > id_token_fkxxxxxxxxxxx > > ORDER BY avadate* > > * **2010-11-09 11:52:29.981 > > CET|3|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09 > > 11:51:59 CET|7/430049|0|postgres| ERROR: could not open relation > > base/273198960/273199235: No such file or directory* > > * **2010-11-09 11:52:30.988 > > CET|6|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09 > > 11:51:59 CET|7/430050|0|postgres| STATEMENT: SELECT max(avadate) > xxxxxxxx > > 32036)* > > * **2010-11-09 11:53:36.346 > > CET|16|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09 > > 11:53:02 CET|94/516004|0|postgres| STATEMENT: SELECT * FROM "photos" > > xxxxxxxxxxxxxx LIMIT 1* > > * **2010-11-09 11:53:37.956 > > CET|17|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09 > > 11:53:02 CET|94/516025|0|postgres| ERROR: could not open relation > > base/271253899/271254075: No such file or directory* > > * **................* > > * **................* > > * **2010-11-09 11:53:55.560 CET|111|||26090||4cc6e970.65ea|2010-10-26 > > 16:45:04 CEST||0|| ERROR: could not open relation > base/273198960/273199235: > > No such file or directory* > > * **2010-11-09 11:53:55.560 CET|112|||26090||4cc6e970.65ea|2010-10-26 > > 16:45:04 CEST||0|| CONTEXT: writing block 8866 of relation > > base/273198960/273199235* > > * **2010-11-09 11:53:55.560 CET|113|||26090||4cc6e970.65ea|2010-10-26 > > 16:45:04 CEST||0|| WARNING: could not write block 8866 of > > base/273198960/273199235* > > * **2010-11-09 11:53:55.560 CET|114|||26090||4cc6e970.65ea|2010-10-26 > > 16:45:04 CEST||0|| DETAIL: Multiple failures --- write error might be > > permanent.* > > * **2010-11-09 11:53:56.590 CET|115|||26090||4cc6e970.65ea|2010-10-26 > > 16:45:04 CEST||0|| ERROR: could not open relation > base/273198960/273199235: > > No such file or directory* > > * **2010-11-09 11:53:56.590 CET|116|||26090||4cc6e970.65ea|2010-10-26 > > 16:45:04 CEST||0|| CONTEXT: writing block 8866 of relation > > base/273198960/273199235* > > > > See that there are 2 different databases involved. (database1 and > > database2). > > > > Looking for distinct errors (among the many we have in the log) I find > there > > are only 4 involved ... > > > > * **base/271253899/271254075* > > * **base/273198960/273198979* > > * **base/273198960/273199235* > > * **base/273198960/273199253* > > > > and those files are not in the postgres base directory. > > > > To fix it we have no option but to restart postgres (which restarts fine > > with a /etc/init.d/postgresql stop & start) > > However, once we restarted postgres some data was corrupted. Tables that > > used to have 4,5 million rows had only 60 rows. As a consequence we had > to > > restore from file system backup. > > Once we did that, it worked fine for a few days until it happened again. > > We're worried it can happen again! > > > > Could this error be a hardware problem? > > We recently increased the memory from 8GB to 28GB, although it was > working > > fine for more than 3 weeks. > > We also recently upgraded from postgres 8.3.6 to 8.4.5, althought it also > > worked fine for a few months. > > Upgrading to postgres 9 is easy for us. however, not sure that would > help. > > > > Some info of our server: > > > > OS: Centos 5.5 > > Kernel: 2.6.18-194.1.el5 > > Postgres version: 8.4.5 (installation out-of-the-box using yum) > > Server memory: 28GB > > > > > > Any help would be appreciated > > > > Pablo > > > > > > -- > Achilleas Mantzios > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin >