Hi Guys,
We are currently having some MCF / postgres issues and I was hoping to get some
help.
Configuration:
MCF 2.1
Postgres 8.4
We have followed the postgres configuration best practices e.g. turned off
autovaccum etc.
We are carrying out db maintenance via cron
- Lazy vaccum once a day
- Full vacumm once a week
- DB reindex once a week
Jobs:
- 1 continuous job that manages 60,000+ docs
- 3 small stand-alone jobs
Issue:
We are randomly getting the below error - in one case the whole db has become
corrupted.
MCF Error:
2015-06-27 07:12:04,352 THREAD="Seeding thread" SEVERITY=INFO
logger=org.apache.manifoldcf.crawler.jobs.JobManager.noteJobSeeded(JobManager.java:6508)
MESSAGE="Job 1435140141926 has been successfully reseeded"
2015-06-27 07:38:04,433 THREAD="Seeding thread" SEVERITY=INFO
logger=org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForSeeding(JobManager.java:7051)
MESSAGE="Marked job 1435140141926 for seeding"
2015-06-27 07:42:32,876 THREAD="Seeding thread" SEVERITY=INFO
logger=org.apache.manifoldcf.crawler.jobs.JobManager.noteJobSeeded(JobManager.java:6508)
MESSAGE="Job 1435140141926 has been successfully reseeded"
2015-06-27 07:45:30,176 THREAD="Assessment thread" SEVERITY=ERROR
logger=org.apache.manifoldcf.crawler.system.AssessmentThread.run(AssessmentThread.java:77)
MESSAGE="Assessment thread aborting and restarting due to database connection
reset: Database exception: SQLException doing query (08006): An I/O error
occured while sending to the backend."
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception:
SQLException doing query (08006): An I/O error occured while sending to the
backend.
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702)
at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.startATransaction(DBInterfacePostgreSQL.java:1259)
at
org.apache.manifoldcf.core.database.Database.internalTransactionBegin(Database.java:262)
at
org.apache.manifoldcf.core.database.Database.synchronizeTransactions(Database.java:242)
at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1432)
at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
at
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:835)
at
org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
at
org.apache.manifoldcf.crawler.jobs.Jobs.assessMarkedJobs(Jobs.java:2052)
at
org.apache.manifoldcf.crawler.jobs.JobManager.assessMarkedJobs(JobManager.java:802)
at
org.apache.manifoldcf.crawler.system.AssessmentThread.run(AssessmentThread.java:65)
Caused by: org.postgresql.util.PSQLException: An I/O error occured while
sending to the backend.
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:283)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:374)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:366)
at
org.apache.manifoldcf.core.database.Database.execute(Database.java:863)
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
Caused by: java.io.EOFException
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:276)
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1660)
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
... 5 more
2015-06-27 07:45:33,453 THREAD="Set priority thread" SEVERITY=ERROR
logger=org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:161)
MESSAGE="Set priority thread aborting and restarting due to database
connection reset: Database exception: SQLException doing query (08006): An I/O
error occured while sending to the backend."
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception:
SQLException doing query (08006): An I/O error occured while sending to the
backend.
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702)
at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.startATransaction(DBInterfacePostgreSQL.java:1259)
at
org.apache.manifoldcf.core.database.Database.internalTransactionBegin(Database.java:262)
at
org.apache.manifoldcf.core.database.Database.synchronizeTransactions(Database.java:242)
at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1432)
at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
at
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:835)
at
org.apache.manifoldcf.crawler.jobs.JobManager.getNextNotYetProcessedReprioritizationDocuments(JobManager.java:2249)
at
org.apache.manifoldcf.crawler.system.SetPriorityThread.run(SetPriorityThread.java:138)
Caused by: org.postgresql.util.PSQLException: An I/O error occured while
sending to the backend.
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:283)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:374)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:366)
at
org.apache.manifoldcf.core.database.Database.execute(Database.java:863)
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
Caused by: java.io.EOFException
at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:276)
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1660)
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
... 5 more
2015-06-28 15:44:21,728 THREAD="tomcat-http--46" SEVERITY=INFO
logger=org.apache.manifoldcf.ui.beans.AdminProfile
The job hung.
Corresponding Postgres log:
ERROR: could not write block 11569 of relation base/27977/28241: Input/output
error
CONTEXT: writing block 11569 of relation base/27977/28241
ERROR: could not write block 3224 of relation base/27977/28241: Input/output
error
CONTEXT: writing block 3224 of relation base/27977/28241
CONTEXT: writing block 3256 of relation base/27977/28241
ERROR: could not close file "base/27977/28161": Input/output error
PANIC: could not fdatasync log file 3, segment 157: Input/output error
STATEMENT: COMMIT
LOG: server process (PID 25059) was terminated by signal 6: Aborted
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat
your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat
your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2015-06-27 07:30:42 EDT
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 3/9AAADB60
LOG: record with zero length at 3/9DAD22E0
LOG: redo done at 3/9DAD22B0
LOG: last completed transaction was at log time 2015-06-27 07:45:18.768223-04
LOG: database system is ready to accept connections
FATAL: database "template0" is not currently accepting connections
ERROR: could not fsync segment 0 of relation base/27977/28019: Input/output
error
Has anybody had similar issues? Or does anything stand out from our
configuration
Thanks,
Colin