Hello,
We've had ManifoldCF 2.0.1 working well with SOLR for months on Windows 2012
using the single process model. We recently just noticed that new documents
are not getting ingested, even after restarting the job, the server, etc.
What I see in the logs are first a bunch of 500 errors coming out of SOLR as a
result of ManifoldCF trying to index .tif files that are found in the directory
structure being indexed. After that (not sure if related or not), I see a
bunch of these errors:
FATAL 2016-03-15 16:01:48,801 (Thread-1387745) -
C:\apache-manifoldcf-2.0.1\example\.\./dbname.data getFromFile failed 33337202
org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
at org.hsqldb.error.Error.error(Unknown Source)
at org.hsqldb.persist.DataFileCache.getFromFile(Unknown Source)
at org.hsqldb.persist.DataFileCache.get(Unknown Source)
at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
at org.hsqldb.index.NodeAVLDisk.findNode(Unknown Source)
at org.hsqldb.index.NodeAVLDisk.getRight(Unknown Source)
at org.hsqldb.index.IndexAVL.next(Unknown Source)
at org.hsqldb.index.IndexAVL.next(Unknown Source)
at org.hsqldb.index.IndexAVL$IndexRowIterator.getNextRow(Unknown Source)
at org.hsqldb.RangeVariable$RangeIteratorMain.findNext(Unknown Source)
at org.hsqldb.RangeVariable$RangeIteratorMain.next(Unknown Source)
at org.hsqldb.QuerySpecification.buildResult(Unknown Source)
at org.hsqldb.QuerySpecification.getSingleResult(Unknown Source)
at org.hsqldb.QuerySpecification.getResult(Unknown Source)
at org.hsqldb.StatementQuery.getResult(Unknown Source)
at org.hsqldb.StatementDMQL.execute(Unknown Source)
at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
at org.hsqldb.Session.execute(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:889)
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
Caused by: java.lang.NegativeArraySizeException
at org.hsqldb.lib.StringConverter.readUTF(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readString(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readChar(Unknown Source)
at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinaryDecode.readData(Unknown Source)
at org.hsqldb.RowAVLDisk.<init>(Unknown Source)
at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
... 21 more
ERROR 2016-03-15 16:01:48,911 (Stuffer thread) - Stuffer thread aborting and
restarting due to database connection reset: Database exception: SQLException
doing query (S1000): java.lang.NegativeArraySizeException
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception:
SQLException doing query (S1000): java.lang.NegativeArraySizeException
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702)
at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
at
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:771)
at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1444)
at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
at
org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performQuery(DBInterfaceHSQLDB.java:916)
at
org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataChunk(IncrementalIngester.java:1783)
at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataMultiple(IncrementalIngester.java:1748)
at
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataMultiple(IncrementalIngester.java:1703)
at
org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:254)
Caused by: java.sql.SQLException: java.lang.NegativeArraySizeException
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:889)
at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
Caused by: org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
After these errors occur, the job just seems to hang and not process any
further documents or log anything more in the manifoldcf.log. So I see the
error is coming out of the HyperSQL database, but I don't know why. There is
sufficient disk space. Now the database file is 33 Gb (larger than I'd expect
for our ~110,000 documents), but I haven't seen any evidence that we're hitting
a limit on file size. I'm afraid I'm not sure where to go from here to
further nail down the problem.
As always, any and all help is much appreciated.
Thanks,
-Ian