Thanks, Karl.
 
I am using a single Windows shares repository connection to a folder on our 
file server which currently contains a total of 143,997 files and 54,424 
folders (59.2 Gb of total data) of which ManifoldCF seems to identify just over 
108,000 as indexable.  The job specifies the following:
 

1. Include  indexable  file(s)  matching  * 
2. Include  directory(s)  matching  * 

No custom connectors.   I kept this simple because I'm a simple guy.  :-)    As 
such, it's entirely possible that I did something stupid when I set it up, but 
I'm not seeing anything else obvious that seems worth pointing out.   
 
-Ian

>>> Karl Wright <[email protected]> 3/16/2016 12:03 PM >>>
Hi Ian,

The database size seems way too big for this crawl size. I've not seen this 
problem before but I suspect that whatever is causing the bloat is also causing 
HSQLDB to fail.

Can you give me further details about what repository connections you are 
using? It is possible that there's a heretofore unknown pathological case you 
are running into during the crawl. Are there any custom connectors involved?

If we rule out a bug of some kind, then the next thing to do would be to go to 
a real database, e.g. PostgreSQL.

Karl


On Wed, Mar 16, 2016 at 11:04 AM, Ian Zapczynski 
<[email protected]> wrote:


Hello,
We've had ManifoldCF 2.0.1 working well with SOLR for months on Windows 2012 
using the single process model. We recently just noticed that new documents are 
not getting ingested, even after restarting the job, the server, etc. What I 
see in the logs are first a bunch of 500 errors coming out of SOLR as a result 
of ManifoldCF trying to index .tif files that are found in the directory 
structure being indexed. After that (not sure if related or not), I see a bunch 
of these errors:
FATAL 2016-03-15 16:01:48,801 (Thread-1387745) - 
C:\apache-manifoldcf-2.0.1\example\.\./dbname.data getFromFile failed 33337202
org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
at org.hsqldb.error.Error.error(Unknown Source)
at org.hsqldb.persist.DataFileCache.getFromFile(Unknown Source)
at org.hsqldb.persist.DataFileCache.get(Unknown Source)
at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
at org.hsqldb.index.NodeAVLDisk.findNode(Unknown Source)
at org.hsqldb.index.NodeAVLDisk.getRight(Unknown Source)
at org.hsqldb.index.IndexAVL.next(Unknown Source)
at org.hsqldb.index.IndexAVL.next(Unknown Source)
at org.hsqldb.index.IndexAVL$IndexRowIterator.getNextRow(Unknown Source)
at org.hsqldb.RangeVariable$RangeIteratorMain.findNext(Unknown Source)
at org.hsqldb.RangeVariable$RangeIteratorMain.next(Unknown Source)
at org.hsqldb.QuerySpecification.buildResult(Unknown Source)
at org.hsqldb.QuerySpecification.getSingleResult(Unknown Source)
at org.hsqldb.QuerySpecification.getResult(Unknown Source)
at org.hsqldb.StatementQuery.getResult(Unknown Source)
at org.hsqldb.StatementDMQL.execute(Unknown Source)
at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
at org.hsqldb.Session.execute(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:889)
at 
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
Caused by: java.lang.NegativeArraySizeException
at org.hsqldb.lib.StringConverter.readUTF(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readString(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readChar(Unknown Source)
at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
at org.hsqldb.rowio.RowInputBinaryDecode.readData(Unknown Source)
at org.hsqldb.RowAVLDisk.<init>(Unknown Source)
at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
... 21 more
ERROR 2016-03-15 16:01:48,911 (Stuffer thread) - Stuffer thread aborting and 
restarting due to database connection reset: Database exception: SQLException 
doing query (S1000): java.lang.NegativeArraySizeException
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: 
SQLException doing query (S1000): java.lang.NegativeArraySizeException
at 
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702)
at 
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
at 
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:771)
at 
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1444)
at 
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
at 
org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performQuery(DBInterfaceHSQLDB.java:916)
at 
org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataChunk(IncrementalIngester.java:1783)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataMultiple(IncrementalIngester.java:1748)
at 
org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataMultiple(IncrementalIngester.java:1703)
at 
org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:254)
Caused by: java.sql.SQLException: java.lang.NegativeArraySizeException
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:889)
at 
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
Caused by: org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
After these errors occur, the job just seems to hang and not process any 
further documents or log anything more in the manifoldcf.log. So I see the 
error is coming out of the HyperSQL database, but I don't know why. There is 
sufficient disk space. Now the database file is 33 Gb (larger than I'd expect 
for our ~110,000 documents), but I haven't seen any evidence that we're hitting 
a limit on file size. I'm afraid I'm not sure where to go from here to further 
nail down the problem.
As always, any and all help is much appreciated.
Thanks,


-Ian


Reply via email to