I believe I've found the problem with removeDocument(), and will commit a fix shortly.
To clarify your question about primary document disposition: For the case where you have no document, and you never expect there to be a document again (because, for instance, it was deleted), then removeDocument() is the right thing to call. If the case is different, namely that the document exists but is no longer indexable for whatever reason, it's better to call noDocument() instead, because you can supply a version string, and then MCF will know not to ask you to process it again unless that string changes. Karl On Tue, Nov 25, 2014 at 4:46 AM, Karl Wright <[email protected]> wrote: > Hi Markus, > > I've created a ticket for the exception. CONNECTORS-1114. > > As for removal of a primary document that is not mentioned, do you mean > that within processDocuments(), if you don't call any disposition method > for a primary document, then that document is left around? If so, that > behavior is intended -- it was necessary for backwards compatibility. The > document should, of course, be cleaned up at the end of the job, as long as > you are not doing a minimal crawl. > > If you are seeing some other kind of behavior, please try to describe it > more completely so that I have a better idea what you mean. > > Thanks, > Karl > > > On Tue, Nov 25, 2014 at 3:25 AM, Markus Schuch <[email protected]> > wrote: > >> Hi Karl, >> >> the patch for CONNECTORS-1111 fixes the cleanup issue. >> >> Another question about primary documents and their components: >> >> I have ingested a primary document with some components. >> During the next processing the primary document should no longer be >> indexed, but the sub components of it should still be indexed. >> >> My understanding is, that not mentioned components are automatically >> removed. >> Since the primary document is the "null" component, i expected the >> framework would remove the primary document component if not mentioned, too. >> >> But this is not the case. Is this another bug or do i have to remove the >> primary document somehow manually? >> >> There is an activity method removeDocument(identifier) which seems >> related. >> But i do not fully understand the described usage scenario in the >> method's javadoc. >> >> I tried the method. The result was the following database exception: >> (Patches for CONNECTORS-1110 and CONNECTORS-1111 are applied) >> >> 2014-11-25 08:30:07,868 ERROR [Worker thread '1'] >> org.apache.manifoldcf.crawlerthreads: Worker thread aborting and restarting >> due to database connection reset: Database exception: SQLException doing >> query (HY0000): You need to set exactly 3 parameters on the prepared >> statement >> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database >> exception: SQLException doing query (HY0000): You need to set exactly 3 >> parameters on the prepared statement >> at >> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702) >> at >> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728) >> at >> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:762) >> at >> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1435) >> at >> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146) >> at >> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191) >> at >> org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:875) >> at >> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221) >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.findRowIdsForDocIds(IncrementalIngester.java:1518) >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentRemoveMultiple(IncrementalIngester.java:1377) >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentRemove(IncrementalIngester.java:803) >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.removeDocument(WorkerThread.java:1674) >> at >> com.example.mcf.TestConnector.processDocuments(TestConnector.java:278) >> at >> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:670) >> at >> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:649) >> at >> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:402) >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:380) >> Caused by: java.sql.SQLException: You need to set exactly 3 parameters on >> the prepared statement >> at >> org.mariadb.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:149) >> at >> org.mariadb.jdbc.internal.SQLExceptionMapper.throwException(SQLExceptionMapper.java:106) >> at >> org.mariadb.jdbc.MySQLStatement.executeQueryEpilog(MySQLStatement.java:264) >> at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:288) >> at >> org.mariadb.jdbc.MySQLStatement.executeQuery(MySQLStatement.java:302) >> at >> org.mariadb.jdbc.MySQLPreparedStatement.executeQuery(MySQLPreparedStatement.java:112) >> at >> org.apache.manifoldcf.core.database.Database.execute(Database.java:880) >> at >> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683) >> Caused by: org.mariadb.jdbc.internal.common.QueryException: You need to >> set exactly 3 parameters on the prepared statement >> at >> org.mariadb.jdbc.internal.common.query.MySQLParameterizedQuery.validate(MySQLParameterizedQuery.java:117) >> at >> org.mariadb.jdbc.internal.mysql.MySQLProtocol.executeQuery(MySQLProtocol.java:976) >> at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:281) >> >> Regards, >> Markus >> > >
