Ok, code committed. Since you are obviously the first person to really try out the component feature, I wondered if you would be willing to submit a test connector that uses it, as a patch. I apologize for not fully testing this feature in the 1.7 release -- the person who requested the feature obviously didn't actually use it, and although I'd intended to write a test, that did not get done.
Thanks, Karl On Tue, Nov 25, 2014 at 5:03 AM, Karl Wright <[email protected]> wrote: > I believe I've found the problem with removeDocument(), and will commit a > fix shortly. > > To clarify your question about primary document disposition: > > For the case where you have no document, and you never expect there to be > a document again (because, for instance, it was deleted), then > removeDocument() is the right thing to call. If the case is different, > namely that the document exists but is no longer indexable for whatever > reason, it's better to call noDocument() instead, because you can supply a > version string, and then MCF will know not to ask you to process it again > unless that string changes. > > Karl > > > On Tue, Nov 25, 2014 at 4:46 AM, Karl Wright <[email protected]> wrote: > >> Hi Markus, >> >> I've created a ticket for the exception. CONNECTORS-1114. >> >> As for removal of a primary document that is not mentioned, do you mean >> that within processDocuments(), if you don't call any disposition method >> for a primary document, then that document is left around? If so, that >> behavior is intended -- it was necessary for backwards compatibility. The >> document should, of course, be cleaned up at the end of the job, as long as >> you are not doing a minimal crawl. >> >> If you are seeing some other kind of behavior, please try to describe it >> more completely so that I have a better idea what you mean. >> >> Thanks, >> Karl >> >> >> On Tue, Nov 25, 2014 at 3:25 AM, Markus Schuch <[email protected]> >> wrote: >> >>> Hi Karl, >>> >>> the patch for CONNECTORS-1111 fixes the cleanup issue. >>> >>> Another question about primary documents and their components: >>> >>> I have ingested a primary document with some components. >>> During the next processing the primary document should no longer be >>> indexed, but the sub components of it should still be indexed. >>> >>> My understanding is, that not mentioned components are automatically >>> removed. >>> Since the primary document is the "null" component, i expected the >>> framework would remove the primary document component if not mentioned, too. >>> >>> But this is not the case. Is this another bug or do i have to remove the >>> primary document somehow manually? >>> >>> There is an activity method removeDocument(identifier) which seems >>> related. >>> But i do not fully understand the described usage scenario in the >>> method's javadoc. >>> >>> I tried the method. The result was the following database exception: >>> (Patches for CONNECTORS-1110 and CONNECTORS-1111 are applied) >>> >>> 2014-11-25 08:30:07,868 ERROR [Worker thread '1'] >>> org.apache.manifoldcf.crawlerthreads: Worker thread aborting and restarting >>> due to database connection reset: Database exception: SQLException doing >>> query (HY0000): You need to set exactly 3 parameters on the prepared >>> statement >>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database >>> exception: SQLException doing query (HY0000): You need to set exactly 3 >>> parameters on the prepared statement >>> at >>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702) >>> at >>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728) >>> at >>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:762) >>> at >>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1435) >>> at >>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146) >>> at >>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191) >>> at >>> org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:875) >>> at >>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221) >>> at >>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.findRowIdsForDocIds(IncrementalIngester.java:1518) >>> at >>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentRemoveMultiple(IncrementalIngester.java:1377) >>> at >>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentRemove(IncrementalIngester.java:803) >>> at >>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.removeDocument(WorkerThread.java:1674) >>> at >>> com.example.mcf.TestConnector.processDocuments(TestConnector.java:278) >>> at >>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:670) >>> at >>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:649) >>> at >>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:402) >>> at >>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:380) >>> Caused by: java.sql.SQLException: You need to set exactly 3 parameters >>> on the prepared statement >>> at >>> org.mariadb.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:149) >>> at >>> org.mariadb.jdbc.internal.SQLExceptionMapper.throwException(SQLExceptionMapper.java:106) >>> at >>> org.mariadb.jdbc.MySQLStatement.executeQueryEpilog(MySQLStatement.java:264) >>> at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:288) >>> at >>> org.mariadb.jdbc.MySQLStatement.executeQuery(MySQLStatement.java:302) >>> at >>> org.mariadb.jdbc.MySQLPreparedStatement.executeQuery(MySQLPreparedStatement.java:112) >>> at >>> org.apache.manifoldcf.core.database.Database.execute(Database.java:880) >>> at >>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683) >>> Caused by: org.mariadb.jdbc.internal.common.QueryException: You need to >>> set exactly 3 parameters on the prepared statement >>> at >>> org.mariadb.jdbc.internal.common.query.MySQLParameterizedQuery.validate(MySQLParameterizedQuery.java:117) >>> at >>> org.mariadb.jdbc.internal.mysql.MySQLProtocol.executeQuery(MySQLProtocol.java:976) >>> at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:281) >>> >>> Regards, >>> Markus >>> >> >> >
