I've created CONNECTORS-1120 for this fix. I should have something to try shortly.
Karl On Fri, Dec 12, 2014 at 9:41 AM, Kamil Żyta <[email protected]> wrote: > > On Fri, Dec 12, 2014 at 09:14:40AM -0500, Karl Wright wrote: > > Hi Kamil, > > > > You are getting a 404 error when ManifoldCF tries to delete a document > from > > the ElasticSearch index: > > > > >>>>>> > > else if (code == 404) > > { > > setResult(IOutputHistoryActivity.HTTP_ERROR,Result.ERROR, "Page not > > found: " + response); > > throw new ManifoldCFException("Server/page not found"); > > } > > <<<<<< > > > > The URL it is using is constructed as follows: > > > > >>>>>> > > String idField = URLEncoder.encode(documentURI); > > HttpDelete method = new HttpDelete(config.getServerLocation() + > > "/" + config.getIndexName() + "/" + config.getIndexType() > > + "/" + idField); > > call(method); > > <<<<<< > > > > So there are a number of possibilities. First possibility is that ES was > > down entirely when this job ended, and so document removal requests > failed > > for a legitimate reason. Second, it may be that the document in question > > has already been deleted, and while this would formerly return a 200 > error > > code in the version of ES the connector was written for, it now returns a > > 404. Finally, maybe the REST API changed so much that it is no longer > > possible to delete a document from the index this way. What version of > > ElasticSearch are you using, and can you find REST API documentation for > > that version that you could point me at? Can you do enough research to > > find out what should work here? > > > > "version" : { > "number" : "1.4.1", > "build_hash" : "89d3241d670db65f994242c8e8383b169779e2d4", > "build_timestamp" : "2014-11-26T15:49:29Z", > "build_snapshot" : false, > "lucene_version" : "4.10.2" > }, > > url for deleting is correct: > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete.html > and I found this: > http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/delete-doc.html > > "If the document isn’t found, we get a 404 Not Found response code and a > body like (...)" > > K > > > > > > > > > On Fri, Dec 12, 2014 at 8:56 AM, Kamil Żyta <[email protected]> > wrote: > > > > > > Hi, > > > When I testing ES as indexer some job ends with 'Error: Server/page not > > > found'. In ES log I have > > > some too big doc exceptions. How this affect job? Full MCF logs: > > > > > > ERROR 2014-12-12 14:45:24,915 (Document cleanup thread '2') - Exception > > > tossed: Server/page not found > > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: Server/page > not > > > found > > > at > > > > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.handleResultCode(ElasticSearchConnection.java:234) > > > at > > > > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.call(ElasticSearchConnection.java:203) > > > at > > > > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchDelete.execute(ElasticSearchDelete.java:45) > > > at > > > > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.removeDocument(ElasticSearchConnector.java:578) > > > at > > > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2350) > > > at > > > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1059) > > > at > > > > org.apache.manifoldcf.crawler.system.DocumentCleanupThread.run(DocumentCleanupThread.java:189) > > > > > > Thanks, > > > Kamil > > > >
