On Fri, Dec 12, 2014 at 09:14:40AM -0500, Karl Wright wrote:
> Hi Kamil,
>
> You are getting a 404 error when ManifoldCF tries to delete a document from
> the ElasticSearch index:
>
> >>>>>>
> else if (code == 404)
> {
> setResult(IOutputHistoryActivity.HTTP_ERROR,Result.ERROR, "Page not
> found: " + response);
> throw new ManifoldCFException("Server/page not found");
> }
> <<<<<<
>
> The URL it is using is constructed as follows:
>
> >>>>>>
> String idField = URLEncoder.encode(documentURI);
> HttpDelete method = new HttpDelete(config.getServerLocation() +
> "/" + config.getIndexName() + "/" + config.getIndexType()
> + "/" + idField);
> call(method);
> <<<<<<
>
> So there are a number of possibilities. First possibility is that ES was
> down entirely when this job ended, and so document removal requests failed
> for a legitimate reason. Second, it may be that the document in question
> has already been deleted, and while this would formerly return a 200 error
> code in the version of ES the connector was written for, it now returns a
> 404. Finally, maybe the REST API changed so much that it is no longer
> possible to delete a document from the index this way. What version of
> ElasticSearch are you using, and can you find REST API documentation for
> that version that you could point me at? Can you do enough research to
> find out what should work here?
>
"version" : {
"number" : "1.4.1",
"build_hash" : "89d3241d670db65f994242c8e8383b169779e2d4",
"build_timestamp" : "2014-11-26T15:49:29Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
url for deleting is correct:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete.html
and I found this:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/delete-doc.html
"If the document isn’t found, we get a 404 Not Found response code and a body
like (...)"
K
>
>
>
> On Fri, Dec 12, 2014 at 8:56 AM, Kamil Żyta <[email protected]> wrote:
> >
> > Hi,
> > When I testing ES as indexer some job ends with 'Error: Server/page not
> > found'. In ES log I have
> > some too big doc exceptions. How this affect job? Full MCF logs:
> >
> > ERROR 2014-12-12 14:45:24,915 (Document cleanup thread '2') - Exception
> > tossed: Server/page not found
> > org.apache.manifoldcf.core.interfaces.ManifoldCFException: Server/page not
> > found
> > at
> > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.handleResultCode(ElasticSearchConnection.java:234)
> > at
> > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.call(ElasticSearchConnection.java:203)
> > at
> > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchDelete.execute(ElasticSearchDelete.java:45)
> > at
> > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.removeDocument(ElasticSearchConnector.java:578)
> > at
> > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2350)
> > at
> > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1059)
> > at
> > org.apache.manifoldcf.crawler.system.DocumentCleanupThread.run(DocumentCleanupThread.java:189)
> >
> > Thanks,
> > Kamil
> >