Ah, here it is: http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html
Karl On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <[email protected]> wrote: > Hi Roland, > > Usually 500 errors are from Tika (aka Solr Cell). If that's what you are > seeing, there is a way to disable them. I don't remember precisely what > you do, but it has been posted to this list (and others) so a google search > should find that for you. > > Thanks! > Karl > > > > On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <[email protected]>wrote: > >> So far we had only to deal with HTTP code 500, because solr was not able >> to process some file types. We manage to tel solr to ignore tika exception. >> This helps us quite a lot, but solr as problem with processing some file >> types, and I have not yet find a way to tell solr to basically skip errors, >> while still logging them. >> >> I will check with the customer to get the error, but it was yesterday >> when it shows up and they have continued with the indexing (we are still at >> the initial indexing of the repository) and the logs with errors have >> disappeared. >> >> >> Thanks for your support, >> >> >> Roland. >> >> >> >> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <[email protected]> wrote: >> >>> Hi Roland, >>> >>> It depends on what the error code is. There is quite a bit of logic in >>> the Solr connector (and in ManifoldCF itself) for handling errors of >>> different kinds. Fundamentally there are two main kinds of error condition >>> - one which causes a retry (and can, if so specified, cause either the >>> offending document to be skipped or the job aborted) and another which >>> always causes a job to abort. The Solr connector has to decide based on >>> limited information exactly what to do. General HTTP error codes such as >>> "500" errors, for example, contain little information and look just the >>> same whether the error represent a document Tika is unhappy with, or >>> something more fundamental, like a complete misconfiguration of Solr. >>> >>> If you can provide more detailed information as to the kind of error(s) >>> you are seeing then we can advise you further. >>> >>> Karl >>> >>> >>> >>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert >>> <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> I helped a customer to deploy solr+manifoldcf to index files from a >>>> windows share drive. But every time solr is sending back an error message, >>>> the manifoldcf jobs abort, which is not really convenient for hour long >>>> indexing. >>>> >>>> So is there a possibility to configure manifold so it doesn't stopped >>>> every time solr return an http code different from 200? >>>> >>>> >>>> Thanks, >>>> >>>> >>>> Roland. >>>> >>> >>> >> >
