We already do that. But, solr is still raising exception for some file types, I have to wait for the customer to provide me the corresponding log from solr and message received by the mcf job.
Regards, Roland. On Thu, Oct 17, 2013 at 2:41 PM, Karl Wright <[email protected]> wrote: > Ah, here it is: > > http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html > > Karl > > > > On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <[email protected]> wrote: > >> Hi Roland, >> >> Usually 500 errors are from Tika (aka Solr Cell). If that's what you are >> seeing, there is a way to disable them. I don't remember precisely what >> you do, but it has been posted to this list (and others) so a google search >> should find that for you. >> >> Thanks! >> Karl >> >> >> >> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <[email protected]>wrote: >> >>> So far we had only to deal with HTTP code 500, because solr was not able >>> to process some file types. We manage to tel solr to ignore tika exception. >>> This helps us quite a lot, but solr as problem with processing some file >>> types, and I have not yet find a way to tell solr to basically skip errors, >>> while still logging them. >>> >>> I will check with the customer to get the error, but it was yesterday >>> when it shows up and they have continued with the indexing (we are still at >>> the initial indexing of the repository) and the logs with errors have >>> disappeared. >>> >>> >>> Thanks for your support, >>> >>> >>> Roland. >>> >>> >>> >>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <[email protected]> wrote: >>> >>>> Hi Roland, >>>> >>>> It depends on what the error code is. There is quite a bit of logic in >>>> the Solr connector (and in ManifoldCF itself) for handling errors of >>>> different kinds. Fundamentally there are two main kinds of error condition >>>> - one which causes a retry (and can, if so specified, cause either the >>>> offending document to be skipped or the job aborted) and another which >>>> always causes a job to abort. The Solr connector has to decide based on >>>> limited information exactly what to do. General HTTP error codes such as >>>> "500" errors, for example, contain little information and look just the >>>> same whether the error represent a document Tika is unhappy with, or >>>> something more fundamental, like a complete misconfiguration of Solr. >>>> >>>> If you can provide more detailed information as to the kind of error(s) >>>> you are seeing then we can advise you further. >>>> >>>> Karl >>>> >>>> >>>> >>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert >>>> <[email protected]>wrote: >>>> >>>>> Hi, >>>>> >>>>> I helped a customer to deploy solr+manifoldcf to index files from a >>>>> windows share drive. But every time solr is sending back an error message, >>>>> the manifoldcf jobs abort, which is not really convenient for hour long >>>>> indexing. >>>>> >>>>> So is there a possibility to configure manifold so it doesn't stopped >>>>> every time solr return an http code different from 200? >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> Roland. >>>>> >>>> >>>> >>> >> >
