Re: Using mainfoldCF as a webcrawler with tika and solr

2018-08-14 Thread Karl Wright
Hi Sven, Please have a look at the Simple History report to see what happened to the documents you are interested in. The Web Connector will fetch binary documents no problem, but it sounds like you have something else in your configuration that is causing them to be rejected. The configuration

Using mainfoldCF as a webcrawler with tika and solr

2018-08-14 Thread Farrenkopf, Sven
I'm using manifoldCF with solr, trying to get it working as a webcrawler. Crawling the websites (HTML, Text) works fine, the problem is that links to binary documents (pdf, xlsx, docx, ...) don't work even if I put a tika-Transformation in the job. I haven't even found a written confirmation