Hi Sven,
Please have a look at the Simple History report to see what happened to the
documents you are interested in.
The Web Connector will fetch binary documents no problem, but it sounds
like you have something else in your configuration that is causing them to
be rejected. The configuration
I'm using manifoldCF with solr, trying to get it working as a webcrawler.
Crawling the websites (HTML, Text) works fine, the problem is that links to
binary documents (pdf, xlsx, docx, ...) don't work even if I put a
tika-Transformation in the job. I haven't even found a written confirmation