ZeroByteFileException / solr
Hello again, it seems like MCF is sending empty data to SolR? Around 3% of my documents (JDBC- and FileShare-Connector) throw the following exception: Error from server at http://myserver:myport/solr/mycore: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes While these documents are not even empty, the bigger problem is that the job gets stuck in a loop while retrying endlessly to deal with those documents. At least that is what "Simple History" indicates. And I can't see why MCF should transfer this empty data to solr in the first place? It's probably me doing sth wrong, but the question is WHAT? Ah, and one more question for dummies like me: Is there a simple way to search through mailing-list-archives? Sven
RE: Driver class not found: net.sourceforge.jtds.jdbc.Driver
It works! Awesome! Thank you! A LOT! Sven From: Karl Wright [mailto:daddy...@gmail.com] Sent: Thursday, August 16, 2018 7:39 PM To: user@manifoldcf.apache.org Subject: Re: Driver class not found: net.sourceforge.jtds.jdbc.Driver Hi Sven, When MCF is built, two entirely distinct versions of the examples are created -- a standard version, and a "proprietary" version. The proprietary version does not in general include any proprietary jars and leaves connectors that depend on them disabled in the connectors.xml file. The proprietary version is not included in the shipping binary. If you want everything to automatically be put together for you and you rely on proprietary jars you will need to build yourself. However, it's easy to add root-level proprietary jars to the non-proprietary binary. The start options in the examples directories all include references to root-level proprietary jars; you just have to add your JDBC driver to those options files and you should be good to go. Karl On Thu, Aug 16, 2018 at 11:36 AM Farrenkopf, Sven mailto:sven.farrenk...@dreso.com>> wrote: Hi there, I'm trying to get the JDBC-Connector to work with the provided example. Target database is MSSQL. The JDBC-Connector seems to be registered successfully. But every time I configure the JDBC-Connector via web-interface, the connection-status shows "Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'" While several sources (including Manifold-Website) state that third party libraries are required to BUILD the connectors, I have downloaded the binary version and I thought there was no need to build anything there. (Am I wrong?) So do I have to put a required jar-file within the folders somewhere? I tried several jtds-jars in several folders with no luck. I simply want to test an sql-connection with manifold within the example. Sven
Driver class not found: net.sourceforge.jtds.jdbc.Driver
Hi there, I'm trying to get the JDBC-Connector to work with the provided example. Target database is MSSQL. The JDBC-Connector seems to be registered successfully. But every time I configure the JDBC-Connector via web-interface, the connection-status shows "Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'" While several sources (including Manifold-Website) state that third party libraries are required to BUILD the connectors, I have downloaded the binary version and I thought there was no need to build anything there. (Am I wrong?) So do I have to put a required jar-file within the folders somewhere? I tried several jtds-jars in several folders with no luck. I simply want to test an sql-connection with manifold within the example. Sven
Using mainfoldCF as a webcrawler with tika and solr
I'm using manifoldCF with solr, trying to get it working as a webcrawler. Crawling the websites (HTML, Text) works fine, the problem is that links to binary documents (pdf, xlsx, docx, ...) don't work even if I put a tika-Transformation in the job. I haven't even found a written confirmation that the webcrawler-connector does support binary documents, although some posts to the mailing-lists indicate that it is possible. The documents are apparently recognized - I put a direct link to a pdf-document in the seeds and it is processed as I run the job. But there is no error (Tika-errors are not ignored!) and the document is not transferred to solr. With no error-message I have nothing to work with ... Any ideas/hints what to do? Does somebody know a tutorial for setting up a webcrawler with solr & tika? I haven't found any on the web, which made me ask myself if I'm trying sth impossible here? Thanks in advance. Sven