ZeroByteFileException / solr

2018-08-17 Thread Farrenkopf, Sven
Hello again,

it seems like MCF is sending empty data to SolR?

Around 3% of my documents (JDBC- and FileShare-Connector) throw the following 
exception:

Error from server at http://myserver:myport/solr/mycore: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

While these documents are not even empty, the bigger problem is that the job 
gets stuck in a loop while retrying endlessly to deal with those documents. At 
least that is what "Simple History" indicates.

And I can't see why MCF should transfer this empty data to solr in the first 
place? It's probably me doing sth wrong, but the question is WHAT?

Ah, and one more question for dummies like me: Is there a simple way to search 
through mailing-list-archives?

Sven


RE: Driver class not found: net.sourceforge.jtds.jdbc.Driver

2018-08-17 Thread Farrenkopf, Sven
It works! Awesome! Thank you! A LOT!

Sven

From: Karl Wright [mailto:daddy...@gmail.com]
Sent: Thursday, August 16, 2018 7:39 PM
To: user@manifoldcf.apache.org
Subject: Re: Driver class not found: net.sourceforge.jtds.jdbc.Driver

Hi Sven,

When MCF is built, two entirely distinct versions of the examples are created 
-- a standard version, and a "proprietary" version.  The proprietary version 
does not in general include any proprietary jars and leaves connectors that 
depend on them disabled in the connectors.xml file.  The proprietary version is 
not included in the shipping binary.  If you want everything to automatically 
be put together for you and you rely on proprietary jars you will need to build 
yourself.

However, it's easy to add root-level proprietary jars to the non-proprietary 
binary.  The start options in the examples directories all include references 
to  root-level proprietary jars; you just have to add your JDBC driver to those 
options files and you should be good to go.

Karl




On Thu, Aug 16, 2018 at 11:36 AM Farrenkopf, Sven 
mailto:sven.farrenk...@dreso.com>> wrote:
Hi there,

I'm trying to get the JDBC-Connector to work with the provided example. Target 
database is MSSQL. The JDBC-Connector seems to be registered successfully.

But every time I configure the JDBC-Connector via web-interface, the 
connection-status shows "Threw exception: 'Driver class not found: 
net.sourceforge.jtds.jdbc.Driver'"

While several sources (including Manifold-Website) state that third party 
libraries are required to BUILD the connectors, I have downloaded the binary 
version and I thought there was no need to build anything there. (Am I wrong?)

So do I have to put a required jar-file within the folders somewhere? I tried 
several jtds-jars in several folders with no luck. I simply want to test an 
sql-connection with manifold within the example.

Sven


Driver class not found: net.sourceforge.jtds.jdbc.Driver

2018-08-16 Thread Farrenkopf, Sven
Hi there,

I'm trying to get the JDBC-Connector to work with the provided example. Target 
database is MSSQL. The JDBC-Connector seems to be registered successfully.

But every time I configure the JDBC-Connector via web-interface, the 
connection-status shows "Threw exception: 'Driver class not found: 
net.sourceforge.jtds.jdbc.Driver'"

While several sources (including Manifold-Website) state that third party 
libraries are required to BUILD the connectors, I have downloaded the binary 
version and I thought there was no need to build anything there. (Am I wrong?)

So do I have to put a required jar-file within the folders somewhere? I tried 
several jtds-jars in several folders with no luck. I simply want to test an 
sql-connection with manifold within the example.

Sven


Using mainfoldCF as a webcrawler with tika and solr

2018-08-14 Thread Farrenkopf, Sven
I'm using manifoldCF with solr, trying to get it working as a webcrawler. 
Crawling the websites (HTML, Text) works fine, the problem is that links to 
binary documents (pdf, xlsx, docx, ...) don't work even if I put a 
tika-Transformation in the job. I haven't even found a written confirmation 
that the webcrawler-connector does support  binary documents, although some 
posts to the mailing-lists indicate that it is possible.

The documents are apparently recognized - I put a direct link to a pdf-document 
in the seeds and it is processed as I run the job.

But there is no error (Tika-errors are not ignored!) and the document is not 
transferred to solr. With no error-message I have nothing to work with ...

Any ideas/hints what to do? Does somebody know a tutorial for setting up a 
webcrawler with solr & tika? I haven't found any on the web, which made me ask 
myself if I'm trying sth impossible here?

Thanks in advance.

Sven