Hi Nick,

thanks for your reply.

Am 16.02.12 16:51, schrieb Nick Burch:
On Tue, 14 Feb 2012, Stephan Mühlstrasser wrote:
https://issues.apache.org/jira/browse/TIKA-527
...

The problem is that using the proposed method does not work for me.
Any use of the configuration file apparently sends Tika into an
endless recursion, even without overriding a built-in parser in the
configuration file.

Are you able to produce a unit test that shows the problem?

That's what I was trying to provide with the example in my previous message:


If I understand it correctly, the following configuration file should
have the same effect as the built-in configuration:

$ cat tika-config.xml
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
</parsers>
</properties>

If you invoke the Tika CLI application with this configuration file, the error happens. Just start it like this: "java -Dtika.config=tika-config.xml -jar tika-app-1.0.jar --list-parsers" and the error will happen.

Ah, I'm not sure that's correct. I think you also need to give a
mimetypes and a detector. Looking at lines 145 to 172 of TikaConfig, it
seems that you either get the defaults with no config, or specify them
all with your own config


Ok, I see now in the source what you mean. Then the example in TIKA-527 is not complete, as it does not have mimetypes and a detector.

In the meantime since yesterday I got my override working by packaging a META-INF/services/org.apache.tika.parser.Parser into the JAR file together with my parser. So I don't need the configuration file approach anymore. But I think it still could be considered a bug if an incorrect/insufficient configuration file sends Tika into an endless recursion instead of producing a meaningful error message.

Thanks
Stephan

--
_______________________________________________________________
Stephan Mühlstrasser   [email protected]            www.pdflib.com
  PDFlib GmbH, Franziska-Bilek-Weg 9, 80339 München,  Germany
       Court of registry/Amtsgericht München HRB 129497
 Managing Directors/Geschäftsführer: Thomas Merz, Petra Porst
---------------------------------------------------------------
    PDFlib: powerful toolkits for PDF developers since 1997
_______ See www.pdflib.com/products for product details________

Reply via email to