On 23 July 2010 10:08, Torsten Krah <[email protected]>wrote:

> Hi,
>
> i need to extend / overwrite the TikaParser and wrote my own plugin:
>
> plugin.xml does have this:
>
> [...]
>
>   <requires>
>      <import plugin="nutch-extensionpoints"/>
>      <import plugin="lib-nekohtml"/>
>      <import plugin="parse-tika"/>
>   </requires>
>
>      <extension point="org.apache.nutch.parse.Parser"
>              id="my.nutch.plugins.parse"
>              name="TSITikaParser">
>
>      <implementation id="my.nutch.plugins.parse.Parser"
>                      class="my.nutch.plugins.parse.TSITikaParser">
>       <parameter name="contentType" value="*"/>
>      </implementation>
>
>   </extension>
>
> [...]
>
> parse-plugins.xml:
>
> [...]
>
> <alias name="parse-tika"
>                extension-id="my.nutch.plugins.parse.Parser" />
>
> [...]
>



>
> The log output does read:
>
> 2010-07-23 10:16:37,071 DEBUG parse.ParseUtil - Parsing
> [http://localhost/test.pdf] with
> [my.nutch.plugins.parse.tsitikapar...@d6089a5]
> 2010-07-23 10:16:37,072 ERROR tika.TikaParser - Can't retrieve Tika parser
> for
> mime-type application/pdf
> 2010-07-23 10:16:37,076 WARN  fetcher.Fetcher - Error parsing:
> http://localhost/test.pdf: failed(2,0): Can't retrieve Tika parser for
> mime-
> type application/pdf
>
>
> Why does Tika does not find its parsers?
>

It's just that you've only declared an alias in parse-plugins.xml but no
association to a mime-type. You haven't made it a 'default' parser which you
can do by either

specifying

<parameter name="contentType" value="*"/>

in the plugin.xml file or your plugin

or

specifying

*    <mimeType name="*">
      <plugin id="parse-tika" />
    </mimeType>

*in parse-plugins.xml.



-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Reply via email to