DEBUG tika.TikaParser - Using Tika parser
org.apache.tika.parser.txt.TXTParser for mime-type text/plain

The above indicates Tika is fired. But somehow I need to tell Tika to use
HtmlParser for mime-type text/plain. Have to dig into Tika docs.

Is it possible to do anything in Nutch ?

On Sun, Nov 25, 2012 at 7:27 PM, Sourajit Basak <[email protected]>wrote:

> Some of my target webpages return a mime type of text/plain though they
> are htmls. I changed "http.accept" to include text/plain and configured
> both tika & parse-html to see if those can be parsed. However, both seem to
> produce no content.
>
> I changed parse-plugins.xml & the corresponding plugin.xml's to match this
> mime type.
>
> Has anyone encountered this problem ?
>
>
>

Reply via email to