DEBUG tika.TikaParser - Using Tika parser org.apache.tika.parser.txt.TXTParser for mime-type text/plain
The above indicates Tika is fired. But somehow I need to tell Tika to use HtmlParser for mime-type text/plain. Have to dig into Tika docs. Is it possible to do anything in Nutch ? On Sun, Nov 25, 2012 at 7:27 PM, Sourajit Basak <[email protected]>wrote: > Some of my target webpages return a mime type of text/plain though they > are htmls. I changed "http.accept" to include text/plain and configured > both tika & parse-html to see if those can be parsed. However, both seem to > produce no content. > > I changed parse-plugins.xml & the corresponding plugin.xml's to match this > mime type. > > Has anyone encountered this problem ? > > >

