It seems like using Tika as a plug-in to Nutch for processing various non HTML formats is somewhat bleeding-edge. Can someone point me (or tell me) how I can simply use Tika in Nutch to crawl and index MS Office or PDF docs? Or is it now in there by default?
Thanks!