Using Tika to crawl doc, pdf, etc.

Kelly Vista Wed, 10 Feb 2010 16:26:00 -0800

It seems like using Tika as a plug-in to Nutch for processing various
non HTML formats is somewhat bleeding-edge.  Can someone point me (or
tell me) how I can simply use Tika in Nutch to crawl and index MS
Office or PDF docs?  Or is it now in there by default?


Thanks!

Using Tika to crawl doc, pdf, etc.

Reply via email to