On Feb 10, 2010, at 4:25pm, Kelly Vista wrote:
It seems like using Tika as a plug-in to Nutch for processing various non HTML formats is somewhat bleeding-edge. Can someone point me (or tell me) how I can simply use Tika in Nutch to crawl and index MS Office or PDF docs? Or is it now in there by default?
Should be there by default, once the Tika plug-in gets rolled in. -- Ken -------------------------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g