Hi James, One of the plugins is Nutch uses Tika 1.2 as parser wrapper. The list of Tika formats can be found below
http://tika.apache.org/1.2/formats.html hth Lewis On Wed, Dec 12, 2012 at 4:02 PM, James Ford <[email protected]> wrote: > Hello, > > Which document types can nutch parse? I know that it works with PDF but can > it also parse ms office documents and such? > > Thanks, > > James Ford > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Parsing-of-document-types-tp4026372.html > Sent from the Nutch - User mailing list archive at Nabble.com. -- Lewis

