Hi James,

One of the plugins is Nutch uses Tika 1.2 as parser wrapper.
The list of Tika formats can be found below

http://tika.apache.org/1.2/formats.html

hth
Lewis

On Wed, Dec 12, 2012 at 4:02 PM, James Ford <[email protected]> wrote:
> Hello,
>
> Which document types can nutch parse? I know that it works with PDF but can
> it also parse ms office documents and such?
>
> Thanks,
>
> James Ford
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Parsing-of-document-types-tp4026372.html
> Sent from the Nutch - User mailing list archive at Nabble.com.



-- 
Lewis

Reply via email to