It's ok. I have found. But I have some strange errors :
050406 155957 fetch okay, but can't parse http://localhost:8080/testsIndex/file.doc, reason: Content truncated at 70954 bytes. Parser can't handle incomplete msword file. 050406 155958 fetch okay, but can't parse http://localhost:8080/testsIndex/file.pdf, reason: Content truncated at 70957 bytes. Parser can't handle incomplete pdf file. 050406 160001 fetch okay, but can't parse http://localhost:8080/testsIndex/file.rtf, reason: Exception parsing RTF document Thank you for helping me. Guillaume > How can I proceed to enable these parsers : what files must be > modified and how ? > > Thank you very much ! > > Guillaume > > > > You have to enable these parsers in your plugin > configuration. I know > > pdf and doc works great myself, not sure about the others > being supported. > > > > -byron > > > > -----Original Message----- > > From: "guillaume lefebvre" <[EMAIL PROTECTED]> > > To: "nutch-user" <[email protected]> > > Date: Wed, 6 Apr 2005 13:41:43 +0200 > > Subject: PDF, XML, DOC, RTF Parsing > > > > > Hi, > > > > > > I'm a new user of Nutch. > > > > > > I have some problems to index PDF, XML, DOC, RTF. Is it normal > > > ? Does Nutch support the PDF, XML, DOC and RTF parsing ? > > > > > > Thank you ! > > > Guillaume > > > > > > > > > Acc�dez au courrier �lectronique de La Poste : > www.laposte.net ; > > > 3615 LAPOSTENET (0,34�/mn) ; t�l : 08 92 68 13 50 (0,34�/mn) > > > > > > > > > > > > > > > Acc�dez au courrier �lectronique de La Poste : www.laposte.net ; > 3615 LAPOSTENET (0,34�/mn) ; t�l : 08 92 68 13 50 (0,34�/mn) > > > > Acc�dez au courrier �lectronique de La Poste : www.laposte.net ; 3615 LAPOSTENET (0,34�/mn) ; t�l : 08 92 68 13 50 (0,34�/mn)
