Hi All,
As you all said, 1. i have added the powerpoint to mime type, 2. in the nutch-default.xml also i have added the powerpoint plugin in the plugins list 3. in plugin.xml also i have added the content-type as application/powerpoint but still i am getting the problem 050908 105407 fetching http://localhost:8080/search_sample/kmportal3.ppt 050908 105407 fetching http://localhost:8080/search_sample/testpdf.pdf 050908 105407 fetching http://localhost:8080/search_sample/kmportal10.ppt 050908 105407 fetching http://localhost:8080/search_sample/kmportal2.ppt 050908 105407 fetching http://localhost:8080/search_sample/kmportal4.ppt 050908 105407 fetching http://localhost:8080/search_sample/kmportal6.ppt 050908 105407 fetching http://localhost:8080/search_sample/testexcel.xls 050908 105407 fetching http://localhost:8080/search_sample/javaCertStudyNotes.pdf 050908 105407 fetching http://localhost:8080/search_sample/kmportal7.ppt 050908 105408 fetching http://localhost:8080/search_sample/testdoc.doc 050908 105408 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal3.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105408 fetching http://localhost:8080/search_sample/kmportal8.ppt 050908 105409 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal8.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105409 fetching http://localhost:8080/search_sample/kmportal9.ppt 050908 105410 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal9.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105410 fetching http://localhost:8080/search_sample/kmportal11.ppt 050908 105411 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal10.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105411 fetching http://localhost:8080/search_sample/kmportal5.ppt 050908 105412 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal11.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105413 fetching http://localhost:8080/search_sample/kmportal1.ppt 050908 105413 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal1.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105415 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal5.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105416 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal2.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105417 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal4.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint 050908 105418 fetch okay, but can't parse http://localhost:8080/search_sample/kmportal7.ppt, reason: failed(2,203): Content-Type not application/msword: application/powerpoint thanks, Ayyanar... --- Jérôme Charron <[EMAIL PROTECTED]> wrote: > > 3. implement a catch-all plugin, which is > equivalent to a Unix command > > strings(1) (I have an implementation of that which > I can contribute). > > And turn it off/on in the config, if it's off, > then the unknown content > > is skipped and logged, if it's on - then make the > best effort to extract > > text. > > Andrzej, I really like this solution... +1 > In such a case, other parse-plugin doesn't need > anymore to check the > content-type: if they get some content, they assume > it is of the good > content-type. > > Regards > > Jérôme > > > -- > http://motrech.free.fr/ > http://www.frutch.org/ > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
