Hi Michael,

I have enabled the ppt extension from the
crawl-urlfilter.txt, Now it is fetching the powerpoint
files,

But i am getting the following error, bcos  ppt files
content type is not taken by nutch..



050906 175342 fetching
http://localhost:8080/search_sample/kmportal3.ppt
050906 175342 fetching
http://localhost:8080/search_sample/testpdf.pdf
050906 175342 fetching
http://localhost:8080/search_sample/kmportal10.ppt
050906 175342 fetching
http://localhost:8080/search_sample/testdoc.doc
050906 175342 fetching
http://localhost:8080/search_sample/kmportal2.ppt
050906 175342 fetching
http://localhost:8080/search_sample/kmportal4.ppt
050906 175342 fetching
http://localhost:8080/search_sample/kmportal6.ppt
050906 175342 fetching
http://localhost:8080/search_sample/testexcel.xls
050906 175342 fetching
http://localhost:8080/search_sample/javaCertStudyNotes.pdf
050906 175342 fetching
http://localhost:8080/search_sample/kmportal7.ppt
050906 175342 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal3.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175342 fetching
http://localhost:8080/search_sample/kmportal8.ppt
050906 175343 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal8.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175343 fetching
http://localhost:8080/search_sample/kmportal9.ppt
050906 175344 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal9.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175344 fetching
http://localhost:8080/search_sample/kmportal11.ppt
050906 175347 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal4.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175348 fetching
http://localhost:8080/search_sample/kmportal5.ppt
050906 175348 fetching
http://localhost:8080/search_sample/kmportal1.ppt
050906 175350 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal7.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175351 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal10.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175353 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal6.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175354 fetch okay, but can't parse
http://localhost:8080/search_sample/testexcel.xls,
reason: failed(2,203): Content-Type not
application/msword: application/vnd.ms-excel
050906 175355 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal11.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175356 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal5.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175358 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal1.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050906 175359 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal2.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint


thanks,
Ayyanar..

--- Michael Nebel <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> have you checked the filters? (regex-urlfilter or
> crawl-urlfilter)? The 
> ending ".ppt" ist disabled by default.
> 
> Regards
> 
>       Michael
> 
> Ayyanar Inbamohan wrote:
> 
> > Hi all,
> > 
> > I am using the powerpoint plugin from JIRA, and
> when i
> > crawl my application having link to the ppt, nutch
> 7.0
> > is not at all fetching the powerpoint files.
> > 
> > i am crawling my local appliation 
> > 
> > http://localhost:8080/search_sample/index.html
> > 
> > this url, i have given in the url.intranet, 
> > 
> > i gave some href to powerpoint file in index.html,
> 
> > 
> > and then started but it is not crawling
> > 
> > 
> > 
> > Thanks in advance..
> > 
> > thanks,
> > Ayyanar....
> > 
> 
> -- 
> Michael Nebel
> http://www.nebel.de/
> http://www.netluchs.de/
> 
> 



        
                
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to