Hi,
 
I have a system with the nutch configure.The all html pages that they are 
generated dinamically with Servlets a JSP, are correctly indexing with crawl, 
but I have a problem with the pdf and word files. My system save those files in 
database and in my portal I have urls that they show those files. But, those 
URLs are jsp, for example http://www.mydomain.com/myportal/file.jsp?id=xx and 
this URL returns an pdf file. The crawl doesn't reconize this contain. I test 
my system with URLs http://www.mydomain.com/myportal/file.pdf, and in this case 
the nutch indexes correctly.
 
Could you help me?
 
Monica

Reply via email to