Hi,
I have a system with the nutch configure.The all html pages that they are
generated dinamically with Servlets a JSP, are correctly indexing with crawl,
but I have a problem with the pdf and word files. My system save those files in
database and in my portal I have urls that they show those files. But, those
URLs are jsp, for example http://www.mydomain.com/myportal/file.jsp?id=xx and
this URL returns an pdf file. The crawl doesn't reconize this contain. I test
my system with URLs http://www.mydomain.com/myportal/file.pdf, and in this case
the nutch indexes correctly.
Could you help me?
Monica