Re: [Nutch-general] ASP Parser

David Spencer Tue, 10 May 2005 12:45:00 -0700

Seth Taylor wrote:

I've recently just installed and configured Nutch from source.  From
what I've read by default, Nutch will parse text and html based
documents only.  I have a site I'm trying to crawl which is all asp
pages.  I put the asp mime type in the mime-type.xml document.  What
else do I need to do in order for Nutch to crawl asp pages?

Probably you need to check out the URL filter (conf/crawl-urlfilter.txt) and make sure the pages are not rejected. Note that there might be a pattern that rejects argument to the URL so you might want to disable that if the pages take args.

I would think that there is no ASP MIME type per-se -- surely the average ASP page returns HTML documents?!

Thanks,

Seth

[EMAIL PROTECTED]

Re: [Nutch-general] ASP Parser

Reply via email to