How deep should a good intranet crawl be...10-20? I still can't get all of my site searchable..
Here is my situation... I want to crawl just a local site for our intranet. We have just rolled out an asp only website from a pure html site. I ran nutch on the old site and got great results. Since moving to this new site I am have a devil of a time retrieving good information and missing a ton of info all together. I am not sure what settings I need to change to get good results. One setting that I have set does produce good results but it seems to crawl other website and not just my domain. The last line of the crawl-urlfilter file I just replace the - with + so it does not ignore other information. Our site is www.woodward.edu I was wondering if someone on this list can crawl this site and only this domain and see what they come up with. Woodward.edu is the domain. I am just stumped as what to do next. I am running a nightly build from January 26th 2006. My criteria for our local search is to be able to search PDF, images, doc, and web content. You can go here and see what the search page pulls up http://search.woodward.edu . Thanks for any help this list can provide. Andy Morris ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
