Hi, I wondered if the config files in the nutch webapp (ie WEB-INF/classes) such as nutch-site.xml and crawl-urlfilter.txt get used by the webapp for searching? Reason is when I search on something I get back the following urls:
http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=3 http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=4 http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=5 http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=2 http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5?OpenDocument&ExpandSection=1 which effectively are all the same page, so although I want the crawl to parse these, I was the webapp search to only return the url up to the query,eg: http://somehost.baplc.com/general/aptrix/apteng.nsf/Content/Engineering+Home/Projects+&+Systems/Engineering+Fit+for+5 Hope that makes sense. Thanks for any help, Ed. _________________________________________________________________ Get all your favourite content with the slick new MSN Toolbar - FREE http://clk.atdmt.com/UKM/go/111354027/direct/01/
