Iv'e applied the patch in the ticket linked to below. I browesed the patch to try to figure out how to use this plugin, and I'm having troubles trying to get it working.
Before I get into the details, if someone has a source of information describing how nutch starts up and initializes plugins so that I can get a feel for if this patch is even being used properly in the system, I'd very much appreciate it. ---- Here's what I did: Added patches with patch -p0 < subcollection.2.path Comiled tarball with ant tar Extracted tarball in my runtime location with tar -zxvpf - nutch-0.8-dev.tar.gz Created urls/urls.txt containing my site name (http://www.philadelphiariders.com/) Edited crawl-urlfilter.xml to accept aformentioned site name Edited subcollections.xml and added the following: <subcollection> <name>wiki</name> <id>wiki</name> <whitelist>http://www.philadelphiariders.com/wiki</whitelist> <blacklist /> </subcollection> <subcollection> <name>moto-web</name> <id>moto-web</name> <whitelist>http://www.philadelphiariders.com/c/dmoz</whitelist> <blacklist /> </subcollection> <subcollection> <name>gallery</name> <id>gallery</id> <whitelist>http://www.philadelphiariders.com/gallery</whitelist> <blacklist /> </subcollection> Crawled/ indexed my site with ./bin/nutch crawl urls -dir ../nutch-index When I start tomcat and do some test searching, I get links from the wiki area w/o a collection filed added to the query. But if I do something a query like: collection:wiki loudon Which should return documents, I get none. Additionally, if I simply query collection:wiki, I get no hits. If anyone has any ideas, I'll be very greatful. Zaheed Haque wrote: >Maybe this could help you.. > >http://issues.apache.org/jira/browse/NUTCH-201 > >Cheers > > > -- Andrew Libby [EMAIL PROTECTED] http://philadelphiariders.com/ ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
