I Second that. Though as this file is growing and XML is not easy on the eye, it may make sense to break them in 2-3 files depending on the Nutch-<subsystem>
Also, it would be nice (actually, its more of a request), if at the end of each rule there can be a "Continue Marker". Say a <CONT> tag. It's much easier to write rules when you can break them out instead of writing one long rule to do everything. Example: Say you want to remove session URLs and fix a url. Old: http://www44.somedom.com/somepage.jsp?sessionID=11111 New: http://www.somedom.com/somepage.jsp You can now write two separate rules, one for the sessions and one for the "www44". Again, this is just a request -- but it you guys think it's not going to be widely used feel free to ignore it. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doug Cutting Sent: Friday, February 04, 2005 3:49 PM To: [EMAIL PROTECTED] Subject: Re: [Nutch-dev] Re: patch available now Re: make URLFilter as plugin Stefan Groschupf wrote: >>> >>> I need input from people on the following: >>> (1) does it make sense to have regex-urlfilter.txt as an attribute >>> to plugin RegexURLFilter, instead of a property in >>> nutch-default.xml? The same with prefix-urlfilter.txt? > > > > I personal would say that make sense, in general I think configuration > values of plugin should be in the plugin.xml Since nutch-config is for > nutch and not plugins. I disagree. I think plugins should be configured by the same mechanism as everything else. That way nearly all of a site's customizations are in a single file, nutch-site.xml, and, when they update to a new version of Nutch they won't lose their customizations. In general, folks shouldn't change files that are in CVS, and the plugin.xml is in CVS, authored by developers, not by users. Doug ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
