I Second that. Though as this file is growing and XML is not easy on the
eye, it may make sense to break them in 2-3 files depending on the
Nutch-<subsystem>


Also, it would be nice (actually, its more of a request), if at the end of
each rule there can be a "Continue Marker". Say a <CONT> tag. 

It's much easier to write rules when you can break them out instead of
writing one long rule to do everything.   

Example: Say you want to remove session URLs and fix a url.

Old: http://www44.somedom.com/somepage.jsp?sessionID=11111
New: http://www.somedom.com/somepage.jsp

You can now write two separate rules, one for the sessions and one for the
"www44".

Again, this is just a request -- but it you guys think it's not going to be
widely used feel free to ignore it.



 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Doug
Cutting
Sent: Friday, February 04, 2005 3:49 PM
To: [EMAIL PROTECTED]
Subject: Re: [Nutch-dev] Re: patch available now Re: make URLFilter as
plugin

Stefan Groschupf wrote:
>>>
>>> I need input from people on the following:
>>> (1) does it make sense to have regex-urlfilter.txt as an attribute 
>>> to plugin RegexURLFilter, instead of a property in 
>>> nutch-default.xml? The same with prefix-urlfilter.txt?
> 
> 
> 
> I personal would say that make sense, in general I think configuration 
> values of plugin should be in the plugin.xml Since nutch-config is for 
> nutch and not plugins.

I disagree.  I think plugins should be configured by the same mechanism as
everything else.  That way nearly all of a site's customizations are in a
single file, nutch-site.xml, and, when they update to a new version of Nutch
they won't lose their customizations.  In general, folks shouldn't change
files that are in CVS, and the plugin.xml is in CVS, authored by developers,
not by users.

Doug


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers




-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to