On Sun, Jan 30, 2005 at 05:57:46PM +0100, Andrzej Bialecki wrote:
> John X wrote:
> >Hi, All,
> >
> >Attached is a patch for segslice to filter entries by url pattern.
> >If no objection, I will commit tomorrow.
> 
> I couldn't object, because I was away for the weekend... You should give 
> somewhat more time than 1 day if it falls on the weekend...
> 
> Anyway, I think that the functionality is very useful, but why there is 
> only an option to use a single pattern? I'd think that using multiple 
> patters (with implicit OR) would be much easier, e.g. to filter out 
> multiple sites you could specify one pattern per site, and run the tool 
> only once...
> 
> I solved this in PruneIndexTool by reading the queries/patterns from a 
> file (either specified on the command-line or the default location 
> specified in the nutch config.

Yours is a better model.

Since such a filtering capability is needed by various tools, I guess
we should standardize on it, probably through a class?
If we can agree on one common style, that will be the best.

What do you and everyone else think?

John


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to