On Sun, Jan 30, 2005 at 05:57:46PM +0100, Andrzej Bialecki wrote: > John X wrote: > >Hi, All, > > > >Attached is a patch for segslice to filter entries by url pattern. > >If no objection, I will commit tomorrow. > > I couldn't object, because I was away for the weekend... You should give > somewhat more time than 1 day if it falls on the weekend... > > Anyway, I think that the functionality is very useful, but why there is > only an option to use a single pattern? I'd think that using multiple > patters (with implicit OR) would be much easier, e.g. to filter out > multiple sites you could specify one pattern per site, and run the tool > only once... > > I solved this in PruneIndexTool by reading the queries/patterns from a > file (either specified on the command-line or the default location > specified in the nutch config.
Yours is a better model. Since such a filtering capability is needed by various tools, I guess we should standardize on it, probably through a class? If we can agree on one common style, that will be the best. What do you and everyone else think? John ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
