Bryan Woliner wrote:

1. I want to prune the URL "http://www.testsite.com/testdir/";, but I
don't want to prune any other files in the /testdir/ directory.

2. I want to prune URLs in the range: http://www.testsite.com/[20-40]/

I think you are just unlucky, in the sense that the PruneIndexTool was created with a different goal in mind - namely, to remove offensive or unwanted content containing certain query terms. Due to the way URLs are tokenized it is indeed quite difficult to construct queries that match specific groups of URLs.

I would suggest the following:

* use a query "url:http url:https", which is a handy trick to retrieve all URLs (if you use other protocols, then add them here).

* implement a PruneChecker, which checks URLs according to a list of regexps.

This should do it. You can lift some code from urlfilter-regex plugin, like reading the regexes, checking them, etc.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to