Bryan Woliner wrote:
1. I want to prune the URL "http://www.testsite.com/testdir/", but I
don't want to prune any other files in the /testdir/ directory.
2. I want to prune URLs in the range: http://www.testsite.com/[20-40]/
I think you are just unlucky, in the sense that the PruneIndexTool was
created with a different goal in mind - namely, to remove offensive or
unwanted content containing certain query terms. Due to the way URLs are
tokenized it is indeed quite difficult to construct queries that match
specific groups of URLs.
I would suggest the following:
* use a query "url:http url:https", which is a handy trick to retrieve
all URLs (if you use other protocols, then add them here).
* implement a PruneChecker, which checks URLs according to a list of
regexps.
This should do it. You can lift some code from urlfilter-regex plugin,
like reading the regexes, checking them, etc.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com