Mauro Tortonesi wrote: > no. i was talking about regexps. they are more expressive > and powerful than simple globs. i don't see what's the > point in supporting both.
The problem is that users who are expecting globs will try things like --filter=-file:*.pdf rather than --filter:-file:.*\.pdf. In many cases their expressions will simply work, which will result in significant confusion when some expression doesn't work, such as --filter:-domain:www-*.yoyodyne.com. :-) It is pretty easy to programmatically convert a glob into a regular expression. One possibility is to make glob the default input and allow regular expressions. For example, the following could be equivalent: --filter:-domain:www-*.yoyodyne.com --filter:-domain,r:www-.*\.yoyodyne\.com Internally, wget would convert the first into the second and then treat it as a regular expression. For the vast majority of cases, glob will work just fine. One might argue that it's a lot of work to implement regular expressions if the default input format is a glob, but I think we should aim for both lack of confusion and robust functionality. Using ",r" means people get regular expressions when they want them and know what they're doing. The universe of wget users who "know what they're doing" are mostly subscribed to this mailing list; the rest of them send us mail saying "please CC me as I'm not on the list". :-) If we go this route, I'm wondering if the appropriate conversion from glob to regular expression should take directory separators into account, such as: --filter:-path:path/to/* becoming the same as: --filter:-path,r:path/to/[^/]* or even: --filter:-path,r:path[/\\]to[/\\][^/\\]* Should the glob match "path/to/sub/dir"? (I suspect it shouldn't.) Tony