I agree with Tony.....i think most basic users, me included, thought
www-*.yoyodyne.com would not match www.yoyodyne.com

Support globs as default, regexp as the more powerful option.

Ranjit Sandhu
SRA
 

-----Original Message-----
From: Tony Lewis [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 31, 2006 10:03 AM
To: [email protected]
Subject: RE: regex support RFC

Mauro Tortonesi wrote: 

> no. i was talking about regexps. they are more expressive and powerful

> than simple globs. i don't see what's the point in supporting both.

The problem is that users who are expecting globs will try things like
--filter=-file:*.pdf rather than --filter:-file:.*\.pdf. In many cases
their expressions will simply work, which will result in significant
confusion when some expression doesn't work, such as
--filter:-domain:www-*.yoyodyne.com. :-)

It is pretty easy to programmatically convert a glob into a regular
expression. One possibility is to make glob the default input and allow
regular expressions. For example, the following could be equivalent:

--filter:-domain:www-*.yoyodyne.com
--filter:-domain,r:www-.*\.yoyodyne\.com

Internally, wget would convert the first into the second and then treat
it as a regular expression. For the vast majority of cases, glob will
work just fine.

One might argue that it's a lot of work to implement regular expressions
if the default input format is a glob, but I think we should aim for
both lack of confusion and robust functionality. Using ",r" means people
get regular expressions when they want them and know what they're doing.
The universe of wget users who "know what they're doing" are mostly
subscribed to this mailing list; the rest of them send us mail saying
"please CC me as I'm not on the list". :-)

If we go this route, I'm wondering if the appropriate conversion from
glob to regular expression should take directory separators into
account, such
as:

--filter:-path:path/to/*

becoming the same as:

--filter:-path,r:path/to/[^/]*

or even:

--filter:-path,r:path[/\\]to[/\\][^/\\]*

Should the glob match "path/to/sub/dir"? (I suspect it shouldn't.)

Tony


Reply via email to