[Immagination running freely, I do not have a lot of experience designing
syntax, but I suffer a lot in a helpdeskish way trying to explain syntax to
users. Hopefully this can be somehow useful]

> we also have to reach consensus on the filtering algorithm. for 
> instance, should we simply require that a url passes all the 
> filtering 
> rules to allow its download (just like the current -A/R 
> behaviour), or 
> should we instead adopt a short circuit algorithm that 
> applies all rules 
> in the same order in which they were given in the command line and 
> immediately allows the download of an url if it passes the 
> first "allow" 
> match? should we also support apache-like deny-from-all and 
> allow-from-all policies? and what would be the best syntax to trigger 
> the usage of these policies?

Get the best of both, use a syntax permitting a "first match-exits" ACL,
single ACE permits several statements ANDed together. Cooking up a simple
syntax for users without much regexp experience won't be easy.

One way (probably not the most beautiful syntax) could be a running number,
AND together repeated filters with the same number but use FIRST MATCH
between numbers:

download every path containing download on every *.dom.com (including
OTHERWISE avoid anything (else) on sweets.dom.com;
OTHERWISE from peanuts.dom.com get everything except brown stuff (currants
and so on);
OTHERWISE get peanuts from everywhere else

--filter1+=+domain:.+\.dom\.com --filter1=+path:download &&
--filter2-=+domain:sweets\.dom\.com &&
--filter3+=+peanuts\.dom\.com --filter3=-file:brown &&
--filter4+=+path:peanuts &&

(&& omitted later on)
The first filterX (for every X) does carry a +/- before the = (permit/deny
ACE), every filterX does carry a + or - after the = (what are whe matching).

Well, I wrote the example and I hate it already, hopefully some better
syntax comes up which doesn't require nested quotes.

Require an additional switch permit/deny for every ACE:
--filter1=permit --filter1=+domain:.+\.dom\.com --filter1=+path:download
--filter2=deny   --filter2=+domain:sweets\.dom\.com
--filter3=permit --filter3=+peanuts\.dom\.com --filter3=-file:brown
--filter4=permit --filter4=+path:peanuts

With permit and + as default that would make
--filter1=domain:.+\.dom\.com --filter1=path:download
--filter2=deny   --filter2=domain:sweets\.dom\.com
--filter3=peanuts\.dom\.com --filter3=-file:brown

On the other hand, without the default=permit we could loose the numbers
(use position):
--filter=permit --filter=+domain:.+\.dom\.com --filter=+path:download
--filter=deny   --filter=+domain:sweets\.dom\.com
--filter=permit --filter=+peanuts\.dom\.com --filter=-file:brown
--filter=permit --filter=+path:peanuts

e.g. start with permit or deny (or default permit for first ACE only),
following statements are ANDed together as a single ACE until next

Considering command line restrictions and so on for complicated expression
there should also be a --filter-file=filename, same syntax except the
--filter ?

I realize much of this syntax can be thrown out of the window simply
considering we can probably reach the same effect with uri filters and more
complicated regexp (perl5 syntax):

Simpler and shorter invocation syntax, but more complicated regexp
requirement, not a simple thing for the casual user, after all wget doesn't
try to appeal to programmers only, many examples in the manual will be

BTW any comments about the dots ? Requiring escaped dots in domains would
become old really fast, reversing behaviour (\. = any char) would be against
the principle of least surprise, since any other regexp syntax does use the
Either way pure windows users will be confused (*.html instead of .*\.html),
but personally I don't think permitting yet another alternate syntax (using
globs) is justified, and a syntax using exclusively globs would be too


-- PREVINET S.p.A. www.previnet.it
-- +39-041-5907073 / +39-041-5917073 ph
-- +39-041-5907472 / +39-041-5917472 fax

Reply via email to