Curtis Hatter wrote:
On Friday 31 March 2006 06:52, Mauro Tortonesi:
while i like the idea of supporting modifiers like "quick" (short
circuit) and maybe "i" (case insensitive comparison), i think that (?i:)
and (?-i:) constructs would be overkill and rather hard to implement.
I figured that
Hrvoje Niksic wrote:
"Tony Lewis" <[EMAIL PROTECTED]> writes:
I don't think ",r" complicates the command that much. Internally,
the only additional work for supporting both globs and regular
expressions is a function that converts a glob into a regexp when
",r" is not requested. That's a strai
Tony Lewis wrote:
Hrvoje Niksic wrote:
I don't see a clear line that connects --filter to glob patterns as used
by the shell.
I want to list all PDFs in the shell, ls -l *.pdf
I want a filter to keep all PDFs, --filter=+file:*.pdf
you don't need --filter for that. you can simply use -A.
31, 2006 10:03 AM
To: wget@sunsite.dk
Subject: RE: regex support RFC
Mauro Tortonesi wrote:
> no. i was talking about regexps. they are more expressive and powerful
> than simple globs. i don't see what's the point in supporting both.
The problem is that users who are expectin
* Mauro Tortonesi <[EMAIL PROTECTED]> wrote:
>> I'm hoping for ... a "raw" type in addition to "file",
>> "domain", etc.
>
> do you mean you would like to have a regex class working on the
> content of downloaded files as well?
Not exactly. (details below)
> i don't like your "raw" proposal as
> * [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > wget -e robots=off -r -N -k -E -p -H http://www.gnu.org/software/wget/
> >
> > soon leads to non wget related links being downloaded, eg.
> > http://www.gnu.org/graphics/agnuhead.html
>
> In that particular case, I think --no-parent would solv
On Friday 31 March 2006 06:52, Mauro Tortonesi:
> while i like the idea of supporting modifiers like "quick" (short
> circuit) and maybe "i" (case insensitive comparison), i think that (?i:)
> and (?-i:) constructs would be overkill and rather hard to implement.
I figured that the (?i:) and (?-i:)
Hrvoje Niksic wrote:
> I don't see a clear line that connects --filter to glob patterns as used
> by the shell.
I want to list all PDFs in the shell, ls -l *.pdf
I want a filter to keep all PDFs, --filter=+file:*.pdf
Note that "*.pdf" is not a valid regular expression even though it's what
most
"Tony Lewis" <[EMAIL PROTECTED]> writes:
> I didn't miss the point at all. I'm trying to make a completely different
> one, which is that regular expressions will confuse most users (even if you
> tell them that the argument to --filter is a regular expression).
Well, "most users" will probably n
Hrvoje Niksic wrote:
> But that misses the point, which is that we *want* to make the
> more expressive language, already used elsewhere on Unix, the
> default.
I didn't miss the point at all. I'm trying to make a completely different
one, which is that regular expressions will confuse most user
"Tony Lewis" <[EMAIL PROTECTED]> writes:
> Mauro Tortonesi wrote:
>
>> no. i was talking about regexps. they are more expressive
>> and powerful than simple globs. i don't see what's the
>> point in supporting both.
>
> The problem is that users who are expecting globs will try things like
> --fi
Mauro Tortonesi wrote:
> no. i was talking about regexps. they are more expressive
> and powerful than simple globs. i don't see what's the
> point in supporting both.
The problem is that users who are expecting globs will try things like
--filter=-file:*.pdf rather than --filter:-file:.*\.pdf.
Mauro Tortonesi wrote:
for consistency and to avoid maintenance problems, i would like wget
to have the same behavior on windows and unix. please, notice that if
we implemented regex support only on unix, windows binaries of wget
built with cygwin would have regex support but native binaries
w
Hrvoje Niksic wrote:
Wincent Colaiuta <[EMAIL PROTECTED]> writes:
Are you sure that "www-*" matches "www"?
Yes.
hrvoje is right. try this perl script:
#!/usr/bin/perl -w
use strict;
my @strings = ("www-.yoyodyne.com",
"www.yoyodyne.com");
foreach my $str (@strings) {
Wincent Colaiuta <[EMAIL PROTECTED]> writes:
> Are you sure that "www-*" matches "www"?
Yes.
> As far as I know "www-*" matches "one w, another w, a third w, a
> hyphen, then 0 or more hyphens".
That would be "www--*" or "www-+".
El 31/03/2006, a las 14:37, Hrvoje Niksic escribió:
"*" matches the previous character repeated 0 or more times. This is
in contrast to wildcards, where "*" alone matches any character 0 or
more times. (This is part of why regexps are often confusing to
people used to the much simpler wildcard
Hrvoje Niksic wrote:
Herold Heiko <[EMAIL PROTECTED]> writes:
Get the best of both, use a syntax permitting a "first match-exits"
ACL, single ACE permits several statements ANDed together. Cooking
up a simple syntax for users without much regexp experience won't be
easy.
I assume ACL stands f
Hrvoje Niksic wrote:
Mauro Tortonesi <[EMAIL PROTECTED]> writes:
wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match "www.yoyodyne.com", "www--.yoyodyne.com",
"www---.yoyodyne.com", and so on, if interpreted as a regex.
not really. it would not match www.yoyodyne.com.
Wh
Mauro Tortonesi <[EMAIL PROTECTED]> writes:
>wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match "www.yoyodyne.com", "www--.yoyodyne.com",
"www---.yoyodyne.com", and so on, if interpreted as a regex.
>>>
>>> not really. it would not match www.yoyodyne.com.
>> Why
> From: Oliver Schulze L. [mailto:[EMAIL PROTECTED]
> My personal idea on this is to: enable regex in Unix and
> disable it on
> Windows.
>
> We all use Unix/Linux and regex is really usefull. I think not having
We all use Unix/Linux ? You would be surprised how many wget users on
windows are
Curtis Hatter wrote:
On Thursday 30 March 2006 13:42, Tony Lewis wrote:
Perhaps --filter=path,i:/path/to/krs would work.
That would look to be the most elegant method. I do hope that the (?i:) and
(?-i:) constructs are supported since I may not want the entire path/file to
be case (in)?sens
Oliver Schulze L. wrote:
Hrvoje Niksic wrote:
The regexp API's found on today's Unix systems
might be usable, but unfortunately those are not available on Windows.
My personal idea on this is to: enable regex in Unix and disable it on
Windows.
>
We all use Unix/Linux and regex is really us
Hrvoje Niksic wrote:
Mauro Tortonesi <[EMAIL PROTECTED]> writes:
Scott Scriven wrote:
* Mauro Tortonesi <[EMAIL PROTECTED]> wrote:
wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match "www.yoyodyne.com", "www--.yoyodyne.com",
"www---.yoyodyne.com", and so on, if interpr
Mauro Tortonesi <[EMAIL PROTECTED]> writes:
> Scott Scriven wrote:
>> * Mauro Tortonesi <[EMAIL PROTECTED]> wrote:
>>
>>>wget -r --filter=-domain:www-*.yoyodyne.com
>> This appears to match "www.yoyodyne.com", "www--.yoyodyne.com",
>> "www---.yoyodyne.com", and so on, if interpreted as a regex
Scott Scriven wrote:
* Mauro Tortonesi <[EMAIL PROTECTED]> wrote:
wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match "www.yoyodyne.com", "www--.yoyodyne.com",
"www---.yoyodyne.com", and so on, if interpreted as a regex.
not really. it would not match www.yoyodyne.com.
It
* [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> wget -e robots=off -r -N -k -E -p -H http://www.gnu.org/software/wget/
>
> soon leads to non wget related links being downloaded, eg.
> http://www.gnu.org/graphics/agnuhead.html
In that particular case, I think --no-parent would solve the
problem.
Hrvoje Niksic wrote:
The regexp API's found on today's Unix systems
might be usable, but unfortunately those are not available on Windows.
My personal idea on this is to: enable regex in Unix and disable it on
Windows.
We all use Unix/Linux and regex is really usefull. I think not having
On Thursday 30 March 2006 13:42, Tony Lewis wrote:
> Perhaps --filter=path,i:/path/to/krs would work.
That would look to be the most elegant method. I do hope that the (?i:) and
(?-i:) constructs are supported since I may not want the entire path/file to
be case (in)?sensitive =), but that will
Curtis Hatter wrote:
> Also any way to add modifiers to the regexs?
Perhaps --filter=path,i:/path/to/krs would work.
Tony
* Jim Wright <[EMAIL PROTECTED]> wrote:
> Suppose you want files from some.dom.com://*/foo/*.png. The
> part I'm thinking of here is "foo as last directory component,
> and png as filename extension." Can the individual rules be
> combined to express this?
Only one rule is needed for that patter
* Mauro Tortonesi <[EMAIL PROTECTED]> wrote:
> wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match "www.yoyodyne.com", "www--.yoyodyne.com",
"www---.yoyodyne.com", and so on, if interpreted as a regex.
It would most likely also match "www---zyoyodyneXcom". Perhaps
you want glob
On Thursday 30 March 2006 11:49, you wrote:
> How many keywords do we need to provide maximum flexibility on the
> components of the URI? (I'm thinking we need five.)
>
> Consider http://www.example.com/path/to/script.cgi?foo=bar
>
> --filter=uri:regex could match against any part of the URI
> --fi
How many keywords do we need to provide maximum flexibility on the
components of the URI? (I'm thinking we need five.)
Consider http://www.example.com/path/to/script.cgi?foo=bar
--filter=uri:regex could match against any part of the URI
--filter=domain:regex could match against www.example.com
--
On Wednesday 29 March 2006 12:05, you wrote:
> we also have to reach consensus on the filtering algorithm. for
> instance, should we simply require that a url passes all the filtering
> rules to allow its download (just like the current -A/R behaviour), or
> should we instead adopt a short circuit
On Thu, 30 Mar 2006, Mauro Tortonesi wrote:
>
> > I do like the [file|path|domain]: approach. very nice and flexible.
> > (and would be a huge help to one specific need I have!) I suggest also
> > including an "any" option as a shortcut for putting the same pattern in
> > all three options.
>
>
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
> > I agree. Just how often will there be problems in a single
> wget run due to
> > both some.domain.com and somedomain.com present (famous last
> > words...)
>
> Actually it would have to be somedomain.com -- a "."
> will not match the null string
Herold Heiko <[EMAIL PROTECTED]> writes:
>> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
>> I don't think such a thing is necessary in practice, though; remember
>> that even if you don't escape the dot, it still matches the (intended)
>> dot, along with other characters. So for quick&dirty usag
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
> I don't think such a thing is necessary in practice, though; remember
> that even if you don't escape the dot, it still matches the (intended)
> dot, along with other characters. So for quick&dirty usage not
> escaping dots will "just work", and th
Herold Heiko <[EMAIL PROTECTED]> writes:
> Get the best of both, use a syntax permitting a "first match-exits"
> ACL, single ACE permits several statements ANDed together. Cooking
> up a simple syntax for users without much regexp experience won't be
> easy.
I assume ACL stands for "access contro
Herold Heiko <[EMAIL PROTECTED]> writes:
> BTW any comments about the dots ? Requiring escaped dots in domains would
> become old really fast, reversing behaviour (\. = any char) would be against
> the principle of least surprise, since any other regexp syntax does use the
> opposite.
Modifying t
[Immagination running freely, I do not have a lot of experience designing
syntax, but I suffer a lot in a helpdeskish way trying to explain syntax to
users. Hopefully this can be somehow useful]
> we also have to reach consensus on the filtering algorithm. for
> instance, should we simply require
Jim Wright wrote:
what definition of regexp would you be following?
that's another degree of liberty. hrovje and i have chosen to integrate
in wget the GNU regex implementation, which allows the exploitation of
one of these different syntaxes:
RE_SYNTAX_EMACS
RE_SYNTAX_AWK
RE_SYNTAX_GNU_AWK
Jim Wright <[EMAIL PROTECTED]> writes:
> what definition of regexp would you be following? or would this be
> making up something new?
It wouldn't be new, Mauro is definitely referring to regexps as
normally understood. The regexp API's found on today's Unix systems
might be usable, but unfortu
Mauro Tortonesi <[EMAIL PROTECTED]> writes:
> for instance, the syntax for --filter presented above is basically the
> following:
>
> --filter=[+|-][file|path|domain]:REGEXP
I think there should also be "url" for filtering on the entire URL.
People have been asking for that kind of thing a lot ov
> for instance, the syntax for --filter presented above is basically the
> following:
>
> --filter=[+|-][file|path|domain]:REGEXP
I think a file 'contents' regexp search facility would be a useful
addition here. eg.
--filter=[+|-][file|path|domain|contents]:REGEXP
The idea is that if the fi
what definition of regexp would you be following? or would this be
making up something new? I'm not quite understanding the comment about
the comma and needing escaping for literal commas. this is true for any
character in the regexp language, so why the special concern for comma?
I do like the
hrvoje and i have been recently talking about adding regex support to
wget. we were considering to add a new --filter option which, by
supporting regular expressions, would allow more powerful ways of
filtering urls to download.
for instance the new option could allow the filtering of domain
47 matches
Mail list logo