Hrvoje Niksic wrote:
Tony Lewis [EMAIL PROTECTED] writes:
I don't think ,r complicates the command that much. Internally,
the only additional work for supporting both globs and regular
expressions is a function that converts a glob into a regexp when
,r is not requested. That's a
Curtis Hatter wrote:
On Friday 31 March 2006 06:52, Mauro Tortonesi:
while i like the idea of supporting modifiers like quick (short
circuit) and maybe i (case insensitive comparison), i think that (?i:)
and (?-i:) constructs would be overkill and rather hard to implement.
I figured that the
Scott Scriven wrote:
* Mauro Tortonesi [EMAIL PROTECTED] wrote:
wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match www.yoyodyne.com, www--.yoyodyne.com,
www---.yoyodyne.com, and so on, if interpreted as a regex.
not really. it would not match www.yoyodyne.com.
It would
Mauro Tortonesi [EMAIL PROTECTED] writes:
Scott Scriven wrote:
* Mauro Tortonesi [EMAIL PROTECTED] wrote:
wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match www.yoyodyne.com, www--.yoyodyne.com,
www---.yoyodyne.com, and so on, if interpreted as a regex.
not really. it
Hrvoje Niksic wrote:
Mauro Tortonesi [EMAIL PROTECTED] writes:
Scott Scriven wrote:
* Mauro Tortonesi [EMAIL PROTECTED] wrote:
wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match www.yoyodyne.com, www--.yoyodyne.com,
www---.yoyodyne.com, and so on, if interpreted as a
Oliver Schulze L. wrote:
Hrvoje Niksic wrote:
The regexp API's found on today's Unix systems
might be usable, but unfortunately those are not available on Windows.
My personal idea on this is to: enable regex in Unix and disable it on
Windows.
We all use Unix/Linux and regex is really
Curtis Hatter wrote:
On Thursday 30 March 2006 13:42, Tony Lewis wrote:
Perhaps --filter=path,i:/path/to/krs would work.
That would look to be the most elegant method. I do hope that the (?i:) and
(?-i:) constructs are supported since I may not want the entire path/file to
be case
Mauro Tortonesi [EMAIL PROTECTED] writes:
wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match www.yoyodyne.com, www--.yoyodyne.com,
www---.yoyodyne.com, and so on, if interpreted as a regex.
not really. it would not match www.yoyodyne.com.
Why not?
i may be wrong, but if -
Hrvoje Niksic wrote:
Herold Heiko [EMAIL PROTECTED] writes:
Get the best of both, use a syntax permitting a first match-exits
ACL, single ACE permits several statements ANDed together. Cooking
up a simple syntax for users without much regexp experience won't be
easy.
I assume ACL stands for
El 31/03/2006, a las 14:37, Hrvoje Niksic escribió:
* matches the previous character repeated 0 or more times. This is
in contrast to wildcards, where * alone matches any character 0 or
more times. (This is part of why regexps are often confusing to
people used to the much simpler wildcards.)
Wincent Colaiuta [EMAIL PROTECTED] writes:
Are you sure that www-* matches www?
Yes.
As far as I know www-* matches one w, another w, a third w, a
hyphen, then 0 or more hyphens.
That would be www--* or www-+.
Hrvoje Niksic wrote:
Wincent Colaiuta [EMAIL PROTECTED] writes:
Are you sure that www-* matches www?
Yes.
hrvoje is right. try this perl script:
#!/usr/bin/perl -w
use strict;
my @strings = (www-.yoyodyne.com,
www.yoyodyne.com);
foreach my $str (@strings) {
$str =~
Mauro Tortonesi wrote:
for consistency and to avoid maintenance problems, i would like wget
to have the same behavior on windows and unix. please, notice that if
we implemented regex support only on unix, windows binaries of wget
built with cygwin would have regex support but native binaries
Mauro Tortonesi wrote:
no. i was talking about regexps. they are more expressive
and powerful than simple globs. i don't see what's the
point in supporting both.
The problem is that users who are expecting globs will try things like
--filter=-file:*.pdf rather than --filter:-file:.*\.pdf. In
Tony Lewis [EMAIL PROTECTED] writes:
Mauro Tortonesi wrote:
no. i was talking about regexps. they are more expressive
and powerful than simple globs. i don't see what's the
point in supporting both.
The problem is that users who are expecting globs will try things like
Hrvoje Niksic wrote:
But that misses the point, which is that we *want* to make the
more expressive language, already used elsewhere on Unix, the
default.
I didn't miss the point at all. I'm trying to make a completely different
one, which is that regular expressions will confuse most users
Tony Lewis [EMAIL PROTECTED] writes:
I didn't miss the point at all. I'm trying to make a completely different
one, which is that regular expressions will confuse most users (even if you
tell them that the argument to --filter is a regular expression).
Well, most users will probably not use
Hrvoje Niksic wrote:
I don't see a clear line that connects --filter to glob patterns as used
by the shell.
I want to list all PDFs in the shell, ls -l *.pdf
I want a filter to keep all PDFs, --filter=+file:*.pdf
Note that *.pdf is not a valid regular expression even though it's what
most
On Friday 31 March 2006 06:52, Mauro Tortonesi:
while i like the idea of supporting modifiers like quick (short
circuit) and maybe i (case insensitive comparison), i think that (?i:)
and (?-i:) constructs would be overkill and rather hard to implement.
I figured that the (?i:) and (?-i:)
* Mauro Tortonesi [EMAIL PROTECTED] wrote:
I'm hoping for ... a raw type in addition to file,
domain, etc.
do you mean you would like to have a regex class working on the
content of downloaded files as well?
Not exactly. (details below)
i don't like your raw proposal as it is
31, 2006 10:03 AM
To: wget@sunsite.dk
Subject: RE: regex support RFC
Mauro Tortonesi wrote:
no. i was talking about regexps. they are more expressive and powerful
than simple globs. i don't see what's the point in supporting both.
The problem is that users who are expecting globs will try
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
I don't think such a thing is necessary in practice, though; remember
that even if you don't escape the dot, it still matches the (intended)
dot, along with other characters. So for quickdirty usage not
escaping dots will just work, and those who
Herold Heiko [EMAIL PROTECTED] writes:
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
I don't think such a thing is necessary in practice, though; remember
that even if you don't escape the dot, it still matches the (intended)
dot, along with other characters. So for quickdirty usage not
On Thu, 30 Mar 2006, Mauro Tortonesi wrote:
I do like the [file|path|domain]: approach. very nice and flexible.
(and would be a huge help to one specific need I have!) I suggest also
including an any option as a shortcut for putting the same pattern in
all three options.
do you
On Wednesday 29 March 2006 12:05, you wrote:
we also have to reach consensus on the filtering algorithm. for
instance, should we simply require that a url passes all the filtering
rules to allow its download (just like the current -A/R behaviour), or
should we instead adopt a short circuit
How many keywords do we need to provide maximum flexibility on the
components of the URI? (I'm thinking we need five.)
Consider http://www.example.com/path/to/script.cgi?foo=bar
--filter=uri:regex could match against any part of the URI
--filter=domain:regex could match against www.example.com
* Mauro Tortonesi [EMAIL PROTECTED] wrote:
wget -r --filter=-domain:www-*.yoyodyne.com
This appears to match www.yoyodyne.com, www--.yoyodyne.com,
www---.yoyodyne.com, and so on, if interpreted as a regex.
It would most likely also match www---zyoyodyneXcom. Perhaps
you want glob patterns
Curtis Hatter wrote:
Also any way to add modifiers to the regexs?
Perhaps --filter=path,i:/path/to/krs would work.
Tony
On Thursday 30 March 2006 13:42, Tony Lewis wrote:
Perhaps --filter=path,i:/path/to/krs would work.
That would look to be the most elegant method. I do hope that the (?i:) and
(?-i:) constructs are supported since I may not want the entire path/file to
be case (in)?sensitive =), but that will
Hrvoje Niksic wrote:
The regexp API's found on today's Unix systems
might be usable, but unfortunately those are not available on Windows.
My personal idea on this is to: enable regex in Unix and disable it on
Windows.
We all use Unix/Linux and regex is really usefull. I think not having
* [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
wget -e robots=off -r -N -k -E -p -H http://www.gnu.org/software/wget/
soon leads to non wget related links being downloaded, eg.
http://www.gnu.org/graphics/agnuhead.html
In that particular case, I think --no-parent would solve the
problem.
what definition of regexp would you be following? or would this be
making up something new? I'm not quite understanding the comment about
the comma and needing escaping for literal commas. this is true for any
character in the regexp language, so why the special concern for comma?
I do like
Mauro Tortonesi [EMAIL PROTECTED] writes:
for instance, the syntax for --filter presented above is basically the
following:
--filter=[+|-][file|path|domain]:REGEXP
I think there should also be url for filtering on the entire URL.
People have been asking for that kind of thing a lot over the
Jim Wright [EMAIL PROTECTED] writes:
what definition of regexp would you be following? or would this be
making up something new?
It wouldn't be new, Mauro is definitely referring to regexps as
normally understood. The regexp API's found on today's Unix systems
might be usable, but
34 matches
Mail list logo