I'm aware that there's a desire to re-write the ftp portion of wget, but
here is a patch against CVS that so far allowed me to spider ftp URLs.
it's a dirty hack that simply uses the opt.spider variable to keep from
downloading files by returning RETROK (or maybe it was RETRFINISHED) after
On Fri, 14 Feb 2003, Max Bowsher wrote:
If and when someone decides to implement it. But there is almost certainly
not going to be another release until after Hrvoje Niksic has returned.
Can someone at FSF do something? [EMAIL PROTECTED], [EMAIL PROTECTED]
This seems like the silliest reason
On Fri, 14 Feb 2003, Daniel Stenberg wrote:
Technically and theoreticly, anyone can grab the sources, patch the bugs, add
features and release a new version. That's what the GPL is there for.
In practise, however, that would mean stepping on Hrvoje's toes and I don't
think anyone wants to do
this bug is confirmed in CVS, it looks like there's been a lot of changes
to html-url.c
/a
On Thu, 20 Feb 2003, Jamie Zawinski wrote:
Try this, watch it lose:
wget --debug -e robots=off -nv -m -nH -np \
http://www.dnalounge.com/flyers/
http://www.dnalounge.com/flyers/ does a
a patch was submitted:
http://www.mail-archive.com/wget%40sunsite.dk/msg04645.html
On Thu, 6 Mar 2003, Keith Thompson wrote:
When invoked with an ftp://; URL, wget's the --spider option is
silently ignored and the file is downloaded. This applies to wget
version 1.8.2.
To demonstrate:
On Thu, 13 Mar 2003, Max Bowsher wrote:
David Balazic wrote:
So it is do it yourself , huh ? :-)
More to the point, *no one* is available who has cvs write access.
what if for the time being the task of keeping track of submissions for
wget was done with its debian package?
my guess is that this probably isn't in the manual.
% wget --version
GNU Wget 1.9-beta
Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001 Free Software Foundation,
Inc.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
rms put this up last week
quoted from: http://www.gnu.org/help/help.html
How to Help the GNU Project
This list is ordered roughly with the more urgent items near the top.
Please note that many things on this list link to larger, expanded lists.
* We are looking for new maintainers for
Wget Manual: Types of Files
http://www.gnu.org/manual/wget/html_node/wget_19.html
On Thu, 5 Jun 2003, Payal Rathod wrote:
Hi all,
I need some kind of method to download only the questions andn answers listedk in
this url. I don't want any picutes, just question
andtheir answers.
The url is
while investigating this bug i noticed the following macro defined on line
197 of wget.h:
/* The same as above, except the comparison is case-insensitive. */
#define BOUNDED_EQUAL_NO_CASE(beg, end, string_literal) \
((end) - (beg) == sizeof (string_literal) - 1 \
!strncasecmp
This patch seems to do user-agent checks correctly (it might have been
broken previously) with a correction to a string comparison macro.
The patch also uses the value of the --user-agent option when enforcing
robots.txt rules.
this patch is against CVS, more on that here:
yeah, i guess that patch is really bad.
http://www.gnu.org/manual/glibc/html_node/String-Length.html
On Thu, 29 May 2003, Larry Jones wrote:
Aaron S. Hawley writes:
shouldn't it be strlen not sizeof?
No. An array is not converted to a pointer when it is the argument of
sizeof, so
:
Aaron S. Hawley writes:
yeah, i guess that patch is really bad.
Yes, it is. ;-)
-Larry JonesIndex: src/init.c
===
RCS file: /pack/anoncvs/wget/src/init.c,v
retrieving revision 1.54
diff -u -u -r1.54 init.c
--- src/init.c 2002/08
On Fri, 30 May 2003, George Prekas wrote:
I have found a bug in Wget version 1.8.2 concerning comment handling ( !--
comment -- ). Take a look at the following illegal HTML code:
HTML
BODY
a href=test1.htmltest1.html/a
!--
a href=test2.htmltest2.html/a
!--
/BODY
/HTML
Now, save the
another propretary protocol brought to you by the folks in redmond,
washington.
http://sdp.ppona.com/
http://geocities.com/majormms/
On Sun, 1 Jun 2003, Andrzej Kasperowicz wrote:
How could I download using wget that:
mms://mms.itvp.pl/bush_archiwum/bush.wmv
If wget cannot manage it then
On Wed, 4 Jun 2003, Tony Lewis wrote:
Adding this function to wget seems reasonable to me, but I'd suggest that it
be off by default and enabled from the command line with something
like --quirky_comments.
why not just have the default wget behavior follow comments explicitly
(i've lost track
. Can you change the [insert
Wget comment mode] comment mode to (not) recognize my comments?
i think the idea of quirky comments modes are cool, but is it the better
solution?
/a
On Wed, 4 Jun 2003, Aaron S. Hawley wrote:
why not just have the default wget behavior follow comments explicitly
(i've
On Wed, 4 Jun 2003, George Prekas wrote:
snip
i think the idea of quirky comments modes are cool, but is it the better
solution?
Do you think that the current algorithm shouldn't be improved? Even, a
little bit to handle the common mistakes?
i think Wget's default behavior should be
http://www.gnu.org/manual/wget/
On Wed, 11 Jun 2003, Support, DemoG wrote:
hello,
I need help on this subject:
Please tell me what is the command line if i wanted to get all the files,
subdirectories with all contained from a ftp like ftp.mine.com
also i have the user and pass, and i will
there doesn't seem to be anything wrong with the page.
are you having trouble with recursive wgets with other (all) pages or just
this one.
/a
above the code segment you submitted (line 765 of init.c) the
comment:
/* Strip the trailing slashes from directories. */
here are the manual notes on this option:
(from Recursive Accept/Reject Options)
`-I list'
`--include-directories=list'
Specify a comma-separated list of directories
=biz.yahoo.com -I /r/ 'http://biz.yahoo.com/r/'
$ ls biz.yahoo.com/
r/ reports/research/
$
I want only '/r/', but it crawls /r*, which includes /reports/, /research/.
Is it an expected result or a bug?
Thanks alot!
--- Aaron S. Hawley [EMAIL PROTECTED] wrote:
above
no, i think your original idea of getting rid of the code that removes the
trailing slash is a better idea. i think this would fix it but keep the
degenerate case of root directory (whatever that's about):
Index: src/init.c
===
RCS
you're right, the include-directories option operates much the same way
(my guess in the interest of speed) as the rest of the accept/reject
options.
which (others have also noticed) is a little flakey.
/a
On Fri, 13 Jun 2003, wei ye wrote:
Did you test your patch? I patched it on my source
it's available in the CVS version..
information at:
http://www.gnu.org/software/wget/
On Tue, 17 Jun 2003, Roman Dusek wrote:
Dear Sirs,
thanks for WGet, it's a great tool. I would very appreciate one more
option: a possibility to get http page using POST method instead of GET.
Cheers,
I use the --spider option a lot and don't have trouble with most sites.
When using the --spider option for the Mozilla website I get a 500 error
response. Without the --spider option I don't receive the problem. Any
guesses?
$ wget --debug --spider www.mozilla.org
DEBUG output created by Wget
i submitted a patch in february.
http://www.mail-archive.com/wget%40sunsite.dk/msg04645.html
http://www.geocrawler.com/archives/3/409/2003/2/100/10313375/
On Tue, 17 Jun 2003, Peschko, Edward wrote:
Just upgraded to 1.8.2 and ok, I think I see the problem here...
--spider only works with
Here's a test case for the --spider option. perhaps helpful for
documentation?
using wget on about 17,000 URLs (these are in the FSF/UNESCO Free Software
Directory and are not by any means unique). out of these about 395
generate errors when run with the spider option (--spider) of the wget
no such facility currently exists for wget. this is a question of job
control and is better directed at your operating system.
On Thu, 3 Jul 2003 [EMAIL PROTECTED] wrote:
Hi
I'm calling the wget program via a .bat file on a win2000 PC. Works ok.
I have to schedule the start/stop of this, so
the feature to locally delete mirrored files that were not downloaded from
the server on the most recent wget --mirror has been requested previously.
On Thu, 3 Jul 2003 [EMAIL PROTECTED] wrote:
Hi
Just started to test wget on a win2000 PC. I'm using the mirror functionality,
and it seems to
try:
wget -p
On Tue, 8 Jul 2003 [EMAIL PROTECTED] wrote:
I am able to download a HTML-page.
Taht page has several cgi-calls generating images.
When calling that page with a browser the images are generated and
stored for further usage in a image cache on the server.
I was expecting that
shit, i'd just use lynx or links to do
links -source www.washpost.com
but wget could do
wget -O /dev/stdout www.washpost.com
On Wed, 9 Jul 2003, Jerry Coleman wrote:
Is there a way to suppress the creation of a .html file, and instead
redirect the output to stdout? I want to issue a wget
we're all used to J K's personality, now.
On Wed, 9 Jul 2003, Toby Corkindale wrote:
What's your problem?
That has to be the least informative email I've seen in a long time.
tjc
(apologies for top-posting in reply)
On Thu, Jul 03, 2003 at 03:20:28PM +0200, J K wrote:
FUCK
try also:
wget -O - www.washpost.com
On Wed, 9 Jul 2003, Gisle Vanem wrote:
Aaron S. Hawley [EMAIL PROTECTED] said:
but wget could do
wget -O /dev/stdout www.washpost.com
On DOS/Windows too? I think not. There must be a better way.
--gv
how about the -Q, or --quota, option?
On Thu, 10 Jul 2003, fehmi ben njima wrote:
hello
i am using an usb key a storage disk in school
and i wana download file that are biger than the capacity of the usb disc
so i wana a script or modification to make in wget source code
so i can specifiy
how is your request different than --wait ?
On Mon, 16 Jun 2003, Wu-Kung Sun wrote:
I'd like to request an additional (or modified) option
that waits for whatever time specified by the user, no
more no less (instead of the linear backoff of
--waitretry which is just a slightly less obnoxious
Wget maintainer:
http://www.geocrawler.com/archives/3/409/2003/3/0/10399285/
--
The geocrawler archives for Wget are alive again!
On Mon, 14 Jul 2003, Hans Deragon (QA/LMC) wrote:
Hi again.
Some people have reported experiencing the same problem, but nobody
from the development team has
I guess I like Mark's --ignore-length strategy. and it looks like this
could work with a fix to Wget found in this patch:
Index: src/ftp.c
===
RCS file: /pack/anoncvs/wget/src/ftp.c,v
retrieving revision 1.61
diff -u -c -r1.61 ftp.c
On Tue, 12 Aug 2003, Tony Lewis wrote:
Daniel Stenberg wrote:
The GNU project is looking for a new maintainer for wget, as the
current one wishes to step down.
I think that means we need someone who:
1) is proficient in C
2) knows Internet protocols
3) is willing to learn the
searching the web i found out that cygwin has wget and there's also this:
http://kimihia.org.nz/projects/cygwget/
/a
On Wed, 13 Aug 2003, Shell Gellner wrote:
Dear Sirs,
I've downloaded the GNU software but when I try to run the WGET.exe file
it keeps telling me 'is linked to missing
wget doesn't have a javascript interpreter.
On Mon, 8 Sep 2003, Andrzej Kasperowicz wrote:
How to force wget to download Java Script links:
http://znik.wbc.lublin.pl/ChemFan/kalkulatory/javascript:wrzenie():
17:04:44 ERROR 404: Not Found.
Wget doesn't currently have http file upload capabilities, but if this XML
message can be sent by cgi POST parameters then Wget could probably do it.
but you'll need to figure out how exactly the XML message is sent using
http.
/a
On Mon, 8 Sep 2003, Vasudha Chiluka wrote:
Hi ,
I need to
I, on the other hand, am actually not sure why you're not able to
have Wget find the marked up (not javascript) image.
Cause it worked for me.
% ls -l www.protcast.com/Grafx/menu-contact_\(off\).jpg
-rw--- 1 ashawley usr 2377 Jan 10 2003
On Wed, 10 Sep 2003, Andreas Belitz wrote:
Hi,
i have found a problem regarding wget --spider.
It works great for any files over http or ftp, but as soon as one of
these two conditions occur, wget starts downloading the file:
1. linked files (i'm not 100% sure about this)
2.
On Wed, 10 Sep 2003, Andreas Belitz wrote:
Hi Aaron S. Hawley,
On Wed, 10. September 2003 you wrote:
ASH actually, what you call download scripts are actually HTTP redirects, and
ASH in this case the redirect is to an FTP server and if you double-check i
ASH think you'll find Wget does
is -nv (non-verbose) an improvement?
$ wget -nv www.johnjosephbachir.org/
12:50:57 URL:http://www.johnjosephbachir.org/ [3053/3053] - index.html [1]
$ wget -nv www.johnjosephbachir.org/m
http://www.johnjosephbachir.org/m:
12:51:02 ERROR 404: Not Found.
but if you're not satisfied you
[saw this on the web..]
HexCat Software DeepVaccum
http://www.hexcat.com/deepvaccum/
DeepVaccum is a donationware, useful web utility based on GNU wget command
line tool. Program includes a vast number of options to fine tune your
downloads through both http and ftp protocols.
DV enables you to
I can verify this in the cvs version.
it appears to be isolated to the recursive behavior.
/a
On Mon, 15 Sep 2003, Dawid Michalczyk wrote:
Hello,
I'm having problems getting the exit status code to work correctly in
the following scenario. The exit code should be 1 yet it is 0
The HTML of those pages contains the meta-tag
meta name=robots content=noindex,nofollow /
and Wget listened, and only downloaded the first page.
Perhaps Wget should give a warning message that the file contained a
meta-robots tag, so that people aren't quite so dumb-founded.
/a
On Fri, 17 Oct
[for those robots.txt fans]
White House site prevents Iraq material being archived
http://www.theage.com.au/articles/2003/10/28/1067233141495.html
By Sam Varghese
October 28, 2003
The White House website http://www.whitehouse.gov/ effectively prevents
search engines indexing and archiving
helmet heads
[EMAIL PROTECTED]:4000
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.1) Gecko/20020827
Precedence: list
Return-Path: [EMAIL PROTECTED]
On Thu, 30 Oct 2003, Aaron S. Hawley wrote:
[for those robots.txt fans]
White House site prevents Iraq material being archived
http
From Various Licenses and Comments about Them
http://www.gnu.ctssn.com/licenses/license-list.html
The OpenSSL license.
The license of OpenSSL is a conjunction of two licenses, One of them
being the license of SSLeay. You must follow both. The combination results
in a copyleft free software
Some sort of URL reporting facility is on the unspoken TODO list.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg05282.html
/a
On Wed, 11 Feb 2004, Olivier SOW wrote:
hi,
I use Wget to check page state with the --spider parameter
I looking for a way to get back only the number server
at the bottom of the man page it says:
SEE ALSO
GNU Info entry for wget.
this is a cryptic suggestion to type the following on the command-line:
info wget
this will give you the GNU Wget user manual where you'll find clear
examples.
info is the documentation format most all GNU
On Mon, 18 Oct 2004, Gerriet M. Denkmann wrote:
So - is this a bug, did I misunderstand the documentation, did I use
the wrong options?
Reasonable request. You just couldn't find the archives:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06626.html
more:
There are old links to Wget on the web pointing to:
http://www.gnu.org/software/wget/wget.html
The FSF people have a nice symlink system for package web sites. Simply
add a file called `.symlinks' to Wget's CVS web repository with the
following line:
index.html wget.html
Or rename the file to
56 matches
Mail list logo