[wget-patch] --spider FTP

2003-02-06 Thread Aaron S. Hawley
I'm aware that there's a desire to re-write the ftp portion of wget, but here is a patch against CVS that so far allowed me to spider ftp URLs. it's a dirty hack that simply uses the opt.spider variable to keep from downloading files by returning RETROK (or maybe it was RETRFINISHED) after

dev. of wget (was Re: Removing files and directories not present onremote FTP server

2003-02-14 Thread Aaron S. Hawley
On Fri, 14 Feb 2003, Max Bowsher wrote: If and when someone decides to implement it. But there is almost certainly not going to be another release until after Hrvoje Niksic has returned. Can someone at FSF do something? [EMAIL PROTECTED], [EMAIL PROTECTED] This seems like the silliest reason

Re: dev. of wget (was Re: Removing files and directories not presenton remote FTP server

2003-02-14 Thread Aaron S. Hawley
On Fri, 14 Feb 2003, Daniel Stenberg wrote: Technically and theoreticly, anyone can grab the sources, patch the bugs, add features and release a new version. That's what the GPL is there for. In practise, however, that would mean stepping on Hrvoje's toes and I don't think anyone wants to do

Re: wget 301 redirects broken in 1.8.2

2003-02-24 Thread Aaron S. Hawley
this bug is confirmed in CVS, it looks like there's been a lot of changes to html-url.c /a On Thu, 20 Feb 2003, Jamie Zawinski wrote: Try this, watch it lose: wget --debug -e robots=off -nv -m -nH -np \ http://www.dnalounge.com/flyers/ http://www.dnalounge.com/flyers/ does a

Re: wget --spider doesn't work for ftp URLs

2003-03-06 Thread Aaron S. Hawley
a patch was submitted: http://www.mail-archive.com/wget%40sunsite.dk/msg04645.html On Thu, 6 Mar 2003, Keith Thompson wrote: When invoked with an ftp://; URL, wget's the --spider option is silently ignored and the file is downloaded. This applies to wget version 1.8.2. To demonstrate:

wget future (was Re: Not 100% rfc 1738 complience for FTP URLs =bug

2003-03-17 Thread Aaron S. Hawley
On Thu, 13 Mar 2003, Max Bowsher wrote: David Balazic wrote: So it is do it yourself , huh ? :-) More to the point, *no one* is available who has cvs write access. what if for the time being the task of keeping track of submissions for wget was done with its debian package?

Re: Wget a Post Form

2003-03-18 Thread Aaron S. Hawley
my guess is that this probably isn't in the manual. % wget --version GNU Wget 1.9-beta Copyright (C) 1995, 1996, 1997, 1998, 2000, 2001 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of

[wget] maintainer

2003-03-20 Thread Aaron S. Hawley
rms put this up last week quoted from: http://www.gnu.org/help/help.html How to Help the GNU Project This list is ordered roughly with the more urgent items near the top. Please note that many things on this list link to larger, expanded lists. * We are looking for new maintainers for

Re: method to download this url

2003-06-06 Thread Aaron S. Hawley
Wget Manual: Types of Files http://www.gnu.org/manual/wget/html_node/wget_19.html On Thu, 5 Jun 2003, Payal Rathod wrote: Hi all, I need some kind of method to download only the questions andn answers listedk in this url. I don't want any picutes, just question andtheir answers. The url is

string comparison macro (was Re: using user-agent to identify forrobots.txt

2003-05-30 Thread Aaron S. Hawley
while investigating this bug i noticed the following macro defined on line 197 of wget.h: /* The same as above, except the comparison is case-insensitive. */ #define BOUNDED_EQUAL_NO_CASE(beg, end, string_literal) \ ((end) - (beg) == sizeof (string_literal) - 1 \ !strncasecmp

Re: using user-agent to identify for robots.txt

2003-05-30 Thread Aaron S. Hawley
This patch seems to do user-agent checks correctly (it might have been broken previously) with a correction to a string comparison macro. The patch also uses the value of the --user-agent option when enforcing robots.txt rules. this patch is against CVS, more on that here:

Re: string comparison macro (was Re: using user-agent to identifyfor

2003-05-30 Thread Aaron S. Hawley
yeah, i guess that patch is really bad. http://www.gnu.org/manual/glibc/html_node/String-Length.html On Thu, 29 May 2003, Larry Jones wrote: Aaron S. Hawley writes: shouldn't it be strlen not sizeof? No. An array is not converted to a pointer when it is the argument of sizeof, so

Re: using user-agent to identify

2003-05-30 Thread Aaron S. Hawley
: Aaron S. Hawley writes: yeah, i guess that patch is really bad. Yes, it is. ;-) -Larry JonesIndex: src/init.c === RCS file: /pack/anoncvs/wget/src/init.c,v retrieving revision 1.54 diff -u -u -r1.54 init.c --- src/init.c 2002/08

Re: Comment handling

2003-05-31 Thread Aaron S. Hawley
On Fri, 30 May 2003, George Prekas wrote: I have found a bug in Wget version 1.8.2 concerning comment handling ( !-- comment -- ). Take a look at the following illegal HTML code: HTML BODY a href=test1.htmltest1.html/a !-- a href=test2.htmltest2.html/a !-- /BODY /HTML Now, save the

Re: wget vs mms://*.wmv?

2003-06-04 Thread Aaron S. Hawley
another propretary protocol brought to you by the folks in redmond, washington. http://sdp.ppona.com/ http://geocities.com/majormms/ On Sun, 1 Jun 2003, Andrzej Kasperowicz wrote: How could I download using wget that: mms://mms.itvp.pl/bush_archiwum/bush.wmv If wget cannot manage it then

Re: Comment handling

2003-06-05 Thread Aaron S. Hawley
On Wed, 4 Jun 2003, Tony Lewis wrote: Adding this function to wget seems reasonable to me, but I'd suggest that it be off by default and enabled from the command line with something like --quirky_comments. why not just have the default wget behavior follow comments explicitly (i've lost track

Re: Comment handling

2003-06-05 Thread Aaron S. Hawley
. Can you change the [insert Wget comment mode] comment mode to (not) recognize my comments? i think the idea of quirky comments modes are cool, but is it the better solution? /a On Wed, 4 Jun 2003, Aaron S. Hawley wrote: why not just have the default wget behavior follow comments explicitly (i've

Re: Comment handling

2003-06-05 Thread Aaron S. Hawley
On Wed, 4 Jun 2003, George Prekas wrote: snip i think the idea of quirky comments modes are cool, but is it the better solution? Do you think that the current algorithm shouldn't be improved? Even, a little bit to handle the common mistakes? i think Wget's default behavior should be

Re: WGET help needed

2003-06-11 Thread Aaron S. Hawley
http://www.gnu.org/manual/wget/ On Wed, 11 Jun 2003, Support, DemoG wrote: hello, I need help on this subject: Please tell me what is the command line if i wanted to get all the files, subdirectories with all contained from a ftp like ftp.mine.com also i have the user and pass, and i will

Re: wget recursion options ?

2003-06-12 Thread Aaron S. Hawley
there doesn't seem to be anything wrong with the page. are you having trouble with recursive wgets with other (all) pages or just this one. /a

Re: trailing '/' of include-directories removed bug

2003-06-12 Thread Aaron S. Hawley
above the code segment you submitted (line 765 of init.c) the comment: /* Strip the trailing slashes from directories. */ here are the manual notes on this option: (from Recursive Accept/Reject Options) `-I list' `--include-directories=list' Specify a comma-separated list of directories

Re: trailing '/' of include-directories removed bug

2003-06-12 Thread Aaron S. Hawley
=biz.yahoo.com -I /r/ 'http://biz.yahoo.com/r/' $ ls biz.yahoo.com/ r/ reports/research/ $ I want only '/r/', but it crawls /r*, which includes /reports/, /research/. Is it an expected result or a bug? Thanks alot! --- Aaron S. Hawley [EMAIL PROTECTED] wrote: above

Re: trailing '/' of include-directories removed bug

2003-06-13 Thread Aaron S. Hawley
no, i think your original idea of getting rid of the code that removes the trailing slash is a better idea. i think this would fix it but keep the degenerate case of root directory (whatever that's about): Index: src/init.c === RCS

Re: trailing '/' of include-directories removed bug

2003-06-16 Thread Aaron S. Hawley
you're right, the include-directories option operates much the same way (my guess in the interest of speed) as the rest of the accept/reject options. which (others have also noticed) is a little flakey. /a On Fri, 13 Jun 2003, wei ye wrote: Did you test your patch? I patched it on my source

Re: suggestion

2003-06-17 Thread Aaron S. Hawley
it's available in the CVS version.. information at: http://www.gnu.org/software/wget/ On Tue, 17 Jun 2003, Roman Dusek wrote: Dear Sirs, thanks for WGet, it's a great tool. I would very appreciate one more option: a possibility to get http page using POST method instead of GET. Cheers,

--spider problems

2003-06-17 Thread Aaron S. Hawley
I use the --spider option a lot and don't have trouble with most sites. When using the --spider option for the Mozilla website I get a 500 error response. Without the --spider option I don't receive the problem. Any guesses? $ wget --debug --spider www.mozilla.org DEBUG output created by Wget

RE: wget feature requests

2003-06-17 Thread Aaron S. Hawley
i submitted a patch in february. http://www.mail-archive.com/wget%40sunsite.dk/msg04645.html http://www.geocrawler.com/archives/3/409/2003/2/100/10313375/ On Tue, 17 Jun 2003, Peschko, Edward wrote: Just upgraded to 1.8.2 and ok, I think I see the problem here... --spider only works with

--spider v. Server Support

2003-06-18 Thread Aaron S. Hawley
Here's a test case for the --spider option. perhaps helpful for documentation? using wget on about 17,000 URLs (these are in the FSF/UNESCO Free Software Directory and are not by any means unique). out of these about 395 generate errors when run with the spider option (--spider) of the wget

Re: Windows Schedule tool for starting/stopping wget?

2003-07-03 Thread Aaron S. Hawley
no such facility currently exists for wget. this is a question of job control and is better directed at your operating system. On Thu, 3 Jul 2003 [EMAIL PROTECTED] wrote: Hi I'm calling the wget program via a .bat file on a win2000 PC. Works ok. I have to schedule the start/stop of this, so

Re: Deleting files locally, that is not present remote any more?

2003-07-03 Thread Aaron S. Hawley
the feature to locally delete mirrored files that were not downloaded from the server on the most recent wget --mirror has been requested previously. On Thu, 3 Jul 2003 [EMAIL PROTECTED] wrote: Hi Just started to test wget on a win2000 PC. I'm using the mirror functionality, and it seems to

Re: Calling cgi from the downloaded page - Simulating a Browser

2003-07-08 Thread Aaron S. Hawley
try: wget -p On Tue, 8 Jul 2003 [EMAIL PROTECTED] wrote: I am able to download a HTML-page. Taht page has several cgi-calls generating images. When calling that page with a browser the images are generated and stored for further usage in a image cache on the server. I was expecting that

Re: Capture HTML Stream

2003-07-09 Thread Aaron S. Hawley
shit, i'd just use lynx or links to do links -source www.washpost.com but wget could do wget -O /dev/stdout www.washpost.com On Wed, 9 Jul 2003, Jerry Coleman wrote: Is there a way to suppress the creation of a .html file, and instead redirect the output to stdout? I want to issue a wget

Re: Fw: wget with openssl problems

2003-07-09 Thread Aaron S. Hawley
we're all used to J K's personality, now. On Wed, 9 Jul 2003, Toby Corkindale wrote: What's your problem? That has to be the least informative email I've seen in a long time. tjc (apologies for top-posting in reply) On Thu, Jul 03, 2003 at 03:20:28PM +0200, J K wrote: FUCK

Re: Capture HTML Stream

2003-07-09 Thread Aaron S. Hawley
try also: wget -O - www.washpost.com On Wed, 9 Jul 2003, Gisle Vanem wrote: Aaron S. Hawley [EMAIL PROTECTED] said: but wget could do wget -O /dev/stdout www.washpost.com On DOS/Windows too? I think not. There must be a better way. --gv

Re: selecting range to download with wget

2003-07-10 Thread Aaron S. Hawley
how about the -Q, or --quota, option? On Thu, 10 Jul 2003, fehmi ben njima wrote: hello i am using an usb key a storage disk in school and i wana download file that are biger than the capacity of the usb disc so i wana a script or modification to make in wget source code so i can specifiy

Re: Feature Request: Fixed wait

2003-06-17 Thread Aaron S. Hawley
how is your request different than --wait ? On Mon, 16 Jun 2003, Wu-Kung Sun wrote: I'd like to request an additional (or modified) option that waits for whatever time specified by the user, no more no less (instead of the linear backoff of --waitretry which is just a slightly less obnoxious

RE: wget: ftp through http proxy not working with 1.8.2. It doeswork with 1.5.3

2003-07-14 Thread Aaron S. Hawley
Wget maintainer: http://www.geocrawler.com/archives/3/409/2003/3/0/10399285/ -- The geocrawler archives for Wget are alive again! On Mon, 14 Jul 2003, Hans Deragon (QA/LMC) wrote: Hi again. Some people have reported experiencing the same problem, but nobody from the development team has

RE: -N option

2003-07-30 Thread Aaron S. Hawley
I guess I like Mark's --ignore-length strategy. and it looks like this could work with a fix to Wget found in this patch: Index: src/ftp.c === RCS file: /pack/anoncvs/wget/src/ftp.c,v retrieving revision 1.61 diff -u -c -r1.61 ftp.c

Re: Reminder: wget has no maintainer

2003-08-14 Thread Aaron S. Hawley
On Tue, 12 Aug 2003, Tony Lewis wrote: Daniel Stenberg wrote: The GNU project is looking for a new maintainer for wget, as the current one wishes to step down. I think that means we need someone who: 1) is proficient in C 2) knows Internet protocols 3) is willing to learn the

Re: help with wget????

2003-08-14 Thread Aaron S. Hawley
searching the web i found out that cygwin has wget and there's also this: http://kimihia.org.nz/projects/cygwget/ /a On Wed, 13 Aug 2003, Shell Gellner wrote: Dear Sirs, I've downloaded the GNU software but when I try to run the WGET.exe file it keeps telling me 'is linked to missing

Re: How to force wget to download Java Script links?

2003-09-08 Thread Aaron S. Hawley
wget doesn't have a javascript interpreter. On Mon, 8 Sep 2003, Andrzej Kasperowicz wrote: How to force wget to download Java Script links: http://znik.wbc.lublin.pl/ChemFan/kalkulatory/javascript:wrzenie(): 17:04:44 ERROR 404: Not Found.

Re: Help needed! How to pass XML message to webserver

2003-09-09 Thread Aaron S. Hawley
Wget doesn't currently have http file upload capabilities, but if this XML message can be sent by cgi POST parameters then Wget could probably do it. but you'll need to figure out how exactly the XML message is sent using http. /a On Mon, 8 Sep 2003, Vasudha Chiluka wrote: Hi , I need to

Re: [SPAM?:###] RE: wget -r -p -k -l 5 www.protcast.com doesnt pull some images t hough they are part of the HREF

2003-09-09 Thread Aaron S. Hawley
I, on the other hand, am actually not sure why you're not able to have Wget find the marked up (not javascript) image. Cause it worked for me. % ls -l www.protcast.com/Grafx/menu-contact_\(off\).jpg -rw--- 1 ashawley usr 2377 Jan 10 2003

Re: wget --spider issue

2003-09-10 Thread Aaron S. Hawley
On Wed, 10 Sep 2003, Andreas Belitz wrote: Hi, i have found a problem regarding wget --spider. It works great for any files over http or ftp, but as soon as one of these two conditions occur, wget starts downloading the file: 1. linked files (i'm not 100% sure about this) 2.

Re: wget --spider issue

2003-09-10 Thread Aaron S. Hawley
On Wed, 10 Sep 2003, Andreas Belitz wrote: Hi Aaron S. Hawley, On Wed, 10. September 2003 you wrote: ASH actually, what you call download scripts are actually HTTP redirects, and ASH in this case the redirect is to an FTP server and if you double-check i ASH think you'll find Wget does

Re: suggestion

2003-09-12 Thread Aaron S. Hawley
is -nv (non-verbose) an improvement? $ wget -nv www.johnjosephbachir.org/ 12:50:57 URL:http://www.johnjosephbachir.org/ [3053/3053] - index.html [1] $ wget -nv www.johnjosephbachir.org/m http://www.johnjosephbachir.org/m: 12:51:02 ERROR 404: Not Found. but if you're not satisfied you

DeepVaccum

2003-09-13 Thread Aaron S. Hawley
[saw this on the web..] HexCat Software DeepVaccum http://www.hexcat.com/deepvaccum/ DeepVaccum is a donationware, useful web utility based on GNU wget command line tool. Program includes a vast number of options to fine tune your downloads through both http and ftp protocols. DV enables you to

Re: possible bug in exit status codes

2003-09-15 Thread Aaron S. Hawley
I can verify this in the cvs version. it appears to be isolated to the recursive behavior. /a On Mon, 15 Sep 2003, Dawid Michalczyk wrote: Hello, I'm having problems getting the exit status code to work correctly in the following scenario. The exit code should be 1 yet it is 0

Re: wget downloading a single page when it should recurse

2003-10-17 Thread Aaron S. Hawley
The HTML of those pages contains the meta-tag meta name=robots content=noindex,nofollow / and Wget listened, and only downloaded the first page. Perhaps Wget should give a warning message that the file contained a meta-robots tag, so that people aren't quite so dumb-founded. /a On Fri, 17 Oct

[wget] OT: White House site prevents Iraq material being archived

2003-10-30 Thread Aaron S. Hawley
[for those robots.txt fans] White House site prevents Iraq material being archived http://www.theage.com.au/articles/2003/10/28/1067233141495.html By Sam Varghese October 28, 2003 The White House website http://www.whitehouse.gov/ effectively prevents search engines indexing and archiving

Re: [wget] OT: White House site prevents Iraq material being archived (fwd)

2003-11-03 Thread Aaron S. Hawley
helmet heads [EMAIL PROTECTED]:4000 User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.1) Gecko/20020827 Precedence: list Return-Path: [EMAIL PROTECTED] On Thu, 30 Oct 2003, Aaron S. Hawley wrote: [for those robots.txt fans] White House site prevents Iraq material being archived http

RE: GNU TLS vs. OpenSSL

2003-11-05 Thread Aaron S. Hawley
From Various Licenses and Comments about Them http://www.gnu.ctssn.com/licenses/license-list.html The OpenSSL license. The license of OpenSSL is a conjunction of two licenses, One of them being the license of SSLeay. You must follow both. The combination results in a copyleft free software

Re: --spider parameter

2004-02-11 Thread Aaron S. Hawley
Some sort of URL reporting facility is on the unspoken TODO list. http://www.mail-archive.com/[EMAIL PROTECTED]/msg05282.html /a On Wed, 11 Feb 2004, Olivier SOW wrote: hi, I use Wget to check page state with the --spider parameter I looking for a way to get back only the number server

Re: Missing precis man wget info about -D

2004-09-03 Thread Aaron S. Hawley
at the bottom of the man page it says: SEE ALSO GNU Info entry for wget. this is a cryptic suggestion to type the following on the command-line: info wget this will give you the GNU Wget user manual where you'll find clear examples. info is the documentation format most all GNU

Re: wget 1.9.1

2004-10-18 Thread Aaron S. Hawley
On Mon, 18 Oct 2004, Gerriet M. Denkmann wrote: So - is this a bug, did I misunderstand the documentation, did I use the wrong options? Reasonable request. You just couldn't find the archives: http://www.mail-archive.com/[EMAIL PROTECTED]/msg06626.html more:

Wget web site broken

2004-12-03 Thread Aaron S. Hawley
There are old links to Wget on the web pointing to: http://www.gnu.org/software/wget/wget.html The FSF people have a nice symlink system for package web sites. Simply add a file called `.symlinks' to Wget's CVS web repository with the following line: index.html wget.html Or rename the file to