-q and -S are incompatible
-q and -S are incompatible and should perhaps produce errors and be noted thus in the docs. BTW, there seems no way to get the -S output, but no progress indicator. -nv, -q kill them both. P.S. one shouldn't have to confirm each bug submission. Once should be enough.
Wget 1.9-beta4 is available for testing
Several bugs fixed since beta3, including a fatal one on Windows. Includes a working Windows implementation of run_with_timeout. Get it from: http://fly.srk.fer.hr/~hniksic/wget/wget-1.9-beta4.tar.gz
Using chunked transfer for HTTP requests?
As I was writing the manual for `--post', I decided that I wasn't happy with this part: Please be aware that Wget needs to know the size of the POST data in advance. Therefore the argument to @code{--post-file} must be a regular file; specifying a FIFO or something like @file{/dev/stdin} won't work. My first impulse was to bemoan Wget's antiquated HTTP code which doesn't understand "chunked" transfer. But, coming to think of it, even if Wget used HTTP/1.1, I don't see how a client can send chunked requests and interoperate with HTTP/1.0 servers. The thing is, to be certain that you can use chunked transfer, you have to know you're dealing with an HTTP/1.1 server. But you can't know that until you receive a response. And you don't get a response until you've finished sending the request. A chicken-and-egg problem! Of course, once a response is received, we could remember that we're dealing with an HTTP/1.1 server, but that information is all but useless, since Wget's `--post' is typically used to POST information to one URL and exit. Is there a sane way to stream data to HTTP/1.0 servers that expect POST?
Re: Web page "source" using wget?
Hrvoje Niksic wrote: > Wget 1.9 can send POST data. That slipped past me. :-) Just in case I'm not the only one in that boat... >From NEWS: ** It is now possible to specify that POST method be used for HTTP requests. For example, `wget --post-data="id=foo&data=bar" URL' will send a POST request with the specified contents. wget --help HTTP options: --post-data=STRINGuse the POST method; send STRING as the data. --post-file=FILE use the POST method; send contents of FILE.
Re: Web page "source" using wget?
"Suhas Tembe" <[EMAIL PROTECTED]> writes: > Hello Everyone, > > I am new to this wget utility, so pardon my ignorance.. Here is a > brief explanation of what I am currently doing: > > 1). I go to our customer's website every day & log in using a User Name & Password. > 2). I click on 3 links before I get to the page I want. > 3). I right-click on the page & choose "view source". It opens it up in Notepad. > 4). I save the "source" to a file & subsequently perform various tasks on that file. > > As you can see, it is a manual process. What I would like to do is > automate this process of obtaining the "source" of a page using > wget. Is this possible? Maybe you can give me some suggestions. It's possible, in fact it's what Wget does in its most basic form. Disregarding authentication, the recipe would be: 1) Write down the URL. 2) Type `wget URL' and you get the source of the page in file named SOMETHING.html, where SOMETHING is the file name that the URL ends with. Of course, you will also have to specify the credentials to the page, and Tony explained how to do that.
Re: Problems ands suggestions
Thanks for the report, I agree with your suggestions. IIRC, the "no such file or directory" is a kludge that comes from the inability of the FTP code to distinguish between different kinds of errors in RETR. In fact, Wget's FTP code is old somewhat crufty, although it works well in practice. The problems will probably be fixed at a later release, when I get around to revamping the backend implementations. Patches that fix the problems in the current codebase are very likely to be accepted.
Re: Web page "source" using wget?
"Tony Lewis" <[EMAIL PROTECTED]> writes: > wget > http://www.custsite.com/some/page.html --http-user=USER --http-passwd=PASS > > If you supply your user ID and password via a web form, it will be > tricky (if not impossible) because wget doesn't POST forms (unless > someone added that option while I wasn't looking. :-) Wget 1.9 can send POST data. But there's a simpler way to handle web sites that use cookies for authorization: make Wget use the site's own cookie. Export cookies as explained in the manual, and specify: wget --load-cookies=COOKIE-FILE http://... Here is an excerpt from the manual section that explains how to export cookies. `--load-cookies FILE' Load cookies from FILE before the first HTTP retrieval. FILE is a textual file in the format originally used by Netscape's `cookies.txt' file. You will typically use this option when mirroring sites that require that you be logged in to access some or all of their content. The login process typically works by the web server issuing an HTTP cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so proves your identity. Mirroring such a site requires Wget to send the same cookies your browser sends when communicating with the site. This is achieved by `--load-cookies'--simply point Wget to the location of the `cookies.txt' file, and it will send the same cookies your browser would send in the same situation. Different browsers keep textual cookie files in different locations: Netscape 4.x. The cookies are in `~/.netscape/cookies.txt'. Mozilla and Netscape 6.x. Mozilla's cookie file is also named `cookies.txt', located somewhere under `~/.mozilla', in the directory of your profile. The full path usually ends up looking somewhat like `~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'. Internet Explorer. You can produce a cookie file Wget can use by using the File menu, Import and Export, Export Cookies. This has been tested with Internet Explorer 5; it is not guaranteed to work with earlier versions. Other browsers. If you are using a different browser to create your cookies, `--load-cookies' will only work if you can locate or produce a cookie file in the Netscape format that Wget expects. If you cannot use `--load-cookies', there might still be an alternative. If your browser supports a "cookie manager", you can use it to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually instruct Wget to send those cookies, bypassing the "official" cookie support: wget --cookies=off --header "Cookie: NAME=VALUE"
Re: Web page "source" using wget?
Suhas Tembe wrote: > 1). I go to our customer's website every day & log in using a User Name & > Password. [snip] > 4). I save the "source" to a file & subsequently perform various tasks on > that file. > > What I would like to do is automate this process of obtaining the "source" > of a page using wget. Is this possible? That depends on how you enter your user name and password. If it's via using an HTTP user ID and password, that's pretty easy. wget http://www.custsite.com/some/page.html --http-user=USER --http-passwd=PASS If you supply your user ID and password via a web form, it will be tricky (if not impossible) because wget doesn't POST forms (unless someone added that option while I wasn't looking. :-) Tony
Problems ands suggestions
I'm a big fan of wget. I've been usnig it for quite a while now, and am now testing the 1.9beta3 on win2k. First of all, I'd like to suggest a couple of things: # it should be possible to tell wget to ignore a couple of errors: FTPLOGINC // FTPs often give out this error when they're full. I want it to keep trying CONREFUSED // the FTP may be temporarily down FTPLOGREFUSED // the FTP may be full FTPSRVERR // freakish errors happen every once in a while # if I tell to download files from a list, and it fails, it should still obey the waitretry timeout as was pointed out by someone else earlier (I don't have the time right now to go look for the post) and now for the problem: Apparently, when wget has a problem during a transfer in win32 and dies, it the starts saying: "failed: No such file or directory" this has hapened to me on HTTPs, I have to check to see what the real error is (I'm guessing CONREFUSED), but it shouldn't be givin this error anyway. thanks for everything ;) _ Help STOP SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail
Web page "source" using wget?
Hello Everyone, I am new to this wget utility, so pardon my ignorance.. Here is a brief explanation of what I am currently doing: 1). I go to our customer's website every day & log in using a User Name & Password. 2). I click on 3 links before I get to the page I want. 3). I right-click on the page & choose "view source". It opens it up in Notepad. 4). I save the "source" to a file & subsequently perform various tasks on that file. As you can see, it is a manual process. What I would like to do is automate this process of obtaining the "source" of a page using wget. Is this possible? Maybe you can give me some suggestions. Thanks in advance. Suhas
New win binary was (RE: Compilation breakage in html-parse.c)
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] > > This might be one cause for compilation breakage in html-parse.c. > It's a Gcc-ism/c99-ism/c++-ism, depending on how you look at it, fixed > by this patch: > > 2003-10-03 Hrvoje Niksic <[EMAIL PROTECTED]> > > * html-parse.c (convert_and_copy): Move variable declarations > before statements. Either this or another patch resolved - I didn't have time to track it down for good. Didn't even read the Changelog, just a quick export, make, minimal test, put up on site. New msvc binary from current cvs at http://xoomer.virgilio.it/hherold (yes, ISP decided to change the url. Old urls do still work). Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax
RE: Bug in Windows binary?
> From: Gisle Vanem [mailto:[EMAIL PROTECTED] > "Jens Rösner" <[EMAIL PROTECTED]> said: > ... > I assume Heiko didn't notice it because he doesn't have that function > in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP? > > --gv Probably. Currently I'm compiling and testing on NT 4.0 only. Beside that I'm VERY tight on time in this moment so testing usually means "does it run ? Does it download one sample http and one https site ? Yes ? Put it up for testing!". Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax
Re: no-clobber add more suffix
Hi Sergey! -nc does not only apply to .htm(l) files. All files are considered. At least in all wget versions I know of. I cannot comment on your suggestion, to restrict -nc to a user-specified list of file types. I personally don't need it, but I could imagine certain situations were this could indeed be helpful. Hopefully someone with more knowledge than me can elaborate a bit more on this :) CU Jens > `--no-clobber' is very usfull option, but i retrive document not only with > .html/.htm suffix. > > Make addition option that like -A/-R define all allowed/rejected rules > for -nc option. > -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++
Re: can wget disable HTTP Location Forward ?
There is currently no way to disable following redirects. A patch to do so has been submitted recently, but I didn't see a good reason why one would need it, so I didn't add the option. Your mail is a good argument, but I don't know how prevalent that behavior is. What is it with servers that can't be bothered to return 404? Are there lots of them nowadays? Is a new default setting of Apache or IIS to blame, or are people intentionally screwing up their configurations?
Re: subscribe wget
To subscribe to this list, please send mail to <[EMAIL PROTECTED]>.
can wget disable HTTP Location Forward ?
hello bug-wget, When I use wget to grab some page that does not exist, the server usually return a location redirect to it's home page (instead of 404 error), then the wget will follow this link, which cause an un-necessary HTTP request. so my question is which option I should use to disable this default behavior.thanks. Regards, fwei [EMAIL PROTECTED] 2003-10-06
no-clobber add more suffix
`--no-clobber' is very usfull option, but i retrive document not only with .html/.htm suffix. Make addition option that like -A/-R define all allowed/rejected rules for -nc option.