Re: Referrer Faking and other nifty features
On 12 Apr 2002 at 17:21, Thomas Lussnig wrote: So that if one fd become -1 the loader take an new url and initate the download. And than shedulingwould work with the select(int,) what about this idee ? It would certainly make handling the logging output a bit of a challenge, especially the progress indication.
Re: Referrer Faking and other nifty features
On 2002-04-03 08:50 -0500, Dan Mahoney, System Admin wrote: 1) referrer faking (i.e., wget automatically supplies a referrer based on the, well, referring page) It is the --referer option, see (wget)HTTP Options, from the Info documentation. Yes, that allows me to specify _A_ referrer, like www.aol.com. When I'm trying to help my users mirror their old angelfire pages or something like that, very often the link has to come from the same directory. I'd like to see something where when wget follows a link to another page, or another image, it automatically supplies the URL of the page it followed to get there. Is there a way to do this? Somebody already asked for this and AFAICT, there's no way to do that. 3) Multi-threading. I suppose you mean downloading several URIs in parallel. No, wget doesn't support that. Sometimes, however, one may start several wget in parallel, thanks to the shell (the operator on Bourne shells). No, I mean downloading multiple files from the SAME uri in parallel, instead of downloading files one-by-one-by-one (thus saving time on a fast pipe). This doesn't make sense to me. When downloading from a single server, the bottleneck is generally either the server or the link ; in either case, there's nothing to win by attempting several simultaneous transfers. Unless there are several servers at the same IP and the bottleneck is the server, not the link ? -- André Majorel URL:http://www.teaser.fr/~amajorel/ std::disclaimer (Not speaking for my employer);
Re: Referrer Faking and other nifty features
Andre Majorel wrote: Yes, that allows me to specify _A_ referrer, like www.aol.com. When I'm trying to help my users mirror their old angelfire pages or something like that, very often the link has to come from the same directory. I'd like to see something where when wget follows a link to another page, or another image, it automatically supplies the URL of the page it followed to get there. Is there a way to do this? Somebody already asked for this and AFAICT, there's no way to do that Not only is it possible, it is the behavior (at least in wget 1.8.1). If you run with -d, you will see that every GET after the first one includes the appropriate referer. If I execute: wget -d -r http://www.exelana.com --referer=http://www.aol.com The first request is reported as: GET / HTTP/1.0 User-Agent: Wget/1.8.1 Host: www.exelana.com Accept: */* Connection: Keep-Alive Referer: http://www.aol.com But, the third request is: GET /left.html HTTP/1.0 User-Agent: Wget/1.8.1 Host: www.exelana.com Accept: */* Connection: Keep-Alive Referer: http://www.exelana.com/ The second request is for robots.txt and uses the referer from the command line. Tony
Re: Referrer Faking and other nifty features
Good morning, Please note that I am a wget user, there may be errors. On Tue, Apr 02, 2002 at 11:50:03PM -0500, Dan Mahoney, System Admin wrote: 1) referrer faking (i.e., wget automatically supplies a referrer based on the, well, referring page) It is the --referer option, see (wget)HTTP Options, from the Info documentation. 2) Teh regex support like in the gold package that I can no longer find. No; however you may use shell globs, see (wget)Accept/Reject Options. 3) Multi-threading. I suppose you mean downloading several URIs in parallel. No, wget doesn't support that. Sometimes, however, one may start several wget in parallel, thanks to the shell (the operator on Bourne shells). Also, I have in the past encountered a difficulty with the ~ being escaped the wrong way, has this been fixed? I know at one point one site suggested you modify url.c to fix this. AFAIK, I have never had that problem; maybe it has been fixed. Finally, is there a way to utilize the persistent cookie file that lynx generates to feed wget? There is the --load-cookies=FILE option, see (wget)HTTP Options. How to read the Info documentation: type info wget from a shell. The ? key may help you. Use gHTTP Options to go to node HTTP Options. Have a nice day. -- fabrice bauzac Software should be free. http://www.gnu.org/philosophy/why-free.html