Re: Referrer Faking and other nifty features

2002-04-12 Thread Ian Abbott

On 12 Apr 2002 at 17:21, Thomas Lussnig wrote:

 So that if one fd become -1 the loader take an new url and initate the 
 download.
 
 And than shedulingwould work with the select(int,) what about this 
 idee ?

It would certainly make handling the logging output a bit of a
challenge, especially the progress indication.



Re: Referrer Faking and other nifty features

2002-04-03 Thread Andre Majorel

On 2002-04-03 08:50 -0500, Dan Mahoney, System Admin wrote:

   1) referrer faking (i.e., wget automatically supplies a referrer
   based on the, well, referring page)
 
  It is the --referer option, see (wget)HTTP Options, from the Info
  documentation.
 
 Yes, that allows me to specify _A_ referrer, like www.aol.com.  When I'm
 trying to help my users mirror their old angelfire pages or something like
 that, very often the link has to come from the same directory.  I'd like
 to see something where when wget follows a link to another page, or
 another image, it automatically supplies the URL of the page it followed
 to get there.  Is there a way to do this?

Somebody already asked for this and AFAICT, there's no way to do
that.

   3) Multi-threading.
 
  I suppose you mean downloading several URIs in parallel.  No, wget
  doesn't support that.  Sometimes, however, one may start several wget
  in parallel, thanks to the shell (the  operator on Bourne shells).
 
 No, I mean downloading multiple files from the SAME uri in parallel,
 instead of downloading files one-by-one-by-one (thus saving time on a fast
 pipe).

This doesn't make sense to me. When downloading from a single
server, the bottleneck is generally either the server or the link
; in either case, there's nothing to win by attempting several
simultaneous transfers. Unless there are several servers at the
same IP and the bottleneck is the server, not the link ?

-- 
André Majorel URL:http://www.teaser.fr/~amajorel/
std::disclaimer (Not speaking for my employer);



Re: Referrer Faking and other nifty features

2002-04-03 Thread Tony Lewis

Andre Majorel wrote:

  Yes, that allows me to specify _A_ referrer, like www.aol.com.  When I'm
  trying to help my users mirror their old angelfire pages or something
like
  that, very often the link has to come from the same directory.  I'd like
  to see something where when wget follows a link to another page, or
  another image, it automatically supplies the URL of the page it followed
  to get there.  Is there a way to do this?

 Somebody already asked for this and AFAICT, there's no way to do
 that

Not only is it possible, it is the behavior (at least in wget 1.8.1). If you
run with -d, you will see that every GET after the first one includes the
appropriate referer.

If I execute: wget -d -r http://www.exelana.com --referer=http://www.aol.com

The first request is reported as:
GET / HTTP/1.0
User-Agent: Wget/1.8.1
Host: www.exelana.com
Accept: */*
Connection: Keep-Alive
Referer: http://www.aol.com

But, the third request is:
GET /left.html HTTP/1.0
User-Agent: Wget/1.8.1
Host: www.exelana.com
Accept: */*
Connection: Keep-Alive
Referer: http://www.exelana.com/

The second request is for robots.txt and uses the referer from the command
line.

Tony




Re: Referrer Faking and other nifty features

2002-04-02 Thread fabrice bauzac

Good morning,

Please note that I am a wget user, there may be errors.

On Tue, Apr 02, 2002 at 11:50:03PM -0500, Dan Mahoney, System Admin
wrote:

 1) referrer faking (i.e., wget automatically supplies a referrer
 based on the, well, referring page)

It is the --referer option, see (wget)HTTP Options, from the Info
documentation.

 2) Teh regex support like in the gold package that I can no longer
 find.

No; however you may use shell globs, see (wget)Accept/Reject Options.

 3) Multi-threading.

I suppose you mean downloading several URIs in parallel.  No, wget
doesn't support that.  Sometimes, however, one may start several wget
in parallel, thanks to the shell (the  operator on Bourne shells).

 Also, I have in the past encountered a difficulty with the ~ being
 escaped the wrong way, has this been fixed?  I know at one point one
 site suggested you modify url.c to fix this.

AFAIK, I have never had that problem; maybe it has been fixed.

 Finally, is there a way to utilize the persistent cookie file that
 lynx generates to feed wget?

There is the --load-cookies=FILE option, see (wget)HTTP Options.

How to read the Info documentation: type info wget from a shell.
The ? key may help you.  Use gHTTP Options to go to node HTTP
Options.

Have a nice day.

-- 
fabrice bauzac
Software should be free.  http://www.gnu.org/philosophy/why-free.html