> > Thus it seems that it should not matter what is the sequence of the 
> > options. If it does I suggest that the developers of wget place 
> > appriopriate info in the manual.
> 
> Yes, you right. Anyway I found out often that it's sometimes quite tricky
> setting up your command line to get exactly what you want.
> The way I do it always works fine for me.

Could developers confirm whether sequence of options matters or not?

> > The log shows, that you haven't downloaded all the graphics from the main 
> > page, and also you haven't downloaded that link:
> > http://lists.feedle.net/pipermail/minerals/
> 
> Well, I didn't verify it with the homepage itself. I initially tried without
> -e --robots=off and got a message blocking further downloading.
> 
> With this option I could achieve further access for downloading.
> I have only tried the one link from above.

I doubt. I tried it without the option and I did not have all the 
graphics. -p option doesn't work as it should.

> > I could try to use the -D option, but then probably everything would be 
> > downloaded from the lists.feedle.net despite the -np option used, 
> > wouldn't it? 
> 
> I don't know exactly how these two options interact with each other.
> Ever tried the -m option?

Of course I tried, haven't you noticed in my previous posts?

> Very often when mirroring I use this line:
> 
> wget -P work:1/ -r -l 2 -H -nc p "http://www.xxx.xx";

This is not really proper mirroring, merely downloading.

> This would have the side effect downloading other links recursively and from
> other hosts if there are any.

You see...

> But of course you can define a list of allowed dirs and excluded dirs.
> I never tried this though.

What's the point of mirroring if I would have to define every time 
allowed and excluded directories? 
I want to run mirror automaticly, periodicaly from cron, and therefore 
the options should be as general as possible, so that no matter what 
changes are done on the site I would still have the site properly 
mirrored without amending the options all the time.
But of course some definitions of directories and sites might be 
necessary from time to time, but as I shown in my corespondence here it 
is not possible to define everything that way that mirroring would work 
properly for all the web elements and the web pages on a particular site.

> After all you maybe shouldn't forget the -k option so you can browse these
> sites offline.

I use it.

My conclusion is (and I am really sorry to say that, cause I liked wget 
until now): 
Wget sucks (for mirroring at least)!

It is useful only for very simple tasks, but when one wants to use it for 
sites mirroring it is almost useless, it cannot be done fully properly 
with Wget, as it can be seen in my previous e-mails.

Summary:
1. -p option doesn't do what is should be doing. It doesn't download all 
graphics no matter what is source of the graphics.
2. -P option used with converting links options doesn't allow the links 
to be properly converted (at least in the current stable wget)
2. -D and -I options do not include paths (directories) in URLs. 
3. -np option should IMHO react to the paths after -D and -I options
4. Just everything should be done to enable proper mirroring of the web 
sites.

Multitude options in Wget is just an ilusion. In real life Wget cannot 
cope with sites mirroring. It is not possible in Wget to set options that 
way that sites with some foreign elements (graphics) or web pages 
scattered over several servers (links to different domains) are mirrored 
correctly. And even if the site would not have the above problems then 
still the problem with proper convertion of the links exist.

Does anyone know any software for linux/unix shell, which would cope to 
the task of proper mirroring?

a.

Reply via email to