> > Thus it seems that it should not matter what is the sequence of the > > options. If it does I suggest that the developers of wget place > > appriopriate info in the manual. > > Yes, you right. Anyway I found out often that it's sometimes quite tricky > setting up your command line to get exactly what you want. > The way I do it always works fine for me.
Could developers confirm whether sequence of options matters or not? > > The log shows, that you haven't downloaded all the graphics from the main > > page, and also you haven't downloaded that link: > > http://lists.feedle.net/pipermail/minerals/ > > Well, I didn't verify it with the homepage itself. I initially tried without > -e --robots=off and got a message blocking further downloading. > > With this option I could achieve further access for downloading. > I have only tried the one link from above. I doubt. I tried it without the option and I did not have all the graphics. -p option doesn't work as it should. > > I could try to use the -D option, but then probably everything would be > > downloaded from the lists.feedle.net despite the -np option used, > > wouldn't it? > > I don't know exactly how these two options interact with each other. > Ever tried the -m option? Of course I tried, haven't you noticed in my previous posts? > Very often when mirroring I use this line: > > wget -P work:1/ -r -l 2 -H -nc p "http://www.xxx.xx" This is not really proper mirroring, merely downloading. > This would have the side effect downloading other links recursively and from > other hosts if there are any. You see... > But of course you can define a list of allowed dirs and excluded dirs. > I never tried this though. What's the point of mirroring if I would have to define every time allowed and excluded directories? I want to run mirror automaticly, periodicaly from cron, and therefore the options should be as general as possible, so that no matter what changes are done on the site I would still have the site properly mirrored without amending the options all the time. But of course some definitions of directories and sites might be necessary from time to time, but as I shown in my corespondence here it is not possible to define everything that way that mirroring would work properly for all the web elements and the web pages on a particular site. > After all you maybe shouldn't forget the -k option so you can browse these > sites offline. I use it. My conclusion is (and I am really sorry to say that, cause I liked wget until now): Wget sucks (for mirroring at least)! It is useful only for very simple tasks, but when one wants to use it for sites mirroring it is almost useless, it cannot be done fully properly with Wget, as it can be seen in my previous e-mails. Summary: 1. -p option doesn't do what is should be doing. It doesn't download all graphics no matter what is source of the graphics. 2. -P option used with converting links options doesn't allow the links to be properly converted (at least in the current stable wget) 2. -D and -I options do not include paths (directories) in URLs. 3. -np option should IMHO react to the paths after -D and -I options 4. Just everything should be done to enable proper mirroring of the web sites. Multitude options in Wget is just an ilusion. In real life Wget cannot cope with sites mirroring. It is not possible in Wget to set options that way that sites with some foreign elements (graphics) or web pages scattered over several servers (links to different domains) are mirrored correctly. And even if the site would not have the above problems then still the problem with proper convertion of the links exist. Does anyone know any software for linux/unix shell, which would cope to the task of proper mirroring? a.