Quoting Thierry Pichevin ([EMAIL PROTECTED]):

> I used the command:
> wget -r -l6 -np -k http://www.apec.asso.fr/metiers/environnement
> 
> 1. small problem: it creates an arborescence 
> www.apec.asso.fr/metiers/environnement, whereas I would have
> expected only the subdirectories of 'environnement' to come

This is the general behaviour of wget. If you want to get just
sibdirectories, you will need to use `--cut-dirs' and
`--no-host-directories'.

> 2. Big problem: many files don't come in: for example file
> 'environnement/directeur_environnement/temoignage.html'.  This file
> is normally obtained from the main page by cliking
> 'directeur_environnement' (Under title "communication et mediation")
> and on the next page by clicking on' Délégué Régional de l'Ademe
> Haute-Normandie' (under title 'temoignage', on the right).  Note
> that other in 'environnement/directeur_environnement/' come
> in... The missing files seem to have a common feature: they are
> viewed via a popup window when clicking on the link.. is this the
> problem?

These URLs are acutally javascript calls. Wget ignores javascript as
it cannot interpret it in any way. It would be probably possible to
modify wget's interal HTML parser to try some heuristic to extract
possible URLs from a `javascript:' URL, but noone has written the code
yet.   

-- jan

--------------------+------------------------------------------------------
 Jan Prikryl        | vr|vis center for virtual reality and visualisation
 <[EMAIL PROTECTED]> | http://www.vrvis.at
--------------------+------------------------------------------------------

Reply via email to