Re: missing files

2006-05-10 Thread Curtis Hatter
On Tuesday 09 May 2006 06:18, you wrote:
 Hi all,
 I have a problem: I'm trying to download an entire directory from a site
 and I'm using the command wget -r  -I directory_name  site_name.
 It seems to work but at a certain point it stops but I'm sure that there
 are some files missing that I can download manually and I'm sure that they
 are files like the others that wget can download. Any clue about that?
 Thanks

Have you checked to see if they have a robots.txt file that may restrict the 
download? If it does you'll have to turn off robots, '-e robots=off' on the 
command line.

Curtis


Re: missing files

2006-05-10 Thread Mosetti Giancarlo
Even in this case, how it is possible discriminate between files?
I mean, why can I download some files and I can't  with others that have
similar features? It sounds strange to me... 
Thanks anyway
G


On Wednesday 10 May 2006 15:15, Curtis Hatter wrote:
 On Tuesday 09 May 2006 06:18, you wrote:
  Hi all,
  I have a problem: I'm trying to download an entire directory from a site
  and I'm using the command wget -r  -I directory_name  site_name.
  It seems to work but at a certain point it stops but I'm sure that there
  are some files missing that I can download manually and I'm sure that
  they are files like the others that wget can download. Any clue about
  that? Thanks

 Have you checked to see if they have a robots.txt file that may restrict
 the download? If it does you'll have to turn off robots, '-e robots=off' on
 the command line.

 Curtis

-- 
Per me l'uomo colto non è colui che sa quando è nato Napoleone, ma quello che 
sa dove andare a cercare l'informazione nell'unico momento della sua vita in 
cui gli serve, e in due minuti.
Umberto Eco


Re: missing files

2006-05-10 Thread Steven M. Schweda
 [...]  Any clue about that?

   Not in your posting.  You might say which Wget version you're using,
on which sort of system, and which files are not getting fetched, and
then show the links to those files in the HTML which Wget should have
followed.  Without some actual information about what's happening
(clues), it's not possible to say much which might be useful.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: missing files

2006-05-10 Thread Curtis Hatter
On Wednesday 10 May 2006 09:28, you wrote:
 Even in this case, how it is possible discriminate between files?
 I mean, why can I download some files and I can't  with others that have
 similar features? It sounds strange to me...
 Thanks anyway
 G

Check the link: http://www.robotstxt.org/wc/norobots-rfc.txt

It explains how one can craft a robots.txt file to keep programs like Wget or 
LWP from fetching specific documents.

As was noted by Steven, what platform are you running on? what version of 
wget? what links won't Wget download?

What is the site? If the material is acceptable for a company to download, and 
it's not very large I can try to download it and see if I can recreate your 
problem.

Curtis


Re: missing files

2006-05-10 Thread Mosetti Giancarlo
Thanks a lot, Curtis.
Unfortunately, the material is very large.
Anyway, I will check the link and I'm already using
your suggestion. I will keep you informed,
Thanks again.
G 





On Wednesday 10 May 2006 15:41, Curtis Hatter wrote:
 On Wednesday 10 May 2006 09:28, you wrote:
  Even in this case, how it is possible discriminate between files?
  I mean, why can I download some files and I can't  with others that have
  similar features? It sounds strange to me...
  Thanks anyway
  G

 Check the link: http://www.robotstxt.org/wc/norobots-rfc.txt

 It explains how one can craft a robots.txt file to keep programs like Wget
 or LWP from fetching specific documents.

 As was noted by Steven, what platform are you running on? what version of
 wget? what links won't Wget download?

 What is the site? If the material is acceptable for a company to download,
 and it's not very large I can try to download it and see if I can recreate
 your problem.

 Curtis

-- 
Per me l'uomo colto non è colui che sa quando è nato Napoleone, ma quello che 
sa dove andare a cercare l'informazione nell'unico momento della sua vita in 
cui gli serve, e in due minuti.
Umberto Eco


missing files

2006-05-09 Thread Mosetti Giancarlo
Hi all,
I have a problem: I'm trying to download an entire directory from a site 
and I'm using the command wget -r  -I directory_name  site_name.
It seems to work but at a certain point it stops but I'm sure that there are 
some files missing that I can download manually and I'm sure that they are 
files like the others that wget can download. Any clue about that?
Thanks 
-- 
Per me l'uomo colto non è colui che sa quando è nato Napoleone, ma quello che 
sa dove andare a cercare l'informazione nell'unico momento della sua vita in 
cui gli serve, e in due minuti.
Umberto Eco


Re: MISSING FILES USING WGET

2001-03-08 Thread toto


As far as I understand, the problem is that the missing files are
not directly referenced in the page, but only via a javascript, which
wget cannot follow.
   However, in my case, I know where the missing files are located
(there are in a subdirectory). So what I would need is another script
that could download all the files contained in a given directory down to
a given level of subdirectory.  I think that this, together with the use
of wget, would enable to download everything required to mirror the
site..
  Do you know if this is possible? Do you know any such script?

(please answer to [EMAIL PROTECTED])

Thanks

Thierry Pichevin




Re: MISSING FILES USING WGET

2001-03-08 Thread Thierry Pichevin

If no link points to the
document you're interested in, then wget can't possibly know about its
existance. Unless you tell it on the command line.

And how can this be achieved? Thanks!


T. Pichevin




MISSING FILES USING WGET

2001-03-07 Thread Thierry Pichevin


Dear everybody


I am trying to use Wget to make a mirror site of:
http://www.apec.asso.fr/metiers/environnement

I used the command:
wget -r -l6 -np -k http://www.apec.asso.fr/metiers/environnement

1. small problem: it creates an arborescence 
www.apec.asso.fr/metiers/environnement, whereas I would have expected only the 
subdirectories of 'environnement' to come 

2. Big problem: many files don't come in: for example file 
'environnement/directeur_environnement/temoignage.html'.
This file is normally obtained  from the main page by cliking  
'directeur_environnement' (Under title "communication et mediation") and on the 
next page  by clicking on'  Dlgu Rgional de l'Ademe Haute-Normandie' (under 
title 'temoignage', on the right).
   Note that other in 'environnement/directeur_environnement/' come in... The 
missing files seem to have a common feature: they are viewed via a popup window 
when clicking on the link.. is this the problem? 
   
   Please answer to [EMAIL PROTECTED]
   
   Thanks 
   
   Thierry Pichevin
   
   




Re: MISSING FILES USING WGET

2001-03-07 Thread Jan Prikryl

Quoting Thierry Pichevin ([EMAIL PROTECTED]):

 I used the command:
 wget -r -l6 -np -k http://www.apec.asso.fr/metiers/environnement
 
 1. small problem: it creates an arborescence 
 www.apec.asso.fr/metiers/environnement, whereas I would have
 expected only the subdirectories of 'environnement' to come

This is the general behaviour of wget. If you want to get just
sibdirectories, you will need to use `--cut-dirs' and
`--no-host-directories'.

 2. Big problem: many files don't come in: for example file
 'environnement/directeur_environnement/temoignage.html'.  This file
 is normally obtained from the main page by cliking
 'directeur_environnement' (Under title "communication et mediation")
 and on the next page by clicking on' Dlgu Rgional de l'Ademe
 Haute-Normandie' (under title 'temoignage', on the right).  Note
 that other in 'environnement/directeur_environnement/' come
 in... The missing files seem to have a common feature: they are
 viewed via a popup window when clicking on the link.. is this the
 problem?

These URLs are acutally javascript calls. Wget ignores javascript as
it cannot interpret it in any way. It would be probably possible to
modify wget's interal HTML parser to try some heuristic to extract
possible URLs from a `javascript:' URL, but noone has written the code
yet.   

-- jan

+--
 Jan Prikryl| vr|vis center for virtual reality and visualisation
 [EMAIL PROTECTED] | http://www.vrvis.at
+--