Re: wget problem

Rajesh Thu, 03 Jul 2003 18:13:57 -0700

Hi Tony,

Thanks for your reply. I have tried using the command wget 
--user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)", but it didn't 
work.

I have one more question. In each directory I have a welcome.cfm file on the 
main server (DirectoryIndex order is welcome.cfm welcome.htm welcome.html 
index.html). But, when I run wget on the mirror server, wget renames welcome.cfm 
to index.html and downloads to mirror server.

Why does it change the file name from welcome.cfm to index.html.

How can I mirror a web site using scp?? I can only copy one file at a time using 
scp.

Thanks,
Rajesh.

>From: "Tony Lewis" <[EMAIL PROTECTED]>
>To: "Rajesh" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
>Subject: Re: wget problem
>Date: Thu, 3 Jul 2003 07:46:33 -0700
>MIME-Version: 1.0
>Content-Transfer-Encoding: 7bit
>X-Priority: 3
>X-MSMail-Priority: Normal
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
>
>Rajesh wrote:
>
>> Wget is not mirroring the web site properly. For eg it is not copying
>symbolic
>> links from the main web server.The target directories do exist on the
>mirror
>> server.
>
>wget can only mirror what can be seen from the web. Symbolic links will be
>treated as hard references (assuming that some web page points to them).
>
>If you cannot get there from http://www.sl.nsw.gov.au/ via your browser,
>wget won't get the page.
>
>Also, some servers change their behavior depending on the client. You may
>need to use a user agent that looks like a browser to mirror some sites. For
>example:
>
>wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
>
>will make it look like wget is really Internet Explorer running on Windows
>XP.
>
>> Another problem is some of the files are different on the mirror web
>server.
>> her you again. For eg: compare these 2 attached files.....
>>
>> penrith1.cfm is the file after wget copied from the main server.
>> penrith1.cfm.org is the actual file sitting on the main server.
>
>wget is storing what the web server returned, which may or may not be the
>precise file stored on your system.
>
>In particular, I notice that penrith1.cfm contains "<!--Requested: 17:30:40
>Thursday 3 July 2003 -->". That implies that all or part of the output is
>generated programmatically.
>
>You might try using wget to replicate an FTP version of the website.
>
>Then again, perhaps wget is the wrong tool for your task. Have you
>considered using secure copy (scp) instead?
>
>HTH,
>
>Tony
>

Unix System Administrator
State Library of NSW
Macquarie Street
Sydney - 2000

Email: [EMAIL PROTECTED]
Ph: 02-92731711

====================================
This email and any attachments to it are privileged and confidential. 
If you
are not the intended recipient, please notify the sender and delete 
it. The
contents of this email are not given or endorsed by the State Library 
of New
South Wales unless otherwise indicated by an authorised officer of 
the
Library. Copyright law may also apply to this contents of this email.
====================================

Re: wget problem

Reply via email to