Question about spidering

Srinivasan Palaniappan Tue, 11 Dec 2007 17:39:36 -0800

Hi,



  I am using WGET version 1.10.2, and trying to crawl through a secured site
(that we are developing for our customer) I noticed two things. WGET is not
downloading all the binaries in the website. It downloads about 30% of it
then skips the rest of the documents. But I don't see any log files that
shows me some kind of error messaging saying unable to download during
spidering, I am not sure I am doing the right thing can you let me know from
the following .wgetrc file and the command line I run.



.wgetrc



exclude_directories =
/ascp/commerce/catalog,/ascp/commerce/checkout,/ascp/commerce/user,/ascp/commerce/common,/ascp/commerce/javascript,/ascp/commerce/css

include_directories = /ascp/commerce,/ascp/commerce/scp/downloads

dir_prefix=\spiderfiles\ascpProd\wget

domains=www.mysite.com

no_parent=on

secure-protocol=SSLv3





command line

-------------------

wget -r l5 --save-headers --no-check-certificate   https://www.mystie.com



In addition, I noticed when the metadata information written to the
downloaded file has only HTTP has scheme, which is somewhat weird do you
know anything about it?



Regards,

Question about spidering

Reply via email to