[Bug-wget] [PATCH] Allow openSSL compiled without SSLv2
Hi: the attached patch adds support to an openSSL library compiled without SSlv2 , in which case, wget will behave like if it was using the GNUTLS backend, that is, doing sslv3 only. # Bazaar merge directive format 2 (Bazaar 0.90) # revision_id: cristian@linux-us4g-20110411021140-k71ctv0bcygv05mj # target_branch: bzr://bzr.savannah.gnu.org/wget/trunk/ # testament_sha1: 0b8aab4ce061b99614d52e9fa063e5f604cd0124 # timestamp: 2011-04-10 23:25:17 -0300 # base_revision_id: gscriv...@gnu.org-20110407105651-ofq3ntt3w0h6zkq9 # # Begin patch === modified file 'src/openssl.c' --- src/openssl.c 2011-04-04 14:56:51 + +++ src/openssl.c 2011-04-11 02:11:40 + @@ -187,8 +187,10 @@ meth = SSLv23_client_method (); break; case secure_protocol_sslv2: +#ifndef OPENSSL_NO_SSL2 meth = SSLv2_client_method (); break; +#endif case secure_protocol_sslv3: meth = SSLv3_client_method (); break; # Begin bundle IyBCYXphYXIgcmV2aXNpb24gYnVuZGxlIHY0CiMKQlpoOTFBWSZTWTL7pGAAAXhfgAAQWGf/91Kl zgCwUANa5u9avddMdo4aIk8qeTyJMw00mU9CZM1Hqep6j1DCAZRDTGppTeEp+qaaaAaA00aA DQEkhoQJpplJ6eqemphD1DQyGmQNAj1JT0R6INGhoZAAElEwqeKekeSeUeFNDTRtINABo09T DebO3wy7zPm0CmOnswQYUnz1fe8Kyy7YsMx4fQPzKYzAYzmk0qwlXzef7myR73QL0ZRdEkMy1SVa p9NXxxJBkHC3NNLAdUE+ksVJCKypafCYeTue0SpBPkjoX3wWGRCFrWkaxiGMYGFzEYKgjnDD4g1G k5UeUVhceQcTP1WfSf1bAky2PgHS6fucUBhFq+W86/U+YrBFCVG5i181Uw2jYgLCm5LTc0alGySm 16BMGMq9Vj+HfQpLEqOVw4dhgraqguKSqiUcxPKIzW4E8DgGXNv/T0mIYDADSFlZNMKi1Mpij+IB gxHYx8ch2kzzHgB9ejROs4c5QoXzFF5Yq8ImjQlkysbdcclNy1ysIRIFsM4Swta12Ly7GAdHeo0W hUNRfboKBL5qqNAtxndeDZcFYtz7FW7bGFbZBCd8CcE02kVClGTPxGuSHhqSYrx5eoCiJUJgzPJH RmzAdTL8XzV17XLIN7nwN7YRhRWwNmyQsGhl590SOsNs0zQJNoeUqMlw98d+3eLOBFp44P4bELIY lqiDgxgjF+DBRSdBnJcR8hcDXu4wHA1BzJS6qKZCIdFx1zQ4Z7tryobFDG2iuQaL/Cvwx2pR3zZ0 TYb/GWXeaIlANz0DGh9MchtfC7KSI9HVAfP2QXkKSMmOjtVVrqD4QFkmBv9bTO4TPGwP+aQhg3V7 U2IGqxMrDpS60fUitWQQC5bhZzjTOcL3wmbSDmk7AOxtO73TH6VqxtXg85tYZsopYVOzhHEKQOTt aD5ZBQsKhZtPQ6sMVMrQYE7QgpsEtFNj46FfUo1wM+Qut3OacZJksYYxjIOEHjODhICTLsWAREUa A2yIp1ATNUd95poeB7ANLzSVcu5zKDMkCAgt8EJdifXVaKIfDa6K8cCYclZ1WgpLdHF5XLuUFHXJ Ks+IPs4MXVBpqi6LfTareKvhaBcjOV3VS9PD+DtY+hQGtrhUDjeXxC8JMKmgoBFbd+tNEu++dUsw Yg+ogBmuMwOcymNMjQlhysOQMimA1m//F3JFOFCQMvukYA==
Re: [Bug-wget] How do I tell wget not to follow links in a file?
Okay, I have filed bug #33044 for this issue at https://savannah.gnu.org/bugs/index.php?33044. I've also moved the demo to http://davidskalinder.com/wgettest/ and added a bunch of directories to the unwanted link page to make the problem clearer. It strikes me that this issue must come up fairly frequently, especially for sites with fairly flat directory hierarchies. For example, any site which keeps a recent updates page that includes a link to a previous updates page, both of which contain links to many root-level directories, would be affected. A user who wanted to maintain an up-to-date mirror of such a site would have no option but to download the entire site every week. HTH DS On 04/07/2011 05:26 AM, Giuseppe Scrivano wrote: David Skalinder da...@skalinder.net writes: I want to mirror part of a website that contains two links pages, each of which contains links to many root-level directories and also to the other links page. I want to download recursively all the links from one links page, but not from the other: that is, I want to tell wget download links1 and follow all of its links, but do not download or follow links from links2. I've put a demo of this problem up at http://fangjaw.com/wgettest -- there is a diagram there that might state the problem more clearly. This functionality seems so basic that I assume I must be overlooking something. Clearly wget has been designed to give users control over which files they download; but all I can find is that -X controls both saving and link-following at the directory level, while -R controls saving at the file level but still follows links from unsaved files. why doesn't -X work in the scenario you have described? If all links from `links2' are under /B, you can exclude them using something like: That scenario seems rather unlikely, unless we're talking about autogenerated folder index files... This issue would be resolved if wget had a way to avoid its current behavior of always unconditionally downloading HTML files regardless of what rejection rules say. Then you can just reject that single file (and if need be, download it as part of a separate session. -- Micah J. Cowan http://micah.cowan.name/ I think that's right. As I mention on the demo page, links2 could easily contain links to hundreds of different directories, in which case you're out of luck. As Micah notes, if -R did not download the files at all (or even just downloaded them but did not queue their links), that should fix the problem. Also, if a user could alter the robots.txt file, I think she could make wget act correctly by including something like User-agent: * Disallow: wgettest/links2.html But obviously, most wget users won't have access to the server side. Since (I assume) wget knows how to follow that robots instruction, it seems like it should be able to follow a similar instruction from the client side. David
Re: [Bug-wget] How do I tell wget not to follow links in a file?
It just occurred to me that since wget will perform this task properly if it gets the rule from robots.txt, maybe this issue could be worked around by proxying or spoofing the remote site's robots.txt file locally? That is, I write User-agent: * Disallow: wgettest/links2.html into a file, save it in my home directory, and then somehow tell wget that davidskalinder.com/robots.txt is actually located at /home/user/robots.txt? Does anybody know a convenient way of doing this? Or is there an easier workaround I'm overlooking?
Re: [Bug-wget] [PATCH] Allow openSSL compiled without SSLv2
Thanks for the patch. Committed and pushed. Cheers, Giuseppe Cristian RodrÃguez crrodrig...@opensuse.org writes: Hi: the attached patch adds support to an openSSL library compiled without SSlv2 , in which case, wget will behave like if it was using the GNUTLS backend, that is, doing sslv3 only. # Bazaar merge directive format 2 (Bazaar 0.90) # revision_id: cristian@linux-us4g-20110411021140-k71ctv0bcygv05mj # target_branch: bzr://bzr.savannah.gnu.org/wget/trunk/ # testament_sha1: 0b8aab4ce061b99614d52e9fa063e5f604cd0124 # timestamp: 2011-04-10 23:25:17 -0300 # base_revision_id: gscriv...@gnu.org-20110407105651-ofq3ntt3w0h6zkq9 # # Begin patch === modified file 'src/openssl.c' --- src/openssl.c 2011-04-04 14:56:51 + +++ src/openssl.c 2011-04-11 02:11:40 + @@ -187,8 +187,10 @@ meth = SSLv23_client_method (); break; case secure_protocol_sslv2: +#ifndef OPENSSL_NO_SSL2 meth = SSLv2_client_method (); break; +#endif case secure_protocol_sslv3: meth = SSLv3_client_method (); break; # Begin bundle IyBCYXphYXIgcmV2aXNpb24gYnVuZGxlIHY0CiMKQlpoOTFBWSZTWTL7pGAAAXhfgAAQWGf/91Kl zgCwUANa5u9avddMdo4aIk8qeTyJMw00mU9CZM1Hqep6j1DCAZRDTGppTeEp+qaaaAaA00aA DQEkhoQJpplJ6eqemphD1DQyGmQNAj1JT0R6INGhoZAAElEwqeKekeSeUeFNDTRtINABo09T DebO3wy7zPm0CmOnswQYUnz1fe8Kyy7YsMx4fQPzKYzAYzmk0qwlXzef7myR73QL0ZRdEkMy1SVa p9NXxxJBkHC3NNLAdUE+ksVJCKypafCYeTue0SpBPkjoX3wWGRCFrWkaxiGMYGFzEYKgjnDD4g1G k5UeUVhceQcTP1WfSf1bAky2PgHS6fucUBhFq+W86/U+YrBFCVG5i181Uw2jYgLCm5LTc0alGySm 16BMGMq9Vj+HfQpLEqOVw4dhgraqguKSqiUcxPKIzW4E8DgGXNv/T0mIYDADSFlZNMKi1Mpij+IB gxHYx8ch2kzzHgB9ejROs4c5QoXzFF5Yq8ImjQlkysbdcclNy1ysIRIFsM4Swta12Ly7GAdHeo0W hUNRfboKBL5qqNAtxndeDZcFYtz7FW7bGFbZBCd8CcE02kVClGTPxGuSHhqSYrx5eoCiJUJgzPJH RmzAdTL8XzV17XLIN7nwN7YRhRWwNmyQsGhl590SOsNs0zQJNoeUqMlw98d+3eLOBFp44P4bELIY lqiDgxgjF+DBRSdBnJcR8hcDXu4wHA1BzJS6qKZCIdFx1zQ4Z7tryobFDG2iuQaL/Cvwx2pR3zZ0 TYb/GWXeaIlANz0DGh9MchtfC7KSI9HVAfP2QXkKSMmOjtVVrqD4QFkmBv9bTO4TPGwP+aQhg3V7 U2IGqxMrDpS60fUitWQQC5bhZzjTOcL3wmbSDmk7AOxtO73TH6VqxtXg85tYZsopYVOzhHEKQOTt aD5ZBQsKhZtPQ6sMVMrQYE7QgpsEtFNj46FfUo1wM+Qut3OacZJksYYxjIOEHjODhICTLsWAREUa A2yIp1ATNUd95poeB7ANLzSVcu5zKDMkCAgt8EJdifXVaKIfDa6K8cCYclZ1WgpLdHF5XLuUFHXJ Ks+IPs4MXVBpqi6LfTareKvhaBcjOV3VS9PD+DtY+hQGtrhUDjeXxC8JMKmgoBFbd+tNEu++dUsw Yg+ogBmuMwOcymNMjQlhysOQMimA1m//F3JFOFCQMvukYA==