[Bug-wget] Difficulty downloading a site from archive.org
I've been looking at downloading a site that's on archive.org I don't have the site in front of me now but here are two example pages showing the kind of structure i'm working with. Notice the website is spread in various directories by archive.org http://web.archive.org/web/20090429823419/http://users.dickens.com/~goodrevs/help/INDEX.HTM http://web.archive.org/web/20090421420227/http://users.dickens.com/~goodrevs/home.html Of course I don't want to download the whole of the internet! so wouldn't want to do the whole archive.org domain! All the URLs I want have the string http://users.dickens.com/~goodrevs/ in them. But notice that they're not all within the same directory higher up. one page is in 20090429823419 another is in 20090421420227 but they are all in http://users.dickens.com/~goodrevs/ within archive.org How should I go about this, What are my options?
Re: [Bug-wget] wget-1.13 on AIX
Hi, On my 6.1 system I do not have flex so I removed the include from css.c and it compiled. On my 5.3 system I do have flex. Removing it from css.l worked as well. I am configuring --without-ssl but I'm assuming that will not make a difference for this. Thanks, Perry On Aug 12, 2011, at 10:20 AM, Giuseppe Scrivano wrote: Hello Perry, thanks to have reported it. Does it work correctly if you drop the #include wget.h line from css.l? === modified file 'src/css.l' --- src/css.l 2011-01-01 12:19:37 + +++ src/css.l 2011-08-12 15:18:23 + @@ -36,7 +36,6 @@ #define YY_NO_INPUT -#include wget.h #include css-tokens.h %} Thanks, Giuseppe Perry Smith pedz...@gmail.com writes: Hi, I've tried this on AIX 5.3 and 6.1. The problem is with src/css.c. In essence it is doing this: #include stdio.h #include string.h #include errno.h #include stdlib.h #include inttypes.h #define _LARGE_FILES #include unistd.h The #define of _LARGE_FILES is actually done in config.h via wget.h. I understand that AIX is very hard to deal with but this seems like a bad idea for any platform. If you are going to declare that you want _LARGE_FILE support, you need to do that before any system includes. What this causes is both _LARGE_FILES and _LARGE_FILE_API both get defined and that causes one place to declare (for example) #define ftruncate ftruncate64 (this is in unistd.h around line 733) and then later we have: extern int ftruncate(int, off_t); #ifdef _LARGE_FILE_API extern int ftruncate64(int, off64_t); #endif (around line 799) which the compiler complains about with: /usr/include/unistd.h:801: error: conflicting types for 'ftruncate64' /usr/include/unistd.h:799: error: previous declaration of 'ftruncate64' was here There are actually several pairs of these. With the above code snippet, if you move the #define to the top, (or completely remove it) the compile works fine. It just seems like it would be prudent to declare things like _LARGE_FILES in config.h (like you do) but put config.h as the first include of each file so that the entire code base knows which interface the program wants to use. What I did was to move css.c to _css.c. I put an #ifndef _CONFIG_H wrapper inside config.h and then the new css.c was simply: #include config.h #include _css.c and that worked for my 5.3 system. I have not tried it on my 6.1 system yet. I hope this helps someone. Thank you, pedz
Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available
Jochen Roderburg roderb...@uni-koeln.de writes: And in general they seem to want to steer away the users from openssl to gnutls and in order to do that the configure script doesn't even mention this option any longer. :-( And in the same vein the option --with-libssl-prefix has completely disappeared, which used to be helpful when you had your preferred ssl library in a non-standard place. Now you have to trick around with compiler options to achieve that. it is fixed in the current development version, and the fix will be included in the wget release I am going to do in the next few days. It was already reported on this mailing list some days ago, and it was the reason why wget 1.13 wasn't released :-) Cheers, Giuseppe
Re: [Bug-wget] Difficulty downloading a site from archive.org
On 08/12/2011 11:56 AM, phil curb wrote: I've been looking at downloading a site that's on archive.org Archive.org's TOS on their website expressly forbids the use of downloading agents, and names wget explicitly. All URLs on archive.org always point at the _original_ (either modern, or nonexistent) locations they pointed to when they were archived. These links are pretty much never the ones you want. Then they embed some JavaScript that goes through and rewrites all these URLs to point at archive.org. This means that in a browser, you'll see the correct URLs when you hover, and when you click to follow. The problem of course is that tools like wget won't run the script, so the original (useless) URLs remain, and it tries to follow these. Not really a lot you can do about it without rolling up your sleeves and hacking around the problem. But as I say, their TOS forbids you from accessing their site with wget anyway... they want you to always use their site directly. (I'd be interested in knowing whether folks actually have legal obligations to respect TOS to an unrestricted-access site like that... I imagine it might even vary by location) -- Micah J. Cowan http://micah.cowan.name/
Re: [Bug-wget] Difficulty downloading a site from archive.org
Micah Cowan wrote: (I'd be interested in knowing whether folks actually have legal obligations to respect TOS to an unrestricted-access site like that... I imagine it might even vary by location) What terms of service? I didn't see any terms of service (perhaps because I didn't look for them and wouldn't want to read them anyway). :-) Tony
Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available
If you want to use OpenSSL then you have to pass --with-ssl=openssl. I hope this would be mentioned in README and/or INSTALL. And that configure.ac will be fixed to say something better than stupid --with-ssl was given, but GNUTLS is not available (especially, when --with-ssl hasn't been explicitly given at all — this do really confuses people). I suppose, plain ./configure would give me the same error too.