[Bug-wget] Difficulty downloading a site from archive.org

2011-08-13 Thread phil curb
I've been looking at downloading a site that's on archive.org

I don't have the site in 
front of me now but here are two example pages showing the kind of structure 
i'm working with.  Notice the website is spread in various directories by 
archive.org

http://web.archive.org/web/20090429823419/http://users.dickens.com/~goodrevs/help/INDEX.HTM

http://web.archive.org/web/20090421420227/http://users.dickens.com/~goodrevs/home.html

Of course I don't want to download the whole of the internet!  so wouldn't want 
to do the whole archive.org domain!

All the URLs  I want have the string http://users.dickens.com/~goodrevs/  in 
them. 

But notice that they're not all within the same directory higher up. one page 
is in 20090429823419  another is in 20090421420227

but they are all in http://users.dickens.com/~goodrevs/ within archive.org

How should I go about this, 

What are my options?



Re: [Bug-wget] wget-1.13 on AIX

2011-08-13 Thread Perry Smith
Hi,

On my 6.1 system I do not have flex so I removed the include from css.c and it 
compiled.  On my 5.3 system I do have flex.  Removing it from css.l worked as 
well.

I am configuring --without-ssl but I'm assuming that will not make a difference 
for this.

Thanks,
Perry

On Aug 12, 2011, at 10:20 AM, Giuseppe Scrivano wrote:

 Hello Perry,
 
 thanks to have reported it.  Does it work correctly if you drop the
 #include wget.h line from css.l?
 
 === modified file 'src/css.l'
 --- src/css.l 2011-01-01 12:19:37 +
 +++ src/css.l 2011-08-12 15:18:23 +
 @@ -36,7 +36,6 @@
 
 #define YY_NO_INPUT
 
 -#include wget.h
 #include css-tokens.h
 
 %}
 
 
 Thanks,
 Giuseppe
 
 
 
 Perry Smith pedz...@gmail.com writes:
 
 Hi,
 
 I've tried this on AIX 5.3 and 6.1.
 
 The problem is with src/css.c.  In essence it is doing this:
 
 #include stdio.h
 #include string.h
 #include errno.h
 #include stdlib.h
 #include inttypes.h
 #define _LARGE_FILES
 #include unistd.h
 
 
 The #define of _LARGE_FILES is actually done in config.h via wget.h.
 
 I understand that AIX is very hard to deal with but this seems like a
 bad idea for any platform.  If you are going to declare that you want
 _LARGE_FILE support, you need to do that before any system includes.
 What this causes is both _LARGE_FILES and _LARGE_FILE_API both get
 defined and that causes one place to declare (for example)
 
 #define ftruncate   ftruncate64
 
 
 (this is in unistd.h around line 733)
 
 and then later we have:
 
extern int  ftruncate(int, off_t);
 #ifdef _LARGE_FILE_API
extern int  ftruncate64(int, off64_t);
 #endif
 
 
 (around line 799) which the compiler complains about with:
 
 /usr/include/unistd.h:801: error: conflicting types for 'ftruncate64'
 /usr/include/unistd.h:799: error: previous declaration of 'ftruncate64' was 
 here
 
 
 There are actually several pairs of these.
 
 With the above code snippet, if you move the #define to the top, (or 
 completely remove it) the compile works fine.
 
 It just seems like it would be prudent to declare things like
 _LARGE_FILES in config.h (like you do) but put config.h as the first
 include of each file so that the entire code base knows which
 interface the program wants to use.
 
 What I did was to move css.c to _css.c.  I put an #ifndef _CONFIG_H wrapper 
 inside config.h and then the new css.c was simply:
 
 #include config.h
 #include _css.c
 
 and that worked for my 5.3 system.  I have not tried it on my 6.1 system yet.
 
 I hope this helps someone.
 
 Thank you,
 pedz




Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available

2011-08-13 Thread Giuseppe Scrivano
Jochen Roderburg roderb...@uni-koeln.de writes:

 And in general they seem to want to steer away the users from openssl
 to gnutls and in order to do that the configure script doesn't even
 mention this option any longer.  :-(

 And in the same vein the option --with-libssl-prefix has completely
 disappeared, which used to be helpful when you had your preferred ssl
 library in a non-standard place. Now you have to trick around with
 compiler options to achieve that.

it is fixed in the current development version, and the fix will be
included in the wget release I am going to do in the next few days.

It was already reported on this mailing list some days ago, and it was
the reason why wget 1.13 wasn't released :-)

Cheers,
Giuseppe



Re: [Bug-wget] Difficulty downloading a site from archive.org

2011-08-13 Thread Micah Cowan

On 08/12/2011 11:56 AM, phil curb wrote:

I've been looking at downloading a site that's on archive.org


Archive.org's TOS on their website expressly forbids the use of 
downloading agents, and names wget explicitly.


All URLs on archive.org always point at the _original_ (either modern, 
or nonexistent) locations they pointed to when they were archived. These 
links are pretty much never the ones you want. Then they embed some 
JavaScript that goes through and rewrites all these URLs to point at 
archive.org. This means that in a browser, you'll see the correct URLs 
when you hover, and when you click to follow.


The problem of course is that tools like wget won't run the script, so 
the original (useless) URLs remain, and it tries to follow these. Not 
really a lot you can do about it without rolling up your sleeves and 
hacking around the problem. But as I say, their TOS forbids you from 
accessing their site with wget anyway... they want you to always use 
their site directly.


(I'd be interested in knowing whether folks actually have legal 
obligations to respect TOS to an unrestricted-access site like that... I 
imagine it might even vary by location)


--
Micah J. Cowan
http://micah.cowan.name/



Re: [Bug-wget] Difficulty downloading a site from archive.org

2011-08-13 Thread Tony Lewis
Micah Cowan wrote:

 (I'd be interested in knowing whether folks actually have legal 
 obligations to respect TOS to an unrestricted-access site like that... I 
 imagine it might even vary by location)

What terms of service? I didn't see any terms of service (perhaps because I
didn't look for them and wouldn't want to read them anyway). :-)

Tony




Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available

2011-08-13 Thread Douglas Mencken
  If you want to use OpenSSL then you have to pass --with-ssl=openssl.

I hope this would be mentioned in README and/or INSTALL. And that
configure.ac will be fixed to say something better than stupid
--with-ssl was given, but GNUTLS is not available (especially, when
--with-ssl hasn't been explicitly given at all — this do really
confuses people). I suppose, plain ./configure would give me the same
error too.