Re: Please help
kayode giwa [EMAIL PROTECTED] writes: am new to wget and I was wondering if any one out there can assist me with the following error messages in my config.log file, What do I need to do to get wget working ? please respond !! $ ./configure PATH: /usr/ucb ## --- ## ## Core tests. ## ## --- ## configure:1502: configuring for GNU Wget 1.10 configure:1539: checking build system type configure:1557: result: sparc-sun-solaris2.9 configure:1565: checking host system type configure:1579: result: sparc-sun-solaris2.9 configure:1659: checking whether make sets $(MAKE) configure:1683: result: no configure:1702: checking for a BSD-compatible install configure:1757: result: ./install-sh -c configure:1819: checking for gcc configure:1848: result: no configure:1899: checking for cc configure:1915: found /usr/ucb/cc configure:1925: result: cc configure:2089: checking for C compiler version configure:2092: cc --version /dev/null 5 ***/usr/ucb/cc: language optional software package not installed*** configure:2095: $? = 1 configure:2097: cc -v /dev/null 5 /usr/ucb/cc: language optional software package not installed configure:2100: $? = 1 configure:2102: cc -V /dev/null 5 ***/usr/ucb/cc: language optional software package not installed* configure:2105: $? = 1 configure:2128: checking for C compiler default output file name configure:2131: ccconftest.c 5 /usr/ucb/cc: language optional software package not installed ** configure:2134: $? = 1 configure: failed program was: | /* confdefs.h. */ | Configure cannot figure out where the compiler is; specifically, it finds the /usr/ucb/cc which, on Solaris, is a backward compatibility stub and not a real compiler. You need the SUN C Compiler, a.k.a. Sun Studio or Forte, or the GNU Compiler Collection, gcc. The last one can be installed from various sources, e.g., http://www.blastwave.org/ -- Peter
Re: timestamps when downloading multiple files
Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: i agree with hrvoje. but this is just a side-effect of the real problem: the semantics of -O with a multiple files download is not well defined. -O with multiple URLs concatenates all content to the given file. This is intentional and supported: for example, it makes `wget -O- URL1 URL2 URL3' behave like `cat FILE1 FILE2 FILE3', only for URLs, and without creating temporary files. It's a useful feature. Well, at least I find it useful. Maybe not for HTML pages, but I use it for certain data files, where concatenating does make sense. In this case the questions about -r/-k are irrelevant.
Re: Question
Mauro Tortonesi [EMAIL PROTECTED] writes: On Saturday 09 July 2005 10:34 am, Abdurrahman ÃARKACIOÄLU wrote: MS Internet Explorer can save a web page as a whole. That means all the images, Tables, can be saved as a file. It is called as Web Archieve, single file (*.mht). Does it possible for wget ? not at the moment, but it's a planned feature for wget 2.0. Really? I've never heard of a .mht web archive, it seems a Windows-only thing.
Re: timestamps when downloading multiple files
Hrvoje Niksic wrote: Jeroen Demeyer [EMAIL PROTECTED] writes: I am a big fan of wget, but I discovered a minor annoyance (not sure if it even is a bug): When downloading multiple files with wget to a single output (e.g. wget -Oout http://file1 http://file2 http://file3), the timestamp of the resulting file becomes the timestamp of the *last* file downloaded. I think it would make more sense if the timestamp would be the timestamp of the most recent file downloaded. It probably doesn't makes sense to set *any* explicit timestamp on file created with -O from multiple URLs. Current behavior is merely a side-effect of the implementation. But just removing the code that sets the time-stamp would break the behavior for people who use -O with single URL. Changing the current behavior would require complexifying that part of the code; I'm not sure that anything would be gained by such a change. Do you have a use case that breaks on current behavior that would be fixed by introducing the change? How about something like this? (see attachment). This works for me. Note that I have zero experience with wget hacking, so I have no idea what this might break. Jeroen Index: src/http.c === --- src/http.c (revision 2042) +++ src/http.c (working copy) @@ -1995,6 +1995,7 @@ const char *tmrate; uerr_t err; time_t tml = -1, tmr = -1; /* local and remote time-stamps */ + time_t mintmtouch = -1; /* minimum time-stamp for the local file */ wgint local_size = 0;/* the size of the local file */ size_t filename_len; struct http_stat hstat; /* HTTP status */ @@ -2124,6 +2125,20 @@ got_head = false; } } + + /* Look at modification time of our output_document. If we concatenate + multiple documents, we want the resulting local timestamp to be the + maximum of all remote time-stamps. In other words, we should never + touch the output_document such that it becomes older. */ + if (opt.output_document output_stream_regular) + { + if (stat (opt.output_document, st) == 0) + /* If the file is empty and always_rest is off, + then ignore the modification time. */ + if (st.st_size 0 || opt.always_rest) + mintmtouch = st.st_mtime; + } + /* Reset the counter. */ count = 0; *dt = 0; @@ -2368,7 +2383,8 @@ else fl = *hstat.local_file; if (fl) - touch (fl, tmr); + /* the time becomes the maximum of mintmtouch and tmr */ + touch (fl, (mintmtouch != (time_t) (-1) mintmtouch tmr) ? mintmtouch : tmr); } /* End of time-stamping section. */
Re: robots.txt takes precedence over -p
By ignoring robots.txt, it may help reduce frustration when users who aren't familiar with robots.txt can't figure out why the pages they want aren't downloading. The problem with trying to define a default behavior with wget is that it lies somewhere between a web crawler and web browser. Most of the time I've had to tell wget to ignore robots.txt. Therefore I'd rather have that be the default behavior. Maybe a little one-click survey on the wget web site could help you guys make a decision. Frank Post, Mark K wrote: I would say the analogy is closer to a very rabid person operating a web browser. I've never been greatly inconvenienced by having to re-run a download while ignoring the robots.txt file. As I said, respecting robots.txt is not a requirement, but it is polite. I prefer my tools to be polite unless I tell them otherwise. Mark Post -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Monday, August 08, 2005 8:35 PM To: Post, Mark K Cc: [EMAIL PROTECTED] Subject: Re: robots.txt takes precedence over -p On Monday 08 August 2005 07:30 pm, Post, Mark K wrote: I hope that doesn't happen. While respecting robots.txt is not an absolute requirement, it is considered polite. I would not want the default behavior of wget to be considered impolite. IMVHO, hrvoje has a good point when he says that wget behaves like a web browser and, as such, should not required to respect the robots standard. -- Frank McCown Old Dominion University http://www.cs.odu.edu/~fmccown
Re: Question
While the MHT format is not extremely popular yet, I'm betting it will continue to grow in popularity. It encapsulates an entire web page and graphics, javascripts, style sheets, etc into a single text file. This makes it much easier to email and store. See RFC 2557 for more info: http://www.faqs.org/rfcs/rfc2557.html It is currently supported by Netscape and Mozilla Thunderbird. Frank Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: On Saturday 09 July 2005 10:34 am, Abdurrahman ÇARKACIOĞLU wrote: MS Internet Explorer can save a web page as a whole. That means all the images, Tables, can be saved as a file. It is called as Web Archieve, single file (*.mht). Does it possible for wget ? not at the moment, but it's a planned feature for wget 2.0. Really? I've never heard of a .mht web archive, it seems a Windows-only thing. -- Frank McCown Old Dominion University http://www.cs.odu.edu/~fmccown
Re: Question
On Tuesday 09 August 2005 04:37 am, Hrvoje Niksic wrote: Mauro Tortonesi [EMAIL PROTECTED] writes: On Saturday 09 July 2005 10:34 am, Abdurrahman ÃARKACIOÄLU wrote: MS Internet Explorer can save a web page as a whole. That means all the images, Tables, can be saved as a file. It is called as Web Archieve, single file (*.mht). Does it possible for wget ? not at the moment, but it's a planned feature for wget 2.0. Really? I've never heard of a .mht web archive, it seems a Windows-only thing. oops, my fault. i was in a hurry and i misunderstood what Abdurrahman was asking. what i wanted to say is that we talked about supporting the same html file download mode of firefox, in which you save all the related files in a directory with the same name of the document you donwloaded. i think that would be nice. sorry for the misunderstanding. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi http://www.tortonesi.com University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute for Human Machine Cognition http://www.ihmc.us GNU Wget - HTTP/FTP file retrieval tool http://www.gnu.org/software/wget Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
Re: Question
Mauro Tortonesi [EMAIL PROTECTED] writes: oops, my fault. i was in a hurry and i misunderstood what Abdurrahman was asking. what i wanted to say is that we talked about supporting the same html file download mode of firefox, in which you save all the related files in a directory with the same name of the document you donwloaded. i think that would be nice. sorry for the misunderstanding. No problem. Once wget -r/-p is taught to parse links on the fly instead of expecting to find them in fixed on-disk locations, writing to MHT should be easy. It seems to be a MIME-like format that builds on the existing concept of multipart/related messages. Instead of converting links to local files, we'd convert them to identifiers (free-form strings) defined with content-id.
Problems downloading from specific site
Hi, list! I would like to friendly offer a challenge to you. Can you download something from the site www.babene.ru using wget? I always receive the message "ERROR 403: Forbidden", but using Firefox or IE, I download the pictures without any problem. I already tried some user-agent strings, but without success. Thanks in advance. Reginald0
Re: Problems downloading from specific site
Zitat von Reginaldo O. Andrade [EMAIL PROTECTED]: I would like to friendly offer a challenge to you. Can you download something from the site www.babene.ru using wget? I always receive the message ERROR 403: Forbidden, but using Firefox or IE, I download the pictures without any problem. I already tried some user-agent strings, but without success. Not an uncommon problem ;-) They check the referer, which a browser usually sends and which points to the page you are coming from. You can do the following with wget: wget --referer=http://www.babene.ru/ http://www.babene.ru/. Best regards, J.Roderburg