Re: Please help

2005-08-09 Thread Peter FELECAN
kayode giwa [EMAIL PROTECTED] writes:

  am new to wget and I was wondering if any one out
 there can assist me with the following error messages
 in my config.log file,
  What do I need to do to get wget working ? please
 respond !!
   


 $ ./configure 


 PATH: /usr/ucb


 ## --- ##
 ## Core tests. ##
 ## --- ##

 configure:1502: configuring for GNU Wget 1.10
 configure:1539: checking build system type
 configure:1557: result: sparc-sun-solaris2.9
 configure:1565: checking host system type
 configure:1579: result: sparc-sun-solaris2.9
 configure:1659: checking whether make sets $(MAKE)
 configure:1683: result: no
 configure:1702: checking for a BSD-compatible install
 configure:1757: result: ./install-sh -c
 configure:1819: checking for gcc
 configure:1848: result: no
 configure:1899: checking for cc
 configure:1915: found /usr/ucb/cc
 configure:1925: result: cc
 configure:2089: checking for C compiler version
 configure:2092: cc --version /dev/null 5
 ***/usr/ucb/cc:  language optional software package
 not installed***
 configure:2095: $? = 1
 configure:2097: cc -v /dev/null 5
 /usr/ucb/cc:  language optional software package not
 installed
 configure:2100: $? = 1
 configure:2102: cc -V /dev/null 5
 ***/usr/ucb/cc:  language optional software package
 not installed*
 configure:2105: $? = 1
 configure:2128: checking for C compiler default output
 file name
 configure:2131: ccconftest.c  5
 /usr/ucb/cc:  language optional software package
 not installed **
 configure:2134: $? = 1
 configure: failed program was:
 | /* confdefs.h.  */
 | 

Configure cannot figure out where the compiler is; specifically, it
finds the /usr/ucb/cc which, on Solaris, is a backward compatibility
stub and not a real compiler. You need the SUN C Compiler, a.k.a. Sun
Studio or Forte, or the GNU Compiler Collection, gcc. The last one can
be installed from various sources, e.g., http://www.blastwave.org/

-- 
Peter


Re: timestamps when downloading multiple files

2005-08-09 Thread Jeroen Demeyer
Hrvoje Niksic wrote:
 Mauro Tortonesi [EMAIL PROTECTED] writes:
 
 
i agree with hrvoje. but this is just a side-effect of the real
problem: the semantics of -O with a multiple files download is not
well defined.
 
 
 -O with multiple URLs concatenates all content to the given file.
 This is intentional and supported: for example, it makes `wget -O-
 URL1 URL2 URL3' behave like `cat FILE1 FILE2 FILE3', only for URLs,
 and without creating temporary files.  It's a useful feature.

Well, at least I find it useful.  Maybe not for HTML pages, but I use it
for certain data files, where concatenating does make sense.  In this
case the questions about -r/-k are irrelevant.


Re: Question

2005-08-09 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 On Saturday 09 July 2005 10:34 am, Abdurrahman ÇARKACIOĞLU wrote:
 MS Internet Explorer can save a web page as a whole. That means all the
 images,

 Tables, can be saved as a file. It is called as Web Archieve, single file
 (*.mht).

 Does it possible for wget ?

 not at the moment, but it's a planned feature for wget 2.0.

Really?  I've never heard of a .mht web archive, it seems a
Windows-only thing.


Re: timestamps when downloading multiple files

2005-08-09 Thread Jeroen Demeyer
Hrvoje Niksic wrote:
 Jeroen Demeyer [EMAIL PROTECTED] writes:
 
 
I am a big fan of wget, but I discovered a minor annoyance (not sure
if it even is a bug):

When downloading multiple files with wget to a single output
(e.g. wget -Oout http://file1 http://file2 http://file3), the
timestamp of the resulting file becomes the timestamp of the *last*
file downloaded.

I think it would make more sense if the timestamp would be the
timestamp of the most recent file downloaded.
 
 
 It probably doesn't makes sense to set *any* explicit timestamp on
 file created with -O from multiple URLs.  Current behavior is merely a
 side-effect of the implementation.  But just removing the code that
 sets the time-stamp would break the behavior for people who use -O
 with single URL.
 
 Changing the current behavior would require complexifying that part of
 the code; I'm not sure that anything would be gained by such a change.
 Do you have a use case that breaks on current behavior that would be
 fixed by introducing the change?

How about something like this?  (see attachment).  This works for me.
Note that I have zero experience with wget hacking, so I have no idea
what this might break.

Jeroen
Index: src/http.c
===
--- src/http.c  (revision 2042)
+++ src/http.c  (working copy)
@@ -1995,6 +1995,7 @@
   const char *tmrate;
   uerr_t err;
   time_t tml = -1, tmr = -1;   /* local and remote time-stamps */
+  time_t mintmtouch = -1;  /* minimum time-stamp for the local 
file */
   wgint local_size = 0;/* the size of the local file */
   size_t filename_len;
   struct http_stat hstat;  /* HTTP status */
@@ -2124,6 +2125,20 @@
  got_head = false;
}
 }
+
+   /* Look at modification time of our output_document.  If we concatenate
+  multiple documents, we want the resulting local timestamp to be the
+  maximum of all remote time-stamps.  In other words, we should never
+  touch the output_document such that it becomes older. */
+   if (opt.output_document  output_stream_regular)
+   {
+   if (stat (opt.output_document, st) == 0)
+   /* If the file is empty and always_rest is off,
+  then ignore the modification time. */
+   if (st.st_size  0 || opt.always_rest)
+   mintmtouch = st.st_mtime;
+   }
+  
   /* Reset the counter.  */
   count = 0;
   *dt = 0;
@@ -2368,7 +2383,8 @@
  else
fl = *hstat.local_file;
  if (fl)
-   touch (fl, tmr);
+   /* the time becomes the maximum of mintmtouch and tmr */
+   touch (fl, (mintmtouch != (time_t) (-1)  mintmtouch  tmr) ? 
mintmtouch : tmr);
}
   /* End of time-stamping section.  */
 


Re: robots.txt takes precedence over -p

2005-08-09 Thread Frank McCown
By ignoring robots.txt, it may help reduce frustration when users who 
aren't familiar with robots.txt can't figure out why the pages they want 
aren't downloading.


The problem with trying to define a default behavior with wget is that 
it lies somewhere between a web crawler and web browser.


Most of the time I've had to tell wget to ignore robots.txt.  Therefore 
I'd rather have that be the default behavior.  Maybe a little one-click 
survey on the wget web site could help you guys make a decision.


Frank



Post, Mark K wrote:

I would say the analogy is closer to a very rabid person operating a web
browser.  I've never been greatly inconvenienced by having to re-run a
download while ignoring the robots.txt file.  As I said, respecting
robots.txt is not a requirement, but it is polite.  I prefer my tools to
be polite unless I tell them otherwise.

Mark Post

-Original Message-
From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 08, 2005 8:35 PM

To: Post, Mark K
Cc: [EMAIL PROTECTED]
Subject: Re: robots.txt takes precedence over -p


On Monday 08 August 2005 07:30 pm, Post, Mark K wrote:

I hope that doesn't happen.  While respecting robots.txt is not an 
absolute requirement, it is considered polite.  I would not want the 
default behavior of wget to be considered impolite.



IMVHO, hrvoje has a good point when he says that wget behaves like a web

browser and, as such, should not required to respect the robots
standard.


--
Frank McCown
Old Dominion University
http://www.cs.odu.edu/~fmccown


Re: Question

2005-08-09 Thread Frank McCown
While the MHT format is not extremely popular yet, I'm betting it will 
continue to grow in popularity.  It encapsulates an entire web page and 
graphics, javascripts, style sheets, etc into a single text file.  This 
makes it much easier to email and store.


See RFC 2557 for more info:
http://www.faqs.org/rfcs/rfc2557.html

It is currently supported by Netscape and Mozilla Thunderbird.

Frank


Hrvoje Niksic wrote:

Mauro Tortonesi [EMAIL PROTECTED] writes:



On Saturday 09 July 2005 10:34 am, Abdurrahman ÇARKACIOĞLU wrote:


MS Internet Explorer can save a web page as a whole. That means all the
images,

Tables, can be saved as a file. It is called as Web Archieve, single file
(*.mht).

Does it possible for wget ?


not at the moment, but it's a planned feature for wget 2.0.



Really?  I've never heard of a .mht web archive, it seems a
Windows-only thing.


--
Frank McCown
Old Dominion University
http://www.cs.odu.edu/~fmccown


Re: Question

2005-08-09 Thread Mauro Tortonesi
On Tuesday 09 August 2005 04:37 am, Hrvoje Niksic wrote:
 Mauro Tortonesi [EMAIL PROTECTED] writes:
  On Saturday 09 July 2005 10:34 am, Abdurrahman ÇARKACIOĞLU wrote:
  MS Internet Explorer can save a web page as a whole. That means all the
  images,
 
  Tables, can be saved as a file. It is called as Web Archieve, single
  file (*.mht).
 
  Does it possible for wget ?
 
  not at the moment, but it's a planned feature for wget 2.0.

 Really?  I've never heard of a .mht web archive, it seems a
 Windows-only thing.

oops, my fault. i was in a hurry and i misunderstood what Abdurrahman was 
asking. what i wanted to say is that we talked about supporting the same html 
file download mode of firefox, in which you save all the related files in a 
directory with the same name of the document you donwloaded. i think that 
would be nice. sorry for the misunderstanding.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute for Human  Machine Cognition  http://www.ihmc.us
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: Question

2005-08-09 Thread Hrvoje Niksic
Mauro Tortonesi [EMAIL PROTECTED] writes:

 oops, my fault. i was in a hurry and i misunderstood what
 Abdurrahman was asking. what i wanted to say is that we talked about
 supporting the same html file download mode of firefox, in which you
 save all the related files in a directory with the same name of the
 document you donwloaded. i think that would be nice. sorry for the
 misunderstanding.

No problem.  Once wget -r/-p is taught to parse links on the fly
instead of expecting to find them in fixed on-disk locations, writing
to MHT should be easy.  It seems to be a MIME-like format that builds
on the existing concept of multipart/related messages.

Instead of converting links to local files, we'd convert them to
identifiers (free-form strings) defined with content-id.


Problems downloading from specific site

2005-08-09 Thread Reginaldo O. Andrade




Hi, list!


 I would like to friendly offer a challenge to you. Can you download
something from the site www.babene.ru using wget? I always
receive the message "ERROR 403: Forbidden", but using Firefox or IE, I
download the pictures without any problem. I already tried some
user-agent strings, but without success.


 Thanks in advance.


Reginald0




Re: Problems downloading from specific site

2005-08-09 Thread Jochen Roderburg
Zitat von Reginaldo O. Andrade [EMAIL PROTECTED]:

I would like to friendly offer a challenge to you. Can you download
 something from the site www.babene.ru using wget? I always receive the
 message ERROR 403: Forbidden, but using Firefox or IE, I download the
 pictures without any problem. I already tried some user-agent strings,
 but without success.

Not an uncommon problem ;-)
They check the referer, which a browser usually sends and which points to the
page you are coming from.  You can do the following with wget:

wget --referer=http://www.babene.ru/  http://www.babene.ru/.

Best regards,
J.Roderburg