Re: Wget 1.11.3 - case sensetivity and URLs

2008-06-13 Thread Steven M. Schweda
   In the VMS world, where file name case may matter, but usually
doesn't, the normal scheme is to preserve case when creating files, but
to do case-insensitive comparisons on file names.

From Tony Lewis:

 To have the effect that Allan seeks, I think the option would have to
 convert all URIs to lower case at an appropriate point in the process.

   I think that that's the wrong way to look at it.  Implementation
details like name hashing may also need to be adjusted, but this
shouldn't be too hard.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Looking for 1.9.1 user manual

2008-04-01 Thread Steven M. Schweda
From: Kevin.Low

 What I'm looking for today is simply, the 1.9.1 user manual.  [...]

   Do you seek anything which is not part of the usual source kit(s), as
seen, for example, at:

  http://ftp.gnu.org/gnu/wget/

?  (Pick a version, any version...)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: FW: cannot log on to Oracle portal/apache - full request - ignore pevious

2008-03-27 Thread Steven M. Schweda
From: Kevin.Low

 [...]  I think knowing gcc 3.2 is not the culprit will help.

   It probably would, but we don't know that.  GCC 3.2 seems to date
back to around August 2002, and there were also 3.2.1, 3.2.2, and 3.2.3
over the next several months, so it's certainly pretty old, and it was
probably not entirely defect-free.

   With a transcript showing what happened, someone might be able to
assign blame (always the first and most important step in problem
resolution).  After that, many things are possible.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Toward a 1.11.1 release

2008-03-20 Thread Steven M. Schweda
 [...]  Is it even useful to _do_ prereleases?

   I was waiting for the version which integrated the (previously
suggested) VMS-related changes.  (There are some generic FTP-related
fixes hidden among the VMS-related ones, too, of course.)

   Perhaps the Summer of Code thing will turn up someone with interests
broader than Linux.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: need help

2008-03-13 Thread Steven M. Schweda
From: Gary Lubrani

   Not the most descriptive subject I've ever seen.

 checking for C compiler default output file name...
 configure: error: C compiler cannot create executables

   Apparently your C compiler is not working as expected.

 See `config.log' for more details.

   Well?  Any clues there?  Are you working in a directory where you
have write permission?  Can you compile a simple test program?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: how to parse a webpage to download links of certain type?

2008-03-09 Thread Steven M. Schweda
From: shirish

 [...] not directories [...]

alp $ wget -h
[...]
Directories:
  -nd, --no-directories   don't create directories.
[...]

   Sounds as if it may be worth a try.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Wget continue option and buggy webserver

2008-02-19 Thread Steven M. Schweda
From: Charles

 In wget 1.10, [...]

   Have you tried this in something like a current release (1.11, or
even 1.10.2)?

  http://ftp.gnu.org/gnu/wget/

 [...] but for some reason (buggy server), [...]

   How should wget know that it's getting a bogus error from your buggy
server, and not getting a valid error from a working server?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: seg fault ~30G

2007-12-24 Thread Steven M. Schweda
From: Hunter

 I'm getting a seg fault anytime I approach 30G in transfer with wget.
 
 I did a google search, but didn't see a resolution.  Is there one I
 simply cannot find?

   It's hard to say.  I'll tell you what I can't find here, and that's a
useful problem report, which would include things like the wget version
(wget -V), the OS you're using and its version, and the actual wget
command you used (and its output).  As usual, adding -d to the command
might be informative.  In a case where the program explodes, a traceback
showing where it was when it died could also be helpful.  (Not knowing
your OS makes it hard to suggest how to get a traceback.)  Evidence that
you have adequate free disk space could be reassuring, too.

   There isn't anything magic about 30G (as there is about, say, 2G or
4G), so I'd guess that it'd more likely be a problem in your environment
than in wget, but with the available evidence, that is only a guess.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: gzip question

2007-12-19 Thread Steven M. Schweda
From: Christopher Eastwood

 Does wget automatically decompress gzip compressed files?

   I don't think so.  Have you any evidence that it does this?  (Wget
version?  OS?  Example with transcript?)

   Is there a
 way to get wget NOT to decompress gzip cpmpressed files, but to download
 them as the gzipped file?

   Just specify the gzip-compressed file, so far as I know.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: gzip question

2007-12-19 Thread Steven M. Schweda
From: Christopher Eastwood

 wget --header=3D'Accept-Encoding: gzip, deflate' http://{gzippedcontent}

   Doctor, it hurts when I do this.
   Don't do that.

   What does it do without --header='Accept-Encoding: gzip, deflate'?

 [...] (Wget version?  OS?  Example with transcript?)

   Still waiting for those data.  Also, when I say Example, I normally
mean An actual example, that is, one which can be tested and verified.

   Adding -d to the wget command can also be informative.

   SMS.


Re: Avoiding DoS fame *** Please cc me (non-subscriber)

2007-11-18 Thread Steven M. Schweda
From: Ezequiel Garzón Lucero

 Could anybody tell me the default value for the -w option. Based on
 wget's speed, I imagine it's not even 1, right?

   Zero, I assume.

 But then, how come
 wget users are not flagged as DoS offenders (at least not all the
 time)?

   Some users do not always ask for recursion through an entire site. 
My most frequent use of wget is to fetch a single file.  Recursion,
while sometimes useful, is not particularly common for me.

   Also, I'm not able to disable (or even greatly inconvenience) a
server whose bandwidth is greater than mine.  With my limited (DSL)
bandwidth, that leaves much of the world safe from an attack by me.

 [...]  does anybody
 know what are the standard thresholds for repeated requests?

   I'd say that it depends on the target of the requests.  If I see
annoying stuff in my Web or FTP server logs, I complain to the ISP for
the pest.  If it recurs, I block that IP address.  Most serious
denial-of-service attacks use more than one attacker.  A single wget
user can't do very much harm.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: .1, .2 before suffix rather than after

2007-11-04 Thread Steven M. Schweda
   I don't care particularly how this stuff works, but if you'd like to
do me a favor, please make sure, whatever the final scheme is, that it's
easy to add the #ifdef for VMS to bypass the whole mess, because the
file version numbers on VMS obviate it.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: RFE: run-time change of limit-rate multi-stream download

2007-11-01 Thread Steven M. Schweda
From: L Walsh

 Say one runs the first wget.  Lets say it is a simple 1-DVD download.
 Then you start a 2nd download of another DVD.  Instead of 2 copies
 of wget running and competing with each other, what if the 2nd copy
 told the 1st copy about the 2nd download, and the 2nd download
 was 'enqueued' in a 'line' behind 1st.

   Perhaps you need an operating system.  On VMS, one could create a
wget-specific batch queue, set its job limit to one, and submit all the
non-compete wget jobs to it.  The queue manager would run the submitted
jobs one at a time, first-come=first-served, with the terminal output
logged to a file (of your choice).  If you ask (SUBMIT /NOTIFY), you can
get a message broadcast to your terminal(s) when a job ends.

  http://h71000.www7.hp.com/index.html

(Where would you like to put the axle on that new wheel?)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Using wget through FTP proxy server

2007-10-24 Thread Steven M. Schweda
From: Alan Watt

 I'm using wget version 1.9.1 for Solaris 8 (SPARC). [...]

   I don't deal with proxies, so I don't know much about this, but you
might do better with the current released version, 1.10.2.  I don't know
if a suitable binary kit is generally available, but if you can't find
one, and you can't build it from the source, I can build one on Solaris
10, if you think that that might be useful.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: More portability stuff [Re: gettext configuration]

2007-10-23 Thread Steven M. Schweda
From: Micah Cowan [EMAIL PROTECTED]

 Next problem on Tru64:
  [...]
  ld:
  Unresolved:
  siggetmask
 
We ain't go no siggetmask().  None on VMS (out as far as V8.3),
 either, should I ever get so far.
 
 siggetmask is an obsolete BSDism; POSIX has the sigprocmask function,
 which we should prefer. We should also do feature-testing, and not
 assume there's a portable way to block/unblock signals.

   Note that sigprocmask() does appear on VMS, but apparently not until
V8.2, which is ahead of many users (including me, in part).  More
portability would be better in this region.  Can't sigsetmask() or
sigblock() do the same job if you tell them not to change anything?

   SMS.


Re: wget -o question

2007-10-01 Thread Steven M. Schweda
From: Micah Cowan

 But, since any specific transaction is unlikely to take such a long
 time, the spread of the run is easily deduced by the start and end
 times, and, in the unlikely event of multiple days, counting time
 regressions.

   And if the pages in books were all numbered 1, 2, 3, 4, 5, 6, 7, 8,
9, 0, 1, 2, 3, ..., the reader could easily deduce the actual number for
any page, but most folks find it more convenient when all the necessary
data are right there in one place.

   But hey.  You're the boss.

   SMS.


Re: wget -o question

2007-09-30 Thread Steven M. Schweda
From: Micah Cowan

  -  tms = time_str (NULL);
  +  tms = datetime_str (NULL);

 Does anyone think there's any general usefulness for this sort of
 thing?

   I don't care much, but it seems like a fairly harmless change with
some benefit.  Of course, I use an OS where a directory listing which
shows date and time does so using a consistent and constant format,
independent of the age of a file, so I may be biased.

 Though if I were considering such a change, I'd probably just have wget
 mention the date at the start of its run, rather than repeat it for each
 transaction. Obviously wouldn't be a high-priority change... :)

   That sounds reasonable, except for a job which begins shortly before
midnight.  I'd say that it makes more sense to do it the same way every
time.  Otherwise, why bother displaying the hour every time, when it
changes so seldom?  Or the minute?  Eleven bytes more per file in the
log doesn't seem to me to be a big price to pay for consistent
simplicity.  Or you could let the victim specify a strptime() format
string, and satisfy everyone.  Personally, I'd just change time_str() to
datetime_str() in a couple of places.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Mirroring redirected web sites

2007-08-21 Thread Steven M. Schweda
From: Theo Wollenleben

 [...]  For a single file I also tried `wget -N -O
 local_copy_of_file'. Apparently Wget doesn't check the timestamp of
 `local_copy_of_file', so it doesn't work either.  [...]

   The implementation of -O defeats -N (among other options).  Look
around at http://www.mail-archive.com/wget@sunsite.dk/ for details.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: VMS support/getpass [Re: Gnulib getpass, and wget password prompting]

2007-08-10 Thread Steven M. Schweda
From: Micah Cowan [EMAIL PROTECTED]

 My preference would be to use getpass, which seems suitably abstracted
 already, and modify it as needed to support VMS. Hopefully, gnulib
 upstream would be interested in those changes (worth checking, at the
 very least), and it can be merged back up.

   I suppose that that wouldn't be _so_ terrible.  I would probably have
made the general/OS-specific split somewhere else, but then we'd most
likely be arguing about /dev/tty v. SYS$COMMAND, or something else, so
it's probably not really worse this way.  I'll relax and wait for things
to deteriorate.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: VMS support/getpass [Re: Gnulib getpass, and wget password prompting]

2007-08-10 Thread Steven M. Schweda
From: Tony Lewis

 I think you should give Micah the benefit of the doubt.  [...]

  Am I complaining too much again?  He's not doing everything I want
before I want it done, but that's not unusual.  I think that I'm still
complaining _to_ him, not _about_ him.  That was the intent, anyway.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Gnulib getpass, and wget password prompting

2007-08-08 Thread Steven M. Schweda
From: Micah Cowan

 [...]
 Gnulib actually has quite a large number of modules designed for
 portability; I imagine we could benefit from several of them.

   Well, yeah, where portability is limited to various UNIX-like
systems and Windows.  As I said
(http://www.mail-archive.com/wget@sunsite.dk/msg10077.html;), it's all
useless on VMS, so I'd prefer some kind of easier-to-deal-with level of
abstraction.

   If you're completely disinterested in (or hostile to) having this
program run on VMS, we could save a lot of my time by declaring it now.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Average download throughput using Wget

2007-08-07 Thread Steven M. Schweda
From: sankalp_karpe

 [...] Wget (windows version) [...]

   windows version does not reveal the wget version.  The output from
wget -V might.

 (i) [...]

   The final speed reported ((118.64 KB/s)) should be the average
speed for the whole download, that is, the full byte count divided by
the full download time.

 (ii) [...]

   The final speed reported ((118.64 KB/s)) should be what you want,
and it's unlikely that any simple calculation using the intermediate
rates will give you what you want.

   Some algebra would be helpful to explain why not.  An old related
problem looks like this:

   A motorist is making a trip of 100km.  After traveling for one hour,
he notices that he has gone only 50km.  How fast does he need to go for
the next 50km to get an average speed for the whole trip of 100 km/h?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: FTP OS-dependence, and new FTP RFC

2007-08-04 Thread Steven M. Schweda
From: Hrvoje Niksic

 I agree that string-of-CWDs would be better than the current solution.

   Well, that's good news.  See, for example, the discussion around:
  http://www.mail-archive.com/wget@sunsite.dk/msg08233.html
Also:
  http://www.mail-archive.com/wget@sunsite.dk/msg08447.html



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Wget 1.10.2 does not continue download when file in a subdirectory

2007-07-28 Thread Steven M. Schweda
From: Martin MOKREJÅ

  I think the following happens due to a bug in wget unable to look
 into a subdirectory for the file to be restarted in download:
 [...]

   Is this the same problem as this?:

  http://www.mail-archive.com/wget@sunsite.dk/msg09707.html



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: patch: prompt for password

2007-07-25 Thread Steven M. Schweda
From: Matthew Woehlke

 [...]
 +#include termios.h /* FIXME probably not portable? */
 [...]

   This would certainly be a problem on VMS, which has its own terminal
handling scheme, and no support for termios.

   Rather than installing a load of UNIX-specific (probably not
portable) code into the middle of an otherwise fairly portable code
segment, why not create a couple of functions, like, say,
terminal_echo_disable() and terminal_chars_restore(), segregate your
implementation of them into some UNIX-specific place, and let the rest
of us supply our own?  Or, for a real adventure, you could look at some
considerably more portable program (like Info-ZIP [Un]Zip or Kermit),
and see how more experienced people have handled this problem, and then
do all the work yourself.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Problem with combinations of the -O , -p, and -k parameters in wget

2007-07-22 Thread Steven M. Schweda
From: Michiel de Boer

 [...] Therefore I use -O to write to a more sensible name.  [...]

   Unfortunately, -O does not do name conversion, it simply directs
all the program output to a specified file, and this causes bad behavior
when -O is combined with many other options.  Use the Search feature
at http://www.mail-archive.com/wget@sunsite.dk/ (for -O) to find many
similar complaints involving -O.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: ignoring robots.txt

2007-07-18 Thread Steven M. Schweda
From: Josh Williams

 As far as I can tell, there's nothing in the man page about it.

   It's pretty well hidden.

  -e robots=off

At this point, I normally just grind my teeth instead of complaining
about the differences between the command-line options and the commands
in the .wgetrc start-up file.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: bug and patch: blank spaces in filenames causes looping

2007-07-06 Thread Steven M. Schweda
From various:

 [...]
char filecopy[2048];
if (file[0] != '') {
  sprintf(filecopy, \%.2047s\, file);
} else {
  strncpy(filecopy, file, 2047);
}
 [...]
 It should be:
 
  sprintf(filecopy, \%.2045s\, file);
 [...]

   I'll admit to being old and grumpy, but am I the only one who
shudders when one small code segment contains 2048, 2047, and 2045
as separate, independent literal constants, instead of using a macro, or
sizeof, or something which would let the next fellow change one buffer
size in one place, instead of hunting all over the code looking for
every 20xx which might be related?

   Just a thought.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Downloading 7GB-file via FTP

2007-07-05 Thread Steven M. Schweda
From veejar:

 I use wgetpro-0.1.3_1 on FreeBSD 6.2 RELEASE.

   Great.  How is that related to normal wget?  On which wget version
was it based?

   The current released version of wget, 1.10.2, should have no problems
with large files, assuming that the FTP server and the local file system
have no problems with large files.

   If real wget fails, complain here.  If some other based on GNU Wget
program fails, it might make more sense to complain to the people who
wrote that program.

  http://wgetpro.sourceforge.net/   ???



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: timestamping and output document

2007-06-26 Thread Steven M. Schweda
From: purp

 Don't know if that's a known issue, [...]

   Try the Search feature at:

  http://www.mail-archive.com/wget@sunsite.dk/

For example:

  http://www.mail-archive.com/search?q=%22-O%22+%22-N%22[EMAIL PROTECTED]

where you can see several previous similar complaints, and the
explanation.

 [...]  This is GNU Wget 1.9.1.

   Why?  Wget 1.10.2 has been available since about October 2005.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: timestamping and output document

2007-06-26 Thread Steven M. Schweda
   For the record:

  http://www.mail-archive.com/search?q=%22-O%22+%22-N%22[EMAIL PROTECTED]

was actually more like:

  http://www.mail-archive.com/search?q=%22-O%22+%22-N%22l= wget at 
sunsite.dk

before it got PROTECTED.

   SMS.


Re: wget -P not working.

2007-06-20 Thread Steven M. Schweda
From: Itamar Reis Peixoto

 Anyone can fix this bug for me ?
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229744

   I don't know.  Talk to the people who broke it?

 with wget-1.10.2-3.2.1.i386.rpm works, the file was downloaded on /etc, with 
 newer versions ( 1.10.2-8.fc6.1 ) the file will be downloaded on current 
 directory

http://netenberg.com/forum/viewtopic.php?t=5430

 [...] please check the wget version that you have on your server. 
 If it is wget-1.10.2-3.3.fc5 or wget-1.10.2-7.el5 or
 wget-1.10.2-8.fc6.1, we suggest that you replace it immediately
 with an older and/or stabler version. This version
 does not honor the -P switch. 
 An alternate version that we suggest is wget-1.10.2-3.2.1 

   It sounds to me as if an older kit will solve the problem.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Problem with --reject option

2007-06-11 Thread Steven M. Schweda
From: Glenn Nieuwenhuyse

 wget -T 1 -t 1 -r --reject=robots.* [...]
 
 I would expect this not to download the robots.txt file, but still it
 does.

   Perhaps because robots.txt is a special case, and is not selected
by following links, and so is unaffected by the --reject option.

   A search for robot in the manual should reveal this:

  http://www.gnu.org/software/wget/manual/wget.html

robots = on/off
 Specify whether the norobots convention is respected by Wget,
 on by default. This switch controls both the /robots.txt and
 the nofollow aspect of the spec. See Robot Exclusion, for more
 details about this. Be sure you know what you are doing before
 turning this off.

So, adding -e robots=off to your command might help.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: problem with HTTP mirroring

2007-06-11 Thread Steven M. Schweda
From: Alexander Simon


 When calling
 wget -A.pdf,.PDF,.doc,.DOC,.java,.class,.JAVA,.CLASS,.zip,.ZIP -m -nH
 -nd -l1 --header=Accept-language: de, en;q=0.8 
 http://wwwseidl.informatik.tu-muenchen.de/lehre/vorlesungen/SS07/info2/index.php;;
 , wget should load some PDF files (i1.pdf, i2.pdf, i3.pdf, ...) that are 
 linked on this site.

   As I read the HTML, i1.pdf appears to be on a different server:

  a href=http://www2.in.tum.de/~seidl/Courses/SS2007/i1.pdf;PDF/a

Perhaps this option would help:

  -H,  --span-hostsgo to foreign hosts when recursive.


   wget -h shows some other potentially useful options under
Recursive accept/reject:

  -D,  --domains=LIST  comma-separated list of accepted domains.
   --exclude-domains=LIST  comma-separated list of rejected domains.
[...]



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Recursive function does not work with -O

2007-06-08 Thread Steven M. Schweda
From: Gekko

 [...] returns the first page it downloads only, and does not continue
 to  download the other links, while omitting the -O - allows the
 downloading to work.

   That's right.  In recursive HTTP operation, wget expects to read its
own output files to find the links to follow.  It's not designed to read
its one-and-only -O output file to find links while it is writing that
file.  It would not be impossible to arrange this sort of thing, but it
would be complicated, and it's not obvious that it would be particularly
useful.  Why would you want to do this?  It should be relatively easy to
get the same effect with a normal wget -r command and a shell script
to go through the resulting files and cat them into a single mess.  I
still don't know why you'd want to do it, however.

   Thanks for including the wget and OS info in the question.  It's a
rare thing to get all the useful info around here.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: NULL ptr dereferences found with Calysto static checker

2007-06-06 Thread Steven M. Schweda
From: Domagoj Babic

 + wget-1.10.2/src/utils.c:287 // localtime can return NULL

   I'd say that it's sloppy code, but the probability of seeing a
failure in the real world, while not zero, must be vanishingly small. 



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: can someone help me with wget?

2007-06-03 Thread Steven M. Schweda
From: shades13

 I have been having problems [...]

   1. People with real names tend to get more respect than others.

   2.  As usual, it might help to know which wget version you're using
on which operating system.

   I don't see any links on http://www.talcomic.com/;.  (Nor much of
anything else.)

   Which links did you see on http://www.cad-comic.com/comic.php; which
wget was not following?  Hint: 'script
src=http://www.google-analytics.com/urchin.js; type=text/javascript'
is not a link which wget will follow.

   wget -d ... may give you a better idea of what wget is doing.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Return Value 2

2007-06-01 Thread Steven M. Schweda
From: Robert Denton

 Can you glean from the code what would cause an exit value 0?

   Zero is the success code, which is the default, so anything which
does not set 1 or 2 should leave 0.

 0: -I have never seen this one -

   What do you see when everything goes right?  (Or do your jobs always
fail in some way?)  Along the same line, what, exactly, is one of my
devices?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Return Value 2

2007-05-31 Thread Steven M. Schweda
From: Robert Denton

 So, can I take this to mean that 'wget exited with value 2' always
 indicates a problem with switches/options?

   No bets, but a search through the code for exit suggests that
that's approximately true, if you count a problem reading or
interpreting .wgetrc as a problem with switches/options,

 Where can I get a full list of exit values and what they indicate?

   I know of none.  Skimming the exit search results suggests that
your choices may be limited to 0, 1, and 2.  (From which one might
deduce that the designer was not a fan of AIX or VMS, where error
messages and/or exit status values tend to be more informative.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Wget 1.10.2 + FC6 + FTP mirroring in root folder

2007-05-30 Thread Steven M. Schweda
From: Richard Dale

 [...] An upgrade to the latest revision of 1.10.2 exhibited the
 problems and a downgrade avoided the problems.

   Do these apparently different variants of wget version 1.10.2 say
different things in the wget -V report?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Crash

2007-05-29 Thread Steven M. Schweda
From: Adrian Sandor

  Apparently there's more than a little code in src/cookies.c which is
  not ready for NULL values in the attr and value members of the
  cookie structure.
 
 Does that mean wget is buggy or does brinkster break the cookie specification?

   Wget is certainly buggy, but as I said, I don't do much with cookies,
so I don't know if missing/null values are legal or not.

 I tried it, and it solves my problem.

   Glad to hear it.  Thanks for the report.

  Will there be an official wget
 patch for this?

   Ask the wget maintainer.  I can't even get the changes _I_ want into
the official code.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Crash

2007-05-28 Thread Steven M. Schweda
From: Adrian Sandor

 [...]
 Stored cookie www14.brinkster.com -1 (ANY) /aditsu/ session insecure 
 [expiry none] (null) (null)
 [...]
 Segmentation fault

   Apparently there's more than a little code in src/cookies.c which is
not ready for NULL values in the attr and value members of the
cookie structure.  (It's more luck than design that you get (null)
in the debug message, instead of it blowing up right there.)

   Double your money back if you're not completely satisfied, but, if
you can build from the sources, you could try this one:

  http://antinode.org/ftp/wget/wget-1_10_2c_vms/cookies.c

I don't do much with cookies, so the end cases may not be handled
correctly, but it does seem to explode less.  (If nothing else, it could
boost the ego of the next fellow who does it right.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: convert-links + output-document

2007-05-16 Thread Steven M. Schweda
From: Poppa Pump

 [...] I cannot rename the
 file after I run the wget command with convert-links
 because I need a unique name before the download.

   Can you create a (uniquely named?) temporary directory, do the work
in there (without --output-document), and rename/move the results later?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Loading cookies that were set by Javascript

2007-05-15 Thread Steven M. Schweda
From: Poppa Pump

 [...] but these are set using Javascript. [...]

   Wget doesn't do JavaScript.  I suspect that you're doomed.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: --page-requisites and --post-data options

2007-05-11 Thread Steven M. Schweda
From aulaulau:

 [...] you will use --page-requisites and --post-data options together.

   Probably not something anyone considered.

 Is there a way to do it with wget options ?

   Perhaps use --post-data to get the primary page, and then use -i
primary_page (perhaps with -F, perhaps with --page-requisites) to get
the other pieces?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: simple wget question

2007-05-11 Thread Steven M. Schweda
From: R Kimber

 What I'm trying to download is what I might express as:
 
 http://www.stirling.gov.uk/*.pdf

   At last.

 but I guess that's not possible.

   In general, it's not.  FTP servers often support wildcards.  HTTP
servers do not.  Generally, an HTTP server will not give you a list of
all its files the way an FTP server often will, which is why I asked (so
long ago) If there's a Web page which has links to all of them, [...].

   I just wondered if it was possible
 for wget to filter out everything except *.pdf - i.e. wget would look
 at a site, or a directory on a site, and just accept those files that
 match a pattern.

   Wget has options for this, as suggested before (wget -h):

[...]
Recursive accept/reject:
  -A,  --accept=LIST   comma-separated list of accepted extensions.
  -R,  --reject=LIST   comma-separated list of rejected extensions.
[...]

but, like many of us, it's not psychic.  It needs explict URLs or else
instructions (-r) to follow links which it sees in the pages it sucks
down.  If you don't have a list of the URLs you want, and you don't have
URLs for one or more Web pages which contain links to the items you
want, then you're probably out of luck.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: sending Post Data and files

2007-05-09 Thread Steven M. Schweda
 1.) How can I send Post Data with Line Breaks? I can not press enter
 and \n or \r or \r\n dont work...

   Put the data into a file, and use --post-file=FILE_NAME?

 Is it possible to send a File with a name?

   Other than with --post-file=FILE_NAME?

 Is it possible to send two files?

   I believe not.  At least not using --post-file.

 [...] Input type=file [...]

   I've never tried that, so I know even less about that than I do about
--post-file (which I did use once).



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: sending Post Data and files

2007-05-09 Thread Steven M. Schweda
From: Tony Lewis

 You don't need a line break because parameters are separated by
 ampersands; a=1b=2

   You need a line break if you need a line break, as when you wish to
set a variable to multiple lines of text.  For example:

  subject=Test messagemsg_text=   This is a test.
  This is only a test.
  If this had been an actual message,
  It would have been delivered appropriately.

Put that stuff into a file named, say, test.dat, and specify
--post-file=test.dat.  The server should then set the variable
subject to Test message, and msg_text to:

 This is a test.
  This is only a test.
  If this had been an actual message,
  It would have been delivered appropriately.

(with the line breaks).



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: simple wget question

2007-05-06 Thread Steven M. Schweda
From: R Kimber

 If I have a series of files such as
 
 http://www.stirling.gov.uk/elections07abcd.pdf
 http://www.stirling.gov.uk/elections07efg.pdf
 http://www.stirling.gov.uk/elections07gfead.pdf
  
 etc
 
 is there a single wget command that would download them all, or would I
 need to do each one separately?

   It depends.  As usual, it might help to know your wget version and
operating system, but in this case, a more immediate mystery would be
what you mean by them all, and how one would know which such files
exist.

   If there's a Web page which has links to all of them, then you could
use a recursive download starting with that page.  Look through the
output from wget -h, paying particular attention to the sections
Recursive download and Recursive accept/reject.  If there's no such
Web page, then how would wget be able to divine the existence of these
files?

   If you're running something older than version 1.10.2, you might try
getting the current released version first.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget suggestion

2007-05-03 Thread Steven M. Schweda
From: Robert La Ferla

 There needs to be a way to tell wget to reject all domains EXCEPT those
 that are accepted. This should include subdomains. Ie. I just want to
 download www.mydomain.com and cache.mydomain.com. I thought the
 --domains option would work this way but it doesn't.

   Can you provide any evidence that it doesn't?  Useful info might
include the wget version, your OS and version, the command you used, and
the results you got.  Adding -d to the command often reveals more than
not using it.  A real example is usually more useful than a fictional
example.

   If you can't exhibit the actual failure and explain how to reproduce
it, you might do better with a psychic hot-line, as most of us are not
skilled in remote viewing.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget suggestion

2007-05-03 Thread Steven M. Schweda
From: Robert La Ferla

 GNU Wget 1.10.2

   Ok.  Running on what?

 Capture this sub-site and not the rest of the site so that you can  
 view it locally.  i.e.  just www.boston.com and cache.boston.com
 
 http://www.boston.com/ae/food/gallery/cheap_eats/

   What is a sub-site?  Do you mean this page, or this page and all
the pages to which it links, excluding off-site pages, or what?

   I have a better idea.  Read this again:

 Can you provide any evidence that it doesn't?  Useful info might
  include the wget version, your OS and version, the command you  
  used, and
  the results you got.  Adding -d to the command often reveals more  
  than
  not using it.  A real example is usually more useful than a fictional
  example.
 
 If you can't exhibit the actual failure and explain how to  
  reproduce
  it, you might do better with a psychic hot-line, as most of us are not
  skilled in remote viewing.

   You might also consider phrasing your demands as polite requests in
future.  Phrases like I would like to learn how to, or Can you
explain how to can be useful for this.  Even better would be, I tried
this command insert command here, and I got this result insert result
here, but I was expecting something more like this insert expected
result here, and I definitely didn't expect this insert undesirable
result here.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: How can I compile a list of URLs matching a pattern?

2007-05-01 Thread Steven M. Schweda
From: Karim Ali

 [...] I want to traverse a given site,
 but only retrieve the URL's that matche a particular pattern.
 [...]
 [...]  I'd like it if wget
 would just return the URL's it finds during its recursive
 traversal, but not return the data.  [...]

   If wget is to traverse a given site, it needs to fetch the HTML
documents from the server so it can search them for links to other HTML
documents.  How should it do this if it does not return the data?

   Have you looked at these?:

  -A,  --accept=LIST   comma-separated list of accepted extensions.

   --spider  don't download anything.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: feature suggestion - make option to use system date instead multiple version number

2007-04-27 Thread Steven M. Schweda
From: Alvydas

 I guess it would relatively easy and quite useful to add an option
 to name file.20070426142800 file.20070426142955 ... instead just numbers.

   The relevant code is in src/utils.c: unique_name(), and should be
easy enough to change.  On a fast system, however, one-second resolution
(or multiple users) could lead to non-unique names, so it would be wise
to do something a little more like the existing code, but with a
date-time string added in.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: q: wget -r -Apdf http:// ..

2007-04-19 Thread Steven M. Schweda
From: b

 GNU Wget 1.10+devel

   Wget 1.10.2 is the current released version, but that probably won't
help you with this problem.

  http://www.gnu.org/software/wget/wget.html

But a close look at the output from wget might help.

   These files may be PDF files, but the names in the links (with all
the query data, ?id=xxx) are XXX.pdf?id=xxx, not
XXX.pdf, so your -Apdf causes this behavior:

[...]
20:59:20 (145.31 KB/s) - `NewZert/isht.comdirect.de/html/cer/pdf/ML-RAEZ_1PFlyer
[1].pdf!id=6dfbfc0f556ec1b78536ae411ee684f19110b63c568a53b046721f62f6' saved
 [905090]

Removing NewZert/isht.comdirect.de/html/cer/pdf/ML-RAEZ_1PFlyer[1].pdf!id=6dfbfc
0f556ec1b78536ae411ee684f19110b63c568a53b046721f62f6 since it should be reje
cted.
[...]

and that Removing [...] is not what you want.  (888 is not pdf, so
it's doing what you asked it to do.)

   I think that you'll need to remove the -Apdf from your wget
command.  When wget is finished, you can remove the non-PDF files and/or
rename the PDF files to remove the query data from the file names, but I
don't know how to make wget do the whole job without any help.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Bug using recursive get and stdout

2007-04-17 Thread Steven M. Schweda
   A quick search at http://www.mail-archive.com/wget@sunsite.dk/; for
-O found:

  http://www.mail-archive.com/wget@sunsite.dk/msg08746.html
  http://www.mail-archive.com/wget@sunsite.dk/msg08748.html

   The way -O is implemented, there are all kinds of things which are
incompatible with it, -r among them.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: [enhancement request] goals to programmatically parse output ( -o or -a)

2007-04-14 Thread Steven M. Schweda
From: Thomas Harding

 - all outputs lines will uses semicolumn (;) separated fields

   And this won't cause confusion if there's a semi-colon in a file
name?

 Wy wget version is GNU Wget 1.9.1 (and prefer to not change, causes it
 is used by apt-methods...)

   And the current _released_ version is 1.10.2, so who else will be
interested in changes to 1.9.1?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Feature suggestion for WGET

2007-04-11 Thread Steven M. Schweda
From: Daniel Clarke - JAS Worldwide

 I'd like to suggest a feature for WGET:  the ability to download a file
 and then delete it afterwards.

   Assuming that you'd like to delete it on the FTP server, and not
locally, the basics of this seem pretty easy to add:

   0. Documentation.

   1. Some kind of command-line option to control the new
source-delete feature (or whatever you decide to call it).

   2. src/ftp-basic.c: Add a new function, ftp_dele() (very nearly
ftp_retr() converted to send DELE instead of RETR, and to expect a
2xx success response instead of a 1xx).

   3. src/ftp.h: Add function prototype for ftp_dele().

   4. src/ftp.c: In getftp(), if ftp_retr() succeeds, and the new
source-delete option is enabled, call the new ftp_dele().

   5.  src/ftp.c: Add a bunch of new debug and error message code to
deal with ftp_dele() activity and failures.

   I've done steps 2, 3, and 4 in my experimental code, and the basic
functionality seems to be there.  If anyone is eager to do the whole job
and wants to see my rough code, just let me know.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Suggesting Feature: Download anything newer than...

2007-04-08 Thread Steven M. Schweda
 but if you're going to add
 --not-before, you might as well add --not-after too.

   I'd suggest --before () and --since (=), but I may be prejudiced by
exposure to VMS, where many commands use similar qualifiers.  (But if
you _like_ longer, more complicated option names, ...)

 Me add?!?  ;-)

   It's that, or wait for someone else to do it.  You decide.

 Is adding such features being worked on by someone - or
 should I start cramming C and RFCs, and *try to* make
 a patch for it myself?

   I know that _I_ wasn't working on them.  I don't think that you need
any RFC's for this, mostly just code theft from other parts of the
program.  It could be an educational experience.  (I hate educational
experiences.)  If my experience is any guide, getting any changes into
the main development code may be more of a challenge than getting those
changes to work properly.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Using a variable to get files in sequence

2007-04-04 Thread Steven M. Schweda
From: Williamts99

 Thanks for your response, I am using Linux.

   You might benefit from studying a shell scriping primer, like, for
example:

http://developer.apple.com/documentation/OpenSource/Conceptual/ShellScripting/index.html

Or try asking Google to look for something like:

  linux shell scripting primer

   Roughly, I'd start with something like this:


#!/bin/sh

n_min=1
n_max=8

n=$n_min
while [ $n -le $n_max ] ; do

echo n = $n
n=` expr $n + 1 `

done



   Adjust n_min and n_max as appropriate, and replace the echo
command with an appropriate wget command.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Cannot write to auto-generated file name

2007-04-03 Thread Steven M. Schweda
From Tony Lewis:

 In which case, wget should do something reasonable (generate an error
 message, truncate the file name, etc.).  [...]

   Sadly, this is easier said than done.  Around here (VMS), the
complaint is i/o error.  I haven't tried it on a UNIX, but it could
easily be different there, too.  VMS offers a ayatem service which can
be used to parse a file specification and test it for legality, but I
don't know how you would do it elsewhere.  On some Linux system(s),
there seems to be a distictive code/message (File name too long):

  http://www.mail-archive.com/wget@sunsite.dk/msg09711.html

   Simply truncating the name would be asking for collisions, and etc.
would seem to involve actual work, especially when converting links to
local.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Using a variable to get files in sequence

2007-04-03 Thread Steven M. Schweda
From: Williamts99

 Is there any way [...] to force wget to use the wildcards?

   Sure.  You did it.  Unfortunately, there's no way to force the HTTP
server to use wildcards.

   One could probably write a script to do this sort of thing, but,
without knowing which OS you're using, it's difficult to guess exactly
how it might best be done.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Special characters in http

2007-03-31 Thread Steven M. Schweda
From: Alan Thomas

   What is happening?  [...]

   I'm no Windows expert, but, as you said, these are special
characters.  Have you tried quoting the URL?  In UNIX, apostrophes and
quotation marks are popular; in VMS, quotation marks; in Windows, at
least one of those should be effective.

   Note that you may need to use -O, because otherwise the
wget-generated output file name may be too ugly for your file system.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Wget doesn't use characters after '' when saving a URL as a filename

2007-03-28 Thread Steven M. Schweda
From: Ed

 I use wget to fetch [...]

   Would it be asking too much to see the actual command you used, and
its actual output?  We're not all psychic out here.

   As usual, it might be interesting to know which wget version is
involved here.

 Mac OS X by the way

   That's not a by the way item any more than the wget version is. 
And Mac OS X covers far too much ground, too.

 How do I get the full file name?

   It depends.  Perhaps it involves giving the whole URL to wget.
Launching into guesswork based on insufficient evidence, have you tried
quoting the URL in the command?  Your shell could be doing the
truncation at the ampersand, which is a shell-special character.  If I
could see the command you used and its output, I wouldn't need to guess.

   Didn't the stuff like [1]+  Done seem out of place?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Is it possible to log transfer times in milliseconds?

2007-03-28 Thread Steven M. Schweda
   Assuming that you're using wget version 1.10.2 (or similar), it
appears (src/ptimer.c) that the program already uses a time resolution
of a millisecond (or better), given underlying run-time library support
at that resolution.  The formatted output (retr_rate(): src/retr.c) is
limited to a form which is more convenient for most users.

   If you want the results to have any meaning, you should examine the
wget code to see exactly what is being timed (at which events the timer
starts and stops), to see if wget is measuring what you want measured. 
A term like response time is pretty vague all by itself.

   It should be easy enough to modify the formatted output code to
provide more digits than the existing code does.  (Whether these would
be _significant_ figures would depend on the underlying OS timer
resolution.)  I don't see how you could get this sort of output without
changing the code, so you'd need to decide whether you wanted to add a
command-line option (or to use some other method) to enable the new
elapsed time format, or if you just wanted to maintain a separate code
stream for a modified wget program which always uses the new format.

   Getting changes like this into the main product code stream is
someone else's decision.  If I were you, I'd expect to have to make the
changes and maintain the different code myself into the indefinite
future.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wildcards in filenames

2007-03-26 Thread Steven M. Schweda
From: Alan Thomas

 [...]  Putting /l*.htm at the end of the URL did not work:
 
 Warning: wildcards not supported in HTTP.
 
 Putting l*.htm after the URL (separated with a space) did not work
 either.

   It's not a UNIX problem, it's an HTTP problem/limitation.  You can't
ask the HTTP (Web) server to send you all the l*.htm files.  It's just
not one of the allowed requests.  You _can_ do this sort of thing with
an _FTP_ server, but not with an HTTP server.

   Wget follows hyperlinks, so if the Web page of interest here has a
bunch of links to files, and you would like it to follow only some of
them, it appears to me that the best you can get (from wget version
1.10.2) is these:

[...]
Recursive accept/reject:
  -A,  --accept=LIST   comma-separated list of accepted extensions.
  -R,  --reject=LIST   comma-separated list of rejected extensions.
[...]

which don't appear do what you want.

   You might need to suck down the first Web page, edit it locally to
remove the links you'd like not to follow, and then ask wget to start
with the modified page.  (Which sounds like a lot of work.)  If you can
get the stuff using FTP instead of HTTP, it would look a lot easier.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


wget-1.10.2 pwd/cd bug

2007-03-24 Thread Steven M. Schweda
   It's starting to look like a consensus.  A Google search for:
wget DONE_CWD
finds:

  http://www.mail-archive.com/wget@sunsite.dk/msg08741.html



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


re: Huh?...NXDOMAINS

2007-03-23 Thread Steven M. Schweda
   Around here:

[...] can't find ga13.gamesarena.com.au: Non-existent host/domain

If that's your complaint, then I don't see what wget is supposed to do
about it.  What would you like it to do, make up an address?  What's
Australian for broken link?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: ezmlm response

2007-03-14 Thread Steven M. Schweda
From: Bruce

 For beginners, *does* wget support large files greater than 2.1gb? I
 looked and looked through the docs and found nothing to confirm or deny
 this questiondoes anyone know?

   From NEWS (which is included in the source kit):

  GNU Wget NEWS -- history of user-visible changes.
  [...]
  * Changes in Wget 1.10.

  ** Downloading files larger than 2GB, sometimes referred to as large
  files, now works on systems that support them.  This includes the
  majority of modern Unixes, as well as MS Windows.
  [...]

You may expect still to have problems if the HTTP or FTP server supplies
bad file size data for large files.


 there's another problem to be resolved first -- NXDOMAINS.
 Can wget negotiate this namespace trickery, and if so how?

   Huh?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Question re web link conversions

2007-03-12 Thread Steven M. Schweda
From: Alan Thomas

   As usual, wget without a version does not adequately describe the
wget program you're using, Internet Explorer without a version does
not adequately describe the Web browser you're using, and I can only
assume that you're doing all this on some version or other of Windows.

   It might help to know which of everything you're using.  (But it
might not.)

   Using GNU Wget 1.10.2c built on VMS Alpha V7.3-2 (wget -V), I had
no such trouble with either a Mozilla or an old Netscape 3 browser.  (I
did need to rename the resulting file to something with fewer exotic
characters before I could get either browser to admit that the file
existed, but it's hard to see how that could matter much.)

   It's not obvious to me how any browser could invent a URL to which to
go Back, so my first guess is operator error, but it's even less obvious
to me how anything wget could do could cause this behavior, either.

   You might try it with Firefox or any browser with no history which
might confuse a Back button.  If there's a way to blame wget for this,
I'll be amazed.  (That has happened before, however.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Naming output file

2007-03-10 Thread Steven M. Schweda
From: Alan Thomas

 Is there a way to tell wget how to name an output file (i.e., not
 what it  is named by the site from which I am retrieving).  

  -O,  --output-document=FILEwrite documents to FILE.

   Note that using -O has some side effects which bother some users.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: file numbering bug

2007-03-08 Thread Steven M. Schweda
From: Robert Dick

 When serializing sucessive copies of a page, the serial number appears
 at the end of the extension, i.e, what should be file1.html is called
 file.html.1 I'm using wget ver. 1.10.2. with the default options on
 Windows ME ...

   I can see how that might annoy a Windows user, but it would probably
be a terrible idea to change the file name as you suggest, because it
would break any HTML links to file.html which might appear in any
other file.

   If you don't like the .nnn suffix, then you'll need to clean it up
later, or else don't download the same file twice into the same
directory.  (Or you could use VMS, where file version numbers are a
natural part of the file system, so the .nnn suffix is not needed, and
this problem does not arise.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget with -nc bails out when it finds the first file that already exists

2007-03-06 Thread Steven M. Schweda
From: Pete Redest

 wget bailed out on the first file:

 File `pure-data.cvs.sourceforge.net/pure-data/doc/tutorials/index.html'
 already there; not retrieving.
 
 Aborted

   It seems to woek for me (same commands):

[...]
File `pure-data_cvs_sourceforge_net/pure-data/doc/tutorials/footils/index.html'
already there; not retrieving.

File `pure-data_cvs_sourceforge_net/pure-data/doc/tutorials/intro/index.html' al
ready there; not retrieving.

--00:58:43--  http://pure-data.cvs.sourceforge.net/pure-data/doc/tutorials/messa
geoddness/
   = `pure-data_cvs_sourceforge_net/pure-data/doc/tutorials/messageoddn
ess/index.html'
Resolving pure-data.cvs.sourceforge.net... 66.35.250.81
Connecting to pure-data.cvs.sourceforge.net|66.35.250.81|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ = ] 2,870 --.--K/s
[...]


 [...] If this bailing-out on first already-existing file is what is
 intended, the design is deficient, and wget is less than useful. 

   I'll tell you what's less than useful, and that's a problem report
which omits significant facts, such as the program version, the system
type, the OS and version, and so on.  Around here:

alp $ wget -V
GNU Wget 1.10.2c built on VMS Alpha V7.3-2.
[...]

   If you're using anything other than wget 1.10.2, then I'd suggest
trying the current released version.  If that fails, try complaining
again (and better).



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget -O not preserving execute permissions

2007-02-28 Thread Steven M. Schweda
From Andrew Hall:

   As usual, it might help to have some basic information, like the wget
version, the system type and OS on which it's being run, and an actual
wget command.

 I notice when using wget -O that execute permissions on files are not
 preserved.

   With -O, wget opens the output file before it talks to the server,
so it doesn;t have that information at that time.  Wget (including with
-O) allows a user to fetch multiple files with one command.  Whose
file permissions would you like it to use?  -O does not work the way
many (most?) people seem to think that it does, which leads to faulty
expectations.

 So a file which on the webserver is rwxr-xr-x will be written as
 rw-r--r--

   With your umask, I'd expect that _any_ file which you can get the Web
server to send will be written with rw-r--r--.  In most cases, file
permissions on a Web server are not even available to the client.  An
FTP server is more likely to supply this kind of info.

 Is this intentional?

   I'd say it was more accidental than intentional.

 Is there a way I can preserve execute permissions?

   The easiest way might be to use FTP and not -O.  Which do you like
better after a download, mv or chmod?  And how do you know which
permissions the file had originally?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Ascii transfers

2007-02-27 Thread Steven M. Schweda
From bruce . furber:

 I want to use wget to transfer files from an IBM mainframe FTP server
 which stores files in EBCDIC to Linux on our S390X machine. 
 
 Even though I use the type=a option, the files are transferred in Binary
 (EBCDIC). The .listing files are OK in ASCII. 

   As wget 1.10.2 is written (and -d should show), specifying
;type=a will request an ASCII transfer (TYPE A) instead of the
default IMAGE transfer (TYPE I), but the standard wget code does not
process the received data properly.  That is, it does not adjust the
line endings, and it certainly does no EBCDIC-ASCII code conversion. 
(Not adjusting the line endings does allow it to do -c continuation
easily, which would be either unreliable or very difficult if the data
were processed properly upon receipt.)

   My VMS-compatible wget 1.10.2c will adjust the line endings (for a
UNIX or VMS host -- I don't care about -c), but that still won't
convert EBCDIC to ASCII.

   Of course, everything depends on what the FTP _server_ does when it
gets a request for an ASCII transfer.  Assuming that wget really _is_
requesting ASCII and you're still getting EBCDIC, then you're probably
doomed to use some external EBCDIC-ASCII code converter program.

   Example -d output showing default and type=a behavior:

alp $ wget -d  ftp://alp-l/wget/wget-1_9_1e_vms/vms_notes.txt
DEBUG output created by Wget 1.10.2c built on VMS V7.3-2.

--23:10:29--  ftp://alp-l/wget/wget-1_9_1e_vms/vms_notes.txt
   = `vms_notes.txt'
[...]
257 SYS$SYSDEVICE:[ANONYMOUS] is current directory.
done.
== TYPE I ...
-- TYPE I

200 TYPE set to IMAGE.
[...]


While, on the othet hand:

alp $ wget -d  ftp://alp-l/wget/wget-1_9_1e_vms/vms_notes.txt;type=a
DEBUG output created by Wget 1.10.2c built on VMS V7.3-2.

--23:10:11--  ftp://alp-l/wget/wget-1_9_1e_vms/vms_notes.txt;type=a
   = `vms_notes.txt'
[...]
257 SYS$SYSDEVICE:[ANONYMOUS] is current directory.
done.
== TYPE A ...
-- TYPE A

200 TYPE set to ASCII.
[...]



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget having trouble with large files

2007-02-17 Thread Steven M. Schweda
From: Niels Möller

 [...]
 I'm using wget-1.9.1, [...]

   You might try version 1.10.2, which offers large-file support.

  http://www.gnu.org/software/wget/wget.html



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget URL's

2007-02-13 Thread Steven M. Schweda
From: u01jmg3

 Hi, I just wondered if you can use wget to just visit/hit a URL rather
 than downloading anything? Regards.

   --spider?  (Assuming that by visit/hit [...] rather than
downloading anything you mean a HEAD request rather than a GET
request.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget problem with IBM Http Server2 = apache 2

2007-02-03 Thread Steven M. Schweda
   In your problem report, I see version numbers for everything but
wget.

   Does adding -d to the wget command tell you anything?

   Anything in the Web server logs?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: is there any plan about supporting different charsets?

2007-02-01 Thread Steven M. Schweda
From: Willener, Pat

 http://en.wikipedia.org/wiki/Big5

   Ok.  Thanks for the pointer.

From: Leo Jay

 the attachment is a sample .listing file.

   I don't know if anyone plans to do anything about multi-byte
characters anywhere in wget, and I know that I can't read them, but I
see no reason why the existing code (with extensions already suggested)
should not be able to handle any byte-character string you specify for a
month name, whether or not it makes any sense as byte characters.  (One
could add an array of different spellings of total, too.)

   That is, I believe that you could append your big5_months[] strings
to the existing months[] array (and add as many other sets (of twelve)
as you'd like), and then make changes something like:

[...]
   #define MONTHS_LEN (sizeof( months)/ sizeof( months[ 0]))

   for (i = 0; i  MONTHS_LEN; i++)
[...]
   if (i != MONTHS_LEN)
[...]
   month = i% 12;
[...]

   Assuming that the strings like 26+ 0xa4+ 0xeb are day numbers, it
appears that you got pretty lucky with wget's simple-minded
day_number-to-integer conversion method.  Not much work needed there.

   Note that a few bytes of storage could be saved by specifying empty
strings () instead of duplicates, where other languages look like
English.  For example:

  static const char *months[] = {
Jan, Feb, Mar, Apr, May, Jun,  /* English. */
Jul, Aug, Sep, Oct, Nov, Dec,
,,Mär, ,Mai, , /* German. */
,,,Okt, ,Dez
  };


   As for getting changes like this into the main development code, I'm
probably the wrong person to ask, as I've been trying for years to get a
set of VMS-related changes adopted with no obvious success.

   A while back, another fellow had a similar complaint about German
month names:

  http://www.mail-archive.com/wget@sunsite.dk/msg07775.html

I seem to have sent him some private e-mail, but I didn't post anything
to the forum at that time.  But it does show that there is some interest
in this problem other than yours.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: is there any plan about supporting different charsets?

2007-01-31 Thread Steven M. Schweda
From: Leo Jay

 i had already hacked the src/ftp-ls.c to meet my need before i posted
 this thread.
 but my approach is just hard coding, which i think is not a good way
 to solve this
 problem and lack of flexibility. so, i wonder if the wget developers
 have any plan to
 solve this problem. and i think their solution must be very elegant
 (at least than mine).

   Wget developers are people who develop wget.  Anyone can do it.

 and the attachment is my modification for big5 charset.
 could you please have a look at it for its correctness? thanks.

   What is a big5 charset?  I can't look for correctness until I know
what you're trying to do.  You may know what you want, but it's not
clear to me.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget error report

2007-01-30 Thread Steven M. Schweda
From: Daniele Annesi

 I think it is a Bug:
 using wget for multiple files :
 es.
 wget ftp://user:[EMAIL PROTECTED]/*.zip
 in the time of each file the seconds are set to 00

   That's not an error report.  An error report would tell the reader
which version of wget you were using (wget -v), on which system type
you were using it, and the OS version, at least.

   It would also help to know how the FTP server reports the date-times
in its listings, as that's where wget gets the information.  If the
server doesn't provide the seconds, how can wget set them?  (And of
course, without more information we can't see the date-time data for
ourselves.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: is there any plan about supporting different charsets?

2007-01-30 Thread Steven M. Schweda
From: Leo Jay

 since the responds of ftp server could be in different charsets, and
 wget can't cope with charsets other than English, i'd like to know is
 there any plan about supporting different charsets?

   Are you complaining about dates in different languages, or file names
in different character sets?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: is there any plan about supporting different charsets?

2007-01-30 Thread Steven M. Schweda
From: Leo Jay

   since the responds of ftp server could be in different charsets, and
   wget can't cope with charsets other than English, i'd like to know is
   there any plan about supporting different charsets?
 
 Are you complaining about dates in different languages, or file names
  in different character sets?
 
 i'm talking about dates in different languages.
 
 i haven't tried file names in different charsets,
 but i'm sure wget can't cope with dates in different languages.

   If you look in src/ftp-ls.c: ftp_parse_unix_ls(), you should find an
array of month names:

static const char *months[] = {
  Jan, Feb, Mar, Apr, May, Jun,
  Jul, Aug, Sep, Oct, Nov, Dec
};

If by dates in different languages you mean that non-English month
names are the only problem, then it should be fairly easy to extend this
with month names in other languages, and then change the code below (if
(i != 12), month = i;) to something a litle more complex, to handle
the new possibilities.

   If the order of the tokens also changes, then you may need to dive
into the hideously complex parsing code, and make it even more hideously
complex.  (The fellow who designed the date format(s) for ls was
obviously targeting an intelligent human audience, not another computer
program.  The order and simplicity of a VMS DIRECTORY listing shows some
evidence of actual design, and parsing such a listing is relatively
trivial, but that won't help you any.)

   I might offer a few more details, but your specification of the
problem is not complete enough to make that practical.  If you can list
a set of date forms which must be interpreted, then it might be possible
to say how hard it would be to do the job.  (I assume that there is no
actual ambiguity in the month name strings for the languages you would
like to support, but that could make the problem impossible to solve for
some languages.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Newbie Question - DNS Failure

2007-01-22 Thread Steven M. Schweda
From: Terry Babbey

  Built how?
 Installed using swinstall

   How the depot contents were built probably matters more.

 Second guess:  If DNS works for everyone else, I'd try building wget
  (preferably a current version, 1.10.2) from the source, and see if that
  makes any difference.  [...]
 
 Started to try that and got some error messages during the build. I may
 need to re-investigate.

   As usual, it might help if you showed what you did, and what happened
when you did it.  Data like which compiler (and version) could also be
useful.

   On an HP-UX 11.23 Itanium system, starting with my VMS-compatible kit
(http://antinode.org/dec/sw/wget.html;, which shouldn't matter much
here), I seemed to have no problems building using the HP C compiler,
other than getting a bunch of warnings related to socket stuff, which
seem to be harmless.  (Built using CC=cc ./configure and make.)

td176 cc -V
cc: HP C/aC++ B3910B A.06.13 [Nov 27 2006]

And I see no obvious name resolution problems:

td176 ./wget http://www.lambton.on.ca
--23:42:04--  http://www.lambton.on.ca/
   = `index.html'
Resolving www.lambton.on.ca... 192.139.190.140
Connecting to www.lambton.on.ca|192.139.190.140|:80... failed: Connection refuse
d.

d176 ./wget -V
GNU Wget 1.10.2c built on hpux11.23.
[...]

   That's on an HP TestDrive system, which is behind a restrictive
firewall, which, I assume, explains the connection problem.  (At least
it got an IP address for the name.)  And it's not the same OS version,
and who knows which patches have been applied to either system?, and so
on.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Newbie Question - DNS Failure

2007-01-20 Thread Steven M. Schweda
From: Terry Babbey

 I installed wget on a HP-UX box using the depot package.

   Great.  Which depot package?  (Anyone can make a depot package.) 
Which wget version (wget -V)?  Built how?  Running on which HP-UX
system type?  OS version?

 Resolving www.lambton.on.ca... failed: host nor service provided, or not
 known.

   First guess:  You have a DNS problem, not a wget problem.  Can any
other program on the system (Web browser, nslookup, ...) resolve names
any better?

   Second guess:  If DNS works for everyone else, I'd try building wget
(preferably a current version, 1.10.2) from the source, and see if that
makes any difference.  (Who knows what name resolver is linked in with
the program in the depot?)

   Third guess:  Try the ITRC forum for HP-UX, but you'll probably need
more info than this there, too:

   http://forums1.itrc.hp.com/service/forums/familyhome.do?familyId=117



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Possibly bug

2007-01-17 Thread Steven M. Schweda
From: Yuriy Padlyak

 Have been downloading slackware-11.0-install-dvd.iso, but It seems wget
 downloaded more then filesize and I found: 
 
 -445900K .. .. .. .. ..119%
 18.53 KB/s 
 
 in  wget-log.

   As usual, it would help if you provided some basic information. 
Which wget version (wget -V)?  On which system type?  OS and version? 
Guesswork follows.

   Wget versions before 1.10 did not support large files, and a DVD
image could easily exceed 2GB.  Negative file sizes are a common symptom
when using a small-file program with large files.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Downloading multiple pages

2007-01-17 Thread Steven M. Schweda
From: graham hadgraft

 I need some help using an application [...]

   You seem to need some help asking for help.

 wget -r -l2  -A html -X cgi-bin -D www.somewebsite.co.uk/ -P
 /home/httpd/vhosts/somewebsite.co.uk/catalogs/somewebsite/swish_site/
 http://www.somewebsite.co.uk/questions/
 
 This only index the index page of this folder. It wil not follow the
 links on the page. What would be the appropriate command to use to
 index all pages from that folder.

   Did it occur to you that it might matter which version of wget you're
using, and on which system type (and version)?  Or that it might be
difficult for someone else to guess what happens when no one else can
see the Web page which seems to be causing your trouble?  Does it
actually have links to other pages?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: SI units

2007-01-14 Thread Steven M. Schweda
From: Lars Hamren

 Download speeds are reported as K/s, where, I assume, K is short
 for kilobytes.
 
 The correct SI prefix for thousand is k, not K:
 
 http://physics.nist.gov/cuu/Units/prefixes.html

   To gain some insight on this, try a Google search for:

  k 1024

   I've seen contrary comments from people who apparently know no actual
science, and who think that they know somthing about computers, claiming
that 1000 is wrong, and that only 1024 is legitimate for k or K.

   You have my best wishes in your quest to set the world straight on
this one.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547



Re: Issue/query with the Wget

2007-01-03 Thread Steven M. Schweda
From: Manish Gupta

 Issue: when i pass a 300 MB file to wget in one shot, it willl not able to 
 download the file at the client side.

   Is this _really_ a problem, or are you only afraid that it might be a
problem?

   300MB is not a large file.  2GB (or, sometimes, 4GB) is a large file.

   The latest released wget version (1.10.2) should work with large
files on systems which support large files.

 Do wget has the feature of buffer where it is holding the stream, if it there 
 then by increasing or specifying th buffer limit, i think we can overcome the 
 issue.

   Wget writes the data to a file.  If you have the disk space, it
should work.  People often use wget to download CD and DVD image files. 
Some older wget versions (without large-file support) had some problems
with files bigger than 2GB (or 4GB, depending on the OS), but not
version 1.10.2.  Some _servers_ have problems with large files, but
those are not wget problems.

   As usual, it would help to know which version of wget you're using,
on which host system type you're using it, and the OS version there.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: re: 4 gig ceiling on wget download of wiki database. Wikipedia database being blocked?

2006-12-24 Thread Steven M. Schweda
From: Jonathan Bazemore:

 I've repeatedly tried [...]

   If it's still true that you're using wget 1.9, you can probably try
until doomsday with little chance of success.  Wget 1.9 does not support
large files.  Wget 1.10.2 does support large files.

Try the current version of wget, 1.10.2, which offers large-file
 support on many systems, possibly including your unspecified one.

   Still my advice.

   In the future, it might help if you would supply some useful
information, like the wget version you're using, and the system type
you're using it on.  Also, actual commands used and actual output which
results would be more useful than vague descriptions like consistently
breaking and will not resume.

 I've used a file splitting program to break the
 partially downloaded database file into smaller parts
 of differing size.  Here are my results: [...]

   So, what, you're messing with the partially downloaded file, and you
expect wget to figure out what to do?  Good luck.

 [...] wget (to my knowledge) doesn't do error checking
 in the file itself, it just checks remote and local
 file sizes and does a difference comparison,
 downloading the remainder if the file size is smaller
 on the client side.

   Only if it can cope with a number as big as the size of the file. 
Wget 1.9 uses 32-bit integers for file size, and that's not enough bits
for numbers over 4G.  And if you start breaking up the partially
downloaded file, what's it supposed to use for the size of the data
already downloaded?

 Wikipedia doesn't have tech support, [...]

   Perhaps because they'd get too many questions like this one too many
times.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: problem at 4 gigabyte mark downloading wikipedia database file.

2006-12-22 Thread Steven M. Schweda
From: Jonathan Bazemore:

 [...] I am using wget 1.9 [...] up to about the 4 gig mark [...]

   Try the current version of wget, 1.10.2, which offers large-file
support on many systems, possibly including your unspecified one.

  http://www.gnu.org/software/wget/wget.html



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Wget timestamping is flawed across timezones

2006-12-21 Thread Steven M. Schweda
From: Remko Scharroo:

 Can this be fixed?

   Of course it can be fixed, but someone will need to fix it, which
would involve defining the user interface and adding the code to do the
actual time offset.  I assume that the user will need to specify the
offset.

   For an indication of what could be done, you might look for
WGET_TIMEZONE_DIFFERENTIAL in my VMS-adapted src/ftp-ls.c:
ftp_parse_vms_ls().

  http://antinode.org/dec/sw/wget.html

   This is a common problem on VMS systems, which normally (sadly), use
local time instead of, say, UTC.  One result of this is that FTP servers
on VMS tend to provide file date-times in the server's local time.

   I chose to add an environment variable (a VMS logical name on a VMS
system) as the user interface for code simplicity (less work for me),
and partly because VMS uses a similar logical name
(SYS$TIMEZONE_DIFFERENTIAL) to specify the offset from UTC to local
time, so the concept would already be familiar to a VMS user.

   I use WGET_TIMEZONE_DIFFERENTIAL in the code only for a VMS FTP
server, but I assume that it could easily be adapted to the other
ftp_parse*_ls() functions.  (Or a new command-line option could be used
to specify the offset.)  When I did the work, I probably didn't consider
the possibility that any non-VMS FTP servers would provide file
date-times in non-UTC.  Otherwise I might have made it more general.

   Trying to get my VMS-related changes into the main Wget development
stream has been sufficiently unsuccessful that I don't spend much time
working on adding features and fixes which are not trivially easy and
which I don't actually need myself.  But I wouldn't try to discourage
anyone else.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: FTP SYST NULL dereferencing crash (found by someone else)

2006-12-19 Thread Steven M. Schweda
From: Ulf Harnhammar [EMAIL PROTECTED]

+  if (request == NULL)
+{
+  xfree (respline);
+  return FTPSRVERR;
+}

   Well, yeah, if you prefer returning an error code to trying a little
harder.  I prefer my change:

if (request == NULL)
  *server_type = ST_OTHER;

Why punish the user when the FTP server behaves badly?



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: trouble loading and installing wget

2006-12-11 Thread Steven M. Schweda
From: Siddiqui, Kashif

 I'm trying to install wget on my itanium 11.23 system [...]

   I assume that that's HP-UX 11.23, as in:

[EMAIL PROTECTED] uname -a
HP-UX td176 B.11.23 U ia64 1928826293 unlimited-user license

 /usr/lib/hpux32/dld.so: Unsatisfied code symbol '__umodsi3' in load
 module '/usr/local/bin/wget'.

   And where did you get _that_ copy of wget?

 If I use the source code and run the configure script, then do a 'make
 install' I get the following error:
 [...]
 gcc -I. -I. -O  -DHAVE_CONFIG_H
 -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\
 -DLOCALEDIR=\/usr/local/share/locale\ -O -c connect.c
 
 In file included from connect.c:41:
 
 /usr/include/sys/socket.h:535: error: static declaration of 'sendfile'
 follows non-static declaration
 [...]

   Complaints about header files are often caused by a bad GCC
installation (or an OS upgrade which confuses GCC).

   I just tried building my VMS-oriented 1.10.2c kit using GCC on one of
the HP TestDrive systems, and I had some trouble ('ld: Unsatisfied
symbol libintl_gettext in file getopt.o'), but that's much later than
compiling connect.c, which got only the (usual) warnings about the
pointers.  That's with:

http://antinode.org/dec/sw/wget.html
http://antinode.org/ftp/wget/wget-1_10_2c_vms/wget-1_10_2c_vms.zip 

[EMAIL PROTECTED] gcc --version
gcc (GCC) 3.4.3
[...]

And I have no idea whether the GCC installation there is good or bad. 
(But it seems to be better than yours.)

   I also tried it using HP's C compiler (CC=cc ./configure):

[EMAIL PROTECTED] cc -V
cc: HP C/aC++ B3910B A.06.12 [Aug 17 2006]

Here, the make ran to an apparently successful completion, but real
testing is not convenient on the TestDrive systems, so I can't say
whether it would actually work better than what you have.

[EMAIL PROTECTED] ./src/wget -V
GNU Wget 1.10.2c built on hpux11.23.
[...]

   So, I'd suggest using HP's C compiler, or else re-installing GCC. 
After that, I'd suggest using the ITRC HP-UX forum:

http://forums1.itrc.hp.com/service/forums/familyhome.do?familyId=117

 Any idea's and assistance [...]

   That's ideas, by the way.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: hacking 'prefix'

2006-12-01 Thread Steven M. Schweda
   I give up.  What are you doing, what are you doing it with, what are
you doing it on, what happens, and what would you like to have happen
instead?  (Hint: Actual commands and their output would help more than
vague descriptions.)



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Accents in PHP parameter

2006-11-30 Thread Steven M. Schweda
 14:13:04 ERROR 406: Not Acceptable.

   It looks to me as if the Web server does not like these characters. 
Adding -d to the wget command might tell you more about what wget is
doing.

   Do you have any evidence of a URL like this which works in, say, a
Web browser?

 GNU Wget 1.7

   1.10.2 is the latest released version.  If there is a problem with
wget 1.7, _and_ if it's still a problem in 1.10.2, then someone might
wish to work on it.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Documentation error?

2006-11-29 Thread Steven M. Schweda
From: Ian

   As usual, it might help to know which wget version you're using
(wget -V) and on which system type you're using it.

 The documentation section 7.2 states:

   _Which_ documentation section 7.2?

   wget -r -l1 --no-parent -A.gif http://www.server.com/dir/

   I don't normally use -A, but a Google search for
  wget -A
found this:

  http://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html

which suggests that -A gif might work better than -A.gif.

   Adding -d to the wget command might also be informative.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: linux version crashes when reaching the max size limit

2006-11-07 Thread Steven M. Schweda
From: Toni Casueps

   i. it crashed is not a helpful description of what happened.  What
actually happened?

   2.  If the file is too large for a FAT32 file system, what would
you like to happen?  4294967295 looks like 2^32-1, which (from what I've
read) is the maximum size of a file on a FAT32 file system.

   3.  Wget 1.10.2 is the latest released version.  Complaints about
older versions normally lead to a suggestion to try the latest version.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: wget 1.10.1 segfaults after SYST

2006-11-06 Thread Steven M. Schweda
From: kaneda

[...]
 == SYST ... Segmentation fault (core dumped)
[...]

This sounds like the same problem as the one under wget 1.10.1
segfaults after SYST.  For details and the solution(s), try the thread
beginning at:

  http://www.mail-archive.com/wget@sunsite.dk/msg09371.html

   It _was_ nice to see a problem report with some useful info (wget
version, host OS, et c.) for a change.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: BUG - .listing has sprung into existence

2006-10-30 Thread Steven M. Schweda
From: Sebastian

   Doctor, it hurts when I do this.

   Don't do that.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: new wget bug when doing incremental backup of very large site

2006-10-21 Thread Steven M. Schweda
From dev:

 I checked and the .wgetrc file has continue=on. Is there any way to
 surpress the sending of getting by byte range? I will read through the
 email and see if I can gather some more information that may be needed.

   Remove continue=on from .wgetrc?

   Consider:

  -N,  --timestampingdon't re-retrieve files unless newer than
 local.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


  1   2   >