Re: Wget 1.11.3 - case sensetivity and URLs
In the VMS world, where file name case may matter, but usually doesn't, the normal scheme is to preserve case when creating files, but to do case-insensitive comparisons on file names. From Tony Lewis: To have the effect that Allan seeks, I think the option would have to convert all URIs to lower case at an appropriate point in the process. I think that that's the wrong way to look at it. Implementation details like name hashing may also need to be adjusted, but this shouldn't be too hard. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Looking for 1.9.1 user manual
From: Kevin.Low What I'm looking for today is simply, the 1.9.1 user manual. [...] Do you seek anything which is not part of the usual source kit(s), as seen, for example, at: http://ftp.gnu.org/gnu/wget/ ? (Pick a version, any version...) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: FW: cannot log on to Oracle portal/apache - full request - ignore pevious
From: Kevin.Low [...] I think knowing gcc 3.2 is not the culprit will help. It probably would, but we don't know that. GCC 3.2 seems to date back to around August 2002, and there were also 3.2.1, 3.2.2, and 3.2.3 over the next several months, so it's certainly pretty old, and it was probably not entirely defect-free. With a transcript showing what happened, someone might be able to assign blame (always the first and most important step in problem resolution). After that, many things are possible. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Toward a 1.11.1 release
[...] Is it even useful to _do_ prereleases? I was waiting for the version which integrated the (previously suggested) VMS-related changes. (There are some generic FTP-related fixes hidden among the VMS-related ones, too, of course.) Perhaps the Summer of Code thing will turn up someone with interests broader than Linux. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: need help
From: Gary Lubrani Not the most descriptive subject I've ever seen. checking for C compiler default output file name... configure: error: C compiler cannot create executables Apparently your C compiler is not working as expected. See `config.log' for more details. Well? Any clues there? Are you working in a directory where you have write permission? Can you compile a simple test program? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: how to parse a webpage to download links of certain type?
From: shirish [...] not directories [...] alp $ wget -h [...] Directories: -nd, --no-directories don't create directories. [...] Sounds as if it may be worth a try. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Wget continue option and buggy webserver
From: Charles In wget 1.10, [...] Have you tried this in something like a current release (1.11, or even 1.10.2)? http://ftp.gnu.org/gnu/wget/ [...] but for some reason (buggy server), [...] How should wget know that it's getting a bogus error from your buggy server, and not getting a valid error from a working server? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: seg fault ~30G
From: Hunter I'm getting a seg fault anytime I approach 30G in transfer with wget. I did a google search, but didn't see a resolution. Is there one I simply cannot find? It's hard to say. I'll tell you what I can't find here, and that's a useful problem report, which would include things like the wget version (wget -V), the OS you're using and its version, and the actual wget command you used (and its output). As usual, adding -d to the command might be informative. In a case where the program explodes, a traceback showing where it was when it died could also be helpful. (Not knowing your OS makes it hard to suggest how to get a traceback.) Evidence that you have adequate free disk space could be reassuring, too. There isn't anything magic about 30G (as there is about, say, 2G or 4G), so I'd guess that it'd more likely be a problem in your environment than in wget, but with the available evidence, that is only a guess. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: gzip question
From: Christopher Eastwood Does wget automatically decompress gzip compressed files? I don't think so. Have you any evidence that it does this? (Wget version? OS? Example with transcript?) Is there a way to get wget NOT to decompress gzip cpmpressed files, but to download them as the gzipped file? Just specify the gzip-compressed file, so far as I know. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: gzip question
From: Christopher Eastwood wget --header=3D'Accept-Encoding: gzip, deflate' http://{gzippedcontent} Doctor, it hurts when I do this. Don't do that. What does it do without --header='Accept-Encoding: gzip, deflate'? [...] (Wget version? OS? Example with transcript?) Still waiting for those data. Also, when I say Example, I normally mean An actual example, that is, one which can be tested and verified. Adding -d to the wget command can also be informative. SMS.
Re: Avoiding DoS fame *** Please cc me (non-subscriber)
From: Ezequiel Garzón Lucero Could anybody tell me the default value for the -w option. Based on wget's speed, I imagine it's not even 1, right? Zero, I assume. But then, how come wget users are not flagged as DoS offenders (at least not all the time)? Some users do not always ask for recursion through an entire site. My most frequent use of wget is to fetch a single file. Recursion, while sometimes useful, is not particularly common for me. Also, I'm not able to disable (or even greatly inconvenience) a server whose bandwidth is greater than mine. With my limited (DSL) bandwidth, that leaves much of the world safe from an attack by me. [...] does anybody know what are the standard thresholds for repeated requests? I'd say that it depends on the target of the requests. If I see annoying stuff in my Web or FTP server logs, I complain to the ISP for the pest. If it recurs, I block that IP address. Most serious denial-of-service attacks use more than one attacker. A single wget user can't do very much harm. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: .1, .2 before suffix rather than after
I don't care particularly how this stuff works, but if you'd like to do me a favor, please make sure, whatever the final scheme is, that it's easy to add the #ifdef for VMS to bypass the whole mess, because the file version numbers on VMS obviate it. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: RFE: run-time change of limit-rate multi-stream download
From: L Walsh Say one runs the first wget. Lets say it is a simple 1-DVD download. Then you start a 2nd download of another DVD. Instead of 2 copies of wget running and competing with each other, what if the 2nd copy told the 1st copy about the 2nd download, and the 2nd download was 'enqueued' in a 'line' behind 1st. Perhaps you need an operating system. On VMS, one could create a wget-specific batch queue, set its job limit to one, and submit all the non-compete wget jobs to it. The queue manager would run the submitted jobs one at a time, first-come=first-served, with the terminal output logged to a file (of your choice). If you ask (SUBMIT /NOTIFY), you can get a message broadcast to your terminal(s) when a job ends. http://h71000.www7.hp.com/index.html (Where would you like to put the axle on that new wheel?) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Using wget through FTP proxy server
From: Alan Watt I'm using wget version 1.9.1 for Solaris 8 (SPARC). [...] I don't deal with proxies, so I don't know much about this, but you might do better with the current released version, 1.10.2. I don't know if a suitable binary kit is generally available, but if you can't find one, and you can't build it from the source, I can build one on Solaris 10, if you think that that might be useful. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: More portability stuff [Re: gettext configuration]
From: Micah Cowan [EMAIL PROTECTED] Next problem on Tru64: [...] ld: Unresolved: siggetmask We ain't go no siggetmask(). None on VMS (out as far as V8.3), either, should I ever get so far. siggetmask is an obsolete BSDism; POSIX has the sigprocmask function, which we should prefer. We should also do feature-testing, and not assume there's a portable way to block/unblock signals. Note that sigprocmask() does appear on VMS, but apparently not until V8.2, which is ahead of many users (including me, in part). More portability would be better in this region. Can't sigsetmask() or sigblock() do the same job if you tell them not to change anything? SMS.
Re: wget -o question
From: Micah Cowan But, since any specific transaction is unlikely to take such a long time, the spread of the run is easily deduced by the start and end times, and, in the unlikely event of multiple days, counting time regressions. And if the pages in books were all numbered 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, ..., the reader could easily deduce the actual number for any page, but most folks find it more convenient when all the necessary data are right there in one place. But hey. You're the boss. SMS.
Re: wget -o question
From: Micah Cowan - tms = time_str (NULL); + tms = datetime_str (NULL); Does anyone think there's any general usefulness for this sort of thing? I don't care much, but it seems like a fairly harmless change with some benefit. Of course, I use an OS where a directory listing which shows date and time does so using a consistent and constant format, independent of the age of a file, so I may be biased. Though if I were considering such a change, I'd probably just have wget mention the date at the start of its run, rather than repeat it for each transaction. Obviously wouldn't be a high-priority change... :) That sounds reasonable, except for a job which begins shortly before midnight. I'd say that it makes more sense to do it the same way every time. Otherwise, why bother displaying the hour every time, when it changes so seldom? Or the minute? Eleven bytes more per file in the log doesn't seem to me to be a big price to pay for consistent simplicity. Or you could let the victim specify a strptime() format string, and satisfy everyone. Personally, I'd just change time_str() to datetime_str() in a couple of places. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Mirroring redirected web sites
From: Theo Wollenleben [...] For a single file I also tried `wget -N -O local_copy_of_file'. Apparently Wget doesn't check the timestamp of `local_copy_of_file', so it doesn't work either. [...] The implementation of -O defeats -N (among other options). Look around at http://www.mail-archive.com/wget@sunsite.dk/ for details. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: VMS support/getpass [Re: Gnulib getpass, and wget password prompting]
From: Micah Cowan [EMAIL PROTECTED] My preference would be to use getpass, which seems suitably abstracted already, and modify it as needed to support VMS. Hopefully, gnulib upstream would be interested in those changes (worth checking, at the very least), and it can be merged back up. I suppose that that wouldn't be _so_ terrible. I would probably have made the general/OS-specific split somewhere else, but then we'd most likely be arguing about /dev/tty v. SYS$COMMAND, or something else, so it's probably not really worse this way. I'll relax and wait for things to deteriorate. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: VMS support/getpass [Re: Gnulib getpass, and wget password prompting]
From: Tony Lewis I think you should give Micah the benefit of the doubt. [...] Am I complaining too much again? He's not doing everything I want before I want it done, but that's not unusual. I think that I'm still complaining _to_ him, not _about_ him. That was the intent, anyway. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Gnulib getpass, and wget password prompting
From: Micah Cowan [...] Gnulib actually has quite a large number of modules designed for portability; I imagine we could benefit from several of them. Well, yeah, where portability is limited to various UNIX-like systems and Windows. As I said (http://www.mail-archive.com/wget@sunsite.dk/msg10077.html;), it's all useless on VMS, so I'd prefer some kind of easier-to-deal-with level of abstraction. If you're completely disinterested in (or hostile to) having this program run on VMS, we could save a lot of my time by declaring it now. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Average download throughput using Wget
From: sankalp_karpe [...] Wget (windows version) [...] windows version does not reveal the wget version. The output from wget -V might. (i) [...] The final speed reported ((118.64 KB/s)) should be the average speed for the whole download, that is, the full byte count divided by the full download time. (ii) [...] The final speed reported ((118.64 KB/s)) should be what you want, and it's unlikely that any simple calculation using the intermediate rates will give you what you want. Some algebra would be helpful to explain why not. An old related problem looks like this: A motorist is making a trip of 100km. After traveling for one hour, he notices that he has gone only 50km. How fast does he need to go for the next 50km to get an average speed for the whole trip of 100 km/h? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: FTP OS-dependence, and new FTP RFC
From: Hrvoje Niksic I agree that string-of-CWDs would be better than the current solution. Well, that's good news. See, for example, the discussion around: http://www.mail-archive.com/wget@sunsite.dk/msg08233.html Also: http://www.mail-archive.com/wget@sunsite.dk/msg08447.html Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Wget 1.10.2 does not continue download when file in a subdirectory
From: Martin MOKREJÅ I think the following happens due to a bug in wget unable to look into a subdirectory for the file to be restarted in download: [...] Is this the same problem as this?: http://www.mail-archive.com/wget@sunsite.dk/msg09707.html Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: patch: prompt for password
From: Matthew Woehlke [...] +#include termios.h /* FIXME probably not portable? */ [...] This would certainly be a problem on VMS, which has its own terminal handling scheme, and no support for termios. Rather than installing a load of UNIX-specific (probably not portable) code into the middle of an otherwise fairly portable code segment, why not create a couple of functions, like, say, terminal_echo_disable() and terminal_chars_restore(), segregate your implementation of them into some UNIX-specific place, and let the rest of us supply our own? Or, for a real adventure, you could look at some considerably more portable program (like Info-ZIP [Un]Zip or Kermit), and see how more experienced people have handled this problem, and then do all the work yourself. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Problem with combinations of the -O , -p, and -k parameters in wget
From: Michiel de Boer [...] Therefore I use -O to write to a more sensible name. [...] Unfortunately, -O does not do name conversion, it simply directs all the program output to a specified file, and this causes bad behavior when -O is combined with many other options. Use the Search feature at http://www.mail-archive.com/wget@sunsite.dk/ (for -O) to find many similar complaints involving -O. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: ignoring robots.txt
From: Josh Williams As far as I can tell, there's nothing in the man page about it. It's pretty well hidden. -e robots=off At this point, I normally just grind my teeth instead of complaining about the differences between the command-line options and the commands in the .wgetrc start-up file. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: bug and patch: blank spaces in filenames causes looping
From various: [...] char filecopy[2048]; if (file[0] != '') { sprintf(filecopy, \%.2047s\, file); } else { strncpy(filecopy, file, 2047); } [...] It should be: sprintf(filecopy, \%.2045s\, file); [...] I'll admit to being old and grumpy, but am I the only one who shudders when one small code segment contains 2048, 2047, and 2045 as separate, independent literal constants, instead of using a macro, or sizeof, or something which would let the next fellow change one buffer size in one place, instead of hunting all over the code looking for every 20xx which might be related? Just a thought. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Downloading 7GB-file via FTP
From veejar: I use wgetpro-0.1.3_1 on FreeBSD 6.2 RELEASE. Great. How is that related to normal wget? On which wget version was it based? The current released version of wget, 1.10.2, should have no problems with large files, assuming that the FTP server and the local file system have no problems with large files. If real wget fails, complain here. If some other based on GNU Wget program fails, it might make more sense to complain to the people who wrote that program. http://wgetpro.sourceforge.net/ ??? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: timestamping and output document
From: purp Don't know if that's a known issue, [...] Try the Search feature at: http://www.mail-archive.com/wget@sunsite.dk/ For example: http://www.mail-archive.com/search?q=%22-O%22+%22-N%22[EMAIL PROTECTED] where you can see several previous similar complaints, and the explanation. [...] This is GNU Wget 1.9.1. Why? Wget 1.10.2 has been available since about October 2005. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: timestamping and output document
For the record: http://www.mail-archive.com/search?q=%22-O%22+%22-N%22[EMAIL PROTECTED] was actually more like: http://www.mail-archive.com/search?q=%22-O%22+%22-N%22l= wget at sunsite.dk before it got PROTECTED. SMS.
Re: wget -P not working.
From: Itamar Reis Peixoto Anyone can fix this bug for me ? https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229744 I don't know. Talk to the people who broke it? with wget-1.10.2-3.2.1.i386.rpm works, the file was downloaded on /etc, with newer versions ( 1.10.2-8.fc6.1 ) the file will be downloaded on current directory http://netenberg.com/forum/viewtopic.php?t=5430 [...] please check the wget version that you have on your server. If it is wget-1.10.2-3.3.fc5 or wget-1.10.2-7.el5 or wget-1.10.2-8.fc6.1, we suggest that you replace it immediately with an older and/or stabler version. This version does not honor the -P switch. An alternate version that we suggest is wget-1.10.2-3.2.1 It sounds to me as if an older kit will solve the problem. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Problem with --reject option
From: Glenn Nieuwenhuyse wget -T 1 -t 1 -r --reject=robots.* [...] I would expect this not to download the robots.txt file, but still it does. Perhaps because robots.txt is a special case, and is not selected by following links, and so is unaffected by the --reject option. A search for robot in the manual should reveal this: http://www.gnu.org/software/wget/manual/wget.html robots = on/off Specify whether the norobots convention is respected by Wget, on by default. This switch controls both the /robots.txt and the nofollow aspect of the spec. See Robot Exclusion, for more details about this. Be sure you know what you are doing before turning this off. So, adding -e robots=off to your command might help. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: problem with HTTP mirroring
From: Alexander Simon When calling wget -A.pdf,.PDF,.doc,.DOC,.java,.class,.JAVA,.CLASS,.zip,.ZIP -m -nH -nd -l1 --header=Accept-language: de, en;q=0.8 http://wwwseidl.informatik.tu-muenchen.de/lehre/vorlesungen/SS07/info2/index.php;; , wget should load some PDF files (i1.pdf, i2.pdf, i3.pdf, ...) that are linked on this site. As I read the HTML, i1.pdf appears to be on a different server: a href=http://www2.in.tum.de/~seidl/Courses/SS2007/i1.pdf;PDF/a Perhaps this option would help: -H, --span-hostsgo to foreign hosts when recursive. wget -h shows some other potentially useful options under Recursive accept/reject: -D, --domains=LIST comma-separated list of accepted domains. --exclude-domains=LIST comma-separated list of rejected domains. [...] Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Recursive function does not work with -O
From: Gekko [...] returns the first page it downloads only, and does not continue to download the other links, while omitting the -O - allows the downloading to work. That's right. In recursive HTTP operation, wget expects to read its own output files to find the links to follow. It's not designed to read its one-and-only -O output file to find links while it is writing that file. It would not be impossible to arrange this sort of thing, but it would be complicated, and it's not obvious that it would be particularly useful. Why would you want to do this? It should be relatively easy to get the same effect with a normal wget -r command and a shell script to go through the resulting files and cat them into a single mess. I still don't know why you'd want to do it, however. Thanks for including the wget and OS info in the question. It's a rare thing to get all the useful info around here. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: NULL ptr dereferences found with Calysto static checker
From: Domagoj Babic + wget-1.10.2/src/utils.c:287 // localtime can return NULL I'd say that it's sloppy code, but the probability of seeing a failure in the real world, while not zero, must be vanishingly small. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: can someone help me with wget?
From: shades13 I have been having problems [...] 1. People with real names tend to get more respect than others. 2. As usual, it might help to know which wget version you're using on which operating system. I don't see any links on http://www.talcomic.com/;. (Nor much of anything else.) Which links did you see on http://www.cad-comic.com/comic.php; which wget was not following? Hint: 'script src=http://www.google-analytics.com/urchin.js; type=text/javascript' is not a link which wget will follow. wget -d ... may give you a better idea of what wget is doing. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Return Value 2
From: Robert Denton Can you glean from the code what would cause an exit value 0? Zero is the success code, which is the default, so anything which does not set 1 or 2 should leave 0. 0: -I have never seen this one - What do you see when everything goes right? (Or do your jobs always fail in some way?) Along the same line, what, exactly, is one of my devices? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Return Value 2
From: Robert Denton So, can I take this to mean that 'wget exited with value 2' always indicates a problem with switches/options? No bets, but a search through the code for exit suggests that that's approximately true, if you count a problem reading or interpreting .wgetrc as a problem with switches/options, Where can I get a full list of exit values and what they indicate? I know of none. Skimming the exit search results suggests that your choices may be limited to 0, 1, and 2. (From which one might deduce that the designer was not a fan of AIX or VMS, where error messages and/or exit status values tend to be more informative.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Wget 1.10.2 + FC6 + FTP mirroring in root folder
From: Richard Dale [...] An upgrade to the latest revision of 1.10.2 exhibited the problems and a downgrade avoided the problems. Do these apparently different variants of wget version 1.10.2 say different things in the wget -V report? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Crash
From: Adrian Sandor Apparently there's more than a little code in src/cookies.c which is not ready for NULL values in the attr and value members of the cookie structure. Does that mean wget is buggy or does brinkster break the cookie specification? Wget is certainly buggy, but as I said, I don't do much with cookies, so I don't know if missing/null values are legal or not. I tried it, and it solves my problem. Glad to hear it. Thanks for the report. Will there be an official wget patch for this? Ask the wget maintainer. I can't even get the changes _I_ want into the official code. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Crash
From: Adrian Sandor [...] Stored cookie www14.brinkster.com -1 (ANY) /aditsu/ session insecure [expiry none] (null) (null) [...] Segmentation fault Apparently there's more than a little code in src/cookies.c which is not ready for NULL values in the attr and value members of the cookie structure. (It's more luck than design that you get (null) in the debug message, instead of it blowing up right there.) Double your money back if you're not completely satisfied, but, if you can build from the sources, you could try this one: http://antinode.org/ftp/wget/wget-1_10_2c_vms/cookies.c I don't do much with cookies, so the end cases may not be handled correctly, but it does seem to explode less. (If nothing else, it could boost the ego of the next fellow who does it right.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: convert-links + output-document
From: Poppa Pump [...] I cannot rename the file after I run the wget command with convert-links because I need a unique name before the download. Can you create a (uniquely named?) temporary directory, do the work in there (without --output-document), and rename/move the results later? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Loading cookies that were set by Javascript
From: Poppa Pump [...] but these are set using Javascript. [...] Wget doesn't do JavaScript. I suspect that you're doomed. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: --page-requisites and --post-data options
From aulaulau: [...] you will use --page-requisites and --post-data options together. Probably not something anyone considered. Is there a way to do it with wget options ? Perhaps use --post-data to get the primary page, and then use -i primary_page (perhaps with -F, perhaps with --page-requisites) to get the other pieces? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: simple wget question
From: R Kimber What I'm trying to download is what I might express as: http://www.stirling.gov.uk/*.pdf At last. but I guess that's not possible. In general, it's not. FTP servers often support wildcards. HTTP servers do not. Generally, an HTTP server will not give you a list of all its files the way an FTP server often will, which is why I asked (so long ago) If there's a Web page which has links to all of them, [...]. I just wondered if it was possible for wget to filter out everything except *.pdf - i.e. wget would look at a site, or a directory on a site, and just accept those files that match a pattern. Wget has options for this, as suggested before (wget -h): [...] Recursive accept/reject: -A, --accept=LIST comma-separated list of accepted extensions. -R, --reject=LIST comma-separated list of rejected extensions. [...] but, like many of us, it's not psychic. It needs explict URLs or else instructions (-r) to follow links which it sees in the pages it sucks down. If you don't have a list of the URLs you want, and you don't have URLs for one or more Web pages which contain links to the items you want, then you're probably out of luck. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: sending Post Data and files
1.) How can I send Post Data with Line Breaks? I can not press enter and \n or \r or \r\n dont work... Put the data into a file, and use --post-file=FILE_NAME? Is it possible to send a File with a name? Other than with --post-file=FILE_NAME? Is it possible to send two files? I believe not. At least not using --post-file. [...] Input type=file [...] I've never tried that, so I know even less about that than I do about --post-file (which I did use once). Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: sending Post Data and files
From: Tony Lewis You don't need a line break because parameters are separated by ampersands; a=1b=2 You need a line break if you need a line break, as when you wish to set a variable to multiple lines of text. For example: subject=Test messagemsg_text= This is a test. This is only a test. If this had been an actual message, It would have been delivered appropriately. Put that stuff into a file named, say, test.dat, and specify --post-file=test.dat. The server should then set the variable subject to Test message, and msg_text to: This is a test. This is only a test. If this had been an actual message, It would have been delivered appropriately. (with the line breaks). Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: simple wget question
From: R Kimber If I have a series of files such as http://www.stirling.gov.uk/elections07abcd.pdf http://www.stirling.gov.uk/elections07efg.pdf http://www.stirling.gov.uk/elections07gfead.pdf etc is there a single wget command that would download them all, or would I need to do each one separately? It depends. As usual, it might help to know your wget version and operating system, but in this case, a more immediate mystery would be what you mean by them all, and how one would know which such files exist. If there's a Web page which has links to all of them, then you could use a recursive download starting with that page. Look through the output from wget -h, paying particular attention to the sections Recursive download and Recursive accept/reject. If there's no such Web page, then how would wget be able to divine the existence of these files? If you're running something older than version 1.10.2, you might try getting the current released version first. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget suggestion
From: Robert La Ferla There needs to be a way to tell wget to reject all domains EXCEPT those that are accepted. This should include subdomains. Ie. I just want to download www.mydomain.com and cache.mydomain.com. I thought the --domains option would work this way but it doesn't. Can you provide any evidence that it doesn't? Useful info might include the wget version, your OS and version, the command you used, and the results you got. Adding -d to the command often reveals more than not using it. A real example is usually more useful than a fictional example. If you can't exhibit the actual failure and explain how to reproduce it, you might do better with a psychic hot-line, as most of us are not skilled in remote viewing. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget suggestion
From: Robert La Ferla GNU Wget 1.10.2 Ok. Running on what? Capture this sub-site and not the rest of the site so that you can view it locally. i.e. just www.boston.com and cache.boston.com http://www.boston.com/ae/food/gallery/cheap_eats/ What is a sub-site? Do you mean this page, or this page and all the pages to which it links, excluding off-site pages, or what? I have a better idea. Read this again: Can you provide any evidence that it doesn't? Useful info might include the wget version, your OS and version, the command you used, and the results you got. Adding -d to the command often reveals more than not using it. A real example is usually more useful than a fictional example. If you can't exhibit the actual failure and explain how to reproduce it, you might do better with a psychic hot-line, as most of us are not skilled in remote viewing. You might also consider phrasing your demands as polite requests in future. Phrases like I would like to learn how to, or Can you explain how to can be useful for this. Even better would be, I tried this command insert command here, and I got this result insert result here, but I was expecting something more like this insert expected result here, and I definitely didn't expect this insert undesirable result here. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: How can I compile a list of URLs matching a pattern?
From: Karim Ali [...] I want to traverse a given site, but only retrieve the URL's that matche a particular pattern. [...] [...] I'd like it if wget would just return the URL's it finds during its recursive traversal, but not return the data. [...] If wget is to traverse a given site, it needs to fetch the HTML documents from the server so it can search them for links to other HTML documents. How should it do this if it does not return the data? Have you looked at these?: -A, --accept=LIST comma-separated list of accepted extensions. --spider don't download anything. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: feature suggestion - make option to use system date instead multiple version number
From: Alvydas I guess it would relatively easy and quite useful to add an option to name file.20070426142800 file.20070426142955 ... instead just numbers. The relevant code is in src/utils.c: unique_name(), and should be easy enough to change. On a fast system, however, one-second resolution (or multiple users) could lead to non-unique names, so it would be wise to do something a little more like the existing code, but with a date-time string added in. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: q: wget -r -Apdf http:// ..
From: b GNU Wget 1.10+devel Wget 1.10.2 is the current released version, but that probably won't help you with this problem. http://www.gnu.org/software/wget/wget.html But a close look at the output from wget might help. These files may be PDF files, but the names in the links (with all the query data, ?id=xxx) are XXX.pdf?id=xxx, not XXX.pdf, so your -Apdf causes this behavior: [...] 20:59:20 (145.31 KB/s) - `NewZert/isht.comdirect.de/html/cer/pdf/ML-RAEZ_1PFlyer [1].pdf!id=6dfbfc0f556ec1b78536ae411ee684f19110b63c568a53b046721f62f6' saved [905090] Removing NewZert/isht.comdirect.de/html/cer/pdf/ML-RAEZ_1PFlyer[1].pdf!id=6dfbfc 0f556ec1b78536ae411ee684f19110b63c568a53b046721f62f6 since it should be reje cted. [...] and that Removing [...] is not what you want. (888 is not pdf, so it's doing what you asked it to do.) I think that you'll need to remove the -Apdf from your wget command. When wget is finished, you can remove the non-PDF files and/or rename the PDF files to remove the query data from the file names, but I don't know how to make wget do the whole job without any help. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Bug using recursive get and stdout
A quick search at http://www.mail-archive.com/wget@sunsite.dk/; for -O found: http://www.mail-archive.com/wget@sunsite.dk/msg08746.html http://www.mail-archive.com/wget@sunsite.dk/msg08748.html The way -O is implemented, there are all kinds of things which are incompatible with it, -r among them. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: [enhancement request] goals to programmatically parse output ( -o or -a)
From: Thomas Harding - all outputs lines will uses semicolumn (;) separated fields And this won't cause confusion if there's a semi-colon in a file name? Wy wget version is GNU Wget 1.9.1 (and prefer to not change, causes it is used by apt-methods...) And the current _released_ version is 1.10.2, so who else will be interested in changes to 1.9.1? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Feature suggestion for WGET
From: Daniel Clarke - JAS Worldwide I'd like to suggest a feature for WGET: the ability to download a file and then delete it afterwards. Assuming that you'd like to delete it on the FTP server, and not locally, the basics of this seem pretty easy to add: 0. Documentation. 1. Some kind of command-line option to control the new source-delete feature (or whatever you decide to call it). 2. src/ftp-basic.c: Add a new function, ftp_dele() (very nearly ftp_retr() converted to send DELE instead of RETR, and to expect a 2xx success response instead of a 1xx). 3. src/ftp.h: Add function prototype for ftp_dele(). 4. src/ftp.c: In getftp(), if ftp_retr() succeeds, and the new source-delete option is enabled, call the new ftp_dele(). 5. src/ftp.c: Add a bunch of new debug and error message code to deal with ftp_dele() activity and failures. I've done steps 2, 3, and 4 in my experimental code, and the basic functionality seems to be there. If anyone is eager to do the whole job and wants to see my rough code, just let me know. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Suggesting Feature: Download anything newer than...
but if you're going to add --not-before, you might as well add --not-after too. I'd suggest --before () and --since (=), but I may be prejudiced by exposure to VMS, where many commands use similar qualifiers. (But if you _like_ longer, more complicated option names, ...) Me add?!? ;-) It's that, or wait for someone else to do it. You decide. Is adding such features being worked on by someone - or should I start cramming C and RFCs, and *try to* make a patch for it myself? I know that _I_ wasn't working on them. I don't think that you need any RFC's for this, mostly just code theft from other parts of the program. It could be an educational experience. (I hate educational experiences.) If my experience is any guide, getting any changes into the main development code may be more of a challenge than getting those changes to work properly. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Using a variable to get files in sequence
From: Williamts99 Thanks for your response, I am using Linux. You might benefit from studying a shell scriping primer, like, for example: http://developer.apple.com/documentation/OpenSource/Conceptual/ShellScripting/index.html Or try asking Google to look for something like: linux shell scripting primer Roughly, I'd start with something like this: #!/bin/sh n_min=1 n_max=8 n=$n_min while [ $n -le $n_max ] ; do echo n = $n n=` expr $n + 1 ` done Adjust n_min and n_max as appropriate, and replace the echo command with an appropriate wget command. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Cannot write to auto-generated file name
From Tony Lewis: In which case, wget should do something reasonable (generate an error message, truncate the file name, etc.). [...] Sadly, this is easier said than done. Around here (VMS), the complaint is i/o error. I haven't tried it on a UNIX, but it could easily be different there, too. VMS offers a ayatem service which can be used to parse a file specification and test it for legality, but I don't know how you would do it elsewhere. On some Linux system(s), there seems to be a distictive code/message (File name too long): http://www.mail-archive.com/wget@sunsite.dk/msg09711.html Simply truncating the name would be asking for collisions, and etc. would seem to involve actual work, especially when converting links to local. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Using a variable to get files in sequence
From: Williamts99 Is there any way [...] to force wget to use the wildcards? Sure. You did it. Unfortunately, there's no way to force the HTTP server to use wildcards. One could probably write a script to do this sort of thing, but, without knowing which OS you're using, it's difficult to guess exactly how it might best be done. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Special characters in http
From: Alan Thomas What is happening? [...] I'm no Windows expert, but, as you said, these are special characters. Have you tried quoting the URL? In UNIX, apostrophes and quotation marks are popular; in VMS, quotation marks; in Windows, at least one of those should be effective. Note that you may need to use -O, because otherwise the wget-generated output file name may be too ugly for your file system. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Wget doesn't use characters after '' when saving a URL as a filename
From: Ed I use wget to fetch [...] Would it be asking too much to see the actual command you used, and its actual output? We're not all psychic out here. As usual, it might be interesting to know which wget version is involved here. Mac OS X by the way That's not a by the way item any more than the wget version is. And Mac OS X covers far too much ground, too. How do I get the full file name? It depends. Perhaps it involves giving the whole URL to wget. Launching into guesswork based on insufficient evidence, have you tried quoting the URL in the command? Your shell could be doing the truncation at the ampersand, which is a shell-special character. If I could see the command you used and its output, I wouldn't need to guess. Didn't the stuff like [1]+ Done seem out of place? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Is it possible to log transfer times in milliseconds?
Assuming that you're using wget version 1.10.2 (or similar), it appears (src/ptimer.c) that the program already uses a time resolution of a millisecond (or better), given underlying run-time library support at that resolution. The formatted output (retr_rate(): src/retr.c) is limited to a form which is more convenient for most users. If you want the results to have any meaning, you should examine the wget code to see exactly what is being timed (at which events the timer starts and stops), to see if wget is measuring what you want measured. A term like response time is pretty vague all by itself. It should be easy enough to modify the formatted output code to provide more digits than the existing code does. (Whether these would be _significant_ figures would depend on the underlying OS timer resolution.) I don't see how you could get this sort of output without changing the code, so you'd need to decide whether you wanted to add a command-line option (or to use some other method) to enable the new elapsed time format, or if you just wanted to maintain a separate code stream for a modified wget program which always uses the new format. Getting changes like this into the main product code stream is someone else's decision. If I were you, I'd expect to have to make the changes and maintain the different code myself into the indefinite future. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wildcards in filenames
From: Alan Thomas [...] Putting /l*.htm at the end of the URL did not work: Warning: wildcards not supported in HTTP. Putting l*.htm after the URL (separated with a space) did not work either. It's not a UNIX problem, it's an HTTP problem/limitation. You can't ask the HTTP (Web) server to send you all the l*.htm files. It's just not one of the allowed requests. You _can_ do this sort of thing with an _FTP_ server, but not with an HTTP server. Wget follows hyperlinks, so if the Web page of interest here has a bunch of links to files, and you would like it to follow only some of them, it appears to me that the best you can get (from wget version 1.10.2) is these: [...] Recursive accept/reject: -A, --accept=LIST comma-separated list of accepted extensions. -R, --reject=LIST comma-separated list of rejected extensions. [...] which don't appear do what you want. You might need to suck down the first Web page, edit it locally to remove the links you'd like not to follow, and then ask wget to start with the modified page. (Which sounds like a lot of work.) If you can get the stuff using FTP instead of HTTP, it would look a lot easier. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
wget-1.10.2 pwd/cd bug
It's starting to look like a consensus. A Google search for: wget DONE_CWD finds: http://www.mail-archive.com/wget@sunsite.dk/msg08741.html Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
re: Huh?...NXDOMAINS
Around here: [...] can't find ga13.gamesarena.com.au: Non-existent host/domain If that's your complaint, then I don't see what wget is supposed to do about it. What would you like it to do, make up an address? What's Australian for broken link? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: ezmlm response
From: Bruce For beginners, *does* wget support large files greater than 2.1gb? I looked and looked through the docs and found nothing to confirm or deny this questiondoes anyone know? From NEWS (which is included in the source kit): GNU Wget NEWS -- history of user-visible changes. [...] * Changes in Wget 1.10. ** Downloading files larger than 2GB, sometimes referred to as large files, now works on systems that support them. This includes the majority of modern Unixes, as well as MS Windows. [...] You may expect still to have problems if the HTTP or FTP server supplies bad file size data for large files. there's another problem to be resolved first -- NXDOMAINS. Can wget negotiate this namespace trickery, and if so how? Huh? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Question re web link conversions
From: Alan Thomas As usual, wget without a version does not adequately describe the wget program you're using, Internet Explorer without a version does not adequately describe the Web browser you're using, and I can only assume that you're doing all this on some version or other of Windows. It might help to know which of everything you're using. (But it might not.) Using GNU Wget 1.10.2c built on VMS Alpha V7.3-2 (wget -V), I had no such trouble with either a Mozilla or an old Netscape 3 browser. (I did need to rename the resulting file to something with fewer exotic characters before I could get either browser to admit that the file existed, but it's hard to see how that could matter much.) It's not obvious to me how any browser could invent a URL to which to go Back, so my first guess is operator error, but it's even less obvious to me how anything wget could do could cause this behavior, either. You might try it with Firefox or any browser with no history which might confuse a Back button. If there's a way to blame wget for this, I'll be amazed. (That has happened before, however.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Naming output file
From: Alan Thomas Is there a way to tell wget how to name an output file (i.e., not what it is named by the site from which I am retrieving). -O, --output-document=FILEwrite documents to FILE. Note that using -O has some side effects which bother some users. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: file numbering bug
From: Robert Dick When serializing sucessive copies of a page, the serial number appears at the end of the extension, i.e, what should be file1.html is called file.html.1 I'm using wget ver. 1.10.2. with the default options on Windows ME ... I can see how that might annoy a Windows user, but it would probably be a terrible idea to change the file name as you suggest, because it would break any HTML links to file.html which might appear in any other file. If you don't like the .nnn suffix, then you'll need to clean it up later, or else don't download the same file twice into the same directory. (Or you could use VMS, where file version numbers are a natural part of the file system, so the .nnn suffix is not needed, and this problem does not arise.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget with -nc bails out when it finds the first file that already exists
From: Pete Redest wget bailed out on the first file: File `pure-data.cvs.sourceforge.net/pure-data/doc/tutorials/index.html' already there; not retrieving. Aborted It seems to woek for me (same commands): [...] File `pure-data_cvs_sourceforge_net/pure-data/doc/tutorials/footils/index.html' already there; not retrieving. File `pure-data_cvs_sourceforge_net/pure-data/doc/tutorials/intro/index.html' al ready there; not retrieving. --00:58:43-- http://pure-data.cvs.sourceforge.net/pure-data/doc/tutorials/messa geoddness/ = `pure-data_cvs_sourceforge_net/pure-data/doc/tutorials/messageoddn ess/index.html' Resolving pure-data.cvs.sourceforge.net... 66.35.250.81 Connecting to pure-data.cvs.sourceforge.net|66.35.250.81|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] [ = ] 2,870 --.--K/s [...] [...] If this bailing-out on first already-existing file is what is intended, the design is deficient, and wget is less than useful. I'll tell you what's less than useful, and that's a problem report which omits significant facts, such as the program version, the system type, the OS and version, and so on. Around here: alp $ wget -V GNU Wget 1.10.2c built on VMS Alpha V7.3-2. [...] If you're using anything other than wget 1.10.2, then I'd suggest trying the current released version. If that fails, try complaining again (and better). Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget -O not preserving execute permissions
From Andrew Hall: As usual, it might help to have some basic information, like the wget version, the system type and OS on which it's being run, and an actual wget command. I notice when using wget -O that execute permissions on files are not preserved. With -O, wget opens the output file before it talks to the server, so it doesn;t have that information at that time. Wget (including with -O) allows a user to fetch multiple files with one command. Whose file permissions would you like it to use? -O does not work the way many (most?) people seem to think that it does, which leads to faulty expectations. So a file which on the webserver is rwxr-xr-x will be written as rw-r--r-- With your umask, I'd expect that _any_ file which you can get the Web server to send will be written with rw-r--r--. In most cases, file permissions on a Web server are not even available to the client. An FTP server is more likely to supply this kind of info. Is this intentional? I'd say it was more accidental than intentional. Is there a way I can preserve execute permissions? The easiest way might be to use FTP and not -O. Which do you like better after a download, mv or chmod? And how do you know which permissions the file had originally? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Ascii transfers
From bruce . furber: I want to use wget to transfer files from an IBM mainframe FTP server which stores files in EBCDIC to Linux on our S390X machine. Even though I use the type=a option, the files are transferred in Binary (EBCDIC). The .listing files are OK in ASCII. As wget 1.10.2 is written (and -d should show), specifying ;type=a will request an ASCII transfer (TYPE A) instead of the default IMAGE transfer (TYPE I), but the standard wget code does not process the received data properly. That is, it does not adjust the line endings, and it certainly does no EBCDIC-ASCII code conversion. (Not adjusting the line endings does allow it to do -c continuation easily, which would be either unreliable or very difficult if the data were processed properly upon receipt.) My VMS-compatible wget 1.10.2c will adjust the line endings (for a UNIX or VMS host -- I don't care about -c), but that still won't convert EBCDIC to ASCII. Of course, everything depends on what the FTP _server_ does when it gets a request for an ASCII transfer. Assuming that wget really _is_ requesting ASCII and you're still getting EBCDIC, then you're probably doomed to use some external EBCDIC-ASCII code converter program. Example -d output showing default and type=a behavior: alp $ wget -d ftp://alp-l/wget/wget-1_9_1e_vms/vms_notes.txt DEBUG output created by Wget 1.10.2c built on VMS V7.3-2. --23:10:29-- ftp://alp-l/wget/wget-1_9_1e_vms/vms_notes.txt = `vms_notes.txt' [...] 257 SYS$SYSDEVICE:[ANONYMOUS] is current directory. done. == TYPE I ... -- TYPE I 200 TYPE set to IMAGE. [...] While, on the othet hand: alp $ wget -d ftp://alp-l/wget/wget-1_9_1e_vms/vms_notes.txt;type=a DEBUG output created by Wget 1.10.2c built on VMS V7.3-2. --23:10:11-- ftp://alp-l/wget/wget-1_9_1e_vms/vms_notes.txt;type=a = `vms_notes.txt' [...] 257 SYS$SYSDEVICE:[ANONYMOUS] is current directory. done. == TYPE A ... -- TYPE A 200 TYPE set to ASCII. [...] Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget having trouble with large files
From: Niels Möller [...] I'm using wget-1.9.1, [...] You might try version 1.10.2, which offers large-file support. http://www.gnu.org/software/wget/wget.html Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget URL's
From: u01jmg3 Hi, I just wondered if you can use wget to just visit/hit a URL rather than downloading anything? Regards. --spider? (Assuming that by visit/hit [...] rather than downloading anything you mean a HEAD request rather than a GET request.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget problem with IBM Http Server2 = apache 2
In your problem report, I see version numbers for everything but wget. Does adding -d to the wget command tell you anything? Anything in the Web server logs? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: is there any plan about supporting different charsets?
From: Willener, Pat http://en.wikipedia.org/wiki/Big5 Ok. Thanks for the pointer. From: Leo Jay the attachment is a sample .listing file. I don't know if anyone plans to do anything about multi-byte characters anywhere in wget, and I know that I can't read them, but I see no reason why the existing code (with extensions already suggested) should not be able to handle any byte-character string you specify for a month name, whether or not it makes any sense as byte characters. (One could add an array of different spellings of total, too.) That is, I believe that you could append your big5_months[] strings to the existing months[] array (and add as many other sets (of twelve) as you'd like), and then make changes something like: [...] #define MONTHS_LEN (sizeof( months)/ sizeof( months[ 0])) for (i = 0; i MONTHS_LEN; i++) [...] if (i != MONTHS_LEN) [...] month = i% 12; [...] Assuming that the strings like 26+ 0xa4+ 0xeb are day numbers, it appears that you got pretty lucky with wget's simple-minded day_number-to-integer conversion method. Not much work needed there. Note that a few bytes of storage could be saved by specifying empty strings () instead of duplicates, where other languages look like English. For example: static const char *months[] = { Jan, Feb, Mar, Apr, May, Jun, /* English. */ Jul, Aug, Sep, Oct, Nov, Dec, ,,Mär, ,Mai, , /* German. */ ,,,Okt, ,Dez }; As for getting changes like this into the main development code, I'm probably the wrong person to ask, as I've been trying for years to get a set of VMS-related changes adopted with no obvious success. A while back, another fellow had a similar complaint about German month names: http://www.mail-archive.com/wget@sunsite.dk/msg07775.html I seem to have sent him some private e-mail, but I didn't post anything to the forum at that time. But it does show that there is some interest in this problem other than yours. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: is there any plan about supporting different charsets?
From: Leo Jay i had already hacked the src/ftp-ls.c to meet my need before i posted this thread. but my approach is just hard coding, which i think is not a good way to solve this problem and lack of flexibility. so, i wonder if the wget developers have any plan to solve this problem. and i think their solution must be very elegant (at least than mine). Wget developers are people who develop wget. Anyone can do it. and the attachment is my modification for big5 charset. could you please have a look at it for its correctness? thanks. What is a big5 charset? I can't look for correctness until I know what you're trying to do. You may know what you want, but it's not clear to me. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget error report
From: Daniele Annesi I think it is a Bug: using wget for multiple files : es. wget ftp://user:[EMAIL PROTECTED]/*.zip in the time of each file the seconds are set to 00 That's not an error report. An error report would tell the reader which version of wget you were using (wget -v), on which system type you were using it, and the OS version, at least. It would also help to know how the FTP server reports the date-times in its listings, as that's where wget gets the information. If the server doesn't provide the seconds, how can wget set them? (And of course, without more information we can't see the date-time data for ourselves.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: is there any plan about supporting different charsets?
From: Leo Jay since the responds of ftp server could be in different charsets, and wget can't cope with charsets other than English, i'd like to know is there any plan about supporting different charsets? Are you complaining about dates in different languages, or file names in different character sets? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: is there any plan about supporting different charsets?
From: Leo Jay since the responds of ftp server could be in different charsets, and wget can't cope with charsets other than English, i'd like to know is there any plan about supporting different charsets? Are you complaining about dates in different languages, or file names in different character sets? i'm talking about dates in different languages. i haven't tried file names in different charsets, but i'm sure wget can't cope with dates in different languages. If you look in src/ftp-ls.c: ftp_parse_unix_ls(), you should find an array of month names: static const char *months[] = { Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec }; If by dates in different languages you mean that non-English month names are the only problem, then it should be fairly easy to extend this with month names in other languages, and then change the code below (if (i != 12), month = i;) to something a litle more complex, to handle the new possibilities. If the order of the tokens also changes, then you may need to dive into the hideously complex parsing code, and make it even more hideously complex. (The fellow who designed the date format(s) for ls was obviously targeting an intelligent human audience, not another computer program. The order and simplicity of a VMS DIRECTORY listing shows some evidence of actual design, and parsing such a listing is relatively trivial, but that won't help you any.) I might offer a few more details, but your specification of the problem is not complete enough to make that practical. If you can list a set of date forms which must be interpreted, then it might be possible to say how hard it would be to do the job. (I assume that there is no actual ambiguity in the month name strings for the languages you would like to support, but that could make the problem impossible to solve for some languages.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Newbie Question - DNS Failure
From: Terry Babbey Built how? Installed using swinstall How the depot contents were built probably matters more. Second guess: If DNS works for everyone else, I'd try building wget (preferably a current version, 1.10.2) from the source, and see if that makes any difference. [...] Started to try that and got some error messages during the build. I may need to re-investigate. As usual, it might help if you showed what you did, and what happened when you did it. Data like which compiler (and version) could also be useful. On an HP-UX 11.23 Itanium system, starting with my VMS-compatible kit (http://antinode.org/dec/sw/wget.html;, which shouldn't matter much here), I seemed to have no problems building using the HP C compiler, other than getting a bunch of warnings related to socket stuff, which seem to be harmless. (Built using CC=cc ./configure and make.) td176 cc -V cc: HP C/aC++ B3910B A.06.13 [Nov 27 2006] And I see no obvious name resolution problems: td176 ./wget http://www.lambton.on.ca --23:42:04-- http://www.lambton.on.ca/ = `index.html' Resolving www.lambton.on.ca... 192.139.190.140 Connecting to www.lambton.on.ca|192.139.190.140|:80... failed: Connection refuse d. d176 ./wget -V GNU Wget 1.10.2c built on hpux11.23. [...] That's on an HP TestDrive system, which is behind a restrictive firewall, which, I assume, explains the connection problem. (At least it got an IP address for the name.) And it's not the same OS version, and who knows which patches have been applied to either system?, and so on. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Newbie Question - DNS Failure
From: Terry Babbey I installed wget on a HP-UX box using the depot package. Great. Which depot package? (Anyone can make a depot package.) Which wget version (wget -V)? Built how? Running on which HP-UX system type? OS version? Resolving www.lambton.on.ca... failed: host nor service provided, or not known. First guess: You have a DNS problem, not a wget problem. Can any other program on the system (Web browser, nslookup, ...) resolve names any better? Second guess: If DNS works for everyone else, I'd try building wget (preferably a current version, 1.10.2) from the source, and see if that makes any difference. (Who knows what name resolver is linked in with the program in the depot?) Third guess: Try the ITRC forum for HP-UX, but you'll probably need more info than this there, too: http://forums1.itrc.hp.com/service/forums/familyhome.do?familyId=117 Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Possibly bug
From: Yuriy Padlyak Have been downloading slackware-11.0-install-dvd.iso, but It seems wget downloaded more then filesize and I found: -445900K .. .. .. .. ..119% 18.53 KB/s in wget-log. As usual, it would help if you provided some basic information. Which wget version (wget -V)? On which system type? OS and version? Guesswork follows. Wget versions before 1.10 did not support large files, and a DVD image could easily exceed 2GB. Negative file sizes are a common symptom when using a small-file program with large files. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Downloading multiple pages
From: graham hadgraft I need some help using an application [...] You seem to need some help asking for help. wget -r -l2 -A html -X cgi-bin -D www.somewebsite.co.uk/ -P /home/httpd/vhosts/somewebsite.co.uk/catalogs/somewebsite/swish_site/ http://www.somewebsite.co.uk/questions/ This only index the index page of this folder. It wil not follow the links on the page. What would be the appropriate command to use to index all pages from that folder. Did it occur to you that it might matter which version of wget you're using, and on which system type (and version)? Or that it might be difficult for someone else to guess what happens when no one else can see the Web page which seems to be causing your trouble? Does it actually have links to other pages? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: SI units
From: Lars Hamren Download speeds are reported as K/s, where, I assume, K is short for kilobytes. The correct SI prefix for thousand is k, not K: http://physics.nist.gov/cuu/Units/prefixes.html To gain some insight on this, try a Google search for: k 1024 I've seen contrary comments from people who apparently know no actual science, and who think that they know somthing about computers, claiming that 1000 is wrong, and that only 1024 is legitimate for k or K. You have my best wishes in your quest to set the world straight on this one. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Issue/query with the Wget
From: Manish Gupta Issue: when i pass a 300 MB file to wget in one shot, it willl not able to download the file at the client side. Is this _really_ a problem, or are you only afraid that it might be a problem? 300MB is not a large file. 2GB (or, sometimes, 4GB) is a large file. The latest released wget version (1.10.2) should work with large files on systems which support large files. Do wget has the feature of buffer where it is holding the stream, if it there then by increasing or specifying th buffer limit, i think we can overcome the issue. Wget writes the data to a file. If you have the disk space, it should work. People often use wget to download CD and DVD image files. Some older wget versions (without large-file support) had some problems with files bigger than 2GB (or 4GB, depending on the OS), but not version 1.10.2. Some _servers_ have problems with large files, but those are not wget problems. As usual, it would help to know which version of wget you're using, on which host system type you're using it, and the OS version there. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: re: 4 gig ceiling on wget download of wiki database. Wikipedia database being blocked?
From: Jonathan Bazemore: I've repeatedly tried [...] If it's still true that you're using wget 1.9, you can probably try until doomsday with little chance of success. Wget 1.9 does not support large files. Wget 1.10.2 does support large files. Try the current version of wget, 1.10.2, which offers large-file support on many systems, possibly including your unspecified one. Still my advice. In the future, it might help if you would supply some useful information, like the wget version you're using, and the system type you're using it on. Also, actual commands used and actual output which results would be more useful than vague descriptions like consistently breaking and will not resume. I've used a file splitting program to break the partially downloaded database file into smaller parts of differing size. Here are my results: [...] So, what, you're messing with the partially downloaded file, and you expect wget to figure out what to do? Good luck. [...] wget (to my knowledge) doesn't do error checking in the file itself, it just checks remote and local file sizes and does a difference comparison, downloading the remainder if the file size is smaller on the client side. Only if it can cope with a number as big as the size of the file. Wget 1.9 uses 32-bit integers for file size, and that's not enough bits for numbers over 4G. And if you start breaking up the partially downloaded file, what's it supposed to use for the size of the data already downloaded? Wikipedia doesn't have tech support, [...] Perhaps because they'd get too many questions like this one too many times. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: problem at 4 gigabyte mark downloading wikipedia database file.
From: Jonathan Bazemore: [...] I am using wget 1.9 [...] up to about the 4 gig mark [...] Try the current version of wget, 1.10.2, which offers large-file support on many systems, possibly including your unspecified one. http://www.gnu.org/software/wget/wget.html Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Wget timestamping is flawed across timezones
From: Remko Scharroo: Can this be fixed? Of course it can be fixed, but someone will need to fix it, which would involve defining the user interface and adding the code to do the actual time offset. I assume that the user will need to specify the offset. For an indication of what could be done, you might look for WGET_TIMEZONE_DIFFERENTIAL in my VMS-adapted src/ftp-ls.c: ftp_parse_vms_ls(). http://antinode.org/dec/sw/wget.html This is a common problem on VMS systems, which normally (sadly), use local time instead of, say, UTC. One result of this is that FTP servers on VMS tend to provide file date-times in the server's local time. I chose to add an environment variable (a VMS logical name on a VMS system) as the user interface for code simplicity (less work for me), and partly because VMS uses a similar logical name (SYS$TIMEZONE_DIFFERENTIAL) to specify the offset from UTC to local time, so the concept would already be familiar to a VMS user. I use WGET_TIMEZONE_DIFFERENTIAL in the code only for a VMS FTP server, but I assume that it could easily be adapted to the other ftp_parse*_ls() functions. (Or a new command-line option could be used to specify the offset.) When I did the work, I probably didn't consider the possibility that any non-VMS FTP servers would provide file date-times in non-UTC. Otherwise I might have made it more general. Trying to get my VMS-related changes into the main Wget development stream has been sufficiently unsuccessful that I don't spend much time working on adding features and fixes which are not trivially easy and which I don't actually need myself. But I wouldn't try to discourage anyone else. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: FTP SYST NULL dereferencing crash (found by someone else)
From: Ulf Harnhammar [EMAIL PROTECTED] + if (request == NULL) +{ + xfree (respline); + return FTPSRVERR; +} Well, yeah, if you prefer returning an error code to trying a little harder. I prefer my change: if (request == NULL) *server_type = ST_OTHER; Why punish the user when the FTP server behaves badly? Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: trouble loading and installing wget
From: Siddiqui, Kashif I'm trying to install wget on my itanium 11.23 system [...] I assume that that's HP-UX 11.23, as in: [EMAIL PROTECTED] uname -a HP-UX td176 B.11.23 U ia64 1928826293 unlimited-user license /usr/lib/hpux32/dld.so: Unsatisfied code symbol '__umodsi3' in load module '/usr/local/bin/wget'. And where did you get _that_ copy of wget? If I use the source code and run the configure script, then do a 'make install' I get the following error: [...] gcc -I. -I. -O -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ -DLOCALEDIR=\/usr/local/share/locale\ -O -c connect.c In file included from connect.c:41: /usr/include/sys/socket.h:535: error: static declaration of 'sendfile' follows non-static declaration [...] Complaints about header files are often caused by a bad GCC installation (or an OS upgrade which confuses GCC). I just tried building my VMS-oriented 1.10.2c kit using GCC on one of the HP TestDrive systems, and I had some trouble ('ld: Unsatisfied symbol libintl_gettext in file getopt.o'), but that's much later than compiling connect.c, which got only the (usual) warnings about the pointers. That's with: http://antinode.org/dec/sw/wget.html http://antinode.org/ftp/wget/wget-1_10_2c_vms/wget-1_10_2c_vms.zip [EMAIL PROTECTED] gcc --version gcc (GCC) 3.4.3 [...] And I have no idea whether the GCC installation there is good or bad. (But it seems to be better than yours.) I also tried it using HP's C compiler (CC=cc ./configure): [EMAIL PROTECTED] cc -V cc: HP C/aC++ B3910B A.06.12 [Aug 17 2006] Here, the make ran to an apparently successful completion, but real testing is not convenient on the TestDrive systems, so I can't say whether it would actually work better than what you have. [EMAIL PROTECTED] ./src/wget -V GNU Wget 1.10.2c built on hpux11.23. [...] So, I'd suggest using HP's C compiler, or else re-installing GCC. After that, I'd suggest using the ITRC HP-UX forum: http://forums1.itrc.hp.com/service/forums/familyhome.do?familyId=117 Any idea's and assistance [...] That's ideas, by the way. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: hacking 'prefix'
I give up. What are you doing, what are you doing it with, what are you doing it on, what happens, and what would you like to have happen instead? (Hint: Actual commands and their output would help more than vague descriptions.) Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Accents in PHP parameter
14:13:04 ERROR 406: Not Acceptable. It looks to me as if the Web server does not like these characters. Adding -d to the wget command might tell you more about what wget is doing. Do you have any evidence of a URL like this which works in, say, a Web browser? GNU Wget 1.7 1.10.2 is the latest released version. If there is a problem with wget 1.7, _and_ if it's still a problem in 1.10.2, then someone might wish to work on it. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: Documentation error?
From: Ian As usual, it might help to know which wget version you're using (wget -V) and on which system type you're using it. The documentation section 7.2 states: _Which_ documentation section 7.2? wget -r -l1 --no-parent -A.gif http://www.server.com/dir/ I don't normally use -A, but a Google search for wget -A found this: http://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html which suggests that -A gif might work better than -A.gif. Adding -d to the wget command might also be informative. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: linux version crashes when reaching the max size limit
From: Toni Casueps i. it crashed is not a helpful description of what happened. What actually happened? 2. If the file is too large for a FAT32 file system, what would you like to happen? 4294967295 looks like 2^32-1, which (from what I've read) is the maximum size of a file on a FAT32 file system. 3. Wget 1.10.2 is the latest released version. Complaints about older versions normally lead to a suggestion to try the latest version. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: wget 1.10.1 segfaults after SYST
From: kaneda [...] == SYST ... Segmentation fault (core dumped) [...] This sounds like the same problem as the one under wget 1.10.1 segfaults after SYST. For details and the solution(s), try the thread beginning at: http://www.mail-archive.com/wget@sunsite.dk/msg09371.html It _was_ nice to see a problem report with some useful info (wget version, host OS, et c.) for a change. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: BUG - .listing has sprung into existence
From: Sebastian Doctor, it hurts when I do this. Don't do that. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547
Re: new wget bug when doing incremental backup of very large site
From dev: I checked and the .wgetrc file has continue=on. Is there any way to surpress the sending of getting by byte range? I will read through the email and see if I can gather some more information that may be needed. Remove continue=on from .wgetrc? Consider: -N, --timestampingdon't re-retrieve files unless newer than local. Steven M. Schweda [EMAIL PROTECTED] 382 South Warwick Street(+1) 651-699-9818 Saint Paul MN 55105-2547