Re: help installing opera on ox
On Feb 8, 2008 9:58 PM, Jacqui Lahr [EMAIL PROTECTED] wrote: hi .i've been trying to install opera on the olpc xo with info from wiki opera site and i get messages to contact you.iv'e tried both codes(?) with the tar ball and without. i have been using macs since the 512 and in my 75 yr. old ignorance i thought i thought i could just type it in and enter it...not!!! i was able to upgrade the build after several attempts and after reading the glowing reports about opera i thought i could handle it..not!!! all the if onlys are driving me crazier..save me please..thanks,Les Lahr I fail to understand what this has to do with GNU Wget.
Re: [PATCH] Reduce COW sections data by marking data constant
On Jan 31, 2008 8:21 PM, Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] wrote: char *foo = ab - 4 + 3 = 9 bytes How did you get 9?
Re: Redirects across hosts
On Dec 12, 2007 1:46 PM, Micah Cowan [EMAIL PROTECTED] wrote: And, what do you think about enabling that option by default when recursive mode is on? Well, I think it's obvious that we need the option. But I don't think it should be enabled by default. By default, shouldn't we want to capture as much information as possible? IMO, wget should only be limited by arguments explicitly invoked by the user.
Re: Wget exit codes
On Dec 9, 2007 7:03 PM, Stuart Moore [EMAIL PROTECTED] wrote: Could the exit code used be determined by a flag? E.g. by default it uses unix convention, 0 for any success; with an --extended_error_codes flag or similar then it uses extra error codes depending on the type of success (but for sanity uses the same codes for failure with or without the flag) That should allow both of you to use it for scripting. I like this idea. Like Micah said, there should _NOT_ be any return value non-zero in which the command was successful, even if it didn't download anything (if that's what we asked it to do, then it was successful). I think it would behoove us to have multiple return values for different *errors*, but not for different cases of success. I think this would be a very simple and helpful patch. (Well, simple may be an understatement because we'd have to go through every possible point of failure to create a return value.. mah.)
Re: Mirroring a site on the Internet Archive
On 12/7/07, Brian [EMAIL PROTECTED] wrote: For the life of me, I cannot convince wget to download an old copy of a website from the Internet Archive. I think the url within a url is somehow messing it up.. wget -e robots=off --base= http://web.archive.org/web/19990125085924/http://gnu.org/ -r -Gbase http://web.archive.org/web/19990125085924/http://gnu.org/ How can I get this to work? Cheers, Brian Hey! We've seen this issue a lot. IIRC, the --base option does no good in this instance because the problem is actually a parsing error. I hacked around it a bit once, and I was able to make it download the files, but in an extremely funky directory hierarchy - so horrible that I couldn't even find the files I wanted. The fact of the matter is that wget (in its current state) cannot handle archive.org websites. Sorry :-( You're welcome to have a go at the code to see if you can figure it out, though :-)
Re: .1, .2 before suffix rather than after
On Nov 29, 2007 6:20 PM, David Ginger [EMAIL PROTECTED] wrote: So can I ask is a wget2 actualy being developed ? Go ahead, but I'll answer that question before you do ;-) The answer is no - not at the moment. But we've been discussing it for several months. It will be a while before any code is actually written.
Re: Wget Name Suggestions
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote: A new discussion page on the wiki: http://wget.addictivecode.org/Wget2Names (Does it sound a bit too much like something that extracts names from wget output? :) ) I really like the name `fetch` because it does what it says it does. It's more UNIX-like than the other names :-)
Re: wget2
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote: - Alan has prior history on this list. Check the archives: yeah, I remember him. And is it just me, or does it seem that something's going to go down tonight with wget 2? ;-)
Re: .1, .2 before suffix rather than after
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote: Yeah... of course they won't be able to edit the wiki that way. I doubt you'd get the slashdot effect from just the people who're interested in editing the wiki. You may get a handful of developers and a few thousand people who only want to read it :-)
Re: .1, .2 before suffix rather than after
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote: Well, the trouble with that is that I'm running all of Wget's stuff (plus my own personal mail and whatnot) on a little VPS. I'm rather concerned that the traffic will kill me. I'm already worried about it potentially hitting SlashDot or Digg because it's the first Wget release in quite a while. D: Tada! http://en.wikipedia.org/wiki/Coral_Content_Distribution_Network There's also archive.org.
Re: wget2
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote: Well don't look at _me_; I'm not the one who brought it up! ;) heh. I wasn't looking for some grand unveiling. It just seems that it seems to be attracting a lot of attention, and we should probably start putting more effort into it. I'm going to re-read some of the current Wget code tonight and start playing around with my own attempts of a wget2. I think we should simplify the name for this release to something like the `fetch` command (which is available, btw ;-).
Re: .1, .2 before suffix rather than after
On 11/29/07, Micah Cowan [EMAIL PROTECTED] wrote: I dunno, man, I think our current wget2 roadmap goals are already pretty wild-and-crazy. ;) I agree. I think we should create an announcement asking for developers to help and submit it to digg and slashdot. The new features may get some excitement going and start rumors. :-P ^^ in all seriousness ^^
Re: wget2
On 11/29/07, Alan Thomas [EMAIL PROTECTED] wrote: Sorry for the misunderstanding. Honestly, Java would be a great language for what wget does. Lots of built-in support for web stuff. However, I was kidding about that. wget has a ton of great functionality, and I am a reformed C/C++ programmer (or a recent Java convert). But I love using wget! I vote we stick with C. Java is slower and more prone to environmental problems. Wget needs to be as independent as we can possibly make it. A lot of the systems that wget is used on (including mine) do not even have Java installed. That would be a HUGE requirement for many people.
Re: .1, .2 before suffix rather than after
On 11/4/07, Micah Cowan [EMAIL PROTECTED] wrote: Christian Roche has submitted a revised version of a patch to modify the unique-name-finding algorithm to generate names in the pattern foo-n.html rather than foo.html.n. The patch looks good, and will likely go in very soon. That's something I had meant to submit a bug report for a while back, but somehow never found the time to do it. I guess it wasn't my top priority since GNU/Linux is usually smart enough to ignore the file extensions anyways. A couple of minor detail questions: what do you guys think about using foo.n.html instead of foo-n.html? And (this one to Gisle), how would this naming convention affect DOS (and, BTW, how does the current one hold up on DOS)? Well, this problem is mainly for win32 users, so I think we need to keep sloppy coding in mind. It's been my experience that *man* win32 programs will treat everything after the first period as the file extension. Honestly, I don't see any reason to risk the annoyance of these kinds of bugs. Just go with the dash. (On a side note, have you thought of running FreeDOS in a virtual machine?)
Re: .1, .2 before suffix rather than after
On 11/4/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: It just occurred to me that this change breaks backward compatibility. It will break scripts that try to clean up after Wget or that in any way depend on the current naming scheme. You mean the scripts that fix the same problem this patch does? ;-)
Re: Thoughts on Wget 1.x, 2.0 (*LONG!*)
On 10/26/07, Micah Cowan [EMAIL PROTECTED] wrote: And, of course, when I say there would be two Wgets, what I really mean by that is that the more exotic-featured one would be something else entirely than a Wget, and would have a separate name. I think the idea of having two Wgets is good. I too have been concerned about the resources required in creating the all-out version 2.0. The current code for Wget is a bit mangled, but I think the basic concepts surrounding it are very good ones. Although the code might suck for those trying to read it, I think it could be very great with a little regular maintenance. There still remains the question, though, of whether version 2 will require a complete rewrite. Considering how fundamental these changes are, I don't think we would have much of a choice. You mentioned that they could share code for recursion, but I don't see how. IIRC, the code for recursion in the current version is very dependent on the current methods of operation. It would probably have to be rewritten to be shared. As for libcurl, I see no reason why not. Also, would these be two separate GNU projects? Would they be packaged in the same source code, like finch and pidgin? I do believe the next question at hand is what version 2's official mascot will be. I purpose Lenny the tortoise ;) _ .. Lenny - (_\/ \_, 'uuuu~'
Re: subscribing from this list
On 10/15/07, patrick robinson [EMAIL PROTECTED] wrote: Hello, I want to unsubscripe from this list but lost my registration e-mail. How is this performed? You can find this (and other information) on the Wget wiki. http://wget.addictivecode.org/ To unsubscribe from a list, send an email to [EMAIL PROTECTED] For more information on list commands, send an email to [EMAIL PROTECTED]
Re: subscribing from this list
On 10/15/07, Micah Cowan [EMAIL PROTECTED] wrote: Note that this doesn't help him much if he's lost his registration e-mail. Patrick, you'll probably have to go bug the staff at www.dotsrc.org, who hosts this list; send an email to [EMAIL PROTECTED] E-mail *address* or just the e-mail? I don't see how having the e-mail is important.
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: OK, so let's go back to basics for a moment. wget's default behavior is to use all available bandwidth. Is this the right thing to do? Or is it better to back off a little after a bit? Tony IMO, this should be handled by the operating system, not the individual applications. That's one of the reasons I believe this should be a module instead, because it's more or less a hack to patch what the environment should be doing for wget, not vice versa. In my experience, GNU/Linux tends to consume all the resources unbiasedly, seemingly on a first come first serve *until you're done* basis. This should be brought to the attention of the LKML. However, other operating systems do not seem to have this problem as much. Even Windows networks seem to prioritise packets. This is a problem I've been having major headaches with lately. It would be nice if wget had a patch for this problem, but that would not solve the problem of my web browser or sftp client consuming all the network resources.
Re: wget default behavior [was Re: working on patch to limit to percent of bandwidth]
On 10/13/07, Tony Godshall [EMAIL PROTECTED] wrote: Well, you may have such problems but you are very much reaching in thinking that my --linux-percent has anything to do with any failing in linux. It's about dealing with unfair upstream switches, which, I'm quite sure, were not running Linux. Let's not hijack this into a linux-bash. I really don't know what you were trying to say here. I use GNU/Linux.
Re: WGET Negative Counter Glitch
On 10/13/07, Micah Cowan [EMAIL PROTECTED] wrote: Hi Joshua, There is a very strong likelihood that this has been fixed in the current development version of Wget. Could you try with that? If you're a Windows user, you can get a binary from http://www.christopherlewis.com/WGet/WGetFiles.htm; otherwise, you'd need to compile from the repositories source: http://wget.addictivecode.org/RepositoryAccess I believe you're right. IIRC, this issue was closed on Savannah a couple months ago. I'd find the ticket number, but I don't have time ATM. Good luck, Joshua! :-)
Re: working on patch to limit to percent of bandwidth
On 10/12/07, Tony Godshall [EMAIL PROTECTED] wrote: Again, I do not claim to be unobtrusive. Merely to reduce obtrusiveness. I do not and cannot claim to be making wget *nice*, just nicER. You can't deny that dialing back is nicer than not. Personally, I think this is a great idea. But I do agree that the documentation is a bit messy right now (as well as the code). If this doesn't make it into the current trunk, I think it'd make a great module in version 2.
Re: working on patch to limit to percent of bandwidth
On 10/12/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: Personally I don't see the value in attempting to find out the available bandwidth automatically. It seems too error prone, no matter how much heuristics you add into it. --limit-rate works because reading the data more slowly causes it to (eventually) also be sent more slowly. --limit-percentage is impossible to define in precise terms, there's just too much guessing. Yeah, that is a good point. Hence, I vote for it to become a module.
Re: working on patch to limit to percent of bandwidth
On 10/8/07, A. P. Godshall [EMAIL PROTECTED] wrote: Anyhow, does this seem like something others of you could use? Should I submit the patch to the submit list or should I post it here for people to hash out any parameterization niceties etc first? Go ahead and send it on here so we can comment on the code :-)
Re: bug in escaped filename calculation?
On 10/4/07, Brian Keck [EMAIL PROTECTED] wrote: I would have sent a fix too, but after finding my way through http.c retr.c I got lost in url.c. You and me both. A lot of the code needs re-written.. there's a lot of spaghetti code in there. I hope Micah chooses to do a complete re-write for version 2 so I can get my hands dirty and understand the code better.
Re: Wrong log output for wget -c
On 9/9/07, Jochen Roderburg [EMAIL PROTECTED] wrote: Hi, This is now an easy case for a change ;-) In the log output for wget -c we have the line: The sizes do not match (local 0) -- retrieving. This shows always 0 as local size in the current svn version. The variable which is printed here is local_size which is initialized to 0 and used nowhere else. I think this variable was just forgotten on a recent code reorganization. Comparing an old version with the current I think the information is now in hstat.orig_file_size, I attach my little patch for this. I have also seen another much more complicated and rare log output problem with restarted requests, but so far I was not able to reconstruct a real-life example for it again. It happens when on multiple retries the Range request is not hnoured by the server and transfer starts again at byte 0. It looked like not all variables for the display of the progress bar are correctly adjusted to this situation. I'll keep on trying ;-) Hi! Thanks for your contribution. I just looked over your patch and it looks good. I've committed the changes to: svn://[EMAIL PROTECTED]/wget/branches/bugs/b21057 After Micah (the maintainer) inspects it, it should go right into the trunk. Thanks!
Re: forum download, cookies?
On 9/12/07, Juhana Sadeharju [EMAIL PROTECTED] wrote: A forum has topics which are available only for members. How to use wget for downloading copy of the pages in that case? How to get the proper cookies and how to get wget to use them correctly? I use IE in PC/Windows and wget in a unix computer. I could use Lynx in the unix computer if needed. (PC/Windows has Firefox but I cannot install anything new. If Firefox has a downloader plugin suitable for forum downloading, that would be ok.) Juhana Firefox stores a cookies.txt file in the profile directory. In Windows, I believe this is located in C:/Documents and Settings/{username}/Application Data/Mozilla/firefox/profiles/PROFILE/cookies.txt. GNU Wget is compatible with this cookies file. Just use the `--load-cookies file` option.
Re: Abort trap
On 9/11/07, Hex Star [EMAIL PROTECTED] wrote: When I try to execute the command (minus quotes) wget -P ftp.usask.ca -r -np -passive-ftp ftp://ftp.usask.ca/pub/mirrors/apple/; wget works for a bit and then terminates with the following error: xmalloc.c:186: failed assertion `ptr !=NULL' Abort trap What causes this error? What does this error mean? Is this due to a server misconfiguration? Thanks! :) P.S. I am not subscribed to this list, please cc all replies to me...thanks! :) failed assertion means that at some point along the line, one of the variables's value was not what it should have been. I'll check into it. Thanks!
Re: Wget automatic download from RSS feeds
On 9/12/07, Erik Bolstad [EMAIL PROTECTED] wrote: Hi! I'm doing a master thesis on online news at the University of Oslo, and need a software that can download html pages based on RSS feeds. I suspect that Wget could be modified to do this. - Do you know if there are any ways to get Wget to read RSS files and download new files every hour or so? - If not: Have you heard about software that can do this? I am very grateful for all help and tips. Wget does not do this. That would be a great feature, but I don't believe parsing the RSS feed is Wget's job. Wget just fetches the files. I recommend you look for a program that simply parses the RSS feed and dumps the URLs to a file for Wget to fetch. Piping.. that's what UNIX is all about ;-) I don't have any recommendations, unfortunately. If you aren't able to find one, let me know, and I'll try to come up with one. Josh
Re: Abort trap
On 9/13/07, Hex Star [EMAIL PROTECTED] wrote: wget 1.9+cvs-dev Try it in either the latest release or (preferably) the subversion trunk and let us know if you still have the same problem. The version you're using is an old trunk version, so we can safely assume that it has plenty of fixed bugs anyways.
Re: Announcing... The Wget Wgiki!
On 9/7/07, Micah Cowan [EMAIL PROTECTED] wrote: Doh! Of course, it's .org. Fortunately all the other links, including the ones from the site at gnu.org, seem to be correct. Unfortunately for you, your typo is now an official piece of free software history! :D Just poking. :-P
Re: wget syntax problem ?
On 9/6/07, Alan Thomas [EMAIL PROTECTED] wrote: I know this is probably something simple I screwed up, but the following commands in a Windows batch file return the error Bad command or file name for the wget command cd .. wget --convert-links --directory-prefix=C:\WINDOWS\Profiles\Alan000\Desktop\wget\CNN\ --no-clobber http://www.cnn.com; Don't use backslashes in filenames. If you do, use `\\` instead.
Re: wget syntax problem ?
On 9/6/07, Micah Cowan [EMAIL PROTECTED] wrote: Not really; we've been Cc'ing you. I don't think we knew whether you were subscribed or not, and so Cc'd you in case you weren't. Also, many of us just habitually hit Reply All to hit the message, so we don't accidentally send it to the message's author only. :) aye. Gmail doesn't have that problem, though. If it finds a duplicate message from a mailing list, it only shows me the one from the list. Kind of nice.
Re: wget ignores --user and --password if you have a .netrc
On 9/3/07, Andreas Kohlbach [EMAIL PROTECTED] wrote: Hi, though the man page of wget mentions .netrc, I assume this is a bug. For my understanding if you provide a --user=user and --password=password at the command line this should overwrite any setting elsewhere, as in the .netrc. It doesn't. And it took me quite some time and bothering other guys to realise that it seems wget is ignoring --user and --password at the command line if a .netrc exists with the matching content. Indeed. Whether this is a bug or not needs some discussion, I think, but here's a patch to fix your problem. Index: src/netrc.c === --- src/netrc.c (revision 2376) +++ src/netrc.c (working copy) @@ -59,6 +59,7 @@ search_netrc (const char *host, const char **acc, const char **passwd, int slack_default) { + if (strlen(opt.user) strlen(opt.passwd)) return; acc_t *l; static int processed_netrc;
Re: Fix for Warning C4142 in windows
On 9/2/07, Christopher G. Lewis [EMAIL PROTECTED] wrote: Warning_C4142_Fix.diff Windows added support of intptr_t and uintptr_t with Visual Studio 2003 (MSVER 1310) This patch removes 60+ warnings from the MSWindows build Holy crap, those're a lot of warnings for that small patch. Thanks!
Re: I can download with a browser, but not with wget
On 8/23/07, Micah Cowan [EMAIL PROTECTED] wrote: --user-agent Mozilla does the trick. Apparently Intel's website does not like wget. :) Stinky buzzards. What did we ever do to them?
Re: url.c (in_url_list_p): why bool verbose?
On 8/22/07, Josh Williams [EMAIL PROTECTED] wrote: In src/url.c, function in_url_list_p, there is an argument called bool verbose, but it is never used. Furthermore, the verbose option is defined in our options struct. Should this argument be removed? Below is a patch of this change. Index: src/spider.c === --- src/spider.c(revision 2336) +++ src/spider.c(working copy) @@ -67,7 +67,7 @@ }; static bool -in_url_list_p (const struct url_list *list, const char *url, bool verbose) +in_url_list_p (const struct url_list *list, const char *url) { const struct url_list *ptr; @@ -100,7 +100,7 @@ list-url = referrer ? xstrdup (referrer) : NULL; hash_table_put (visited_urls_hash, xstrdup (url), list); } - else if (referrer !in_url_list_p (list, referrer, false)) + else if (referrer !in_url_list_p (list, referrer)) { /* Append referrer at the end of the list */ struct url_list *newnode;
Re: url.c (in_url_list_p): why bool verbose?
On 8/22/07, Micah Cowan [EMAIL PROTECTED] wrote: This looks like very reasonable, Josh. Feel free to check this change directly into the trunk (with a note in src/ChangeLog). That I will, when I get home tonight. The stupid network at the college is blocking subversion. I'm going to have to come up with some sort of proxy or something, because this is really bugging the bejebers out of me. Do you want this in the main trunk?
Re: -R and HTML files
On 8/22/07, Micah Cowan [EMAIL PROTECTED] wrote: What would be the appropriate behavior of -R then? I think the default option should be to download the html files to parse the links, but it should discard them afterwards if they do not match the acceptance list. But, as you stated, I believe that the user _should_ be given the choice.
Re: --spider requires --recursive
On 8/18/07, Micah Cowan [EMAIL PROTECTED] wrote: I'm not convinced. To me, the name spider implies recursion, and it's counter-intuitive for it not to. As to wasted functionality, what's wrong with -O /dev/null (or NUL or whatever) for simply checking existence? I see his point. The difference is that the --spider option will only look for broken links on a given page, such as the bookmarks.html example. If we were to force recursion, it would finger out across the different pages. Perhaps we only want to check the links on _that_one_ page. Recursion wouldn't be helpful in that instance. But it could be argued that you could just set the recursion level to zero (or is it one?) to prevent that behavior.
--spider requires --recursive
Is there any particular reason the --spider option requires --recursive? As it is now, we run into the following error if we omit --recursive: [EMAIL PROTECTED]:~/cprojects/wget/src$ ./wget http://www.google.com --spider Spider mode enabled. Check if remote file exists. --00:37:21-- http://www.google.com/ Resolving www.google.com... 209.85.165.147, 209.85.165.104, 209.85.165.99, ... Connecting to www.google.com|209.85.165.147|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 0 [text/html] Remote file exists but recursion is disabled -- not retrieving. [EMAIL PROTECTED]:~/cprojects/wget/src$ The only explanation I can think of is that it serves the purpose of checking whether a file exists. Should --spider imply --recursive?
Re: Manual and --help difference
On 8/2/07, dmitry over [EMAIL PROTECTED] wrote: Hi, In `man wget` is see text ---[ cut ]--- --http-user=user --http-password=password [..] but in `wget --help` is see --http-user=USER set http user to USER. --http-passwd=PASSset http password to PASS. check --http-passwd and --http-password and fix it please. What version of wget are you using? I don't see this problem in 1.10.2 _or_ in the trunk.
Re: patch: prompt for password
On 7/25/07, Matthew Woehlke [EMAIL PROTECTED] wrote: Any reason you're not replying to the list? (Unless there is, please direct replies to the list.) No, I was in a hurry at the time and forgot to change the e-mail address before I sent it. I personally *must have* this patch; storing my login password in a file is not acceptable ;-), and I have a script that needs to use wget. Said script is already interactive, so asking for a password is not an issue in this context. Micah has already stated that he intends to add this functionality eventually, but if he didn't I would be forced to perpetually maintain this patch on my own, or else... use something similar to wget that is not wget :-). We've been discussing optional user interaction a lot lately, but we haven't decided how to go about it yet. Even so, I'm sure your patch may come in handy. If you're interested, we could use someone to help develop an *optional* user interface (browse the list archives if you need some ideas).
Re: ignoring robots.txt
On 7/18/07, Maciej W. Rozycki [EMAIL PROTECTED] wrote: There is no particular reason, so we do. As far as I can tell, there's nothing in the man page about it.
Re: Why --exclude-directories, and not --exclude-paths?
On 7/17/07, Hrvoje Niksic [EMAIL PROTECTED] wrote: -R allows excluding files. If you use a wildcard character in -R, it will treat it as a pattern and match it against the entire file name. If not, it will treat it as a suffix (not really an extension, it doesn't care about . being there or not). -X always excludes directories and allows wildcards. It was supposed to be a DWIM thing. I wrote a patch to add an option --exclude-files. (That was me he was talking about, btw.) It may not be *technically* necessary at this point since -R allows it, but this is more precise, and its job is more clearly defined. I haven't committed it to the svn yet, but you can see it here: https://savannah.gnu.org/bugs/?20454
Re: Maximum 20 Redirections HELP!!!
On 7/16/07, Jaymz Goktug YUKSEL [EMAIL PROTECTED] wrote: Hello everyone, Is there a command to override the maximum redirections? Attached is a patch for this problem. Let me know if you have any problems with it. It was written for the latest trunk in the svn, so you *may* have to compile an unstable release. You can browse the source (with this patch) at http://addictivecode.org/svn/wget/branches/bugs/b20499/. To download it, run: svn co svn://addictivecode.org/wget/branches/bugs/b20499 wget-maxredirect To compile, run: ./autogen.sh ./configure make As this is an unstable release, you probably do not wish to install it, but to run it from ./src/wget with the --max-redirect option. Index: src/options.h === --- src/options.h (revision 2280) +++ src/options.h (working copy) @@ -38,6 +38,8 @@ bool recursive; /* Are we recursive? */ bool spanhost; /* Do we span across hosts in recursion? */ + int maxredirect;/* Maximum number of times we'll allow + a page to redirect. */ bool relative_only; /* Follow only relative links. */ bool no_parent; /* Restrict access to the parent directory. */ Index: src/init.c === --- src/init.c (revision 2280) +++ src/init.c (working copy) @@ -182,6 +182,7 @@ { loadcookies, opt.cookies_input, cmd_file }, { logfile, opt.lfilename, cmd_file }, { login, opt.ftp_user, cmd_string },/* deprecated*/ + { maxredirect, opt.maxredirect, cmd_number_inf }, { mirror, NULL, cmd_spec_mirror }, { netrc, opt.netrc, cmd_boolean }, { noclobber, opt.noclobber, cmd_boolean }, Index: src/retr.c === --- src/retr.c (revision 2280) +++ src/retr.c (working copy) @@ -567,13 +567,7 @@ return dlrate; } -/* Maximum number of allowed redirections. 20 was chosen as a - reasonable value, which is low enough to not cause havoc, yet - high enough to guarantee that normal retrievals will not be hurt by - the check. */ -#define MAX_REDIRECTIONS 20 - #define SUSPEND_POST_DATA do { \ post_data_suspended = true; \ saved_post_data = opt.post_data; \ @@ -746,10 +740,10 @@ mynewloc = xstrdup (newloc_parsed-url); /* Check for max. number of redirections. */ - if (++redirection_count MAX_REDIRECTIONS) + if (++redirection_count opt.maxredirect) { logprintf (LOG_NOTQUIET, _(%d redirections exceeded.\n), - MAX_REDIRECTIONS); + opt.maxredirect); url_free (newloc_parsed); url_free (u); xfree (url); Index: src/main.c === --- src/main.c (revision 2280) +++ src/main.c (working copy) @@ -189,6 +189,7 @@ { level, 'l', OPT_VALUE, reclevel, -1 }, { limit-rate, 0, OPT_VALUE, limitrate, -1 }, { load-cookies, 0, OPT_VALUE, loadcookies, -1 }, +{ max-redirect, 0, OPT_VALUE, maxredirect, -1 }, { mirror, 'm', OPT_BOOLEAN, mirror, -1 }, { no, 'n', OPT__NO, NULL, required_argument }, { no-clobber, 0, OPT_BOOLEAN, noclobber, -1 }, @@ -497,6 +498,8 @@ N_(\ --header=STRING insert STRING among the headers.\n), N_(\ + --max-redirect maximum redirections allowed per page.\n), +N_(\ --proxy-user=USER set USER as proxy username.\n), N_(\ --proxy-password=PASS set PASS as proxy password.\n), Index: ChangeLog === --- ChangeLog (revision 2280) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2007-07-16 Joshua David Williams [EMAIL PROTECTED] + + * Added new option --max-redirect + 2007-07-09 Micah Cowan [EMAIL PROTECTED] * README, util/wget.spec: Removed references to wget.sunsite.dk.
Re: -nd not working as I would expect.
On 7/16/07, Dax Mickelson [EMAIL PROTECTED] wrote: I've read the man page about 10 times now and I'm sure this issue is my own stupidity but I can't see where or how. [..] Thus I would expect to get a directory full of index.html.n files along with a bunch of .zip files! Alas, all I get is: You have quite a few unnecessary (and repetitive) options which I have omitted. There are too many to mention in detail, so please take note of these for future reference (and rtfm :-). I don't have time to walk you through it right now, unfortunately, but here's the command you need: wget http://librivox.org/ --output-file logs --progress=dot --no-directories --recursive --level=100 -Aindex.html*,*zip* -Dlibrivox.org,archive.org,www.archive.org --span-hosts --follow-ftp
Re: -nd not working as I would expect.
On 7/16/07, Dax Mickelson [EMAIL PROTECTED] wrote: Thanks for the quick reply. I truly did RTFM (or at least RTF'Man'). Sorry for the dumb question and I knew it must be me but I just couldn't see it. I'm running the file now and it is looking good so far! Nah, it wasn't a dumb question. To be honest, it took me quite a while to get that one working. Cheers!
Re: Maximum 20 Redirections HELP!!!
On 7/16/07, Jaymz Goktug YUKSEL [EMAIL PROTECTED] wrote: Hey Josh, Thank you very much for that patch, this was what I was looking for, I think this is going to solve my problem! Thank you vary much, and have a good one! Cordially, James You're welcome :-) Let me know how it turns out. The only testing I did on it was checking to make sure my code compiled; I haven't actually tried the option.
Re: Maximum 20 Redirections HELP!!!
On 7/17/07, Tony Lewis [EMAIL PROTECTED] wrote: Just forward the patch to [EMAIL PROTECTED] and let them test it. :-) Hmm. .org, maybe? Delivery to the following recipient failed permanently: [EMAIL PROTECTED] Technical details of permanent failure: PERM_FAILURE: DNS Error: Domain name not found
Re: bug and patch: blank spaces in filenames causes looping
On 7/15/07, Rich Cook [EMAIL PROTECTED] wrote: I think you may well be correct. I am now unable to reproduce the problem where the server does not recognize a filename unless I give it quotes. In fact, as you say, the server ONLY recognizes filenames WITHOUT quotes and quoting breaks it. I had to revert to the non- quoted code to get proper behavior. I am very confused now. I apologize profusely for wasting your time. How embarrassing! I'll save this email, and if I see the behavior again, I will provide you with the details you requested below. I wouldn't say it was a waste of time. Actually, I think it's good for us to know that this problem exists on some servers. We're considering writing a patch to recognise servers that do not support spaces. If the standard method fails, then it will retry as an escaped character. Nothing has been written for this yet, but it has been discussed, and may be implemented in the future.
--base does not consider references to root directory
Consider this example, which happens to be how I realised this problem: wget http://www.mxpx.com/ -r --base=. Here, I want the entire site to be downloaded with each link pointing to the local file. This works for some links, but it does not take references to the root directory into account, such as this: a href=/index.phpHome/a Here, wget just ignores the --base parameter and leaves the link as /index.php. I realise that this may seem like a sticky situation, but consider this solution: Let's say that I have a photo album on my personal homepage with the following directory scheme: / /photos/ /photos/hawaii /photos/concerts In /photos/concerts/index.html, I have a link to /index.html. When wget parses the html, it could then become: ../../index.html. All we need to know is how many directories deep we are. Would this be an acceptable solution? If so, I'd be glad to write a patch.
Re: --base does not consider references to root directory
On 7/14/07, Matthias Vill [EMAIL PROTECTED] wrote: So you would suggest handling in the way that when I use wget --base=/some/serverdir http://server/serverdir/ /.* will be interpreted as /some/.* so if you have a link like /serverdir/ it would go back to /some/serverdir, right? Correct. I guess this would be ok. Just one question if there is a Link back to /serverdir/ and base is something like /my/dir/ shouldn't this also be fetched from inside /my/dir/ and not /my/serverdir/? Take a look at the directory structure: /my/dir /my/dir/www.foo.bar /my/dir/www.foo.bar/serverdir Suppose we have a link in /my/dir/www.foo.bar/serverdir like this: a href=/jobs.phpJobs/a This link (if called locally) would try to fetch a file on the root directory of the operating system, not the website. It would probably get a 403 or a 404 error. What we would want it to look like is this: a href=../jobs.phpJobs/a This method will work no matter what the --base parameter is.
Re: --base does not consider references to root directory
On 7/14/07, Matthias Vill [EMAIL PROTECTED] wrote: I think I got your point: Now i think this could result in different problems like what schould happen with wget -r --base=/home/matthias/tmp http://server/with/a/complicated/structure/and/to/many/dirs/a.php; If you now have a link to /index.html you would try to access some file above / or am I wrong? In the case of http://server/with/a/complicated/structure/and/to/many/dirs/a.php, a link to /index.php would look like this: a href=../../../../../../../../index.phpHome/a (Assuming I counted it correctly.) It's just a matter of knowing how many directories deep we are so we know how many times to concatenate the ../
--delete-after and --spider should not create (and leave) directories
It has come to my attention that --delete-after and --spider leave empty directories when they have finished. IMHO, we should force --no-directories since we're not leaving any of the files we're downloading. I have submitted a patch here - https://savannah.gnu.org/bugs/index.php?20466 Do any of you have any objections to this change?