Re: [Bug-wget] mirroring one sourceforge package?
ftp://ftp.heanet.ie/mirrors/sourceforge/b/project/bi/biblatex-biber/biblatex-biber/current/ Thank you, thank you! That is perfect. I wonder if it's possible that that file is a redirection from a Just FWIW, I also tried with --max-redirect=0 and --max-redirect=1, but they seemed to hang forever and/or have no effect, depending on exactly what options were specified. Not sure there is any bug there, just mentioning. Adding -R login.php seems a decent workaround Indeed, I tried that and it worked better. Then I thought I would try to exclude the numerous stats items, but failed. I tried wget -m -np -nv -R login.php -X stats http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ wget -m -np -nv -R login.php,stats http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ wget -m -np -nv -R login.php,stats\* http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ and none of them actually stopped url's like http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/binaries/stats/timeline from showing up. Thanks again for all the responses, karl
Re: [Bug-wget] mirroring one sourceforge package?
On 03/31/2011 03:45 PM, Karl Berry wrote: Then I thought I would try to exclude the numerous stats items, but failed. I tried wget -m -np -nv -R login.php -X stats http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ wget -m -np -nv -R login.php,stats http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ wget -m -np -nv -R login.php,stats\* http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ and none of them actually stopped url's like http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/binaries/stats/timeline Yeah... -X matches a full directory path, so for the above you have to use -X /projects/biblatex-biber/files/biblatex-biber/current/binaries/stats If it can occur deeper down in the hierarchy, there's no help but to add more -X, replacing that final /stats with /*/stats, /*/*/stats, etc, until you feel like you've covered enough of them. -R always matches only the filename portion of the URL (not including anything before the final slash, or anything after a ?). There's currently no way to request a match against something anywhere in the URL (though this was planned to be addressed at some point, and may in fact already have something in the current dev sources, I don't know). -- Micah J. Cowan http://micah.cowan.name/
[Bug-wget] mirroring one sourceforge package?
The bug (?) -- running wget -m -np -nv \ http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ ends up downloading many things above that directory, despite the -np. Doesn't that seem wrong? This is with wget 1.12 compiled from the original source. The request: does anyone know a way to mirror one package (ideally a subdirectory thereof) from sourceforge? Specifically, I want http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ (I don't actually care about using wget, could be anything. :) I did a bunch of searching to no avail. My friends at CTAN have tried from time to time over the years but never found any really reliable method -- always ends up coming down to scraping the HTML :(. One possible answer, Joining the sourceforge mirror network (http://sourceforge.net/apps/trac/sourceforge/wiki/Instructions for joining the SourceForge.net mirror network) is not what I want to do. I just want one package, not to help sf. I suspect there is no good way, given sf's policies and setup, but thought I would ask here. Thanks, Karl
Re: [Bug-wget] mirroring one sourceforge package?
(03/30/2011 02:37 PM), Karl Berry wrote: The bug (?) -- running wget -m -np -nv \ http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/ ends up downloading many things above that directory, despite the -np. Doesn't that seem wrong? This is with wget 1.12 compiled from the original source. Definitely a bug; reproduced with Ubuntu Lucid's wget 1.12. Running with --debug, I see a lot of: Deciding whether to enqueue http://sourceforge.net/blog/;. Going to blog would escape projects/biblatex-biber/files/biblatex-biber/current with no_parent on. Decided NOT to load it. And then: Deciding whether to enqueue https://sourceforge.net/blog/;. Allowing path blog/ because of rule `'. Decided to load it. That link was apparently found in https://sourceforge.net/account/login.php So it looks like wget is correctly blocking the http URL, but incorrectly permitting the https URL. Adding -R login.php seems a decent workaround; I let it run awhile (not forever), and it seemed okay, though it did get a single link (so far) outside the expected hierarchy (once again, an https link; this time to a wiki page; the page fortunately appears not to have incurred other renegade links AFAICT). -- HTH, Micah J. Cowan http://micah.cowan.name/
Re: [Bug-wget] mirroring one sourceforge package?
Thanks Tony. I wonder if it's possible that that file is a redirection from a correct URL. Because wget would expect to download all URLs from a redirection, and would use the redirected name (but AIUI the current dev sources wouldn't use that name without --trust-server-name or something). In any event, it seems pretty clear that something busted between 1.11.4 and 1.12. -mjc (03/30/2011 03:06 PM), Tony Lewis wrote: It works as I would expect in 1.11.4, with the exception of downloading this file: sourceforge.net/projects/biblatex-biber/files/index.html Tony
Re: [Bug-wget] mirroring one sourceforge package?
Micah Cowan mi...@cowan.name writes: So it looks like wget is correctly blocking the http URL, but incorrectly permitting the https URL. We check if the two schemes are similar but at the same time we require the port to be identical. I have relaxed this condition, now the two ports must be identical only in the case the same protocol is used. I have pushed this patch: === modified file 'src/recur.c' --- src/recur.c 2011-01-01 12:19:37 + +++ src/recur.c 2011-03-30 23:36:05 + @@ -563,7 +563,8 @@ if (opt.no_parent schemes_are_similar_p (u-scheme, start_url_parsed-scheme) 0 == strcasecmp (u-host, start_url_parsed-host) - u-port == start_url_parsed-port + (u-scheme != start_url_parsed-scheme + || u-port == start_url_parsed-port) !(opt.page_requisites upos-link_inline_p)) { if (!subdir_p (start_url_parsed-dir, u-dir)) Applying it and launching wget using the same arguments used by Karl, I get: $ find sourceforge.net/ -maxdepth 3 sourceforge.net/ sourceforge.net/projects sourceforge.net/projects/biblatex-biber sourceforge.net/projects/biblatex-biber/files sourceforge.net/robots.txt Just in time before the release :-) Cheers, Giuseppe