Re: [Bug-wget] mirroring one sourceforge package?

2011-03-31 Thread Karl Berry

ftp://ftp.heanet.ie/mirrors/sourceforge/b/project/bi/biblatex-biber/biblatex-biber/current/

Thank you, thank you!  That is perfect.

I wonder if it's possible that that file is a redirection from a

Just FWIW, I also tried with --max-redirect=0 and --max-redirect=1, but
they seemed to hang forever and/or have no effect, depending on exactly
what options were specified.  Not sure there is any bug there, just
mentioning.

Adding -R login.php seems a decent workaround

Indeed, I tried that and it worked better.  Then I thought I would try
to exclude the numerous stats items, but failed.  I tried

wget -m -np -nv -R login.php -X stats 
http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
wget -m -np -nv -R login.php,stats 
http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
wget -m -np -nv -R login.php,stats\* 
http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/

and none of them actually stopped url's like 

http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/binaries/stats/timeline

from showing up.

Thanks again for all the responses,
karl



Re: [Bug-wget] mirroring one sourceforge package?

2011-03-31 Thread Micah Cowan
On 03/31/2011 03:45 PM, Karl Berry wrote:
 Then I thought I would try
 to exclude the numerous stats items, but failed.  I tried
 
 wget -m -np -nv -R login.php -X stats 
 http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
 wget -m -np -nv -R login.php,stats 
 http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
 wget -m -np -nv -R login.php,stats\* 
 http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
 
 and none of them actually stopped url's like 
 
 http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/binaries/stats/timeline

Yeah... -X matches a full directory path, so for the above you have to use

-X /projects/biblatex-biber/files/biblatex-biber/current/binaries/stats

If it can occur deeper down in the hierarchy, there's no help but to add
more -X, replacing that final /stats with /*/stats, /*/*/stats, etc,
until you feel like you've covered enough of them.

-R always matches only the filename portion of the URL (not including
anything before the final slash, or anything after a ?). There's
currently no way to request a match against something anywhere in the
URL (though this was planned to be addressed at some point, and may in
fact already have something in the current dev sources, I don't know).

-- 
Micah J. Cowan
http://micah.cowan.name/



[Bug-wget] mirroring one sourceforge package?

2011-03-30 Thread Karl Berry
The bug (?) -- running
  wget -m -np -nv \
  http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
ends up downloading many things above that directory, despite the -np.
Doesn't that seem wrong?
This is with wget 1.12 compiled from the original source.

The request: does anyone know a way to mirror one package (ideally a
subdirectory thereof) from sourceforge?  Specifically, I want
http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
(I don't actually care about using wget, could be anything. :)

I did a bunch of searching to no avail.  My friends at CTAN have tried
from time to time over the years but never found any really reliable
method -- always ends up coming down to scraping the HTML :(.

One possible answer, Joining the sourceforge mirror network
(http://sourceforge.net/apps/trac/sourceforge/wiki/Instructions for
joining the SourceForge.net mirror network) is not what I want to do.  I
just want one package, not to help sf.

I suspect there is no good way, given sf's policies and setup, but
thought I would ask here.

Thanks,
Karl



Re: [Bug-wget] mirroring one sourceforge package?

2011-03-30 Thread Micah Cowan
(03/30/2011 02:37 PM), Karl Berry wrote:
 The bug (?) -- running
   wget -m -np -nv \
   http://sourceforge.net/projects/biblatex-biber/files/biblatex-biber/current/
 ends up downloading many things above that directory, despite the -np.
 Doesn't that seem wrong?
 This is with wget 1.12 compiled from the original source.

Definitely a bug; reproduced with Ubuntu Lucid's wget 1.12.

Running with --debug, I see a lot of:

Deciding whether to enqueue http://sourceforge.net/blog/;.
Going to blog would escape
projects/biblatex-biber/files/biblatex-biber/current with no_parent on.
Decided NOT to load it.

And then:

Deciding whether to enqueue https://sourceforge.net/blog/;.
Allowing path blog/ because of rule `'.
Decided to load it.

That link was apparently found in https://sourceforge.net/account/login.php

So it looks like wget is correctly blocking the http URL, but
incorrectly permitting the https URL.

Adding -R login.php seems a decent workaround; I let it run awhile (not
forever), and it seemed okay, though it did get a single link (so far)
outside the expected hierarchy (once again, an https link; this time to
a wiki page; the page fortunately appears not to have incurred other
renegade links AFAICT).

-- 
HTH,
Micah J. Cowan
http://micah.cowan.name/



Re: [Bug-wget] mirroring one sourceforge package?

2011-03-30 Thread Micah Cowan
Thanks Tony.

I wonder if it's possible that that file is a redirection from a
correct URL. Because wget would expect to download all URLs from a
redirection, and would use the redirected name (but AIUI the current dev
sources wouldn't use that name without --trust-server-name or something).

In any event, it seems pretty clear that something busted between 1.11.4
and 1.12.

-mjc

(03/30/2011 03:06 PM), Tony Lewis wrote:
 It works as I would expect in 1.11.4, with the exception of downloading this
 file:
 sourceforge.net/projects/biblatex-biber/files/index.html
 
 Tony



Re: [Bug-wget] mirroring one sourceforge package?

2011-03-30 Thread Giuseppe Scrivano
Micah Cowan mi...@cowan.name writes:

 So it looks like wget is correctly blocking the http URL, but
 incorrectly permitting the https URL.

We check if the two schemes are similar but at the same time we require
the port to be identical.

I have relaxed this condition, now the two ports must be identical only
in the case the same protocol is used.

I have pushed this patch:

=== modified file 'src/recur.c'
--- src/recur.c 2011-01-01 12:19:37 +
+++ src/recur.c 2011-03-30 23:36:05 +
@@ -563,7 +563,8 @@
   if (opt.no_parent
schemes_are_similar_p (u-scheme, start_url_parsed-scheme)
0 == strcasecmp (u-host, start_url_parsed-host)
-   u-port == start_url_parsed-port
+   (u-scheme != start_url_parsed-scheme
+  || u-port == start_url_parsed-port)
!(opt.page_requisites  upos-link_inline_p))
 {
   if (!subdir_p (start_url_parsed-dir, u-dir))

Applying it and launching wget using the same arguments used by Karl, I
get:

$ find sourceforge.net/ -maxdepth 3
sourceforge.net/
sourceforge.net/projects
sourceforge.net/projects/biblatex-biber
sourceforge.net/projects/biblatex-biber/files
sourceforge.net/robots.txt

Just in time before the release :-)

Cheers,
Giuseppe