On Wed, 2004-02-25 at 03:38, Mark R. Diggory wrote:
> >>3.) For Infrastructure, all this needs to be properly secured and 
> >>maintained according to Apache standards.
> > 
> > Yes.

> I need allot of help here, if I'm not doing something up to par, I need 
> to hear about it. The help you've provided in this area is greatly 
> appreciated.

Note that we are notified by the changes in dist/ _after_ things have
happened there.  This means we can only try to explain what should
have been done if something isn't exactly how we envisioned it.  It
is almost impossible for Infrastructure to take a proactive stance
without becoming of a substantial bottleneck.  I think the simplest
solution is ask on infrastructure@ when unsure about something.

> > So basically, the entire .../dist/java-repository directory is not
> > being used at all.  And all the while we are pushing all this data
> > to all of our mirrors (~200).
> > 
> >   http://www.apache.org/mirrors/

> No, it is being used, all Apache Projects that use Maven are being 
> instructed to publish to java-repository, its contents (plus others) are 
> aggregate mirrored on www.ibiblio.org/maven at this time. This way, we 
> control the vertical and horizontal when it comes to what is released 
> into Maven and what other projects can build on.

With 'being used' I meant clients downloading from it.  To justify the
bandwidth to the mirrors, each package needs to be downloaded at least
X times from a mirror.  I'm too lazy to come up with X right now ;).
Note that now there are mostly symlinks, this point is pretty moot.
Keep up the symlinks work, since that costs about next to nothing
bandwidth (and diskspace) wise.

> > That looks like a priority then.  It will make it actually make
> > all the mirrorring worth it.  And it should help the user aswell,
> > since closer hosts usually mean quicker downloads. 
> Yesssss, we need to make our tools take advantage of it.


> >>>>When it comes to things like the ibiblio maven repository, it would only 
> >>>>maintain full version releases of apache projects. 
> >>>
> >>>Can you explain why ibiblio is special here?  I mean, what you describe
> >>>is what is supposed to be on all the mirrors right?
> >>
> >>Just because it is the "default" repository used by the Maven Client.
> > 
> > 
> > I'd say the default should be www.apache.org, and from there it should
> > select the 'best' mirror.  Note that for any mirror use, and that
> > includes ibiblio, integrity checking is a must for an application
> > like Maven.
> > 
> Its a struggle, because, www.apache.org currently doesn't represent the 
> canonical contents for maven itself. I'm not sure how positive the Maven 
> group is about "distributed" retrieval, but it is something I see as 
> very important.

If it helps I can bring this up with the Infrastructure team.  My guess
is that the verdict will be that if the tool isn't using the content
on the mirrors, it should not be published on www.apache.org/dist.
Would that convince the rest of the Maven group?

> Can www.apache.org automatically reroute a download request to a 
> mirrored location?

It could, but we don't.  We let the user decide which mirror
to use (but default to a recommeded one).  Take a look at:


To get an idea.  The functionality of this you would
duplicate in the maven client code.  Maybe backed up by
some server side tool to do the 'closeness' calculation
(I'm guessing this would be a server capability).

> If so, what is the service/tool/script that supports 
> this sort of functionality (suspect its some sort of Apache_mod or cgi 
> script)?

Read all about mirrorring at:


> > The 'only mirrorring that would be done' equals pushing everything
> > in there out to approximately 200 mirrors.  And then it isn't used.
> But we do want to reach a point where it is used via tools.

All I'm saying is that it would be nice to reach this point sooner
rather than later ;)

> I'd suspect that, if the redirection can be automated, the its eventually
> going to be totally transparent to Maven as a client.

I think the client should be proactive, but that is just my opinion.  It
should be able to try a mirror, and if it finds it fails, try another.

[... rsync and httpd config comments ...]

> Wow, I'm very relieved now...I'll stop making such "brash" assumptions.

No problem.  We have the facts straight now and that's what matters.

> >>No, it is in the directory structure (no db) and md5's should exist next 
> >>to the files, there is a bug in maven caused by the fact that on BSD 
> >>checksums are generated by "md5" not "md5sum" like on linux, this needs 
> >>to be addressed, for example, you see my md5 was bad on the math jar 
> >>(which I just fixed).
> > 
> > Does this mean you are running maven on minotaur?  Or was it that the
> > platform of the one who ran maven was BSD? 
> Its not like "Maven" is run on minotaur, Maven is run on a client, it 
> establishes an ssh session and performs the md5sum command on minotaur, 
> but the client side script that does this isn't configurable, and is 
> hardcoded to call Linux/Gnu md5sum. I will submit a bug to Maven to make 
> it more configurable, currently all md5 checksums generated using Maven 
> are broken because of this, I think others have recognized this and 
> generate them by hand on their own.

Hint: generate the md5 sigs on the client side.  That way you won't be
dependend on a tool on the server side, and, more importantly, you will
be sure the md5 was created of the file _before_ transfer.

> > May I also suggest PGP
> > signatures?  You could verify if the package is signed by a trusted
> > source, and if the package integrity is not compromised (an md5 is
> > easily replaced, a PGP sig is somewhat harder ;).
> Yes, I've been working on a signature plugin that basically uses GnuPG 

GnuPG?  Not something you'll find on minotaur.  Try PGP ;).

> on the server,

You should only do this for verification of the signature.

>  its approached same as the md5 stuff, but with more 
> configurable parameters for the command and options to be called. The 
> challenge is that with GPG, you don't want to store the private key on 
> minotaur, so the file would need to be signed on the client side and the 
> published.

Ofcourse.  You never ever want to store the private key on a machine
you don't control.  It would even be best to keep your private key on
offline storage; I mean, realistically, how often do you sign a release,
or sign other keys for that matter?  Worth the pain of using offline
storage, weighed against the integrity of what you signed.  But opinions
vary on this point :).

> > See http://www.apache.org/~henkp/sig/ for stats.
> > 
> > Sorry for the, I imagine, somewhat critical feedback.  FWIW I do
> > appreciate the project's goal.
> > 
> > Sander
> No, no, its all very important to address.

Thanks, just making sure you don't get the wrong impression.


Reply via email to