Note that was not subscribed, so I am breaking the thread with my post.

> From: "Mark R. Diggory" <[EMAIL PROTECTED]>
> Date: Sun, 22 Feb 2004 00:59:58 -0500

> I'll try to expand on the functionalities of Maven below.
> 
> Sander Striker wrote:
> 
> > On Sat, 2004-02-21 at 01:01, Mark R. Diggory wrote:
> > 
> >>Noel J. Bergman wrote:
> >>
> >>>>The issue is... the jars/distributables are placed into the
> >>>>java-repository using maven.
> > 
> > 
> > Can you explain this a bit?  I thought Maven was used to fetch
> > projects and dependencies.  Ofcourse I can read up on Maven,
> > but a quick summary of the technicalities would be appreciated.
> > 
> 
> Maven is used to both fetch jars from the repository and to publish the 
> jars to the repository. In regards to the latter, it does this basically 
> through ssh sessions where it completes a number of commands (scp, md5, 
> chmod, chgrp). Because its encapsulated within maven the user can rely 
> on Mavens deployement mechanism to setup the jar/signature in the 
> repository for their project, since its scripted, it is done the same 
> way every time. This takes a great deal of the effort invovled with 
> publishing jars to the repository out of the users hands.
> 
> Maven is really doing nothing more than acting as an ssh client for the 
> user and automating the deployment process for them using their apache 
> account.
> 
> This benefits Maven because it can rely on the repository being 
> maintained in a structure it can predict and locate dependencies within.

I understand.  But it has to be fairly mature before one can
deploy and recommend using it to the PMCs.  Also, you can't
force all the projects to use it.  For this you need some way
of handmaintaining 'shadow' packages in java-repository,
correct?
 
[...]
> > That would mean that this entire area would have to be rw to all
> > groups producing releases that are to be in there.  This kindof means
> > apcvs group ownership, which I don't really fancy doing.  The other
> > way around, control and access of each projects dist/ area seperated,
> > and symlinking to that from java-repository, seems a bit sa[fn]er to
> > me.
> 
> Ultimately we are seeking a convergence here between what the repository 
> folks want to see, the maven users want to see and the infrastructure 
> folks want to see.
> 
> 1.) For the repository (and Maven) folks, we want to see the contents of 
> dist become standardized according to the Repository URI specification. 
> This means "all" distributables (java or not) are organized according to 
> this specification.

But this is fairly utopic at this point, no?  Is the Repository URI spec
stable?  Is the tool mature?

> 2.) For Maven users, no matter what happens, we need to maintain a 
> functionally working repository the works with the existing version of 
> Maven.

Isn't the repository format versioned?  Can Maven advise the user
to upgrade?

> 3.) For Infrastructure, all this needs to be properly secured and 
> maintained according to Apache standards.

Yes.

> The java-repository structure is broken down into
> 
> .../java-repository/<project>/<jars|distributables|...>/<foo-version.ext>
> 
> this would mean each project would need to maintain a separate set of 
> symlinks for "jars", "distributables", "...".

I'm assuming you are stating this as fact, correct?
Given a 'regular' release in .../dist/<project>/, would
Maven be able to automate the creation of the symlinks?
> 
> >>>That sounds OK to me, but folks like Sander and others more involved in
> >>>mirroring should be put in the loop.  Everything we put under dist/ effects
> >>>100s of mirrors.
> > 
> > 
> > Not me specifically, but Infrastructure.  Others are more actively
> > maintaining the mirrors list and monitoring the mirrors.  The mirrors
> > are a precious resource and we want to be careful not to 'scare' any
> > mirrors away with actions on our end.
> > 
> > 
> >>Yes, I learned that the hard way when we created the contents of 
> >>java-repository... that was not a happy weekend. I don't make any "rash" 
> >>changes to dist any more...Only well thought out moves. But we are in a 
> >>state of cleanup now as well, we have to consider what we are going to 
> >>do next.
> > 
> > 
> > If you are making large changes to the directory structure and the
> > majority of the files is already on the mirrors, send a mail to
> > mirrors@, attach a shell script that moves everything around locally,
> > and give them a heads up on when this shuffle is happening.  This
> > save a _lot_ of bandwidth.
> > 
> > Also, when adding a lot, make sure to inform the mirrors, so they
> > are prepared.
> > 
> > 
> >>>>Discussion about how to finalize the directory structure such
> >>>>that "Repository", "Dist/Mirror" and "Maven" has to move forward.
> > 
> > 
> > I don't parse this, but since Noel can read it, I am probably missing
> > context/background.
> > 
> 
> Just that these groups are all focused on different aspects of the 
> distributables in the dist directory:
> 
> The Repository projects Url structure is important in standardizing and 
> improving the dist contents into a more formal structure.
> 
> The Maven project represents a working example of a tool that implements 
> itself upon this structure.
> 
> Between the dist directory maintainers and the the mirrors out there 
> represent a "control" on the whole situation, if it doesn't work for 
> them, then its not realistic as a strategy.

I'm assuming that you mean that Maven is the tool implementing the
"control"?

[...]
> > Is Maven using the mirrors today, like getting the list of active
> > mirrors from the main site and finding the closest?  Or is it only
> > using the main site and perhaps iblibio?
> > 
> Currently, all Maven clients use www.ibiblio.org/maven to retrieve 
> content.

So basically, the entire .../dist/java-repository directory is not
being used at all.  And all the while we are pushing all this data
to all of our mirrors (~200).

  http://www.apache.org/mirrors/

> www.ibibilio.org is also a mirror of /java-repository for all 
> its apache content.

Just for my clarity, there are non-ASF packages distributed in
/java-repository on ibiblio?  As in, ibiblio has a java-repository
which contains more than the one on www.apache.org/dist/java-repository?

> Actually Maven users DO NOT go to 
> www.apache.org/dist/java-repository to download files, and only Apache 
> developers can publish to www.apache.org/dist/java-repository.

Erm, I'm confused, if noone is going to pull things from
there, why do we have/need it?

> What server is used is currently based on the configuration of the Maven 
> client, servers currently do not maintain any capability to hand this 
> client off to another mirror. I think, in the future as the Repository 
> comes into existence and machine readable metadata or mechanisms for 
> directing clients off to mirrors come into existence, then clients like 
> Maven will implement such capabilities.

That looks like a priority then.  It will make it actually make
all the mirrorring worth it.  And it should help the user aswell,
since closer hosts usually mean quicker downloads.

> >>When it comes to things like the ibiblio maven repository, it would only 
> >>maintain full version releases of apache projects. 
> > 
> > Can you explain why ibiblio is special here?  I mean, what you describe
> > is what is supposed to be on all the mirrors right?
> 
> Just because it is the "default" repository used by the Maven Client.

I'd say the default should be www.apache.org, and from there it should
select the 'best' mirror.  Note that for any mirror use, and that
includes ibiblio, integrity checking is a must for an application
like Maven.

[...]
> >>And the only publishing of jars by actual humans (Release Managers) 
> >>would be the full releases onto
> >>
> >>www/www.apache.org/dist/java-repository
> > 
> > Symlinks I hope.  Mirrors handle symlinks efficiently, that is,
> > if they follow our rsync instructions.
> 
> The only mirroring that would be done would be via:
> 
> www.apache.org/dist

The 'only mirrorring that would be done' equals pushing everything
in there out to approximately 200 mirrors.  And then it isn't used.

> All other content in cvs.apache.org or archive.apache.org is not to be 
> "synced" as its not to be published out to mirrors, such content are 
> "developer build" and not for public consumption.

Correct.  This is exactly the mirrorring policy.

[...]
> > Take a look at http://www.apache.org/~henkp/md5/, specifically
> > the fyi: some duplicates section.  Dups are a waste of bandwidth
> > and diskspace.
> 
> Yes, approx 50% of instances of duplication on this page are currently 
> caused by avalon components (avalon also was using their dist directory 
> as a private maven repository). For example:
> 
> avalon/excalibur-component/jars/excalibur-component-1.1.jar
> java-repository/excalibur-component/jars/excalibur-component-1.1.jar
> 
> I understand it can be the policy that when rsyncing, if the symlink and 
> the target directory do not have the same ownership, that it will not be 
> followed.

Er no.  We advertise this as the rsync options to use:

  rsync -rtlzv --delete www.apache.org::apache-dist /local/path/to/mirror

  -r  recurse into directories
  -t  preserve times
  -l  copy symlinks as symlinks
  -z  compress file data
  -v  increase verbosity
  --delete  delete files that don't exist on sender 

Note the we do _not_ ask to include:

 -p  preserve permissions
 -o  preserve owner (root only)
 -g  preserve group

IOW, on the mirrors, the tree exists as if you cp'd it (ownership and
permission wise).  rsync is even sensitive to the umask setting.

> I believe this creates a problem in that I cannot simply 
> create symlinks from java-repository/excalibur-component/ to 
> avalon/excalibur-component/ as they will not be followed by rsync.

This is simply not true, per above.  You are probably thinking of
the 'SymLinksIfOwnerMatch' option we are recommending in the httpd.conf
of the mirrors.  This is not a problem, since the copy of dist/ on
the mirrors will in its entirety have the same owner.  On www.apache.org
we have 'FollowSymLinks' enabled, so it's not a problem there either.

> However, the other 50% of duplicates within the java-repository 
> directory should be properly alleviated with symlinking, I can work on 
> this as I now (as of a couple days ago) own all the files :-). I will 
> start working on a script I can run periodically which will accomplish this.

Like I said, don't worry about the ownership.  The only thing you need
to worry about is setting your umaks to 002, so that other members
of the group are able to do modifications like you ;).

> > I'll ask Henk to disable the checks for presence of md5 in the
> > dist/java-repository, since that doesn't seem to be applicable
> > there.  It seems to me that you do want to do some verification
> > in maven, but you are probably storing signature information
> > somewhere in the maven 'database'?
> 
> No, it is in the directory structure (no db) and md5's should exist next 
> to the files, there is a bug in maven caused by the fact that on BSD 
> checksums are generated by "md5" not "md5sum" like on linux, this needs 
> to be addressed, for example, you see my md5 was bad on the math jar 
> (which I just fixed).

Does this mean you are running maven on minotaur?  Or was it that the
platform of the one who ran maven was BSD?  May I also suggest PGP
signatures?  You could verify if the package is signed by a trusted
source, and if the package integrity is not compromised (an md5 is
easily replaced, a PGP sig is somewhat harder ;).

See http://www.apache.org/~henkp/sig/ for stats.

Sorry for the, I imagine, somewhat critical feedback.  FWIW I do
appreciate the project's goal.

Sander

Reply via email to