I'll try to expand on the functionalities of Maven below.
Sander Striker wrote:
On Sat, 2004-02-21 at 01:01, Mark R. Diggory wrote:
Noel J. Bergman wrote:
The issue is... the jars/distributables are placed into the java-repository using maven.
Can you explain this a bit? I thought Maven was used to fetch projects and dependencies. Ofcourse I can read up on Maven, but a quick summary of the technicalities would be appreciated.
Maven is used to both fetch jars from the repository and to publish the jars to the repository. In regards to the latter, it does this basically through ssh sessions where it completes a number of commands (scp, md5, chmod, chgrp). Because its encapsulated within maven the user can rely on Mavens deployement mechanism to setup the jar/signature in the repository for their project, since its scripted, it is done the same way every time. This takes a great deal of the effort invovled with publishing jars to the repository out of the users hands.
Maven is really doing nothing more than acting as an ssh client for the user and automating the deployment process for them using their apache account.
This benefits Maven because it can rely on the repository being maintained in a structure it can predict and locate dependencies within.
so, currently, if you look in
something like the commons project.properties you'll see that they are pointing to the central repository for the location of where to "publish" files.
The "convergence issues" we currently have for the repository: 1.) We want single copies of files on the mirrors.
This is the core point.
Yes, we all agree on this one...
My best conclusion is keep "jars" in the java-repository, do not keep them in your /dist/<project>/<binaries> directory. Remove all [jar/zip/tar files] from the java-repository.
symlnk the appropriate java-repository dir into their appropriate "dist" directory.
That would mean that this entire area would have to be rw to all
groups producing releases that are to be in there. This kindof means
apcvs group ownership, which I don't really fancy doing. The other
way around, control and access of each projects dist/ area seperated,
and symlinking to that from java-repository, seems a bit sa[fn]er to
Ultimately we are seeking a convergence here between what the repository folks want to see, the maven users want to see and the infrastructure folks want to see.
1.) For the repository (and Maven) folks, we want to see the contents of dist become standardized according to the Repository URI specification. This means "all" distributables (java or not) are organized according to this specification.
2.) For Maven users, no matter what happens, we need to maintain a functionally working repository the works with the existing version of Maven.
3.) For Infrastructure, all this needs to be properly secured and maintained according to Apache standards.
The java-repository structure is broken down into
this would mean each project would need to maintain a separate set of symlinks for "jars", "distributables", "...".
That sounds OK to me, but folks like Sander and others more involved in mirroring should be put in the loop. Everything we put under dist/ effects 100s of mirrors.
Not me specifically, but Infrastructure. Others are more actively maintaining the mirrors list and monitoring the mirrors. The mirrors are a precious resource and we want to be careful not to 'scare' any mirrors away with actions on our end.
Yes, I learned that the hard way when we created the contents of java-repository... that was not a happy weekend. I don't make any "rash" changes to dist any more...Only well thought out moves. But we are in a state of cleanup now as well, we have to consider what we are going to do next.
If you are making large changes to the directory structure and the majority of the files is already on the mirrors, send a mail to mirrors@, attach a shell script that moves everything around locally, and give them a heads up on when this shuffle is happening. This save a _lot_ of bandwidth.
Also, when adding a lot, make sure to inform the mirrors, so they are prepared.
Discussion about how to finalize the directory structure such that "Repository", "Dist/Mirror" and "Maven" has to move forward.
I don't parse this, but since Noel can read it, I am probably missing context/background.
Just that these groups are all focused on different aspects of the distributables in the dist directory:
The Repository projects Url structure is important in standardizing and improving the dist contents into a more formal structure.
The Maven project represents a working example of a tool that implements itself upon this structure.
Between the dist directory maintainers and the the mirrors out there represent a "control" on the whole situation, if it doesn't work for them, then its not realistic as a strategy.
Currently, all Maven clients use www.ibiblio.org/maven to retrieve content. www.ibibilio.org is also a mirror of /java-repository for all its apache content. Actually Maven users DO NOT go to www.apache.org/dist/java-repository to download files, and only Apache developers can publish to www.apache.org/dist/java-repository.
That would be good.
In our last discussion, I think one of the conclusions that was arrived at as well, was the idea of breaking the java-repository up into two different locations.
www/cvs.apache.org/dist/java-repository --> nightly builds
www/www.apache.org/dist/java-repository --> official releases.
the idea was that nightly/weekly builds are not things we want to see on mirrors but to be available for developers. And that official release of jars are things we want to see mirrored.
Is Maven using the mirrors today, like getting the list of active mirrors from the main site and finding the closest? Or is it only using the main site and perhaps iblibio?
What server is used is currently based on the configuration of the Maven client, servers currently do not maintain any capability to hand this client off to another mirror. I think, in the future as the Repository comes into existence and machine readable metadata or mechanisms for directing clients off to mirrors come into existence, then clients like Maven will implement such capabilities.
When it comes to things like the ibiblio maven repository, it would only maintain full version releases of apache projects.
Can you explain why ibiblio is special here? I mean, what you describe is what is supposed to be on all the mirrors right?
Just because it is the "default" repository used by the Maven Client.
If your an apache project and need to be on the bleeding edge for a component, then you can simply add
as your first repository location and get your apache jars straight off the nightly builds...
The big question is how to facilitate this a build process, I think the last decision on the Jakarta Commons/General/Maven lists was that we would automate the build process for releasing the nightly jars into
And the only publishing of jars by actual humans (Release Managers) would be the full releases onto
Symlinks I hope. Mirrors handle symlinks efficiently, that is, if they follow our rsync instructions.
The only mirroring that would be done would be via:
All other content in cvs.apache.org or archive.apache.org is not to be "synced" as its not to be published out to mirrors, such content are "developer build" and not for public consumption.
Within the www.apache.org/dist directory, yes symlinking should be used to resolve duplication.
Take a look at http://www.apache.org/~henkp/md5/, specifically the fyi: some duplicates section. Dups are a waste of bandwidth and diskspace.
Yes, approx 50% of instances of duplication on this page are currently caused by avalon components (avalon also was using their dist directory as a private maven repository). For example:
I understand it can be the policy that when rsyncing, if the symlink and the target directory do not have the same ownership, that it will not be followed. I believe this creates a problem in that I cannot simply create symlinks from java-repository/excalibur-component/ to avalon/excalibur-component/ as they will not be followed by rsync.
However, the other 50% of duplicates within the java-repository directory should be properly alleviated with symlinking, I can work on this as I now (as of a couple days ago) own all the files :-). I will start working on a script I can run periodically which will accomplish this.
I'll ask Henk to disable the checks for presence of md5 in the dist/java-repository, since that doesn't seem to be applicable there. It seems to me that you do want to do some verification in maven, but you are probably storing signature information somewhere in the maven 'database'?
No, it is in the directory structure (no db) and md5's should exist next to the files, there is a bug in maven caused by the fact that on BSD checksums are generated by "md5" not "md5sum" like on linux, this needs to be addressed, for example, you see my md5 was bad on the math jar (which I just fixed).
-Mark -- Mark Diggory Software Developer Harvard MIT Data Center http://www.hmdc.harvard.edu