Key issues I can see needing to be addressed are the following.
1.) Get projects to be as "responsible" for their content in java-repository as they are for the content of their "project" directories in dist.
2.) Resolve the only duplication left which has to do with the fact that avalon runs their own "Avalon Repository" in /dist/avalon. So its contents are currently duplicated with java-repository/ibiblio.
3.) Maintaining proper permissions on all the directory contents of the repository (group write, group ownership of files and new directories should be the users primary group)
4.) Find a way around the current "shortsightedness" in Maven where the command executed on the serverside are all "gnu linux" and do not map to BSD, md5 = md5sum. So using the "repository" goals in maven fails to produce proper md5 checksums.
Nicola Ken Barozzi wrote:
Some action items:
1- how can we make mirroring work for both of us? (IIRC Ruper already showed it's easy to do, but I need help from Adam that already tried it)
2- How does Mark think we can proceed in not making it compulsory to phisically have jars in a defined location?
These two are a "double-whammy", but I think I have a possible solution to the whole subject. Currently we think of the "Repository" as just that "A Repository", a physical location for the jars. But what if we defined the URL's for a "Repository" to simple be pointers or addresses that when resolved by a client, point to the proper location of that resource. This in essance, makes the repository into a resolver or a naming service.
In my line of work (Digital Libraries) we already have a service that accomplishes this task (actually we have 2 competing/complimentary naming systems)
PURLS (Persistent Uniform Resource Identifier) http://www.purl.org http://www.oclc.org/research/projects/purl/download.htm
Handles (analogous to publicId's, ISBN's, Dewey Decimal System,, etc...) http://www.handle.net/ http://www.handle.net/download.html
If your wondering how PURLs and Handles stack up, heres some comparison documentation.
So what we are talking about really is a naming system that provides for the "resolution" of registered names to physical locations. Interesting ly, I think the lack of this sort of separation of "Resolution" from "Storage" is exactly the issue that is causing "friction" in our community.
I think its quite possible, that one could completely and transparently replace the underlying URL based repository syntax in both Maven and other tools with a resolving layer. to clarify this, heres a few examples.
1.) an example using PURL's.
this is currently a URL pointing to physical resource on ibiblio.
if this were not a physical resource but a PURL in the PURL naming system, then it could (redirect using currently existing PURL server software) the client to the appropriate resource (mirrored or not).
would actually resolve (through redirection to)
could also point to
This provides a layer of flexiblity, its solves issues with both the projects needing to place their content in a specific structure/location and it also solves issues of name changes over time,
2.) So if we decide that we want to have different "groupID's" in maven for a specific project, the naming system maintains the old naming structure pointing to the jars as a means for dependent projects to still be able to resolve to the resource.
we are currently planning to adopt a more hierarchical naming approach
We could (at little cost in both maintenance and diskspace) , maintain the old naming resolution and the new one. In fact, this is the very foundation of the PURL system, the old uri's stay persistent over time.
3.) With such a level of "redirection", we can also maintain archival and production releases of the content without the actual location specifier changing. So when Apache retires commons-collection-1.0.jar from production and removes it from the mirrors, instead placing it onto archives.apache.org, then that resolver entry in the PURL database can be adjusted to point at the new location>
now points to the following location instead:
3- We really should see that the md5s are compatible between Ruper and Maven (IOW Mark meet Markus, Markus meet Mark :-)
4.) I think that as well, the usage of both signatures and md5 checksums as user metadata in the system can be maintained on the filesystem and in the database for comparison, the clients can compare the md5's/signatures in the naming system with those of the filesystem and as such verify the integrity of the resolved resource is the same as was intended when the name was registered with the naming system.
5.) Finally, Naming systems can be "Chained" together in such a way that
http://www.ibibilio.org/maven/apache is actually a PURL to the the Apache Naming resolver http://repository.apache.org so that
is actually resolved by ibiblio to:
which is then resolved by repository.apache.org to:
http://<your mirror here>/jakarta/commons/collections/jars/...
It may seem complex at first, but its not really, and it something that the library community has been working on for years now, so it is well tested. The examples I've shown are PURL based, but the use of Handles and Purls is a complimentary and bridges between the to resolving systems do exist.
-- Mark Diggory Software Developer Harvard MIT Data Center http://www.hmdc.harvard.edu