On Nov 7, 2010, at 1:29 PM, jhumble wrote:

> One possibility to get repeatable builds without filling up an artifacts
> repository too fast could be to make Maven store the fully qualified pom
> files in the artifacts repo and an md5 of the binary but not necessarily the
> actual binary. I know artifacts repos already store some of this
> information.
> 
> That way you could make sure sufficient metadata is publicly available such
> that they can be reproduced, without using up loads of disk space. You could
> also happily delete older binaries, safe in the knowledge that people could
> reproduce them from the metadata in the artifacts repo.

One of the things I like about snapshots is it just simply means "latest".  
Though the thing about timestamped snapshots is that they aren't guaranteed to 
exist (the repository is not typically assumed to be reliable), and they aren't 
100% reproducible (the timestamp offset includes the time it took to build the 
artifact and all the artifacts before it, meaning there's no way to know 
exactly what point in time the build came from).  Even if one could find the 
correct timestamp to check out from to get the same binary, whatever subsystem 
creates the timestamp on upload (wagon?) probably doesn't like being told what 
to call the snapshot.  

It follows the only way to get a reproducible build is either to tag the 
original sources or to know the SCM revision id.  The revision id is a natural 
tag that is automatically generated, and does not clutter the named tag space 
with thousands of tags that have no organizational meaning.  On my CI builds, 
the first thing that happens is grabbing the revision ID from SVN, and that's 
put in a properties file that can be used when the UI is generated.  Where the 
version number helps users identify the general features to expect of the 
current software, the revision ID is great for filing issues so devs don't have 
to guess at what sources have the issue.  

When the sources all come from the same SCM repository tree, the rev ID makes 
it a cinch to reproduce the build.  Of course, a better solution can span 
multiple trees and is reproducible.

It just seems like the rev ID is really useful here for identifying 
reproducible builds without creating releases every time, does it fit with your 
ideas?  If so, a hypothetical repository manager plugin could be maintaining 
information about snapshot dependencies based on SCM rev ID, thus allowing for 
reproducibility without modifying Maven or existing snapshot mechanics.  Such a 
plugin might be able to generate a POM that has the extra rev ID metadata that 
the repo manager would recognize, allowing for existing SNAPSHOT-style 
identifiers to keep working for developer desktops (avoiding SCM thrash), but 
also providing reproducibility through synthetic POMs.



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to