You are going to be missing the key ingredient which is the application
POMs that tell you what artifacts are actually used.
You might get some interesting information about things like log4j which
is probably used by lots of things inside Maven Central.
You will be grossly misled about the use of things like CXF since it is
hardly ever called by a library that would be submitted to Maven Central
but is frequently used by project that are in private repositories.
You may be able to visualize a "where used" between libraries but you
will have a lot of nodes that are "never used" which is not true.
You will have to figure out a way to separate projects that are still
used and produced a ton of revisions 5 years ago but nothing since, from
projects that are mature yet still active but only produce new versions
every 18 months since they are stable and work, from projects that were
very active and then died as they became unnecessary due to newer
technologies being introduced.
You will also have trouble with projects that repackage their artifacts
between major releases and change the GAV structure by redistributing
the functionality.
Not sure that your project is going to produce any useful information
and I fear that it will be misleading to anyone who does not look deeper
into the raw data.
Visualization may just make it easier for incorrect conclusions to be
developed.
Ron
On 09/04/2012 10:20 PM, Matt Taylor wrote:
Perhaps this is already in existence somewhere. If so please point me in
the right direction.
I want to know what the most popular dependancies are, not based on
downloads, but based on dependancies from other projects.
I want to explore the full dependency graph and see its evolution over
'time' (for instance seeing how fast versions of artifacts are adopted).
I want to create a visual representations of all the dependancies just
because it would look cool.
In general I want total access to all the metadata (pom files essentially)
in the maven central repo, so I can see how the worlds software fits
together on a 'global' scale.
Eventually I would like to explore the jar artifacts as well to get deeper
insights into what methods/classes are being referenced as well, but that
is phase 2. :)
> From googling around is appears that understandably it is improper to
simply wget the entire repo. However, there don't seem to be any publicly
available torrents, or other resources for me to get access to this data.
http://search.maven.org/#stats
457GB is a lot of data, but it isn't an unimaginable amount, and most of
that is no doubt the artifacts, not the metadata (pom files).
So I really have two questions:
1. What is the easiest path to getting rsync type access of the full repo
(I'd quite understand if I needed to pay a fee for this level of access).
2. Failing that, what would be a legitimate way of just getting all the pom
files?
Basically I want to be a good guy and not put undo load on the servers, but
at the same time I really want the data.
Thanks,
Matt Taylor
http://blog.matthewjosephtaylor.com
--
Ron Wheeler
President
Artifact Software Inc
email: rwhee...@artifact-software.com
skype: ronaldmwheeler
phone: 866-970-2435, ext 102
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@maven.apache.org
For additional commands, e-mail: users-h...@maven.apache.org