Re: A Maven extension for dependency tracking
Thanks Tamas. I'm looking at https://github.com/apache/maven-resolver/pull/182 today (and apologies for delay - other tasks...). See inline. pt., 27 maj 2022 o 21:19 Tamás Cservenák napisał(a): > Howdy, inline, also PR updated, simplified, and added a "demo" listener > that does exactly what you wanted. > > On Fri, May 27, 2022 at 8:24 AM Grzegorz Grzybek > wrote: > > > Hello and thank you very much for your time ;) > > > > This looks exactly how I imagined it ;) - that the path is reachable via > > the RequestTrace! > > Doing everything in the RepositoryListener (correct me if I'm wrong, but > > artifactResolved() is called both after remote access and when the > artifact > > is found in local repo?) looks very clean, because it's natural to > register > > such listeners - much more natural than extending some crucial classes > from > > the resolver. > > > > > Yes, now you can do everything as a listener. There is a "demo" added that > does exactly what you want. > Still, the warning stands: listener "steals" time from collection, > collecting is "hot", so be quick! :) > But now we are thread-safe as well, so "parallel pom" download will work as > well. > (there is one thing I need to fix: for this safety I need to COPY the path > list, as once multithreaded, that list will change!!!) > So in ideal situation (no listener registered), the only cost would be the copy. I'll check collectStepTrace() impact by building some of my projects and I'll let you know. > > > > > > I remember you mentioned this "end graph", but I didn't find a place > (hook, > > listener) where I can get it - could you please point me to the class? > > > > I was talking about another extension point to be added, which is not > there. > But now I am unsure if it is needed or not... > ok, so no full tree, but just a path collected in RequestTrace - that's what I needed ;) > > > > > > I think artifactResolved() callback was called not only for POMs... and > all > > the changes made to the collector were supposed to prepare the dependency > > path, so I didn't see a problem here. But you're the expert ;) > > > Yes, event is called a bit more: for every artifact being resolved, that > means that is called > for POMs (when artifactDescriptorRequest is run in collector), but that one > request may trigger SEVERAL > events, like for the POM, then for it's parent POM, then for parent parent > POM etc. This is model builder, that > is building the effective POM for a given artifact, and in case it has a > parent POM, hence it needs to be resolved > as well. > And THAT was exactly the reason I wanted to track everything. Yes - I wanted parent poms, grandparent poms, parents of boms, etc... > > Hence, there is that little "trick" in place that ensures that tree is > written only once, see the demo listener. > It could be improved even more (like in the case of mvnd, where you may > have several ongoing sessions at once). > Thanks - I promise to check PR#182 this week. regards Grzegorz Grzybek > > HTH > Tamas >
Re: A Maven extension for dependency tracking
Howdy, inline, also PR updated, simplified, and added a "demo" listener that does exactly what you wanted. On Fri, May 27, 2022 at 8:24 AM Grzegorz Grzybek wrote: > Hello and thank you very much for your time ;) > > This looks exactly how I imagined it ;) - that the path is reachable via > the RequestTrace! > Doing everything in the RepositoryListener (correct me if I'm wrong, but > artifactResolved() is called both after remote access and when the artifact > is found in local repo?) looks very clean, because it's natural to register > such listeners - much more natural than extending some crucial classes from > the resolver. > > Yes, now you can do everything as a listener. There is a "demo" added that does exactly what you want. Still, the warning stands: listener "steals" time from collection, collecting is "hot", so be quick! :) But now we are thread-safe as well, so "parallel pom" download will work as well. (there is one thing I need to fix: for this safety I need to COPY the path list, as once multithreaded, that list will change!!!) > > I remember you mentioned this "end graph", but I didn't find a place (hook, > listener) where I can get it - could you please point me to the class? > I was talking about another extension point to be added, which is not there. But now I am unsure if it is needed or not... > > I think artifactResolved() callback was called not only for POMs... and all > the changes made to the collector were supposed to prepare the dependency > path, so I didn't see a problem here. But you're the expert ;) Yes, event is called a bit more: for every artifact being resolved, that means that is called for POMs (when artifactDescriptorRequest is run in collector), but that one request may trigger SEVERAL events, like for the POM, then for it's parent POM, then for parent parent POM etc. This is model builder, that is building the effective POM for a given artifact, and in case it has a parent POM, hence it needs to be resolved as well. Hence, there is that little "trick" in place that ensures that tree is written only once, see the demo listener. It could be improved even more (like in the case of mvnd, where you may have several ongoing sessions at once). HTH Tamas
Re: A Maven extension for dependency tracking
Hello and thank you very much for your time ;) wt., 24 maj 2022 o 15:54 Tamás Cservenák napisał(a): > Howdy, > > take a look at this: > https://github.com/apache/maven-resolver/pull/182 > (demos are "mutilated" just to run them and observe output, changes there > are unrelated) > This looks exactly how I imagined it ;) - that the path is reachable via the RequestTrace! Doing everything in the RepositoryListener (correct me if I'm wrong, but artifactResolved() is called both after remote access and when the artifact is found in local repo?) looks very clean, because it's natural to register such listeners - much more natural than extending some crucial classes from the resolver. > > But still, I am on edge: I still don't see why all this is "better", then > just observe the collection end graph (having all, and then just write out > reverse dep trees then?) > I remember you mentioned this "end graph", but I didn't find a place (hook, listener) where I can get it - could you please point me to the class? > > Also, this is an "early" phase, the collection, hence only POMs are being > downloaded. And the fact POM is downloaded (collected), does NOT mean JAR > will be downloaded (resolved) as well > I think artifactResolved() callback was called not only for POMs... and all the changes made to the collector were supposed to prepare the dependency path, so I didn't see a problem here. But you're the expert ;) kind regards Grzegorz Grzybek > > > Thanks > T > > On Tue, May 24, 2022 at 12:40 PM Grzegorz Grzybek > wrote: > > > wt., 24 maj 2022 o 11:17 Tamás Cservenák > napisał(a): > > > > > Howdy, > > > > > > inline only the "interesting" part: > > > > > > So, after playing a bit with 1.8.0[.1] of the BF/DF resolvers and your > > #176 > > > > PR, I see that example > > > > > > > > > > > > > > org.eclipse.aether.internal.impl.collect.DependencyCollectorDelegate#dependencyCollected() > > > > extension point you've introduced is a bit too early for my use > case... > > > > It's invoked during dependency collection, but I think it'd be better > > to > > > > simply use "full path" when there's actual download (or resolution > from > > > > local repository). > > > > > > > > > > This part I don't quite get: "too early"? What do you mean here? > > > As events you use are fired "even before" as I see... > > > > > > > By "too early" I mean that your dependencyCollected() was already > printing > > the path, while in my case I was only pushing current dependency on top > of > > the stack and full stack was available later in an implementation of > > org.eclipse.aether.AbstractRepositoryListener > > > > However I didn't check if that's simply not the same - I thought that I > > could have several stack pushed before my implementation of > > org.eclipse.aether.AbstractRepositoryListener#artifactDownloaded() was > > called - but I may be wrong. > > > > > > > > > > > > > > > > > > My whole need to extend resolver was to collect the path from initial > > to > > > > final dependency, so the stack is available when it's needed. > > > > > > > > > > Isn't the PR doing that? Or did I miss something? > > > > > > > as above - I was only thinking that there's a difference between these > two: > > - your PR calls dependencyCollected() just after > node.getChildren().add() > > (DF) or context.getParent().getChildren().add() (BF) and prints the > > reversed dependency path > > - my extension pushes the dependency (DF only in resolver 1.6.3) and > > prints the path in org.eclipse.aether.RepositoryListener > > > > if these are effectively the same, sorry for confusion ;) > > > > > > > > > > > > > > > > > > Initially I thought that org.eclipse.aether.RequestTrace should be > the > > > > thing I could use to get current dependency path, but I found it's > not > > > > possible. > > > > > > > > > > Yes, my hopes were geared toward RequestTrace as well, as it could > > > represent a tree just fine, but > > > > > > > > > > > > > > Maybe your DependencyCollectorDelegate#dependencyCollected() could > > simply > > > > "expose" the List path somewhere? Maybe in Maven > > session? > > > > as attribute? > > > > > > > > > > In a moment parallel collection comes into picture ( > > > https://github.com/apache/maven-resolver/pull/178) > > > this will be not enough, as there is one session but multiple threads > are > > > working on it. Hence, > > > I agree that "hooking" onto existing events would be the best, but, > sadly > > > they are quite disconnected > > > from collectors, hence, it is hard to "couple" them in the right > > manner > > > > > > Cleanest would be if the event would carry its own copy of path > > > > > > > Yeah... org.eclipse.aether.RepositoryEvent is very "public", so finding a > > path there would be great. However² I print most of the tracking > > information not in org.eclipse.aether.RepositoryListener but in overriden > > org.eclipse.aether.repository.LocalRepositoryManager#find() (because I > > wanted
Re: A Maven extension for dependency tracking
Howdy, take a look at this: https://github.com/apache/maven-resolver/pull/182 (demos are "mutilated" just to run them and observe output, changes there are unrelated) But still, I am on edge: I still don't see why all this is "better", then just observe the collection end graph (having all, and then just write out reverse dep trees then?) Also, this is an "early" phase, the collection, hence only POMs are being downloaded. And the fact POM is downloaded (collected), does NOT mean JAR will be downloaded (resolved) as well Thanks T On Tue, May 24, 2022 at 12:40 PM Grzegorz Grzybek wrote: > wt., 24 maj 2022 o 11:17 Tamás Cservenák napisał(a): > > > Howdy, > > > > inline only the "interesting" part: > > > > So, after playing a bit with 1.8.0[.1] of the BF/DF resolvers and your > #176 > > > PR, I see that example > > > > > > > > > org.eclipse.aether.internal.impl.collect.DependencyCollectorDelegate#dependencyCollected() > > > extension point you've introduced is a bit too early for my use case... > > > It's invoked during dependency collection, but I think it'd be better > to > > > simply use "full path" when there's actual download (or resolution from > > > local repository). > > > > > > > This part I don't quite get: "too early"? What do you mean here? > > As events you use are fired "even before" as I see... > > > > By "too early" I mean that your dependencyCollected() was already printing > the path, while in my case I was only pushing current dependency on top of > the stack and full stack was available later in an implementation of > org.eclipse.aether.AbstractRepositoryListener > > However I didn't check if that's simply not the same - I thought that I > could have several stack pushed before my implementation of > org.eclipse.aether.AbstractRepositoryListener#artifactDownloaded() was > called - but I may be wrong. > > > > > > > > > > > > My whole need to extend resolver was to collect the path from initial > to > > > final dependency, so the stack is available when it's needed. > > > > > > > Isn't the PR doing that? Or did I miss something? > > > > as above - I was only thinking that there's a difference between these two: > - your PR calls dependencyCollected() just after node.getChildren().add() > (DF) or context.getParent().getChildren().add() (BF) and prints the > reversed dependency path > - my extension pushes the dependency (DF only in resolver 1.6.3) and > prints the path in org.eclipse.aether.RepositoryListener > > if these are effectively the same, sorry for confusion ;) > > > > > > > > > > > > Initially I thought that org.eclipse.aether.RequestTrace should be the > > > thing I could use to get current dependency path, but I found it's not > > > possible. > > > > > > > Yes, my hopes were geared toward RequestTrace as well, as it could > > represent a tree just fine, but > > > > > > > > > > Maybe your DependencyCollectorDelegate#dependencyCollected() could > simply > > > "expose" the List path somewhere? Maybe in Maven > session? > > > as attribute? > > > > > > > In a moment parallel collection comes into picture ( > > https://github.com/apache/maven-resolver/pull/178) > > this will be not enough, as there is one session but multiple threads are > > working on it. Hence, > > I agree that "hooking" onto existing events would be the best, but, sadly > > they are quite disconnected > > from collectors, hence, it is hard to "couple" them in the right > manner > > > > Cleanest would be if the event would carry its own copy of path > > > > Yeah... org.eclipse.aether.RepositoryEvent is very "public", so finding a > path there would be great. However² I print most of the tracking > information not in org.eclipse.aether.RepositoryListener but in overriden > org.eclipse.aether.repository.LocalRepositoryManager#find() (because I > wanted to know ALL the traces back to the ultimate origin of the dependency > - even if it's already downloaded. And org.eclipse.aether.RepositoryEvent > can't help here - that's why I needed the static stack... > > I think that indeed - #176 is something that could be both useful and cheap > (defaults to empty method), dependencyCollected() could be invoked with (as > in your PR): > - context.parents (BF) > - args.nodes.nodes (DF) > > This way my extension would be DF/BF independent and also it could ignore > parallel/serial downloader. > > regards > Grzegorz Grzybek > > > > > > > > Thanks > > T > > >
Re: A Maven extension for dependency tracking
wt., 24 maj 2022 o 11:17 Tamás Cservenák napisał(a): > Howdy, > > inline only the "interesting" part: > > So, after playing a bit with 1.8.0[.1] of the BF/DF resolvers and your #176 > > PR, I see that example > > > > > org.eclipse.aether.internal.impl.collect.DependencyCollectorDelegate#dependencyCollected() > > extension point you've introduced is a bit too early for my use case... > > It's invoked during dependency collection, but I think it'd be better to > > simply use "full path" when there's actual download (or resolution from > > local repository). > > > > This part I don't quite get: "too early"? What do you mean here? > As events you use are fired "even before" as I see... > By "too early" I mean that your dependencyCollected() was already printing the path, while in my case I was only pushing current dependency on top of the stack and full stack was available later in an implementation of org.eclipse.aether.AbstractRepositoryListener However I didn't check if that's simply not the same - I thought that I could have several stack pushed before my implementation of org.eclipse.aether.AbstractRepositoryListener#artifactDownloaded() was called - but I may be wrong. > > > > > > My whole need to extend resolver was to collect the path from initial to > > final dependency, so the stack is available when it's needed. > > > > Isn't the PR doing that? Or did I miss something? > as above - I was only thinking that there's a difference between these two: - your PR calls dependencyCollected() just after node.getChildren().add() (DF) or context.getParent().getChildren().add() (BF) and prints the reversed dependency path - my extension pushes the dependency (DF only in resolver 1.6.3) and prints the path in org.eclipse.aether.RepositoryListener if these are effectively the same, sorry for confusion ;) > > > > > > Initially I thought that org.eclipse.aether.RequestTrace should be the > > thing I could use to get current dependency path, but I found it's not > > possible. > > > > Yes, my hopes were geared toward RequestTrace as well, as it could > represent a tree just fine, but > > > > > > Maybe your DependencyCollectorDelegate#dependencyCollected() could simply > > "expose" the List path somewhere? Maybe in Maven session? > > as attribute? > > > > In a moment parallel collection comes into picture ( > https://github.com/apache/maven-resolver/pull/178) > this will be not enough, as there is one session but multiple threads are > working on it. Hence, > I agree that "hooking" onto existing events would be the best, but, sadly > they are quite disconnected > from collectors, hence, it is hard to "couple" them in the right manner > > Cleanest would be if the event would carry its own copy of path > Yeah... org.eclipse.aether.RepositoryEvent is very "public", so finding a path there would be great. However² I print most of the tracking information not in org.eclipse.aether.RepositoryListener but in overriden org.eclipse.aether.repository.LocalRepositoryManager#find() (because I wanted to know ALL the traces back to the ultimate origin of the dependency - even if it's already downloaded. And org.eclipse.aether.RepositoryEvent can't help here - that's why I needed the static stack... I think that indeed - #176 is something that could be both useful and cheap (defaults to empty method), dependencyCollected() could be invoked with (as in your PR): - context.parents (BF) - args.nodes.nodes (DF) This way my extension would be DF/BF independent and also it could ignore parallel/serial downloader. regards Grzegorz Grzybek > > > Thanks > T >
Re: A Maven extension for dependency tracking
Howdy, inline only the "interesting" part: So, after playing a bit with 1.8.0[.1] of the BF/DF resolvers and your #176 > PR, I see that example > > org.eclipse.aether.internal.impl.collect.DependencyCollectorDelegate#dependencyCollected() > extension point you've introduced is a bit too early for my use case... > It's invoked during dependency collection, but I think it'd be better to > simply use "full path" when there's actual download (or resolution from > local repository). > This part I don't quite get: "too early"? What do you mean here? As events you use are fired "even before" as I see... > > My whole need to extend resolver was to collect the path from initial to > final dependency, so the stack is available when it's needed. > Isn't the PR doing that? Or did I miss something? > > Initially I thought that org.eclipse.aether.RequestTrace should be the > thing I could use to get current dependency path, but I found it's not > possible. > Yes, my hopes were geared toward RequestTrace as well, as it could represent a tree just fine, but > > Maybe your DependencyCollectorDelegate#dependencyCollected() could simply > "expose" the List path somewhere? Maybe in Maven session? > as attribute? > In a moment parallel collection comes into picture ( https://github.com/apache/maven-resolver/pull/178) this will be not enough, as there is one session but multiple threads are working on it. Hence, I agree that "hooking" onto existing events would be the best, but, sadly they are quite disconnected from collectors, hence, it is hard to "couple" them in the right manner Cleanest would be if the event would carry its own copy of path Thanks T
Re: A Maven extension for dependency tracking
Hello I've finally found some time to check your PR#176 Tamás... Here are my comments and answers (also to previous messages). https://github.com/apache/maven-resolver/pull/176 > > So here is some implementation "demo" (that could be made into extension > point), as explained in Draft PR description. > BUT, also as written in PR, am getting a feeling that doing this is > "dangerous", and a simple callback with whole collected graph would be > better > I've built Maven 3.8.5.1 (special local version) with maven-resolver 1.8.0.1 (special local version) with PR#176 included. I was easily able to switch between DF and BF collectors (-Daether.collector.impl) and I could find the "path" to "top" (or "current") dependency: - BF: org.eclipse.aether.internal.impl.collect.bf.DependencyProcessingContext#parents - DF: org.eclipse.aether.internal.impl.collect.df.NodeStack#nodes - 1st: Personally, from a Resolver perspective, I'd just provide an API > (basically the author extending resolver should implement) and make it > simple to "click in" (Sisu component discovery). > - 2nd: resolver IMHO should not provide any out of the box component > implementation at all > I agree - there should be no additional processing without an explicit extension (custom Sisu/Plexus component) So 1st would provide a "stable" extension point for users who would like to > "integrate" with resolver at this point (like you did), but it could become > possible using simply this new API, instead the hoops and loops your code > was forced to do (as resolver is quite "closed" in this respect). > Indeed - I had to shade several resolver classes simply to make them public (with protected methods). With DF/BF resolvers, it'd be even more important to have some clear contract. As for 2nd point, while I do like your idea of "decorating" local > repository, I'd try a bit different route: I'd integrate this > https://github.com/lambdazen/bitsy that makes possible to use Apache > Tinkerpop's Gremlin queries to ask about the built graph for example... > At first glance, it looks like an overkill ;) But I didn't check enough probably... So, after playing a bit with 1.8.0[.1] of the BF/DF resolvers and your #176 PR, I see that example org.eclipse.aether.internal.impl.collect.DependencyCollectorDelegate#dependencyCollected() extension point you've introduced is a bit too early for my use case... It's invoked during dependency collection, but I think it'd be better to simply use "full path" when there's actual download (or resolution from local repository). My whole need to extend resolver was to collect the path from initial to final dependency, so the stack is available when it's needed. Initially I thought that org.eclipse.aether.RequestTrace should be the thing I could use to get current dependency path, but I found it's not possible. Maybe your DependencyCollectorDelegate#dependencyCollected() could simply "expose" the List path somewhere? Maybe in Maven session? as attribute? kind regards Grzegorz Grzybek śr., 11 maj 2022 o 18:40 Tamás Cservenák napisał(a): > Howdy, > > https://github.com/apache/maven-resolver/pull/176 > > So here is some implementation "demo" (that could be made into extension > point), as explained in Draft PR description. > BUT, also as written in PR, am getting a feeling that doing this is > "dangerous", and a simple callback with whole collected graph would be > better > > > WDYT? > > Tamas > > On Mon, May 2, 2022 at 4:18 PM Tamás Cservenák > wrote: > > > Howdy, > > > > just a few short answers: > > - 1st: Personally, from a Resolver perspective, I'd just provide an API > > (basically the author extending resolver should implement) and make it > > simple to "click in" (Sisu component discovery). > > - 2nd: resolver IMHO should not provide any out of the box component > > implementation at all > > > > So 1st would provide a "stable" extension point for users who would like > > to "integrate" with resolver at this point (like you did), but it could > > become possible using simply this new API, instead the hoops and loops > your > > code was forced to do (as resolver is quite "closed" in this respect). > > > > As for 2nd point, while I do like your idea of "decorating" local > > repository, I'd try a bit different route: I'd integrate this > > https://github.com/lambdazen/bitsy that makes possible to use Apache > > Tinkerpop's Gremlin queries to ask about the built graph for example... > > > > And one big remark: the collector is the "hottest point" in resolver > (heap > > and cpu wise), so ANY "new API" implementation should be aware, that each > > "lost" millisecond directly affects resolver collection speed, but I > think > > for "research kind" of stuff, of just "recording the process result" > should > > fit in just fine. I don't see this as a "standard" feature of Maven, but > > who knows? :) > > > > Just my 5 cents... > > > > HTH > > Tamas > > > > On Mon, May 2, 2022 at 4:09 PM Grzegorz Grzybek >
Re: A Maven extension for dependency tracking
Hello! Thanks for your comments and PR - I needed to switch to different tasks, but soon (next week?) I'm going to spend more time on it. I yet have to get a feeling of the graph/stack that could be passed around. And check these DF/BF dependency collectors (as I didn't see them in resolver 1.6.3). I'll keep the https://issues.apache.org/jira/browse/MRESOLVER-248 tab open till I check it ;) kind regards Grzegorz Grzybek śr., 11 maj 2022 o 18:40 Tamás Cservenák napisał(a): > Howdy, > > https://github.com/apache/maven-resolver/pull/176 > > So here is some implementation "demo" (that could be made into extension > point), as explained in Draft PR description. > BUT, also as written in PR, am getting a feeling that doing this is > "dangerous", and a simple callback with whole collected graph would be > better > > > WDYT? > > Tamas > > On Mon, May 2, 2022 at 4:18 PM Tamás Cservenák > wrote: > > > Howdy, > > > > just a few short answers: > > - 1st: Personally, from a Resolver perspective, I'd just provide an API > > (basically the author extending resolver should implement) and make it > > simple to "click in" (Sisu component discovery). > > - 2nd: resolver IMHO should not provide any out of the box component > > implementation at all > > > > So 1st would provide a "stable" extension point for users who would like > > to "integrate" with resolver at this point (like you did), but it could > > become possible using simply this new API, instead the hoops and loops > your > > code was forced to do (as resolver is quite "closed" in this respect). > > > > As for 2nd point, while I do like your idea of "decorating" local > > repository, I'd try a bit different route: I'd integrate this > > https://github.com/lambdazen/bitsy that makes possible to use Apache > > Tinkerpop's Gremlin queries to ask about the built graph for example... > > > > And one big remark: the collector is the "hottest point" in resolver > (heap > > and cpu wise), so ANY "new API" implementation should be aware, that each > > "lost" millisecond directly affects resolver collection speed, but I > think > > for "research kind" of stuff, of just "recording the process result" > should > > fit in just fine. I don't see this as a "standard" feature of Maven, but > > who knows? :) > > > > Just my 5 cents... > > > > HTH > > Tamas > > > > On Mon, May 2, 2022 at 4:09 PM Grzegorz Grzybek > > wrote: > > > >> Thank you Tamás for checking my experiment > >> > >> I'm just finishing my work before tomorrow's national holiday, but will > >> read your information more carefully soon. > >> > >> Whether it's DFS or BFS, as long as there's tracking from initial to > >> ultimate dependency, it's enough. DFS sounds more "natural" here > though. I > >> didn't check the CollectResult class yet - is it created per dependency > or > >> for the entire project? > >> > >> And yes - I didn't check multithreading, as in normal scenario (just > `mvn > >> clean install`) I didn't observe concurrency issues accessing the stack. > >> Mind that I know a bit about maven "components", but there are > definitely > >> few missing things in my understanding. > >> > >> Checking your output, I see there are two aspects of this potential > >> enhancement to the resolver: > >> - 1st - how to effectively collect the "reverse dependency tree" in > >> context of DFS/BFS/multithreading > >> - 2nd - how to write the information > >> > >> 2nd aspect could include: > >> - whether there should be ".tracking" for each GAV directory in local > >> repo > >> (tracking for the purpose of entire local repository) > >> - maybe there should be configurable output location for single report > of > >> a build? (tracking for the purpose of single project) > >> - which format to use (human consumable or machine readable?) > >> > >> For now I've used resolver 1.6.3 from Maven 3.8.5, but I'll look at > `main` > >> branch too. > >> > >> kind regards > >> Grzegorz Grzybek > >> > >> > >> pon., 2 maj 2022 o 15:57 Tamás Cservenák > >> napisał(a): > >> > >> > What I missed to mention: in my case the trees in the gist are about > >> > "resolving maven-core 3.5.8", but I guess you figured it out from the > >> > tree > >> > > >> > T > >> > > >> > On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák > >> > wrote: > >> > > >> > > Howdy, > >> > > > >> > > I did some experiment, that (partially re-using your code to dump > the > >> rev > >> > > tree) produces this output: > >> > > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f > >> > > > >> > > (note: 1.8.0 resolver has two collector implementations: original > >> > > Depth-First and new Breadth-First called DF and BF respectively) > >> > > > >> > > The code is not pushed yet anywhere, but I plan to make an API for > >> this, > >> > > and as you can see, it works > >> > > for both implementations of collectors. Also, I hook ONLY into > >> collector, > >> > > as that's the place where the graph > >> > > is being built, but this is logically
Re: A Maven extension for dependency tracking
Howdy, https://github.com/apache/maven-resolver/pull/176 So here is some implementation "demo" (that could be made into extension point), as explained in Draft PR description. BUT, also as written in PR, am getting a feeling that doing this is "dangerous", and a simple callback with whole collected graph would be better WDYT? Tamas On Mon, May 2, 2022 at 4:18 PM Tamás Cservenák wrote: > Howdy, > > just a few short answers: > - 1st: Personally, from a Resolver perspective, I'd just provide an API > (basically the author extending resolver should implement) and make it > simple to "click in" (Sisu component discovery). > - 2nd: resolver IMHO should not provide any out of the box component > implementation at all > > So 1st would provide a "stable" extension point for users who would like > to "integrate" with resolver at this point (like you did), but it could > become possible using simply this new API, instead the hoops and loops your > code was forced to do (as resolver is quite "closed" in this respect). > > As for 2nd point, while I do like your idea of "decorating" local > repository, I'd try a bit different route: I'd integrate this > https://github.com/lambdazen/bitsy that makes possible to use Apache > Tinkerpop's Gremlin queries to ask about the built graph for example... > > And one big remark: the collector is the "hottest point" in resolver (heap > and cpu wise), so ANY "new API" implementation should be aware, that each > "lost" millisecond directly affects resolver collection speed, but I think > for "research kind" of stuff, of just "recording the process result" should > fit in just fine. I don't see this as a "standard" feature of Maven, but > who knows? :) > > Just my 5 cents... > > HTH > Tamas > > On Mon, May 2, 2022 at 4:09 PM Grzegorz Grzybek > wrote: > >> Thank you Tamás for checking my experiment >> >> I'm just finishing my work before tomorrow's national holiday, but will >> read your information more carefully soon. >> >> Whether it's DFS or BFS, as long as there's tracking from initial to >> ultimate dependency, it's enough. DFS sounds more "natural" here though. I >> didn't check the CollectResult class yet - is it created per dependency or >> for the entire project? >> >> And yes - I didn't check multithreading, as in normal scenario (just `mvn >> clean install`) I didn't observe concurrency issues accessing the stack. >> Mind that I know a bit about maven "components", but there are definitely >> few missing things in my understanding. >> >> Checking your output, I see there are two aspects of this potential >> enhancement to the resolver: >> - 1st - how to effectively collect the "reverse dependency tree" in >> context of DFS/BFS/multithreading >> - 2nd - how to write the information >> >> 2nd aspect could include: >> - whether there should be ".tracking" for each GAV directory in local >> repo >> (tracking for the purpose of entire local repository) >> - maybe there should be configurable output location for single report of >> a build? (tracking for the purpose of single project) >> - which format to use (human consumable or machine readable?) >> >> For now I've used resolver 1.6.3 from Maven 3.8.5, but I'll look at `main` >> branch too. >> >> kind regards >> Grzegorz Grzybek >> >> >> pon., 2 maj 2022 o 15:57 Tamás Cservenák >> napisał(a): >> >> > What I missed to mention: in my case the trees in the gist are about >> > "resolving maven-core 3.5.8", but I guess you figured it out from the >> > tree >> > >> > T >> > >> > On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák >> > wrote: >> > >> > > Howdy, >> > > >> > > I did some experiment, that (partially re-using your code to dump the >> rev >> > > tree) produces this output: >> > > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f >> > > >> > > (note: 1.8.0 resolver has two collector implementations: original >> > > Depth-First and new Breadth-First called DF and BF respectively) >> > > >> > > The code is not pushed yet anywhere, but I plan to make an API for >> this, >> > > and as you can see, it works >> > > for both implementations of collectors. Also, I hook ONLY into >> collector, >> > > as that's the place where the graph >> > > is being built, but this is logically equivalent to your "More >> > interesting >> > > ... 2nd case". >> > > >> > > Will ping once again when I have the changes >> > > >> > > Thanks >> > > Tamas >> > > >> > > On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák >> > > wrote: >> > > >> > >> Howdy, >> > >> >> > >> This is very cool, I was actually tinkering on very similar issues in >> > >> resolver coming from totally different angles. >> > >> >> > >> And yes, the resolver collector is not quite "extension" friendly, >> but >> > we >> > >> will make it right. >> > >> Just FYI, that in the latest resolver (1.8.0) there are actually two >> > >> implementations: depth-first (original) and depth-first. >> > >> >> > >> By looking at your code: collection is most
Re: A Maven extension for dependency tracking
Howdy, just a few short answers: - 1st: Personally, from a Resolver perspective, I'd just provide an API (basically the author extending resolver should implement) and make it simple to "click in" (Sisu component discovery). - 2nd: resolver IMHO should not provide any out of the box component implementation at all So 1st would provide a "stable" extension point for users who would like to "integrate" with resolver at this point (like you did), but it could become possible using simply this new API, instead the hoops and loops your code was forced to do (as resolver is quite "closed" in this respect). As for 2nd point, while I do like your idea of "decorating" local repository, I'd try a bit different route: I'd integrate this https://github.com/lambdazen/bitsy that makes possible to use Apache Tinkerpop's Gremlin queries to ask about the built graph for example... And one big remark: the collector is the "hottest point" in resolver (heap and cpu wise), so ANY "new API" implementation should be aware, that each "lost" millisecond directly affects resolver collection speed, but I think for "research kind" of stuff, of just "recording the process result" should fit in just fine. I don't see this as a "standard" feature of Maven, but who knows? :) Just my 5 cents... HTH Tamas On Mon, May 2, 2022 at 4:09 PM Grzegorz Grzybek wrote: > Thank you Tamás for checking my experiment > > I'm just finishing my work before tomorrow's national holiday, but will > read your information more carefully soon. > > Whether it's DFS or BFS, as long as there's tracking from initial to > ultimate dependency, it's enough. DFS sounds more "natural" here though. I > didn't check the CollectResult class yet - is it created per dependency or > for the entire project? > > And yes - I didn't check multithreading, as in normal scenario (just `mvn > clean install`) I didn't observe concurrency issues accessing the stack. > Mind that I know a bit about maven "components", but there are definitely > few missing things in my understanding. > > Checking your output, I see there are two aspects of this potential > enhancement to the resolver: > - 1st - how to effectively collect the "reverse dependency tree" in > context of DFS/BFS/multithreading > - 2nd - how to write the information > > 2nd aspect could include: > - whether there should be ".tracking" for each GAV directory in local repo > (tracking for the purpose of entire local repository) > - maybe there should be configurable output location for single report of > a build? (tracking for the purpose of single project) > - which format to use (human consumable or machine readable?) > > For now I've used resolver 1.6.3 from Maven 3.8.5, but I'll look at `main` > branch too. > > kind regards > Grzegorz Grzybek > > > pon., 2 maj 2022 o 15:57 Tamás Cservenák napisał(a): > > > What I missed to mention: in my case the trees in the gist are about > > "resolving maven-core 3.5.8", but I guess you figured it out from the > > tree > > > > T > > > > On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák > > wrote: > > > > > Howdy, > > > > > > I did some experiment, that (partially re-using your code to dump the > rev > > > tree) produces this output: > > > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f > > > > > > (note: 1.8.0 resolver has two collector implementations: original > > > Depth-First and new Breadth-First called DF and BF respectively) > > > > > > The code is not pushed yet anywhere, but I plan to make an API for > this, > > > and as you can see, it works > > > for both implementations of collectors. Also, I hook ONLY into > collector, > > > as that's the place where the graph > > > is being built, but this is logically equivalent to your "More > > interesting > > > ... 2nd case". > > > > > > Will ping once again when I have the changes > > > > > > Thanks > > > Tamas > > > > > > On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák > > > wrote: > > > > > >> Howdy, > > >> > > >> This is very cool, I was actually tinkering on very similar issues in > > >> resolver coming from totally different angles. > > >> > > >> And yes, the resolver collector is not quite "extension" friendly, but > > we > > >> will make it right. > > >> Just FYI, that in the latest resolver (1.8.0) there are actually two > > >> implementations: depth-first (original) and depth-first. > > >> > > >> By looking at your code: collection is most critical regarding > > >> performance and memory in the resolver, so "hooking" into it (like > > sending > > >> events per each step) might not be the best, but still, what kind of > > >> extension points would you envision in the collector? > > >> > > >> For example, to achieve what you want, it would be completely enough > to > > >> receive the final CollectResult (the full graph), no? > > >> As -- from a resolver perspective -- that would be simplest, > especially > > >> that now we have two collector implementations... > > >> > > >> Also, in case of
Re: A Maven extension for dependency tracking
Thank you Tamás for checking my experiment I'm just finishing my work before tomorrow's national holiday, but will read your information more carefully soon. Whether it's DFS or BFS, as long as there's tracking from initial to ultimate dependency, it's enough. DFS sounds more "natural" here though. I didn't check the CollectResult class yet - is it created per dependency or for the entire project? And yes - I didn't check multithreading, as in normal scenario (just `mvn clean install`) I didn't observe concurrency issues accessing the stack. Mind that I know a bit about maven "components", but there are definitely few missing things in my understanding. Checking your output, I see there are two aspects of this potential enhancement to the resolver: - 1st - how to effectively collect the "reverse dependency tree" in context of DFS/BFS/multithreading - 2nd - how to write the information 2nd aspect could include: - whether there should be ".tracking" for each GAV directory in local repo (tracking for the purpose of entire local repository) - maybe there should be configurable output location for single report of a build? (tracking for the purpose of single project) - which format to use (human consumable or machine readable?) For now I've used resolver 1.6.3 from Maven 3.8.5, but I'll look at `main` branch too. kind regards Grzegorz Grzybek pon., 2 maj 2022 o 15:57 Tamás Cservenák napisał(a): > What I missed to mention: in my case the trees in the gist are about > "resolving maven-core 3.5.8", but I guess you figured it out from the > tree > > T > > On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák > wrote: > > > Howdy, > > > > I did some experiment, that (partially re-using your code to dump the rev > > tree) produces this output: > > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f > > > > (note: 1.8.0 resolver has two collector implementations: original > > Depth-First and new Breadth-First called DF and BF respectively) > > > > The code is not pushed yet anywhere, but I plan to make an API for this, > > and as you can see, it works > > for both implementations of collectors. Also, I hook ONLY into collector, > > as that's the place where the graph > > is being built, but this is logically equivalent to your "More > interesting > > ... 2nd case". > > > > Will ping once again when I have the changes > > > > Thanks > > Tamas > > > > On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák > > wrote: > > > >> Howdy, > >> > >> This is very cool, I was actually tinkering on very similar issues in > >> resolver coming from totally different angles. > >> > >> And yes, the resolver collector is not quite "extension" friendly, but > we > >> will make it right. > >> Just FYI, that in the latest resolver (1.8.0) there are actually two > >> implementations: depth-first (original) and depth-first. > >> > >> By looking at your code: collection is most critical regarding > >> performance and memory in the resolver, so "hooking" into it (like > sending > >> events per each step) might not be the best, but still, what kind of > >> extension points would you envision in the collector? > >> > >> For example, to achieve what you want, it would be completely enough to > >> receive the final CollectResult (the full graph), no? > >> As -- from a resolver perspective -- that would be simplest, especially > >> that now we have two collector implementations... > >> > >> Also, in case of multi threading, your shared stack would not cut, would > >> it? > >> > >> I personally was also looking into these, especially after some of the > >> latest additions to resolver in 1.8.0 and current master > >> > >> > >> Thanks > >> T > >> > >> > >> On Thu, Apr 28, 2022 at 12:45 PM Grzegorz Grzybek > > >> wrote: > >> > >>> Hello > >>> > >>> TL;DR: https://github.com/grgrzybek/tracking-maven-extension > >>> > >>> I'd like to share some proof of concept I made. It all started with a > >>> question "why I'm getting log4j:log4j:1.2.12" in my local Maven > >>> repository > >>> when building trivial project with fresh local repo? > >>> > >>> I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms that > >>> declare old log4j, but I needed something better. > >>> > >>> In short words - I managed to persist the information available in > >>> > >>> > org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes > >>> stack. > >>> I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext or > >>> used > >>> with "-Dmaven.ext.class.path" which does two things: > >>> > >>>1. adds org.eclipse.aether.RepositoryListener component that writes > >>> some > >>>information when a dependency is FIRST downloaded from remote > >>> repository > >>>2. adds org.eclipse.aether.impl.DependencyCollector component > >>> (extension > >>>of > >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector) > >>>that writes some information when a dependency is resolved against >
Re: A Maven extension for dependency tracking
What I missed to mention: in my case the trees in the gist are about "resolving maven-core 3.5.8", but I guess you figured it out from the tree T On Mon, May 2, 2022 at 3:55 PM Tamás Cservenák wrote: > Howdy, > > I did some experiment, that (partially re-using your code to dump the rev > tree) produces this output: > https://gist.github.com/cstamas/598a3266f943984442c00df30520294f > > (note: 1.8.0 resolver has two collector implementations: original > Depth-First and new Breadth-First called DF and BF respectively) > > The code is not pushed yet anywhere, but I plan to make an API for this, > and as you can see, it works > for both implementations of collectors. Also, I hook ONLY into collector, > as that's the place where the graph > is being built, but this is logically equivalent to your "More interesting > ... 2nd case". > > Will ping once again when I have the changes > > Thanks > Tamas > > On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák > wrote: > >> Howdy, >> >> This is very cool, I was actually tinkering on very similar issues in >> resolver coming from totally different angles. >> >> And yes, the resolver collector is not quite "extension" friendly, but we >> will make it right. >> Just FYI, that in the latest resolver (1.8.0) there are actually two >> implementations: depth-first (original) and depth-first. >> >> By looking at your code: collection is most critical regarding >> performance and memory in the resolver, so "hooking" into it (like sending >> events per each step) might not be the best, but still, what kind of >> extension points would you envision in the collector? >> >> For example, to achieve what you want, it would be completely enough to >> receive the final CollectResult (the full graph), no? >> As -- from a resolver perspective -- that would be simplest, especially >> that now we have two collector implementations... >> >> Also, in case of multi threading, your shared stack would not cut, would >> it? >> >> I personally was also looking into these, especially after some of the >> latest additions to resolver in 1.8.0 and current master >> >> >> Thanks >> T >> >> >> On Thu, Apr 28, 2022 at 12:45 PM Grzegorz Grzybek >> wrote: >> >>> Hello >>> >>> TL;DR: https://github.com/grgrzybek/tracking-maven-extension >>> >>> I'd like to share some proof of concept I made. It all started with a >>> question "why I'm getting log4j:log4j:1.2.12" in my local Maven >>> repository >>> when building trivial project with fresh local repo? >>> >>> I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms that >>> declare old log4j, but I needed something better. >>> >>> In short words - I managed to persist the information available in >>> >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes >>> stack. >>> I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext or >>> used >>> with "-Dmaven.ext.class.path" which does two things: >>> >>>1. adds org.eclipse.aether.RepositoryListener component that writes >>> some >>>information when a dependency is FIRST downloaded from remote >>> repository >>>2. adds org.eclipse.aether.impl.DependencyCollector component >>> (extension >>>of >>> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector) >>>that writes some information when a dependency is resolved against >>> local >>>repository when it's already there (where no download is needed) >>> >>> In the first case, I write something like this: >>> >>> ~~~ >>> Downloaded artifact log4j:log4j:pom::1.2.12 (repository: central ( >>> https://repo.maven.apache.org/maven2, default, releases)) >>>-> commons-logging:commons-logging:jar:1.1 (compile) (context: plugin) >>> -> commons-digester:commons-digester:jar:1.8 (compile) (context: >>> plugin) >>>-> org.apache.velocity:velocity-tools:jar:2.0 (compile) (context: >>> plugin) >>> -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 >>> (compile) >>> (context: plugin) >>>-> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () >>> (context: plugin) >>> Reading descriptor for artifact log4j:log4j:jar::1.2.12 (context: >>> plugin) >>> (scope: ?) (repository: central (https://repo.maven.apache.org/maven2, >>> default, releases)) >>> Transitive dependencies collection for >>> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () >>> Resolution of plugin >>> org.apache.maven.plugins:maven-site-plugin:3.11.0 (org.apache:apache:25) >>> ~~~ >>> Downloaded artifact log4j:log4j:jar::1.2.12 (repository: central ( >>> https://repo.maven.apache.org/maven2, default, releases)) >>> Resolution of plugin com.mycila:license-maven-plugin:3.0 >>> (org.apache.camel:camel-buildtools:3.17.0-SNAPSHOT) >>> >>> I simply write some information from available >>> org.eclipse.aether.RepositoryEvent and event's >>> org.eclipse.aether.RequestTrace. >>> >>> More interesting information is written in 2nd case. Because I wanted to >>>
Re: A Maven extension for dependency tracking
Howdy, I did some experiment, that (partially re-using your code to dump the rev tree) produces this output: https://gist.github.com/cstamas/598a3266f943984442c00df30520294f (note: 1.8.0 resolver has two collector implementations: original Depth-First and new Breadth-First called DF and BF respectively) The code is not pushed yet anywhere, but I plan to make an API for this, and as you can see, it works for both implementations of collectors. Also, I hook ONLY into collector, as that's the place where the graph is being built, but this is logically equivalent to your "More interesting ... 2nd case". Will ping once again when I have the changes Thanks Tamas On Thu, Apr 28, 2022 at 9:01 PM Tamás Cservenák wrote: > Howdy, > > This is very cool, I was actually tinkering on very similar issues in > resolver coming from totally different angles. > > And yes, the resolver collector is not quite "extension" friendly, but we > will make it right. > Just FYI, that in the latest resolver (1.8.0) there are actually two > implementations: depth-first (original) and depth-first. > > By looking at your code: collection is most critical regarding performance > and memory in the resolver, so "hooking" into it (like sending events per > each step) might not be the best, but still, what kind of extension points > would you envision in the collector? > > For example, to achieve what you want, it would be completely enough to > receive the final CollectResult (the full graph), no? > As -- from a resolver perspective -- that would be simplest, especially > that now we have two collector implementations... > > Also, in case of multi threading, your shared stack would not cut, would > it? > > I personally was also looking into these, especially after some of the > latest additions to resolver in 1.8.0 and current master > > > Thanks > T > > > On Thu, Apr 28, 2022 at 12:45 PM Grzegorz Grzybek > wrote: > >> Hello >> >> TL;DR: https://github.com/grgrzybek/tracking-maven-extension >> >> I'd like to share some proof of concept I made. It all started with a >> question "why I'm getting log4j:log4j:1.2.12" in my local Maven repository >> when building trivial project with fresh local repo? >> >> I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms that >> declare old log4j, but I needed something better. >> >> In short words - I managed to persist the information available in >> >> org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes >> stack. >> I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext or used >> with "-Dmaven.ext.class.path" which does two things: >> >>1. adds org.eclipse.aether.RepositoryListener component that writes >> some >>information when a dependency is FIRST downloaded from remote >> repository >>2. adds org.eclipse.aether.impl.DependencyCollector component >> (extension >>of org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector) >>that writes some information when a dependency is resolved against >> local >>repository when it's already there (where no download is needed) >> >> In the first case, I write something like this: >> >> ~~~ >> Downloaded artifact log4j:log4j:pom::1.2.12 (repository: central ( >> https://repo.maven.apache.org/maven2, default, releases)) >>-> commons-logging:commons-logging:jar:1.1 (compile) (context: plugin) >> -> commons-digester:commons-digester:jar:1.8 (compile) (context: >> plugin) >>-> org.apache.velocity:velocity-tools:jar:2.0 (compile) (context: >> plugin) >> -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 >> (compile) >> (context: plugin) >>-> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () >> (context: plugin) >> Reading descriptor for artifact log4j:log4j:jar::1.2.12 (context: >> plugin) >> (scope: ?) (repository: central (https://repo.maven.apache.org/maven2, >> default, releases)) >> Transitive dependencies collection for >> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () >> Resolution of plugin >> org.apache.maven.plugins:maven-site-plugin:3.11.0 (org.apache:apache:25) >> ~~~ >> Downloaded artifact log4j:log4j:jar::1.2.12 (repository: central ( >> https://repo.maven.apache.org/maven2, default, releases)) >> Resolution of plugin com.mycila:license-maven-plugin:3.0 >> (org.apache.camel:camel-buildtools:3.17.0-SNAPSHOT) >> >> I simply write some information from available >> org.eclipse.aether.RepositoryEvent and event's >> org.eclipse.aether.RequestTrace. >> >> More interesting information is written in 2nd case. Because I wanted to >> track ALL attempts to resolve log4j:log4j:1.2.12 (and any other >> dependency), I needed some structure. And I decided this: >> >>- every dependency directory (where e.g., _remote.repositories is >>written along with the jar/pom/sha1/md5/...) gets ".tracking" directory >>- in ".tracking" directory I write files with names of this pattern: >>
Re: A Maven extension for dependency tracking
Howdy, This is very cool, I was actually tinkering on very similar issues in resolver coming from totally different angles. And yes, the resolver collector is not quite "extension" friendly, but we will make it right. Just FYI, that in the latest resolver (1.8.0) there are actually two implementations: depth-first (original) and depth-first. By looking at your code: collection is most critical regarding performance and memory in the resolver, so "hooking" into it (like sending events per each step) might not be the best, but still, what kind of extension points would you envision in the collector? For example, to achieve what you want, it would be completely enough to receive the final CollectResult (the full graph), no? As -- from a resolver perspective -- that would be simplest, especially that now we have two collector implementations... Also, in case of multi threading, your shared stack would not cut, would it? I personally was also looking into these, especially after some of the latest additions to resolver in 1.8.0 and current master Thanks T On Thu, Apr 28, 2022 at 12:45 PM Grzegorz Grzybek wrote: > Hello > > TL;DR: https://github.com/grgrzybek/tracking-maven-extension > > I'd like to share some proof of concept I made. It all started with a > question "why I'm getting log4j:log4j:1.2.12" in my local Maven repository > when building trivial project with fresh local repo? > > I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms that > declare old log4j, but I needed something better. > > In short words - I managed to persist the information available in > > org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes > stack. > I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext or used > with "-Dmaven.ext.class.path" which does two things: > >1. adds org.eclipse.aether.RepositoryListener component that writes some >information when a dependency is FIRST downloaded from remote repository >2. adds org.eclipse.aether.impl.DependencyCollector component (extension >of org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector) >that writes some information when a dependency is resolved against local >repository when it's already there (where no download is needed) > > In the first case, I write something like this: > > ~~~ > Downloaded artifact log4j:log4j:pom::1.2.12 (repository: central ( > https://repo.maven.apache.org/maven2, default, releases)) >-> commons-logging:commons-logging:jar:1.1 (compile) (context: plugin) > -> commons-digester:commons-digester:jar:1.8 (compile) (context: > plugin) >-> org.apache.velocity:velocity-tools:jar:2.0 (compile) (context: > plugin) > -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 (compile) > (context: plugin) >-> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () > (context: plugin) > Reading descriptor for artifact log4j:log4j:jar::1.2.12 (context: plugin) > (scope: ?) (repository: central (https://repo.maven.apache.org/maven2, > default, releases)) > Transitive dependencies collection for > org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () > Resolution of plugin > org.apache.maven.plugins:maven-site-plugin:3.11.0 (org.apache:apache:25) > ~~~ > Downloaded artifact log4j:log4j:jar::1.2.12 (repository: central ( > https://repo.maven.apache.org/maven2, default, releases)) > Resolution of plugin com.mycila:license-maven-plugin:3.0 > (org.apache.camel:camel-buildtools:3.17.0-SNAPSHOT) > > I simply write some information from available > org.eclipse.aether.RepositoryEvent and event's > org.eclipse.aether.RequestTrace. > > More interesting information is written in 2nd case. Because I wanted to > track ALL attempts to resolve log4j:log4j:1.2.12 (and any other > dependency), I needed some structure. And I decided this: > >- every dependency directory (where e.g., _remote.repositories is >written along with the jar/pom/sha1/md5/...) gets ".tracking" directory >- in ".tracking" directory I write files with names of this pattern: >"groupId_artifactId_type_classifier_version.dep", e.g., >org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep >- each such file contains a _reverse dependency tree_ that shows my why >given dependency was resolved. > > For example, in > > ~/.m2/repository/log4j/log4j/1.2.12/.tracking/org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep > (the path itself already contains information that > org.apache.maven.plugins:maven-dependency-plugin:3.1.2 depends (directly or > indirectly) in log4j:logj4:1.2.12. > The content of this file is: > > log4j:log4j:pom:1.2.12 > -> commons-logging:commons-logging:jar:1.1 (compile) (context: plugin) >-> commons-digester:commons-digester:jar:1.8 (compile) (context: plugin) > -> org.apache.velocity:velocity-tools:jar:2.0 (compile) (context: > plugin) >->
A Maven extension for dependency tracking
Hello TL;DR: https://github.com/grgrzybek/tracking-maven-extension I'd like to share some proof of concept I made. It all started with a question "why I'm getting log4j:log4j:1.2.12" in my local Maven repository when building trivial project with fresh local repo? I knew it's possible to `grep -r --include=*.pom 1.2.12` the poms that declare old log4j, but I needed something better. In short words - I managed to persist the information available in org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.Args#nodes stack. I wrote a Maven extension that can be put into $MAVEN_HOME/lib/ext or used with "-Dmaven.ext.class.path" which does two things: 1. adds org.eclipse.aether.RepositoryListener component that writes some information when a dependency is FIRST downloaded from remote repository 2. adds org.eclipse.aether.impl.DependencyCollector component (extension of org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector) that writes some information when a dependency is resolved against local repository when it's already there (where no download is needed) In the first case, I write something like this: ~~~ Downloaded artifact log4j:log4j:pom::1.2.12 (repository: central ( https://repo.maven.apache.org/maven2, default, releases)) -> commons-logging:commons-logging:jar:1.1 (compile) (context: plugin) -> commons-digester:commons-digester:jar:1.8 (compile) (context: plugin) -> org.apache.velocity:velocity-tools:jar:2.0 (compile) (context: plugin) -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 (compile) (context: plugin) -> org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () (context: plugin) Reading descriptor for artifact log4j:log4j:jar::1.2.12 (context: plugin) (scope: ?) (repository: central (https://repo.maven.apache.org/maven2, default, releases)) Transitive dependencies collection for org.apache.maven.plugins:maven-site-plugin:jar:3.11.0 () Resolution of plugin org.apache.maven.plugins:maven-site-plugin:3.11.0 (org.apache:apache:25) ~~~ Downloaded artifact log4j:log4j:jar::1.2.12 (repository: central ( https://repo.maven.apache.org/maven2, default, releases)) Resolution of plugin com.mycila:license-maven-plugin:3.0 (org.apache.camel:camel-buildtools:3.17.0-SNAPSHOT) I simply write some information from available org.eclipse.aether.RepositoryEvent and event's org.eclipse.aether.RequestTrace. More interesting information is written in 2nd case. Because I wanted to track ALL attempts to resolve log4j:log4j:1.2.12 (and any other dependency), I needed some structure. And I decided this: - every dependency directory (where e.g., _remote.repositories is written along with the jar/pom/sha1/md5/...) gets ".tracking" directory - in ".tracking" directory I write files with names of this pattern: "groupId_artifactId_type_classifier_version.dep", e.g., org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep - each such file contains a _reverse dependency tree_ that shows my why given dependency was resolved. For example, in ~/.m2/repository/log4j/log4j/1.2.12/.tracking/org.apache.maven.plugins_maven-dependency-plugin_jar_3.1.2.dep (the path itself already contains information that org.apache.maven.plugins:maven-dependency-plugin:3.1.2 depends (directly or indirectly) in log4j:logj4:1.2.12. The content of this file is: log4j:log4j:pom:1.2.12 -> commons-logging:commons-logging:jar:1.1 (compile) (context: plugin) -> commons-digester:commons-digester:jar:1.8 (compile) (context: plugin) -> org.apache.velocity:velocity-tools:jar:2.0 (compile) (context: plugin) -> org.apache.maven.doxia:doxia-site-renderer:jar:1.7.4 (compile) (context: plugin) -> org.apache.maven.reporting:maven-reporting-impl:jar:3.0.0 (compile) (context: plugin) -> org.apache.maven.plugins:maven-dependency-plugin:jar:3.1.2 () (context: plugin) It's kind of obvious - dependency-plugin through maven-reporint-impl, through doxia, velocity, commons-digester and commons-logging "depends" on malicious log4j:1.2.12 library every security scanner screams about. Since I wrote this extension, I keep it in my @MAVEN_HOME/lib/ext and build everything in my work. Now I know why my ~/.m2/repository/org/codehaus/plexus/plexus-utils/ directory contains 57 different versions of plexus-utils for example. for example why 1.0.4 from 2005? org.codehaus.plexus:plexus-utils:pom:1.0.4 -> org.codehaus.plexus:plexus-container-default:jar:1.0-alpha-9-stable-1 (compile) (context: plugin) -> org.codehaus.plexus:plexus-velocity:jar:1.2 (compile) (context: plugin) -> org.apache.maven.doxia:doxia-site-renderer:jar:1.11.1 (compile) (context: plugin) -> org.apache.maven.plugins:maven-javadoc-plugin:jar:3.3.2 () (context: plugin) Why Guava 10.0.1? com.google.guava:guava:pom:10.0.1 -> org.eclipse.sisu:org.eclipse.sisu.plexus:jar:0.0.0.M5 (compile) (context: plugin) ->