Hi Lukas, >> It currently detects mails with identical subjects (after prefixes are >> removed) within a 180 day window. This is not a very sophisticated >> matching system, but given that it's an API client and not in the core, >> I'm much happier to experiment and build up sophistication as and when >> it's needed. >> > > Simplicity is certainly valuable. > > Ralf and I envisioned the much more sophisticated algorithm for > similar patch detection (from pasta, https://github.com/lfd/PaStA) > integrated into a workflow with patchwork.
Yes, PaStA was definitely something I had in mind when I wrote this. The capitalisation in PaReD is a small homage to PaStA. I have also been pointed at this abandoned attempt to get similar functionality into Gerrit: https://gerrit-review.googlesource.com/c/gerrit/+/91253 I am very open to moving in that direction if it turns out that more detection at that level of sophistication is required to get acceptable accuracy. > Daniel, you have seen the small steps we have taken: > > - Mete (an intern at BMW, my employer at the time) implemented the > "related patches" feature for patchwork in 2019. > - Rohit (a Google Summer of Code student in 2020, mentored by Ralf and > me) implemented an "export, compute, import" toolchain between > patchwork and pasta, some more details are described in > https://github.com/lfd/PaStA/blob/master/documentation/pasta-patchwork.md. > > Unfortunately, IMHO, we hit two challenging implementation tasks with this > work: > 1. Performance issue computing relations with pasta > 2. The lack of being able to limit the computation to new incoming > patches: pasta was designed as an run-once off-line analysis tool, not > as an continuously running online analysis; changing that is possible, > but touches on various internal aspects throughout the whole tool. > > At that point, we have not continued the work yet and I personally > believe that exploring simpler solutions than the complex pasta > heuristics is worth a try (even if just to save power consumption of > servers in the long run...). > > For completeness, I need to mention that Konstantin's b4 tool also > detects the "latest patch series" when you ask it to pick a patch > series from a kernel mailing list. I do not know how it determines > that (and I hope that Konstantin can comment here), but it is probably > also a simple heuristics searching for similar/same subject lines of > the patch series cover letter. It would be nice if that functionality > could be invoked as some kind of library function/separate client tool > for patchwork as well. > > I hope that others can also come up with simple PaReD variants, such > as parsing lore.kernel.org Links in the 'patch comment section' (so > below the "---"), as once named the best way for developers to refer > to previous versions in a ksummit-discuss email thread. I always hope > that once a tool provides a significant benefit for tracking and > managing previous versions, more developers pick up the needed > conventions that patches would need to follow to benefit from such a > tool. I hope so too! I have been pleased by the proliferation of checks across kernel.org's patchwork; I hope this will be the next thing to spread! >> You can get the code at https://github.com/daxtens/pw-pared . I'm using >> the same license as Patchwork, for a number of reasons, but in part >> because we may one day want to migrate the functionality into the >> patchwork core. Patches are welcome. >> >> You can see some examples of where PaReD has set up meaningful relations >> at: >> >> - >> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210802073929.907431-2-kj...@linux.ibm.com/ >> - >> https://patchwork.ozlabs.org/project/patchwork/patch/20210823182833.3976100-6-ra...@google.com/ >> >> Some very obvious things that doing this has exposed: >> >> - the relations display should show the status of each related patch >> (e.g. New, Superseded, Accepted) >> >> - Series relations would make a lot of sense - probably even more sense >> from a human point of view - and we should probably build those at >> some point. >> > > Agree. This is something Ralf, Mete, Rohit and I discussed as well. > > Extending a patch relation to a patch series relation is conceptually simple: > > If two patch series S1 and S2 with patches p1, ..., pn in series S1 > and patches r1, ..., rm in series S2 share a critical amount of > related patches, i.e., for a large set of pairs of indices (i, j) in > I: pi and rj are related to each other, then the series S1 and S2 are > related to each other. Further, one could come up with a separate > similarity relation among cover letters, and weigh that into the > measure for related patch series. Fine-tune the weights and > thresholds, evaluate it on a representative dataset and you are > done...Conceptually clear, but this involves quite some work. As with patch relations, I think we'd want to start with the infrastructure and API --- although having learned from the experience with patch relations I think we'd also want to release a tool that performs basic detection of series relations at the same time! >> - PaReD requires an API token for a maintainer account (much like for >> pushing checks) which is annoying and one day we should sort out >> fine-grained permissions. >> >> Ask your patchwork instance admin if a maintainer account for PaReD is >> right for you! >> > > I am looking forward to more implementations and more instances > running and trying out this feature. > > Daniel, thanks for moving this feature yet a step further. > Thanks for the long-running effort you have coordinated to land patch relations as a feature in Patchwork and extend their capabilities! Kind regards, Daniel _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork