On Mon, 10 Jun 2024 14:39:01 -0500 Alberto Perez Bogantes via tcpdump-workers <tcpdump-workers@lists.tcpdump.org> wrote:
> We believe that this functionality is well suited for tcpdump because > much of the logic used to print an IP address for a specific packet > can be reused to access that IP and anonymize it. The logic for > dissecting packet headers can be slightly adapted to implement this > feature, including anonymization of application headers. For example, > much of the code written to print an IP address offered by DHCP can > be used to access that address and anonymize it. Better late than never. Nik Sultana discussed this feature with me in April. Whilst trying to explain difficulties of the earlier pull request 615, I (rather unexpectedly for myself) came to the same point of view as above. Let me paste a copy of my off-list message to clarify: --------8<--------8<--------8<--------8<--------8<--------8<-------- >From a high-level perspective, anonymisation is a feature with sound practical use cases. To that end, several pieces of software exist that allow to anonymise an existing .pcap file. That's a pretty straightforward part of the problem. The complicated part of the problem is whether this feature should be implemented in tcpdump code. On one hand, this would complicate the code base and add long-term maintenance burden, so there is an argument that this should be done entirely in an external program, same as savefile compression, ideally using something as generic as a popen() call. A good example of such design would be gzip/bzip2/compress/xz support in tar: it does not implement compression itself, but allows an easy interface with an external compression program, even a custom one. Also such approach would not require to implement the anonymisation code in C. On the other hand, such an external program would have to handle various link-level types and protocol headers in order to modify the L2/L3 addresses, which would be a duplication of the work already done in tcpdump DLT and protocol dissectors. To add to that, a compressor program does not have to understand the internal structure of a tar archive, so the problem space is not exactly the same and what is the best fit for tar is not necessarily the best fit for tcpdump. In the earlier tcpdump pull request 615 it was not immediately obvious which approach would be better, so it ended up as a discussion without a conclusion, and then real life and COVID overwhelmed the matter. It seems to me, if the Crypto-PAn method becomes a part of tcpdump code, it would be fair to allow more than one method in principle and to parameterise tcpdump accordingly, for example: --anon-method=cryptopan --anon-parameters=<...> Then the older implementation would have a way to be implemented too, if anyone wants to revive it. I guess the best next step would be to raise the problem statement and the current "definition of done" on tcpdump-workers and to see if anybody else can improve it. --------8<--------8<--------8<--------8<--------8<--------8<-------- So there is no reason not to make this a pull request and to hope that it will be progressed (be patient though). With this in mind, I looked very briefly at the branch with the new feature, and it would be helpful to make some generic cleanups because the same points would come up on a pull request: * Use one author e-mail for all commits. * Make the number of commits right (squash the "fix this" commits, make a set of logically unified changes in one complete commit). * Add a change log entry and update the man page. * Run ./build_matrix.sh and fix any warnings/errors (currently the build fails with CMake with "libcryptopANT header not found", obviously, not all build environments have the library, so it should be opt-in and later maybe auto-detected). -- Denis Ovsienko _______________________________________________ tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s