On Mon, 10 Jun 2024 14:39:01 -0500
Alberto Perez Bogantes via tcpdump-workers
<tcpdump-workers@lists.tcpdump.org> wrote:

> We believe that this functionality is well suited for tcpdump because
> much of the logic used to print an IP address for a specific packet
> can be reused to access that IP and anonymize it. The logic for
> dissecting packet headers can be slightly adapted to implement this
> feature, including anonymization of application headers. For example,
> much of the code written to print an IP address offered by DHCP can
> be used to access that address and anonymize it.

Better late than never.  Nik Sultana discussed this feature with me in
April.  Whilst trying to explain difficulties of the earlier pull
request 615, I (rather unexpectedly for myself) came to the same point
of view as above.  Let me paste a copy of my off-list message to
clarify:

--------8<--------8<--------8<--------8<--------8<--------8<--------
>From a high-level perspective, anonymisation is a feature with sound
practical use cases.  To that end, several pieces of software exist
that allow to anonymise an existing .pcap file.  That's a pretty
straightforward part of the problem.  The complicated part of the
problem is whether this feature should be implemented in tcpdump code.

On one hand, this would complicate the code base and add long-term
maintenance burden, so there is an argument that this should be done
entirely in an external program, same as savefile compression, ideally
using something as generic as a popen() call.  A good example of such
design would be gzip/bzip2/compress/xz support in tar: it does not
implement compression itself, but allows an easy interface with an
external compression program, even a custom one.  Also such approach
would not require to implement the anonymisation code in C.

On the other hand, such an external program would have to handle various
link-level types and protocol headers in order to modify the L2/L3
addresses, which would be a duplication of the work already done in
tcpdump DLT and protocol dissectors.  To add to that, a compressor
program does not have to understand the internal structure of a tar
archive, so the problem space is not exactly the same and what is the
best fit for tar is not necessarily the best fit for tcpdump.

In the earlier tcpdump pull request 615 it was not immediately obvious
which approach would be better, so it ended up as a discussion without
a conclusion, and then real life and COVID overwhelmed the matter.  It
seems to me, if the Crypto-PAn method becomes a part of tcpdump code,
it would be fair to allow more than one method in principle and to
parameterise tcpdump accordingly, for example:

--anon-method=cryptopan --anon-parameters=<...>

Then the older implementation would have a way to be implemented too,
if anyone wants to revive it.

I guess the best next step would be to raise the problem statement and
the current "definition of done" on tcpdump-workers and to see if
anybody else can improve it.
--------8<--------8<--------8<--------8<--------8<--------8<--------

So there is no reason not to make this a pull request and to hope that
it will be progressed (be patient though).  With this in mind, I looked
very briefly at the branch with the new feature, and it would be
helpful to make some generic cleanups because the same points would
come up on a pull request:

* Use one author e-mail for all commits.
* Make the number of commits right (squash the "fix this" commits, make
  a set of logically unified changes in one complete commit).
* Add a change log entry and update the man page.
* Run ./build_matrix.sh and fix any warnings/errors (currently the
  build fails with CMake with "libcryptopANT header not found",
  obviously, not all build environments have the library, so it should
  be opt-in and later maybe auto-detected).

-- 
    Denis Ovsienko
_______________________________________________
tcpdump-workers mailing list -- tcpdump-workers@lists.tcpdump.org
To unsubscribe send an email to tcpdump-workers-le...@lists.tcpdump.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to