Re: [tcpdump-workers] endianness of portable BPF bytecode

2022-06-10 Thread Denis Ovsienko via tcpdump-workers
--- Begin Message ---
On Fri, 10 Jun 2022 14:26:34 -0700
Guy Harris  wrote:

> On Jun 10, 2022, at 1:59 PM, Denis Ovsienko via tcpdump-workers
>  wrote:
> 
> > Below is a draft of such a file format.  It addresses the following
> > needs:
> > * There is a header with a signature string to avoid false positive
> >  detection as some other file type that begins exactly with
> > particular bytecode (ran into this during disassembly experiments).
> > * There are version fields to address possible future changes to the
> >  encoding (either backward-compatible or not).  
> 
> Is the idea that a change that's backward-compatible (so that code
> that handles the new format needs no changes to handle the old
> format, but code that handles only the old format can't handle the
> new format) would involve a change to the minor version number, but a
> change that's not backward-compatible (so that to handle both
> versions would require two code paths for the two versions) would
> involve a change to the major version number?

Yes, more or less.  The draft format had a couple more fields not long
ago, with those the version semantics seemed more apparent (for a while
I thought Linux kernel cBPF is a superset of libpcap cBPF, but upon a
closer inspection of the header files they seem identical).

In any case, it is very convenient to be able to cycle a major version
and to redefine everything beyond the signature and the version fields,
that's the idea. Forward- and backward-compatibility between minor
versions can be considered now.

> > File format:
> > 
> > 0   1   2   3
> > 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> > |  'c'  |  'B'  |  'P'  |  'F'  |
> > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  
> 
> Is the 'c' part of the retronym "cBPF" for the "classic BPF"
> instruction set, as opposed to the eBPF instruction set?  (I didn't
> find any file format for saving eBPF programs, so this format could
> be used for that as well, with the magic number 'e' 'B' 'P' 'F'.)

Yes, it is.  In online documentation "eBPF" seems to clash with "BPF"
a lot, so it seems better to avoid the confusion early.

As it turned out after some research, the nominal binary format for eBPF
is ELF.  This is one of the most useful online documents I found:
https://www.man7.org/linux/man-pages/man8/tc-bpf.8.html
As you can see there and in the references into Linux kernel
documentation, ELF eBPF seems to cover different bit widths, relocation
types, debug information, lookup maps, multiple executable sections and
what not.

However, most of these features significantly overshoot the packet
capture problem space on one hand, and don't seem to address simple
practical needs of capturing parameters and context of a cBPF
compilation and reproducing it later.  So I figured it would be better
to leave eBPF solution space alone and to use a separate
purpose-designed file format for cBPF.

Most of the meta-data TLVs below are purposed to help a developer to
understand the context and reproduce the compilation.

> > Type=0x02 (LINKTYPE_ID)
> > Length=4
> > Value=  
> 
> This could be 2 bytes long - pcapng limits link-layer types to 16
> bits, and pcap now can use the upper 16 bits of the link-layer type
> field for other purposes.

Fine.

> > Type=0x03 (LINKTYPE_NAME)
> > Length is variable
> > Value=  
> 
> E.g. either its LINKTYPE_xxx name or its DLT_xxx name?

Yes.  The intent is to capture the input to
pcap_datalink_name_to_val() if the latter was involved.

> > Type=0x04 (COMMENT)
> > Length is variabe
> > Value= > description>  
> 
> "Generating software description" as in the code that generated the
> BPF program?

"libpcap x.y.z", "my script v1.0" or something like that.

> > Type=0x05 (TIMESTAMP)
> > Length=8
> > Value=  
> 

> Is this the time the code was generated?

Yes.

> Is it a 64-bit time_t, or a 32-bit time_t and a 32-bit
> microseconds/nanoseconds value?  I'd recommend the former, unless we
> expect classic BPF to be dead by 2038.

It is the 64-bit integer time.

-- 
Denis Ovsienko
--- End Message ---
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] endianness of portable BPF bytecode

2022-06-10 Thread Guy Harris via tcpdump-workers
--- Begin Message ---
On Jun 10, 2022, at 1:59 PM, Denis Ovsienko via tcpdump-workers 
 wrote:

> Below is a draft of such a file format.  It addresses the following
> needs:
> * There is a header with a signature string to avoid false positive
>  detection as some other file type that begins exactly with particular
>  bytecode (ran into this during disassembly experiments).
> * There are version fields to address possible future changes to the
>  encoding (either backward-compatible or not).

Is the idea that a change that's backward-compatible (so that code that handles 
the new format needs no changes to handle the old format, but code that handles 
only the old format can't handle the new format) would involve a change to the 
minor version number, but a change that's not backward-compatible (so that to 
handle both versions would require two code paths for the two versions) would 
involve a change to the major version number?

> File format:
> 
> 0   1   2   3
> 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |  'c'  |  'B'  |  'P'  |  'F'  |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Is the 'c' part of the retronym "cBPF" for the "classic BPF" instruction set, 
as opposed to the eBPF instruction set?  (I didn't find any file format for 
saving eBPF programs, so this format could be used for that as well, with the 
magic number 'e' 'B' 'P' 'F'.)

> Type=0x02 (LINKTYPE_ID)
> Length=4
> Value=

This could be 2 bytes long - pcapng limits link-layer types to 16 bits, and 
pcap now can use the upper 16 bits of the link-layer type field for other 
purposes.

> Type=0x03 (LINKTYPE_NAME)
> Length is variable
> Value=

E.g. either its LINKTYPE_xxx name or its DLT_xxx name?

> Type=0x04 (COMMENT)
> Length is variabe
> Value=

"Generating software description" as in the code that generated the BPF program?

> Type=0x05 (TIMESTAMP)
> Length=8
> Value=

Is this the time the code was generated?

Is it a 64-bit time_t, or a 32-bit time_t and a 32-bit microseconds/nanoseconds 
value?  I'd recommend the former, unless we expect classic BPF to be dead by 
2038.--- End Message ---
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers


Re: [tcpdump-workers] endianness of portable BPF bytecode

2022-06-10 Thread Denis Ovsienko via tcpdump-workers
--- Begin Message ---
On Thu, 2 Jun 2022 20:58:38 +0100
Denis Ovsienko via tcpdump-workers 
wrote:

> If there is no convention in place yet, I would like to propose
> declaring big-endian as the implicit/default byte order, then
> particular file format(s) with headers can override that as needed.

Below is a draft of such a file format.  It addresses the following
needs:
* There is a header with a signature string to avoid false positive
  detection as some other file type that begins exactly with particular
  bytecode (ran into this during disassembly experiments).
* There are version fields to address possible future changes to the
  encoding (either backward-compatible or not).
* The explicit instruction count enables detection of truncated or
  malformed files on one hand, on the other it divides the bytecode
  from the optional meta-data.
* The optional meta-data allows to captures some of the bytecode
  compilation context to aid in debugging.

All multiple-byte fields are big-endian.

File format:

 0   1   2   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  'c'  |  'B'  |  'P'  |  'F'  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  's'  |  'a'  |  'v'  |  'e'  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  'f'  |  'i'  |  'l'  |  'e'  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   MajorVer=0  |MinorVer   |   InstructionCount=n  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   |
| instruction 1 |
|   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   |
| instruction 2 |
|   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   |
~   ~
|   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   |
| instruction n |
|   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   |
|   optional trailing TLV space |
|   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Instruction:

 0   1   2   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| opcode|   jt  |   jf  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   k   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


TLV format:

 0   1   2   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Type | Length|   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   ~
|   |
~  Value (0 or more bytes)  ~
|   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Length value does not include Type and Length.

Type=0x00 (FILTER)
Length is variable
Value=

Type=0x01 (OPTIMIZED)
Length=1
Value= (0 for off, 1 for on)

Type=0x02 (LINKTYPE_ID)
Length=4
Value=

Type=0x03 (LINKTYPE_NAME)
Length is variable
Value=

Type=0x04 (COMMENT)
Length is variabe
Value=

Type=0x05 (TIMESTAMP)
Length=8
Value=

-- 
Denis Ovsienko
--- End Message ---
___
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers