After an extended period on the sidelines, the
layer 2 filtering project is resubmitting this
PSARC fast track for consideration. No changes
are being provided between this and the original
specification as various parts of the design have
been completely revisited, effectively obsoleting
the original specification.

Additional materials (diff marked man pages)
can be found in the case directory:
gld.7d.txt
hook_nic_event.9s.txt
ipf.4.txt
ipnat.4.txt
net_getlifaddr.9f.txt

Darren


Abstract
========
This case will extend PSARC/2005/334, by adding the ability to intercept
packets in MAC layer using the PFHooks infrastructure.

This case only makes one change, an addition, to the interfaces that were
committed to by PSARC/2008/219 (see "new NIC event" below for details)

Release Biding
--------------
This case seeks for a patch binding.

Introduction
============
The PFHooks project, PSARC/2005/334, provide the ability to intercept 
packets
in IP layer by adding hooks into the network stack.

Since its integration, there has been customer requirements for the ability
to intercept packets in MAC layer as well, also it is needed to enforce
security rules for xVM guest domains and exclusive zones.

Goals
-----
This case seeks to meet the following goals:
* provide the hooks in MAC layer that allows consumers to register on to
  intercept packets;

* provide the netinfo interface for MAC layer that gives consumers access to
  interface information, and the ability to inject or emit packets directly;

* modify IPFilter to allow the administrator to specify layer 2 rules, which
  includes ethernet filtering rules and IP Filtering/NAT rules.

Out of scope
------------
This project only provides the ability to specify ethernet filtering rules
to match ethernet packets, and IP filtering/NAT rules to match/modify IP
packets at MAC layer. Providing the ability to specify rules to filter
non-ethernet packets by matching the MAC header is out of scope for this
project.

The detailed design are described below for each major components.

Netinfo interface for MAC layer
===============================
netinfo interfaces
------------------
The hooks provided will generate events for NH_PHYSICAL_IN and 
NH_PHYSICAL_OUT,
using the same interface as IPv4 and IPv6 do in PSARC/2005/334.

The following functions will be supported through the netinfo(9f) framework:
net_getifname()
net_phylookup()
net_phygetnext()
net_getlifaddr()
net_inject()
net_getmtu()

All of the other functions in the netinfo(9f) framework will return a value
indicating that they are unsupported. The return values for the above
functions only have meaning with the scope of the corresponding family -
it is not correct to use a value returned by net_getifname() using the
ethernet net_data_t handle with net_phylookup() for IPv4.

The callback for these events will receive a pointer to a hook_pkt_event_t
structure that has the following fields filled out:

hpe_ifp - 0 for NH_PHYSICAL_OUT, otherwise a value indicating which
          interface the NH_PHYSICAL_IN event is associated with;
hpe_ofp - 0 for NH_PHYSICAL_IN, otherwise a value indicating which
          interface the NH_PHYSICAL_OUT event is associated with;
hpe_hdr - points to the start of the MAC header
hpe_mb  - points to the start of the mblk_t that holds hpe_hdr;
hpe_mp  - points to the mblk_t that is the start of the packet.

Name to interface resolution
----------------------------
After Clearview UV all the data link related operations use link names, this
applies to IPFilter as well. When the administrator wants to specify a rule
that works on certain interface, link name is used to specify which 
interface
this rule applies to. So link name consititutes the interface name for
MAC layer netinfo.

Since layer 2 filtering is based on the MAC client which Crossbow project is
introducing, in this project we'll introduce the MAC client index as the
MAC layer interface pointer, to uniquely indentify a MAC layer interface
in the kernel. This is similar to the existing ifindex that is used as IP
layer interface pointer today.

Netinfo provides functions to translate from a interface name (link name)
to the corresponding interface pointer (MAC client index) and back, via
net_phylookup() and net_getifname(). And these functions can be called in
data path so the existing procedures such as dls_mgmt_get_linkid() and
dls_mgmt_get_linkinfo() cannot be used as they involve door calls.
Thus we propose to add a link name <-> link id hash table in dls, and 
provide
the following routines to translate between link name and mac name. The MAC
layer netinfo will use these routines to implement mapping between link name
and MAC client index.

+------------------------------------------------------------+
| Interface                                 | Classification |
|------------------------------------------------------------|
| dls_devnet_macname2linkname(const char *, |                |
|     char *, const size_t);                | consolidation  |
| dls_devnet_linkname2macname(const char *, | private        |
|     char *, const size_t);                |                |
+------------------------------------------------------------+
     Table: Fuctions for link name and mac name mapping

new NIC event
-------------
The status of network in the operating system often changes, from unplugging
a system from network temporarily, to an interface's IP address changing
as a result of DHCP. Thus PFHooks framework provides event notification
mechanism for this.

The callback for these events will receive a pointer to a hook_nic_event_t
structure that has the following fields filled out:

hne_protocol - network protocol for events, returned from net_lookup

hne_nic      - physical interface associated with event

hne_lif      - logical interface (if any) associated with event

hne_event    - type of event occuring. The current list of events 
available is:

        NE_PLUMB
               indicates that an interface has just been created

        NE_UNPLUMB
               indicates that an interface has just been destroyed and that
               no more events should be received for it

        NE_UP
               indicates that an interface has changed state to "up" and
               may now generate packet events.

        NE_DOWN
               indicates that an interface has changed state to "down" and
               will no longer generate packet events.

        NE_ADDRESS_CHANGE
               indicates that an address on an interface has changed.

hne_data     - pointer to extra data about event or NULL if none

hne_datalen  - size of data pointed to by hne_data (can be 0)

NE_NAME_CHANGE event
~~~~~~~~~~~~~~~~~~~~
As Clearview UV (PSARC/2006/499, PSARC/2007/527, PSARC/2008/002) introduces
the ability to rename a data link, we need to capture this event in order to
update IPFilter rules accrodingly. Thus we propose an extension to
PSARC/2008/219 by adding a new hook event NE_NAME_CHANGE to nic_event_t
to indicate the that an interface has been renamed, and this particular 
event
is only available to layer 2 netinfo. In IP, changing of an interface name
is represented by a NE_UNPLUMB and NE_PLUMB event pair.

typedef enum nic_event {
         NE_PLUMB = 1,
         NE_UNPLUMB,
         NE_UP,
         NE_DOWN,
         NE_ADDRESS_CHANGE,
+        NE_NAME_CHANGE
} nic_event_t;

Design considerations
~~~~~~~~~~~~~~~~~~~~~
IPFilter rules always match by name, and only the current link names are 
used
for matching, not old names. Uppon NE_NAME_CHANGE event, IPFilter will walk
all the layer 2 rules, and resolve the interface name stored in the rule
structure into interface pointers. So when the link is renamed, rules using
old link names are invalidated, and rules using new link names are 
activated.
If there's a filtering rule that applies to interface bge0, and someone 
renames
bge0 to net0, then the rule no longer matches packets received on the link
formally known as bge0.

Also IPFilter has been designed to allow users to specify rules with 
interface
names that do not exist at the time they are loaded, and for those interface
names to be resolved at the time at which they're added to the system. Thus,
the mapping from the linkname to the linkid needs to happen in the kernel.
Changing IPFilter to use linkid instead of link name will not work.

Protocol & Hook registration
============================
Protocol registration
---------------------
With IP layer netinfo today we have 3 protocols, IPv4, IPv6 and ARP. For MAC
layer, each of the MAC plugin type is treated as a different protocol, so
we'll have ethernet, wifi and ib. These protocols will be registered by
using net_protocol_register() when the corresponding MAC plugin gets loaded.

Hook registration
-----------------
IPFilter will register hooks for MAC layer protocols in the following cases:

when the first ethernet filtering rule is added
- register the ethernet hook

when the first "layer2" IP filtering/NAT rule is added
- register the ethernet, wifi and ib hooks

when it receives a notification indicating that a protocol is registered
- register the hook if there are rules for that corresponding protocol.

Since layer 2 filtering functionality is enabled automatically when the
first layer 2 rule is added, the corresponding hook needs to be registered
then so packets can be passed to IPFilter from the hook framework.

It is possible that a rule for a layer 2 protocol is added before the
corresponding protocol is registered. Suppose user has added a layer 2 IP
filtering rule on a system that only has ethernet cards, then he plugs a
wifi card into the system and sets it up, in this case when the wifi MAC
plugin is loaded, the protocol will be registered, and IPFilter will be
notified via the callback notification mechanism provided by the PFHooks
API project, and it will register the hook for that protocol so it can
receive and match wifi packets.

IPFilter changes
================
Users can use ipf(1M) to add ethernet filtering rules in addition to IP
filtering rules, these ethernet filtering rules are marked with "family 
ether".
They can also add IP Filtering/NAT rules and mark them with "layer2" keyword
so these rules will be processed in MAC layer instead of IP layer. 
Unlike IPv6,
no special command line switch is required to load these rules.

The "layer2" IP filtering/NAT rules go to existing ipf.conf, ipf6.conf and
ipnat.conf, respectively. The "family ether" rules go to a new configuration
file ipf-ether.conf.

The layer 2 filtering functionality will be enabled automatically when the
first ethernet rule or "layer2" IPFilter rule is added, and disabled when
the last such rule is removed. This functionality is only available in
global zone.

Also, ipmon has been updated to print out log records with ethernet
information but the output of this command is volatile.

Rule processing
---------------
Currently processing order in IPFilter is:

[INPUT] -> IP NAT -> IP firewall -> { IP }  -> IP firewall -> IP NAT -> 
[OUTPUT]

With layer 2 filtering the processing order would become:

[INPUT] -> L2 firewall -> "layer2" IP NAT -> "layer2" IP firewall ->
... -> IP NAT -> IP firewall -> { IP }  -> IP firewall -> IP NAT -> ...
-> L2 firewall -> "layer2" IP firewall -> "layer2" IP NAT -> [OUTPUT]

Input processing
~~~~~~~~~~~~~~~~
Take input processing for an IP packet for example:

- MAC level filtering rules are processed first. These rules match on MAC
headers to determine if a packet should be passed or blocked. Administrators
use these rules to match with MAC addresses, MAC type, VLAN ID, .etc.

- L2filter jump over the MAC header, determine if this is an IP packet, and
do some sanity checking before passing it up to "layer2" IP rules for 
further
processing.

- Then "layer2" IP NAT rules are processed. Like IP layer NAT rules, these
rules do NAT for IP packets, but it is done at MAC layer instead of IP 
layer.

- Then "layer2" IP Filtering rules are processed. These rules provide IP
Filtering at MAC layer.

- L2filter finishes processing and the packet is delivered up in the stack.
When the packet reaches IP, IP layer filtering/NAT processing is invoked,
and it works just as it does today.

Design considerations
~~~~~~~~~~~~~~~~~~~~~
This processing order is designed so that

- The processing order between IP NAT rules and IP Filtering rules is
consistent with existing IPFilter today;

- Since MAC level filtering rules is processed before layer2 IP rules,
down the road it is possible to combine filtering at both level together,
allow or block a packet based on a mixture of L2 and L3 criteria, thus
providing more fine grained control.

Changes to output
-----------------
With layer 2 filtering, each type of rules have its own distinct orders,
the output of ipfstat/ipnat has been modified so that the rules are shown
in a manner to let the users better understand the processing orders.
The change only applies to global zone, output in non-global zones remain
unchanged.

Example
~~~~~~~

# ipfstat -io
Ethernet rules:
empty list for ipfilter(out)
pass in family ether all
pass in family ether from 1:2:3:4:5:6 to any

layer 2 IP rules:
empty list for ipfilter(out)
pass in proto icmp from 1.1.1.1 to 2.2.2.2 layer2
block in proto tcp from 3.3.3.3 to 4.4.4.4 layer2

IP rules:
pass in all
pass out all

# ipnat -l
List of layer 2 active MAP/Redirect filters:
map bge1 from 2.3.4.5/32 to 6.7.8.9/32 -> 1.1.2.2/32 layer2

List of active MAP/Redirect filters:

List of active sessions:

Examples
--------
Prevent MAC address spoofing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suppose we have a domU with a interface vnic0, we may want to ensure:
- packets from this domU can have use its own source MAC address, preventing
this domU from pretending someone else
- packet from this source MAC address can only come from this domU, 
preventing
others from pretending this domU

say vnic0 has MAC address 11:22:33:44:55:66, the rules would be 
something like:

block out family ether from 11:22:33:44:55:66 to any
block out on vnic0 family ether from any to any
pass out on vnic0 family ether from 11:22:33:44:55:66 to any

Prevent IP address spoofing
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suppose we'd want to prevent a domU from using others' IP addresses, we can
probably go with:

block out on vnic0 from any to any layer2
pass out on vnic0 from 1.1.1.1 to any layer2

while 1.1.1.1 is the assigned IP address on vnic0

VLAN packets filtering
~~~~~~~~~~~~~~~~~~~~~~

Say we'd want to block some IP traffic, below are examples on how it is done
with regard to vlan:

- block all IP traffic regardless of VLAN

block in family ether type 0x800

- block all IP traffic belonging to VLAN:

block in family ether type 0x800 with vlan

- block all IP traffic NOT belonging to VLAN:

block in family ether type 0x800 with not vlan

- block all IP traffic for a specific VLAN (e.g. 100)

block in family ether type 0x800 vlan 100

Ioctl compatibility
-------------------
ABI compatibility with the old structure definitions is preserved by 
this case.

IPFILTER_VERSION (see ipnat(7i)) is used to keep track of user application's
version thus the old binaries can still work after this change. The 
kernel code
would handle the ioctl input/output based on the version number to make it a
compatible change. There's no change required for user applications 
using the
interfaces.

The related data structures natlookup_t and nat_t remain the same, and 
ioctls
SIOCGNATL/SIOCSTPUT will work correctly. User can set a flag, IPN_LAYER2,
in natlookup_t and nat_t, respectively, to indicate it is looking 
up/inserting
a layer 2 NAT session, or a layer 3 one. For compatibilities, by default the
flag is not set, which indicates a layer 3 session.


Interfaces
==========
+----------------------------------------+----------------+
| Interface                              | Classification |
+----------------------------------------+----------------+
| dls_devnet_macname2linkname            |     Private    |
| dls_devnet_linkname2macname            |     Private    |
+----------------------------------------+----------------+
| NE_NAME_CHANGE                         |    Committed   |
| NHF_ETHER                              |    Committed   |
| NHF_WIFI                               |    Committed   |
| NHF_IB                                 |    Committed   |
+----------------------------------------+----------------+
| "ipfilter_hook_eth_in"                 |   Uncommitted  |
| "ipfilter_hook_eth_out"                |   Uncommitted  |
| "ipfilter_hook_wifi_in"                |   Uncommitted  |
| "ipfilter_hook_wifi_out"               |   Uncommitted  |
| "ipfilter_hook_ib_in"                  |   Uncommitted  |
| "ipfilter_hook_ib_out"                 |   Uncommitted  |
+----------------------------------------+----------------+
| "family ether"                         |     Volatile   |
| "layer2"                               |     Volatile   |
+----------------------------------------+----------------+
| IPFILTER_VERSION                       |    Committed   |
| ioctl SIOCGNATL                        |    Committed   |
| ioctl SIOCSTPUT                        |    Committed   |
| ioctl SIOCSTLCK                        |    Committed   |
| struct natlookup                       |   Uncommitted  |
| struct nat                             |   Uncommitted  |
| IPN_LAYER2                             |     Volatile   |
| /usr/include/netinet/ipl.h             |   Uncommitted  |
| /usr/include/netinet/ip_fil.h          |   Uncommitted  |
| /usr/include/netinet/ip_nat.h          |   Uncommitted  |
+----------------------------------------+----------------+


Reply via email to