After an extended period on the sidelines, the layer 2 filtering project is resubmitting this PSARC fast track for consideration. No changes are being provided between this and the original specification as various parts of the design have been completely revisited, effectively obsoleting the original specification.
Additional materials (diff marked man pages) can be found in the case directory: gld.7d.txt hook_nic_event.9s.txt ipf.4.txt ipnat.4.txt net_getlifaddr.9f.txt Darren Abstract ======== This case will extend PSARC/2005/334, by adding the ability to intercept packets in MAC layer using the PFHooks infrastructure. This case only makes one change, an addition, to the interfaces that were committed to by PSARC/2008/219 (see "new NIC event" below for details) Release Biding -------------- This case seeks for a patch binding. Introduction ============ The PFHooks project, PSARC/2005/334, provide the ability to intercept packets in IP layer by adding hooks into the network stack. Since its integration, there has been customer requirements for the ability to intercept packets in MAC layer as well, also it is needed to enforce security rules for xVM guest domains and exclusive zones. Goals ----- This case seeks to meet the following goals: * provide the hooks in MAC layer that allows consumers to register on to intercept packets; * provide the netinfo interface for MAC layer that gives consumers access to interface information, and the ability to inject or emit packets directly; * modify IPFilter to allow the administrator to specify layer 2 rules, which includes ethernet filtering rules and IP Filtering/NAT rules. Out of scope ------------ This project only provides the ability to specify ethernet filtering rules to match ethernet packets, and IP filtering/NAT rules to match/modify IP packets at MAC layer. Providing the ability to specify rules to filter non-ethernet packets by matching the MAC header is out of scope for this project. The detailed design are described below for each major components. Netinfo interface for MAC layer =============================== netinfo interfaces ------------------ The hooks provided will generate events for NH_PHYSICAL_IN and NH_PHYSICAL_OUT, using the same interface as IPv4 and IPv6 do in PSARC/2005/334. The following functions will be supported through the netinfo(9f) framework: net_getifname() net_phylookup() net_phygetnext() net_getlifaddr() net_inject() net_getmtu() All of the other functions in the netinfo(9f) framework will return a value indicating that they are unsupported. The return values for the above functions only have meaning with the scope of the corresponding family - it is not correct to use a value returned by net_getifname() using the ethernet net_data_t handle with net_phylookup() for IPv4. The callback for these events will receive a pointer to a hook_pkt_event_t structure that has the following fields filled out: hpe_ifp - 0 for NH_PHYSICAL_OUT, otherwise a value indicating which interface the NH_PHYSICAL_IN event is associated with; hpe_ofp - 0 for NH_PHYSICAL_IN, otherwise a value indicating which interface the NH_PHYSICAL_OUT event is associated with; hpe_hdr - points to the start of the MAC header hpe_mb - points to the start of the mblk_t that holds hpe_hdr; hpe_mp - points to the mblk_t that is the start of the packet. Name to interface resolution ---------------------------- After Clearview UV all the data link related operations use link names, this applies to IPFilter as well. When the administrator wants to specify a rule that works on certain interface, link name is used to specify which interface this rule applies to. So link name consititutes the interface name for MAC layer netinfo. Since layer 2 filtering is based on the MAC client which Crossbow project is introducing, in this project we'll introduce the MAC client index as the MAC layer interface pointer, to uniquely indentify a MAC layer interface in the kernel. This is similar to the existing ifindex that is used as IP layer interface pointer today. Netinfo provides functions to translate from a interface name (link name) to the corresponding interface pointer (MAC client index) and back, via net_phylookup() and net_getifname(). And these functions can be called in data path so the existing procedures such as dls_mgmt_get_linkid() and dls_mgmt_get_linkinfo() cannot be used as they involve door calls. Thus we propose to add a link name <-> link id hash table in dls, and provide the following routines to translate between link name and mac name. The MAC layer netinfo will use these routines to implement mapping between link name and MAC client index. +------------------------------------------------------------+ | Interface | Classification | |------------------------------------------------------------| | dls_devnet_macname2linkname(const char *, | | | char *, const size_t); | consolidation | | dls_devnet_linkname2macname(const char *, | private | | char *, const size_t); | | +------------------------------------------------------------+ Table: Fuctions for link name and mac name mapping new NIC event ------------- The status of network in the operating system often changes, from unplugging a system from network temporarily, to an interface's IP address changing as a result of DHCP. Thus PFHooks framework provides event notification mechanism for this. The callback for these events will receive a pointer to a hook_nic_event_t structure that has the following fields filled out: hne_protocol - network protocol for events, returned from net_lookup hne_nic - physical interface associated with event hne_lif - logical interface (if any) associated with event hne_event - type of event occuring. The current list of events available is: NE_PLUMB indicates that an interface has just been created NE_UNPLUMB indicates that an interface has just been destroyed and that no more events should be received for it NE_UP indicates that an interface has changed state to "up" and may now generate packet events. NE_DOWN indicates that an interface has changed state to "down" and will no longer generate packet events. NE_ADDRESS_CHANGE indicates that an address on an interface has changed. hne_data - pointer to extra data about event or NULL if none hne_datalen - size of data pointed to by hne_data (can be 0) NE_NAME_CHANGE event ~~~~~~~~~~~~~~~~~~~~ As Clearview UV (PSARC/2006/499, PSARC/2007/527, PSARC/2008/002) introduces the ability to rename a data link, we need to capture this event in order to update IPFilter rules accrodingly. Thus we propose an extension to PSARC/2008/219 by adding a new hook event NE_NAME_CHANGE to nic_event_t to indicate the that an interface has been renamed, and this particular event is only available to layer 2 netinfo. In IP, changing of an interface name is represented by a NE_UNPLUMB and NE_PLUMB event pair. typedef enum nic_event { NE_PLUMB = 1, NE_UNPLUMB, NE_UP, NE_DOWN, NE_ADDRESS_CHANGE, + NE_NAME_CHANGE } nic_event_t; Design considerations ~~~~~~~~~~~~~~~~~~~~~ IPFilter rules always match by name, and only the current link names are used for matching, not old names. Uppon NE_NAME_CHANGE event, IPFilter will walk all the layer 2 rules, and resolve the interface name stored in the rule structure into interface pointers. So when the link is renamed, rules using old link names are invalidated, and rules using new link names are activated. If there's a filtering rule that applies to interface bge0, and someone renames bge0 to net0, then the rule no longer matches packets received on the link formally known as bge0. Also IPFilter has been designed to allow users to specify rules with interface names that do not exist at the time they are loaded, and for those interface names to be resolved at the time at which they're added to the system. Thus, the mapping from the linkname to the linkid needs to happen in the kernel. Changing IPFilter to use linkid instead of link name will not work. Protocol & Hook registration ============================ Protocol registration --------------------- With IP layer netinfo today we have 3 protocols, IPv4, IPv6 and ARP. For MAC layer, each of the MAC plugin type is treated as a different protocol, so we'll have ethernet, wifi and ib. These protocols will be registered by using net_protocol_register() when the corresponding MAC plugin gets loaded. Hook registration ----------------- IPFilter will register hooks for MAC layer protocols in the following cases: when the first ethernet filtering rule is added - register the ethernet hook when the first "layer2" IP filtering/NAT rule is added - register the ethernet, wifi and ib hooks when it receives a notification indicating that a protocol is registered - register the hook if there are rules for that corresponding protocol. Since layer 2 filtering functionality is enabled automatically when the first layer 2 rule is added, the corresponding hook needs to be registered then so packets can be passed to IPFilter from the hook framework. It is possible that a rule for a layer 2 protocol is added before the corresponding protocol is registered. Suppose user has added a layer 2 IP filtering rule on a system that only has ethernet cards, then he plugs a wifi card into the system and sets it up, in this case when the wifi MAC plugin is loaded, the protocol will be registered, and IPFilter will be notified via the callback notification mechanism provided by the PFHooks API project, and it will register the hook for that protocol so it can receive and match wifi packets. IPFilter changes ================ Users can use ipf(1M) to add ethernet filtering rules in addition to IP filtering rules, these ethernet filtering rules are marked with "family ether". They can also add IP Filtering/NAT rules and mark them with "layer2" keyword so these rules will be processed in MAC layer instead of IP layer. Unlike IPv6, no special command line switch is required to load these rules. The "layer2" IP filtering/NAT rules go to existing ipf.conf, ipf6.conf and ipnat.conf, respectively. The "family ether" rules go to a new configuration file ipf-ether.conf. The layer 2 filtering functionality will be enabled automatically when the first ethernet rule or "layer2" IPFilter rule is added, and disabled when the last such rule is removed. This functionality is only available in global zone. Also, ipmon has been updated to print out log records with ethernet information but the output of this command is volatile. Rule processing --------------- Currently processing order in IPFilter is: [INPUT] -> IP NAT -> IP firewall -> { IP } -> IP firewall -> IP NAT -> [OUTPUT] With layer 2 filtering the processing order would become: [INPUT] -> L2 firewall -> "layer2" IP NAT -> "layer2" IP firewall -> ... -> IP NAT -> IP firewall -> { IP } -> IP firewall -> IP NAT -> ... -> L2 firewall -> "layer2" IP firewall -> "layer2" IP NAT -> [OUTPUT] Input processing ~~~~~~~~~~~~~~~~ Take input processing for an IP packet for example: - MAC level filtering rules are processed first. These rules match on MAC headers to determine if a packet should be passed or blocked. Administrators use these rules to match with MAC addresses, MAC type, VLAN ID, .etc. - L2filter jump over the MAC header, determine if this is an IP packet, and do some sanity checking before passing it up to "layer2" IP rules for further processing. - Then "layer2" IP NAT rules are processed. Like IP layer NAT rules, these rules do NAT for IP packets, but it is done at MAC layer instead of IP layer. - Then "layer2" IP Filtering rules are processed. These rules provide IP Filtering at MAC layer. - L2filter finishes processing and the packet is delivered up in the stack. When the packet reaches IP, IP layer filtering/NAT processing is invoked, and it works just as it does today. Design considerations ~~~~~~~~~~~~~~~~~~~~~ This processing order is designed so that - The processing order between IP NAT rules and IP Filtering rules is consistent with existing IPFilter today; - Since MAC level filtering rules is processed before layer2 IP rules, down the road it is possible to combine filtering at both level together, allow or block a packet based on a mixture of L2 and L3 criteria, thus providing more fine grained control. Changes to output ----------------- With layer 2 filtering, each type of rules have its own distinct orders, the output of ipfstat/ipnat has been modified so that the rules are shown in a manner to let the users better understand the processing orders. The change only applies to global zone, output in non-global zones remain unchanged. Example ~~~~~~~ # ipfstat -io Ethernet rules: empty list for ipfilter(out) pass in family ether all pass in family ether from 1:2:3:4:5:6 to any layer 2 IP rules: empty list for ipfilter(out) pass in proto icmp from 1.1.1.1 to 2.2.2.2 layer2 block in proto tcp from 3.3.3.3 to 4.4.4.4 layer2 IP rules: pass in all pass out all # ipnat -l List of layer 2 active MAP/Redirect filters: map bge1 from 2.3.4.5/32 to 6.7.8.9/32 -> 1.1.2.2/32 layer2 List of active MAP/Redirect filters: List of active sessions: Examples -------- Prevent MAC address spoofing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppose we have a domU with a interface vnic0, we may want to ensure: - packets from this domU can have use its own source MAC address, preventing this domU from pretending someone else - packet from this source MAC address can only come from this domU, preventing others from pretending this domU say vnic0 has MAC address 11:22:33:44:55:66, the rules would be something like: block out family ether from 11:22:33:44:55:66 to any block out on vnic0 family ether from any to any pass out on vnic0 family ether from 11:22:33:44:55:66 to any Prevent IP address spoofing ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Suppose we'd want to prevent a domU from using others' IP addresses, we can probably go with: block out on vnic0 from any to any layer2 pass out on vnic0 from 1.1.1.1 to any layer2 while 1.1.1.1 is the assigned IP address on vnic0 VLAN packets filtering ~~~~~~~~~~~~~~~~~~~~~~ Say we'd want to block some IP traffic, below are examples on how it is done with regard to vlan: - block all IP traffic regardless of VLAN block in family ether type 0x800 - block all IP traffic belonging to VLAN: block in family ether type 0x800 with vlan - block all IP traffic NOT belonging to VLAN: block in family ether type 0x800 with not vlan - block all IP traffic for a specific VLAN (e.g. 100) block in family ether type 0x800 vlan 100 Ioctl compatibility ------------------- ABI compatibility with the old structure definitions is preserved by this case. IPFILTER_VERSION (see ipnat(7i)) is used to keep track of user application's version thus the old binaries can still work after this change. The kernel code would handle the ioctl input/output based on the version number to make it a compatible change. There's no change required for user applications using the interfaces. The related data structures natlookup_t and nat_t remain the same, and ioctls SIOCGNATL/SIOCSTPUT will work correctly. User can set a flag, IPN_LAYER2, in natlookup_t and nat_t, respectively, to indicate it is looking up/inserting a layer 2 NAT session, or a layer 3 one. For compatibilities, by default the flag is not set, which indicates a layer 3 session. Interfaces ========== +----------------------------------------+----------------+ | Interface | Classification | +----------------------------------------+----------------+ | dls_devnet_macname2linkname | Private | | dls_devnet_linkname2macname | Private | +----------------------------------------+----------------+ | NE_NAME_CHANGE | Committed | | NHF_ETHER | Committed | | NHF_WIFI | Committed | | NHF_IB | Committed | +----------------------------------------+----------------+ | "ipfilter_hook_eth_in" | Uncommitted | | "ipfilter_hook_eth_out" | Uncommitted | | "ipfilter_hook_wifi_in" | Uncommitted | | "ipfilter_hook_wifi_out" | Uncommitted | | "ipfilter_hook_ib_in" | Uncommitted | | "ipfilter_hook_ib_out" | Uncommitted | +----------------------------------------+----------------+ | "family ether" | Volatile | | "layer2" | Volatile | +----------------------------------------+----------------+ | IPFILTER_VERSION | Committed | | ioctl SIOCGNATL | Committed | | ioctl SIOCSTPUT | Committed | | ioctl SIOCSTLCK | Committed | | struct natlookup | Uncommitted | | struct nat | Uncommitted | | IPN_LAYER2 | Volatile | | /usr/include/netinet/ipl.h | Uncommitted | | /usr/include/netinet/ip_fil.h | Uncommitted | | /usr/include/netinet/ip_nat.h | Uncommitted | +----------------------------------------+----------------+