> The other question that came to my mind was, what do you do about overlapping > or duplicate fragments? I thought about this a little more and realized that > what you're doing is very similar to one of fragment crop or fragment > drop-ovl, but you haven't specified which. Once you choose a definite method > for dealing with fragmentation oddities, then your procedure reduces to:
This is another option that's probably better left to sysadmins. Basically, "wormhole" reassembles enough fragments to make filtration decisions reliably. After than, it effectively falls back on EITHER crop or drop, which should be up to the firewall's admin to choose (since it's equally simple to implement either). So you'd have something like "wormhole reassemble-drop-ovl" or "wormhole reassemble-crop-ovl". As for arguments that these kinds of tiny frags, such as ones too small to contain proper headers or overlapping frags, cannot be generated by legitimate applications, I strongly urge you all not to underestimate human capacity for utter stupidity. Don't forget that it violates standards to generate fragments with don't fragment sent, yet there is a significant entity (whose name need not be mentioned) which has nevertheless accomplished this amazing feat of utter brainlessness. I do recall watching a certain amount of debate ensuing as people's dependence on this service caused inconveniences to pf users and developers alike. By remaining more flexible, we stand to be less affected by 3rd-party idiocy. > But this is almost the same as fragment crop and fragment drop-ovl; all > you've > done is add the TCP segment part to the front, and that part is the same as > fragment reassemble. > > I am not a pf hacker, and I know nothing about the internals, but I suspect > that it would be fairly easy to implement this. Furthermore I don't think > that it carries any additional risks on top of those already in place for > fragment crop and fragment drop-ovl, in fact it should decrease them > somewhat. > > > Wormhole reassembly is mainly intended for low-throughput low-latency > networks, > > where store-and-forward time can be greatly reduced. > > Well, yes, but it also assumes you're on a medium-to-high trust network. If > you have a firewalled network with a VPN concentrator inside, and you have a > separate pf firewall between the VPN concentrator and the rest of your > network, then you can trust (hopefully) the first firewall to clear out most > IP layer oddities. But it's a bad idea to expose this sort of scheme to the > outside world. I'm sorry I didn't specify this in my first posting, but the fact that this is intended ONLY for medium trust networks is a consideration every administrator needs to take into account before selecting any normalization method. If your network is low-trust and you expect users to try to punch through firewalls or attack services using layer-3 or -4 techniques such as fragmentation, you really have no reason to use anything other than fragment reassemble. In fact, if you're on a very low trust network, you should probably consider some kind of TCP proxy for intense normalization of traffic all the way up to layer 4 or 5. I can't name any specific solution like this (other than inetd+nc, which is probably a stupid idea in a production environment) but I'm certain solutions exist. > Furthermore, one of your implicit assumptions is that both directions of the > TCP stream are slow. You gave an example of a star-configuration VPN each of > whose endpoints is a dialup link; but the VPN concentrator itself is probably > not on a dialup! In that case the benefit provided by this method is mostly > negated; either the sending side will be fast, so all the fragments will > arrive quickly, or the receiving side will be fast, so the reassembled > segment can be sent quickly. This is not necessarily so. The major factor to consider here is the *path* latency and throughput. The throughput achieved at any endpoint is only the throughput of the path. The latency of the path is the sum of the latencies of each segment of the path, and is largely determined by the size of the data quantum (the smallest unit that can be forwarded as soon as it's finished being stored). The sole reason for using wormholing is to reduce latency by reducing the effective data quantum. When I say my idea "doesn't compromise security," I refered to it not allowing attackers to access any point beyond a firewall they would not be able to access using fragment reassembly, and I still believe that is true. > You don't need a high-throughput connection for a memory exhaustion attack > like I'm proposing to do damage; it depends mostly on how the receiving > system > handles fragments. I can't remember when it was fixed--I think 3.2--but pf > used to try to use an infinite amount of space to reassemble fragments. But > kernel memory can't be paged out, so if an attacker sent enough fragments, > pf would panic. Even if the attacker didn't send that many packets, pf could > easily steal 64M of memory that it couldn't give back, and that was without > 64M of data being sent; that's enough to seriously hurt a system! A more > modern pf will start dropping incomplete segments, but if you send enough > fragments, you'll be able to attack legitimately fragmented segments. In this case, wormhole reassembly does not really affect pf's response to an assault of floating fragments (although less memory is now consumed, it wasn't an issue to begin with) except for one thing. Now the system behind pf will be expected to cache some reasonable-sized non-overlapping consecutive fragments (hopefully, a smaller number than were originally received by pf). The system being protected *will* be loaded down cacheing these fragments, but a good sysadmin should test the application to determine whether this risk is significant. If the throughput is low enough, the fragments on the protected machine will time out quick enough that the fragments never reach a high enough level to cause problems. > It sounds to me like you have in mind some specific application for this > technique, but I'm not sure what it is. If you shared what it was, we might > be able to find a better solution. Yes, I do. I'm constructing a star-configuration VPN out of a series of home LAN's, each on a DSL or cablemodem link. I know that star is probably not the best configuration for performance reasons, but other configurations would be too complex to organize (since I'm managing all these home LAN's alone, I need some simplicity). Also, there are not too many LAN's, so I'm not too worried about throughput issues; I don't need much, I just really want the virtual privacy. Now the problem comes down to latency. Since all the LAN's will be on ADSL or cablemodem links, each will only be able to transmit to the other at its slower upstream speed (some of these have extreme asymmetry, like 3M vs 128k). So if one non-central LAN tries to transmit to another non-central LAN, the throughput of the connection will probably be reasonable for my needs, but latency would be double that if one of the parties were the central LAN. I would like to achieve consistent (and preferrably minimal) latency between each endpoint. An attack using heavily fragmented packets can NOT be done, since fragment reassembly is already performed by the local OpenBSD router on each VPN. This means that a single packet may pass through this route: transmitter -> VPN client pf -> VPN master pf -> VPN client pf -> receiver The first client pf will reassemble this packet and forward it to the master. But the master reassembles all fragments by default, to ensure that these packets can be filtered, even if any VPN client router is compromised. The master then forwards to the other client, who has to reassemble before sending to the receiver. The major latency issue is caused by the master needing to cache the full packet before forwarding to the other client. If I could eliminate this, I would have acceptable latencies througout my VPN. Switching the master, or even all systems over to wormholing would improve latency between endpoints, and yet I would be certain that all IP-layer and transport-layer filtration rules in place were still being followed. Even if a DoS attack *were* possible through this configuration, I would not be concerned as most of my VPN clients are either OpenBSD systems, with their own fragment reassembly firewalls, or Windows systems, which do a fine job of crashing on their own without an attacker :-D. As you can see, wormholing is not necessary for most applications, but for that one system which is forwarding packets between two narrow links, latency can be much improved. So why don't I use drop or crop on that middle router, instead of suggesting this new scrub form? As I understand it, scrub rules are fairly simple in their implementation, and there's not an extreme amount of flexibility in being able to decide which fragments need to go into which scrubber. I don't think I could create a ruleset that scrubs all packets using the optimal scrubber and yet does not create leaks. There are some other reasons too. I want to ensure that, at least, the first fragment is normalized for remote filters, but I trust my pf's and client stacks to handle fragments after that reasonably well. I also think that the min-frag argument could be useful for reducing overhead with minimal sacrifice to latency; I'll need to do some tuning, but I'll probably strike a good balance between latency and throughput at some point. > I don't think that you could practically time things well enough; I think > there are too many random variables here. Even if you can, I'm not sure of > the utility of this method. Let's consider this. You have an SSH server on a 100-base-TX LAN. You habitually remote-admin the system from a location far away (fairly high latency). An attacker wishes to comprimise the root account, but has only managed to acquire a non-root user's account. Also, this same attack has found a way onto the LAN (very low, fairly predictable latency) although not close enough to the server itself. Now, in the course of trying to root the system, your attacker does not want you (root) to be able to intervene; he does not care whether he's detected in the process, so he can be as un-subtle as necessary. I know that this is a rather odd scenario (why can't the security guards just throw him out?) but there are possible cases where this could happen. You need to connect to sshd as root to lock the system down, go securelevel->2, kick the user, whatever. Now, this clever person has been testing your firewall, and has figured out the logic that goes into connection timeouts (you just connect, then send a test packet after a certain interval, and see if the connection has timed out; can be automated). The attacker also knows that a linear function is used to reduce state timeouts for each connection that's established. And the attacker also has a good guess as to the latency you'll be connecting from (say, he can guess you have at least 100ms of round trip time). Of course, the sysadmin has state max enabled for sshd, to prevent an attacker from exhausting all sshd connections and blocking out the admin, right? So this attacker opens a single SSH connection to the machine and logs in (now the state is established, and timeout reductions will never reduce the timeout to less than the attacker's keepalive interval). The attacker then runs a "script" (though it would have to be well-coded indeed) to occupy all remaining states available to sshd. When all states are occupied, naturally, pf reduces the timeouts for these connections, and all connections start to drop. Legit clients may be disconnected over time (esp. if the attacker finds some way to increase their latency), and the attacker need only maintain his state by sending keepalives fast enough. Now, the attacker's "plug" connections start timing out. So what can the attacker do? By timing the sending of a short burst of SYN's, based on his latency and what he knows about the timeouts, he can keep almost all states filled. Though this does not prevent the administrator from creating a state, the attacker can drive state timeouts very low indeed, below the 100ms round-trip time the sysadmin needs. So by the time the SYN/ACK arrives at the sysadmin's ssh client and his ACK arrives at pf, the state is gone. The sysadmin could always try to forge an ACK timed to arrive just after 50ms (half round-trip time) at the firewall and try to sneak in, but thanks to state modulation, this won't work :-D. Although you may think that the latency involved is unpredictable, one-way latencies on most networks, I find, have a small variance. And besides, the sysadmin only needs to manage to get a single connection and log in to stop the attacker. Now, suppose pf does not allow timeouts to be reduced so close to 0. Let's say that timeouts reach a certain "floor" value when the number of states fills up (such as 1 second). How does this affect the attacker? He can now occupy ALL states for up to a second. Since he has gathered timeout information, he should have an excellent guess when his connection will time out, and, being on a high-throughput low-latency network, he can time his burst of SYN packets to shut that tiny opening very fast. As the network conditions approach ideal this window gets smaller and smaller, and even on real networks, it can be extremely small (100 microseconds? less?). The sysadmin will have a much more difficult time timing SYN bursts to snatch up one of these states (although, if he gets a SYN through, he'll probably make the timeout and get in). Thus, the attacker likely has quite some time to get his root kit working before the admin can sneak in and catch him. So, now we add timeout modulation to pf. Nothing fancy. Just that each time a state is created, a random value percentage value is generated, between 0 and a user specified value (probably 0-10%). When setting a new timeout for a state (e.g. creation, or traffic has just arrived), instead of setting the state's timeout to the value equal to its stock timeout (e.g. instead of setting the timeout to 86400 for an established TCP optimization=normal), you set the timeout to that value minus the percentage for that state, and start counting down normally from there (that same state may end up at 81257, for instance). Instead of percent, one could use absolute seconds instead. Now, instead of knowing, within a couple hundreds or tens of milliseconds when a timeout will expire, your attacker is faced with a probabilistic function. Although he may know the long-term behavior of the system (e.g. what the average timeout is, the variance, or even the probability distribution) he can't time a burst very well now since timeouts can be entire seconds off his guess. So the sysadmin, using otherwise normal methods, can force a connection after a reasonable number of attempts (I wouldn't surprised if he made it within 100 or even 10 attempts, which could be made very close together). This means that if the root kit attempts take longer than the attacker expected, he's in quite a bit of trouble. The first time I thought about using probabilistic approaches to foil state exhaustion attacks was when I was researching the question "why aren't there syncookies for *BSD?" As it turns out, BSD TCP/IP stacks have a much simpler and more effective solution. When being SYNflooded, the default behavior [according to some postings I googled up a long time ago] is to drop one random old half-open connection for each new one received. Consider the alternatives. A stupid method would be to ignore all new connections altogether, which means that an attacker can totally close off a system by maintaining enough of a synflood. Another method is syncookies, which can eat up a lot of CPU power, which a server may not be able to spare. Another method is method 1 with adaptive timeout reduction (as per pf right now) which works much better, but can behave similar to method 1, if the attacker has enough power (i.e. is close enough to the victim) and can predict timeout values. The best method I've seen implemented so far is the "drop a random old connection" method, which is the only system that does not necessarily punish clients for having a high latency -- after all, your sysadmin may have much longer round-trip time than your attacker. In fact, the timing attack I mentioned above can even be used on higher-latency connections (though high-latency usually implies high-latency-variance, it's still possible). However, I'm pretty sure high-throughput is necessary, since SYNs within the burst need to be closely spaced, and also the attacker's SSH client connection needs to be able to work around SYN bursts. So why couldn't an attacker SYNflood the server to a point where timeouts are low enough that the attacker can't generate SYN's fast enough, and the sysadmin can get in? Other than the sysadmin's own latency, there is the problem of state quantization. If there are a max of 100 states occupied, and one drops, then there are 99 states occupied. This means that not all timeout values are possible, but only a certain discreet set. This set could be such that at 99 states, the timeouts are less than the admin's round trip time (50ms) but not less than the attacker's ability to generate SYN's (the attacker could easily generate SYN bursts at 50ms intervals). The goal here, remember, isn't an all-out SYN flood to take down the server, but merely to hammer out legit connections while the attacker pokes around in an otherwise healthy server. So, without state timeout modulation, there is a certain degree of doubt as to whether a sysadmin can always get in to his sshd in a time of crisis. But with state timeout modulation, the doubt is now on the shoulders of the attacker, who can't effectively predict state timeouts. It seems to me that it should always be the attacker doubting. Total predictability [determinism] is generally a bad thing in attack resistance, and can often be exploited for SOME devious purpose, regardless of where it's found. After all, I didn't always wear a white hat. I might also add that an alternative, or perhaps complimentary solution to state timeout modulation (it may be even more effective if combined) is random early drop for states. After reaching a threshold, new connections could randomly be denied, with a probability proportional to the number of remaining states occupied, reaching 100% when all states are full. This means that an attacker must create exponentially [I think] larger SYNbursts to occupy each additional state. All three methods (state timeout modulation, random state drop, and state timeout modulation with random early drop) are all effective at stopping the attack I've described, though they all have slightly different effects, and are probably most effective when properly combined. __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
