Failed sysupgrade from 6.6 to 6.7 amd64
After all these years of trouble-free upgrades, I ran into my first problem. I used sysupgrade to go from 6.6/amd64 to 6.7. The upgrade process was successful, but after bsd.upgrade did its thing and rebooted the system, the new kernel would not boot. It got to the "boot>" prompt, started loading the kernel, but then the system would reboot right after showing "booting hd0a:bsd: 12957+2753552..." line. I tried booting bsd.sp, bsd.rd, and bsd.booted with identical results. Was able to boot from cd67.iso. Tried downloading the original kernel, but that didn't work either. Re-running the upgrade didn't help. Finally, decided to upgrade to 6.8, so did that from cd68.iso, which fixed the problem. I also replaced bootx64.efi file on the EFI partition after this upgrade, but I'm not actually sure if it was different or not. Obviously curious as to what the issue may have been, but mostly wondering whether any upgrade steps may have been missed as a result of never fully booting the 6.7 OS and running post-upgrade steps there. Thanks!
Re: Disk I/O performance of OpenBSD 5.9 on Xen
On Sat, Jul 16, 2016 at 6:37 AM, Mike Belopuhov <m...@belopuhov.com> wrote: > On 14 July 2016 at 14:54, Maxim Khitrov <m...@mxcrypt.com> wrote: >> On Wed, Jul 13, 2016 at 11:47 PM, Tinker <ti...@openmailbox.org> wrote: >>> On 2016-07-14 07:27, Maxim Khitrov wrote: >>> [...] >>>> >>>> No, the tests are run sequentially. Write performance is measured >>>> first (20 MB/s), then rewrite (12 MB/s), then read (37 MB/s), then >>>> seeks (95 IOPS). >>> >>> >>> Okay, you are on a totally weird platform. Or, on an OK platform with a >>> totally weird configuration. >>> >>> Or on an OK platform and configuration with a totally weird underlying >>> storage device. >>> >>> Are you on a magnet disk, are you using a virtual block device or virtual >>> SATA connection, or some legacy interface like IDE? >>> >>> I get some feeling that your hardware + platform + configuration crappiness >>> factor is fairly much through the ceiling. >> >> Dell R720 and R620 servers, 10 gigabit Ethernet SAN, Dell MD3660i >> storage array, 1.2 TB 10K RPM SAS disks in RAID6. I don't think there >> is anything crappy or weird about the configuration. Test results for >> CentOS on the same system: 170 MB/s write, 112 MB/s rewrite, 341 MB/s >> read, 746 IOPS. >> >> I'm assuming that there are others running OpenBSD on Xen, so I was >> hoping that someone else could share either bonnie++ or even just dd >> performance numbers. That would help us figure out if there really is >> an anomaly in our setup. >> > > Hi, > > Since you have already discovered that we don't provide a driver > for the paravirtualized disk interface (blkfront), I'd say that most likely > your setup is just fine, but emulated pciide performance is subpar. > > I plan to implement it, but right now the focus is on making networking > and specifically interrupt delivery reliable and efficient. > > Regards, > Mike Hi Mike, Revisiting this issue with OpenBSD 6.1-RELEASE and the new xbf driver on XenServer 7.0. The write performance is much better at 74 MB/s (still slower than other OSs, but good enough). IOPS also improved from 95 to 167. However, the read performance actually got worse and is now at 16 MB/s. Here are the full bonnie++ results: Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP web4.dhcp.bhsai. 8G 76191 43 10052 17 16044 25 167.3 43 Latency 168ms 118ms 416ms 488ms Here are two dd runs for writing and reading: $ dd if=/dev/zero of=test bs=1M count=2048 2147483648 bytes transferred in 25.944 secs (82771861 bytes/sec) $ dd if=test of=/dev/null bs=1M 2147483648 bytes transferred in 123.505 secs (17387767 bytes/sec) Here's the dmesg output: pvbus0 at mainbus0: Xen 4.6 xen0 at pvbus0: features 0x2705, 32 grant table frames, event channel 3 xbf0 at xen0 backend 0 channel 8: disk scsibus1 at xbf0: 2 targets sd0 at scsibus1 targ 0 lun 0: <Xen, phy xvda 768, > SCSI3 0/direct fixed sd0: 73728MB, 512 bytes/sector, 150994944 sectors xbf1 at xen0 backend 0 channel 9: cdrom xbf1: timed out waiting for backend to connect Any ideas on why the read performance is so poor? Thanks, Max
Re: Disk I/O performance of OpenBSD 5.9 on Xen
On Wed, Jul 13, 2016 at 11:47 PM, Tinker <ti...@openmailbox.org> wrote: > On 2016-07-14 07:27, Maxim Khitrov wrote: > [...] >> >> No, the tests are run sequentially. Write performance is measured >> first (20 MB/s), then rewrite (12 MB/s), then read (37 MB/s), then >> seeks (95 IOPS). > > > Okay, you are on a totally weird platform. Or, on an OK platform with a > totally weird configuration. > > Or on an OK platform and configuration with a totally weird underlying > storage device. > > Are you on a magnet disk, are you using a virtual block device or virtual > SATA connection, or some legacy interface like IDE? > > I get some feeling that your hardware + platform + configuration crappiness > factor is fairly much through the ceiling. Dell R720 and R620 servers, 10 gigabit Ethernet SAN, Dell MD3660i storage array, 1.2 TB 10K RPM SAS disks in RAID6. I don't think there is anything crappy or weird about the configuration. Test results for CentOS on the same system: 170 MB/s write, 112 MB/s rewrite, 341 MB/s read, 746 IOPS. I'm assuming that there are others running OpenBSD on Xen, so I was hoping that someone else could share either bonnie++ or even just dd performance numbers. That would help us figure out if there really is an anomaly in our setup.
Re: Disk I/O performance of OpenBSD 5.9 on Xen
On Wed, Jul 13, 2016 at 11:10 AM, Tinker <ti...@openmailbox.org> wrote: > On 2016-07-13 22:57, Maxim Khitrov wrote: >> >> On Wed, Jul 13, 2016 at 10:53 AM, Tinker <ti...@openmailbox.org> wrote: >>> >>> On 2016-07-13 20:01, Maxim Khitrov wrote: >>>> >>>> >>>> We're seeing about 20 MB/s write, 35 MB/s read, and 70 IOPS >>> >>> >>> >>> What do you mean 70, you mean 70 000 IOPS? >> >> >> Sadly, no. It was actually 95, I looked at the wrong column before: >> >> Write (K/sec), %cpu, Rewrite (K/sec), %cpu, Read (K/sec), %cpu, Seeks >> (/sec), %cpu >> 20075, 22, 12482, 42, 37690, 47, 95.5, 68 > > > So that is.. 20075 + 12482 + 37690 = 70247 IOPS? > > or 70MB/sec total throughput? No, the tests are run sequentially. Write performance is measured first (20 MB/s), then rewrite (12 MB/s), then read (37 MB/s), then seeks (95 IOPS).
Re: Disk I/O performance of OpenBSD 5.9 on Xen
On Wed, Jul 13, 2016 at 10:53 AM, Tinker <ti...@openmailbox.org> wrote: > On 2016-07-13 20:01, Maxim Khitrov wrote: >> >> We're seeing about 20 MB/s write, 35 MB/s read, and 70 IOPS > > > What do you mean 70, you mean 70 000 IOPS? Sadly, no. It was actually 95, I looked at the wrong column before: Write (K/sec), %cpu, Rewrite (K/sec), %cpu, Read (K/sec), %cpu, Seeks (/sec), %cpu 20075, 22, 12482, 42, 37690, 47, 95.5, 68
Disk I/O performance of OpenBSD 5.9 on Xen
Hi all, We're seeing about 20 MB/s write, 35 MB/s read, and 70 IOPS with OpenBSD 5.9 amd64 on XenServer 7.0 (tested using bonnie++). The virtual disks are LVM over iSCSI. Linux hosts get well over 100 MB/s in both directions. I'm assuming that this is because there is no disk driver for Xen yet, but I wanted to see if others are getting similar numbers. Any suggestions for improving this performance? -Max
Re: APC UPS & sensorsd - how?
On Wed, Feb 24, 2016 at 3:38 AM, lilit-aibolitwrote: > On 03/22/2015 05:44 PM, T. Ribbrock wrote: >> >> Then, I re-applied power, but that, too, was never flagged by sensorsd. >> For some reason, it looks like sensorsd only ever detects a status change >> (for these rules) when it gets started - but not afterwards. Regards, Thomas > > Have you succeed with getting status change while sensord is running? low=0:high=0 has been working well for me: https://marc.info/?l=openbsd-misc=144529176814155=2
Re: sensorsd, upd, and state changes
On Mon, Dec 8, 2014 at 3:45 PM, David Higgswrote: > On Mon, Dec 8, 2014 at 3:37 PM, trondd wrote: >> On Mon, Dec 8, 2014 at 3:23 PM, trondd wrote: >>> On Mon, Dec 8, 2014 at 11:47 AM, David Higgs wrote: sysctl(8) will display Off if the value is zero, and On for nonzero. So, using the "closed interval" rule above, you should use "high=0" for indicators that you consider in "good" state when Off (i.e. ShutdownImminent), and "low=1" for indicators that you consider in "good" state when On (i.e. ACPresent). >>> >>> Isn't saying high=0 kind of the same thing as saying low=1? >> >> >> Oh, I think I get this. Since the sensor doesn't trigger if it is on the >> limit, only outside the limit, you have to set up which is the OK state. >> >> Still a little confusing but I guess there is no way to automatically know >> if an indicator is supposed to be Off or On when it's in it's good state? >> > > Kind of. The high/low difference is what values you consider "within" > normal operating parameters (and the output of %l). The upd(4) code > hasn't yet been taught how to map specific indicator values to OK / > WARN / CRITICAL status. Currently any value successfully read is > marked OK. > > I'm working with tech@ and slowly writing diffs to improve these things. > > --david Resurrecting an old thread since I just ran into the same problem in 5.8. To summarize, upd(4) exposes some SENSOR_INDICATOR-type sensors for attached UPSes, such as ACPresent = On/Off, and it's not clear how to configure sensorsd(8) to execute a command when this value changes. Also, upd always sets sensor status to "OK," so sensorsd never triggers commands for status changes; we have to use low/high limits until this is fixed. One proposed hack was to use "low=1:high=2" in sensorsd.conf, but this doesn't seem to work for everybody. Has anyone tried using "low=0:high=0"? I'm pretty sure that should solve the problem in all cases. The low/high range is inclusive at both ends. Off is 0, but On can be any other int64 value, including negative. For my UPS, ACPresent = On is actually a value of -1. I know this because when I set "low=-1:high=-1" sensorsd reports "upd0.indicator2: within limits: On". That being the case, "low=1:high=2" would not work because the value changes from -1 (On) to 0 (Off), and is always below the lower limit. Using "low=0:high=0" should always work for On -> Off -> On transitions, but it will show On as outside (below or above) the limits. If you want On to be within limits, then just play with the values until you figure out whether On is 1, -1, or something else entirely. That may not be as reliable. I'm not actually sure whether this value is UPS-specific or something that upd determines. -Max
Re: sensorsd, upd, and state changes
On Mon, Oct 19, 2015 at 2:31 PM, David Higgs <hig...@gmail.com> wrote: > On Mon, Oct 19, 2015 at 11:11 AM, Maxim Khitrov <m...@mxcrypt.com> wrote: >> >> On Mon, Dec 8, 2014 at 3:45 PM, David Higgs <hig...@gmail.com> wrote: >> > On Mon, Dec 8, 2014 at 3:37 PM, trondd <tro...@gmail.com> wrote: >> >> On Mon, Dec 8, 2014 at 3:23 PM, trondd <tro...@gmail.com> wrote: >> >>> On Mon, Dec 8, 2014 at 11:47 AM, David Higgs <hig...@gmail.com> wrote: >> >>>> >> >>>> >> >>>> sysctl(8) will display Off if the value is zero, and On for nonzero. >> >>>> So, using the "closed interval" rule above, you should use "high=0" >> >>>> for indicators that you consider in "good" state when Off (i.e. >> >>>> ShutdownImminent), and "low=1" for indicators that you consider in >> >>>> "good" state when On (i.e. ACPresent). >> >>>> >> >>> >> >>> Isn't saying high=0 kind of the same thing as saying low=1? >> >> >> >> >> >> Oh, I think I get this. Since the sensor doesn't trigger if it is on >> >> the >> >> limit, only outside the limit, you have to set up which is the OK >> >> state. >> >> >> >> Still a little confusing but I guess there is no way to automatically >> >> know >> >> if an indicator is supposed to be Off or On when it's in it's good >> >> state? >> >> >> > >> > Kind of. The high/low difference is what values you consider "within" >> > normal operating parameters (and the output of %l). The upd(4) code >> > hasn't yet been taught how to map specific indicator values to OK / >> > WARN / CRITICAL status. Currently any value successfully read is >> > marked OK. >> > >> > I'm working with tech@ and slowly writing diffs to improve these things. >> > >> > --david >> >> Resurrecting an old thread since I just ran into the same problem in >> 5.8. To summarize, upd(4) exposes some SENSOR_INDICATOR-type sensors >> for attached UPSes, such as ACPresent = On/Off, and it's not clear how >> to configure sensorsd(8) to execute a command when this value changes. >> Also, upd always sets sensor status to "OK," so sensorsd never >> triggers commands for status changes; we have to use low/high limits >> until this is fixed. One proposed hack was to use "low=1:high=2" in >> sensorsd.conf, but this doesn't seem to work for everybody. >> >> Has anyone tried using "low=0:high=0"? I'm pretty sure that should >> solve the problem in all cases. >> >> The low/high range is inclusive at both ends. Off is 0, but On can be >> any other int64 value, including negative. For my UPS, ACPresent = On >> is actually a value of -1. I know this because when I set >> "low=-1:high=-1" sensorsd reports "upd0.indicator2: within limits: >> On". That being the case, "low=1:high=2" would not work because the >> value changes from -1 (On) to 0 (Off), and is always below the lower >> limit. >> >> Using "low=0:high=0" should always work for On -> Off -> On >> transitions, but it will show On as outside (below or above) the >> limits. If you want On to be within limits, then just play with the >> values until you figure out whether On is 1, -1, or something else >> entirely. That may not be as reliable. I'm not actually sure whether >> this value is UPS-specific or something that upd determines. > > > Yes, the values reported are UPS-specific. You may need to adjust the > ranges, but (as previously discussed) you can just use either high or low > (not both) to detect transition between good and bad indicator states. Why not both? The low limit is initialized to LLONG_MIN and high to LLONG_MAX. For "indicator" sensors, the logic we are trying to express is either value == 0 or value != 0. For the former (i.e. a sensor that should be "Off" normally), "low=0:high=0" is exactly what you want. For the latter, sensorsd.conf doesn't give you a way of negating the range (possible feature request?), but if you know that ACPresent = On is really -1 for your UPS, then "high=-1" is sufficient. This is, of course, assuming that the On value will never be positive in the future. I just tested all of this, and it works perfectly. For UPSes that use 1 to indicate On, instead of "low=1:high=2" you can simplify that to "low=1". Alternatively, use "low=0:high=0" everywhere, which will be the most reliable method, and provide an extra parameter to your script to indicate which value to consider "normal." The downside is that sensorsd will complain when the value is On and stay silent when it's Off. -Max
Re: Firewall question: is using a NIC with multiple jacks considered insecure?
On Mon, Jul 27, 2015 at 7:37 AM, Christian Weisgerber na...@mips.inka.de wrote: On 2015-07-27, Quartz qua...@sneakertech.com wrote: Some years ago I remember reading that when using OpenBSD (or any OS, really) as a router+firewall it was considered inadvisable from a security standpoint to have the different networks all attached to a single network card with multiple ethernet ports. The thinking being that it was theoretically possible for an attacker to exploit bugs in the card's chip to short circuit the path and route packets directly across the card in a way pf can't control. It was also suggested that in addition to using different physical cards, the cards should really use different chipsets too, in case an unknown driver bug allows a short circuit. Those are not realistic concerns. Intel 82574L packet of death comes to mind as one example of a bug in the EEPROM that allowed an attacker to bring down an interface: http://blog.krisk.org/2013/02/packets-of-death.html These days you have bypass features in hardware that allow packets to flow from one interface to another even if the firewall is turned off. Who knows what other bugs in such functionality will be discovered in the future? Having said that, just throwing random chipsets into the mix is probably not the right solution. You may actually be increasing your attack surface. If this is a real concern for you, I think multiple firewalls, one behind the other (and using different chipsets, if you really want to), is a better way to go.
Re: Firewall question: is using a NIC with multiple jacks considered insecure?
On Mon, Jul 27, 2015 at 11:10 AM, Quartz qua...@sneakertech.com wrote: These days you have bypass features in hardware that allow packets to flow from one interface to another even if the firewall is turned off. Can you elaborate on this? Search for intel nic bypass mode and you'll find lots of details. It's an increasingly common feature in server network adapters. If the host OS is down, the NIC continues forwarding packets between two ports without any processing. Some older implementations used a physical jumper to enable or disable this feature. Now it's all done in software and can even be configured remotely. For example: http://www.lannerinc.com/applications/product-features/lan-bypass
Re: OpenBSD 5.7 Released
On Fri, May 1, 2015 at 4:00 AM, OpenBSD Store Misc m...@openbsdstore.com wrote: one of the master CD's was damaged in transit to the production facility The NSA agent needed more time to record an alternate version of the song.
Re: pf to read protocol information from /etc/services ?
On Fri, Feb 27, 2015 at 3:40 PM, Research resea...@nativemethods.com wrote: UDP is meaningless in the context of HTTP. Well, actually... https://en.wikipedia.org/wiki/QUIC Not really standard, but still. I now allow UDP on ports 80 and 443 to make Google Chrome happy.
Preserving unbound cache across reboots
Hi all, I wrote two simple functions for rc.shutdown and rc.login that save/restore unbound cache when the system is restarted. Since each record has a relative TTL field, the cache can only be restored within a short time window to avoid serving stale data to clients. I set this window to 10 minutes; enough to survive a reboot, but not for any extended downtime. Is there any interest in including this functionality in the base OS (moved to /etc/rc)? - Max --- /var/backups/etc_rc.shutdown.currentMon Aug 4 21:03:16 2014 +++ /etc/rc.shutdownFri Jan 30 10:06:11 2015 @@ -8,3 +8,17 @@ powerdown=NO # set to YES for powerdown # Add your local shutdown actions here. + +save_unbound_cache() { + local db=/var/db/unbound.cache + /etc/rc.d/unbound check || return + echo -n 'saving unbound cache: ' + if unbound-control dump_cache $db; then + chmod 0600 $db + echo 'done.' + else + rm -f $db + fi +} + +save_unbound_cache --- /var/backups/etc_rc.local.current Mon Aug 4 21:03:16 2014 +++ /etc/rc.local Fri Jan 30 10:07:00 2015 @@ -4,3 +4,17 @@ # can be done AFTER your system goes into securemode. For actions # which should be done BEFORE your system has gone into securemode # please see /etc/rc.securelevel. + +restore_unbound_cache() { + local db=/var/db/unbound.cache + test -s $db /etc/rc.d/unbound check || return + echo -n 'restoring unbound cache: ' + if [ $(($(date '+%s') - $(stat -qf '%m' $db))) -lt 600 ]; then + unbound-control load_cache $db + else + echo 'failed (cache expired).' + fi + rm -f $db +} + +restore_unbound_cache
Re: Preserving unbound cache across reboots
On Fri, Jan 30, 2015 at 12:54 PM, Ingo Schwarze schwa...@usta.de wrote: Hi, Maxim Khitrov wrote on Fri, Jan 30, 2015 at 10:22:23AM -0500: I wrote two simple functions for rc.shutdown and rc.login that save/restore unbound cache when the system is restarted. Since each record has a relative TTL field, the cache can only be restored within a short time window to avoid serving stale data to clients. I set this window to 10 minutes; enough to survive a reboot, but not for any extended downtime. Is there any interest in including this functionality in the base OS (moved to /etc/rc)? The purpose of rebooting is to reset the system to a clean state, so clearing caches looks like a feature rather than a bug. Given that even unbound-control reload flushes the cache, a reboot should certainly do that, too. So i wouldn't even recommend showing this to people as something they might add to their local scripts if they want to. It just seems wrong. Also note that the unbound-control(8) manual explicitly marks load_cache as a debugging feature and warns that it may cause wrong data to be served. On top of that, the version of unbound(8) running after the reboot might be newer than the version running before, so compatibility is questionable as well, so your proposal is very fragile at best. Besides, even if the goal would be desirable, which it is not, my feeling is that this code is too specialized for adding to the boot scripts. Fair enough, though I would note that this feature is available in pfSense, which is also using unbound. Some resolvers persist their cache to disk automatically, so it's not that strange of an idea. I wanted to share the code anyway for others who might be interested in doing the same thing. My thinking on this is that if the cache was valid before the reboot, there is no good reason to clear it two minutes later just because the kernel was upgraded. It creates a traffic spike and a noticeable performance hit for the clients, especially with DNSSEC enabled. An explicit reload is different because you do it when you change the unbound configuration. Version upgrades are easily handled and I've now added that to my scripts, so thanks for the suggestion.
Re: pf: question about tables derived from interface group
On Sun, Dec 28, 2014 at 6:38 AM, Harald Dunkel ha...@afaics.de wrote: Hi folks, pfctl can give me an extended list of tables showing interface group names, self, etc. Sample: # pfctl -g -sT egress egress:0 extern extern:network intern:network nospamd self spamd-white unroutable How can I query the value of the special tables? These tables are under the hidden _pf anchor: pfctl -a _pf -t extern -T show
Re: pf: question about tables derived from interface group
On Sun, Dec 28, 2014 at 9:35 AM, Harald Dunkel ha...@afaics.de wrote: On 12/28/14 13:51, Maxim Khitrov wrote: These tables are under the hidden _pf anchor: pfctl -a _pf -t extern -T show Thats cool. Where did you find this? Searching on openbsd.org for _pf revealed only http://www.openbsd.org/papers/ven05-henning/mgp00011.txt . This is surely something that should go to the man page or to the FAQs for pf. Read the source code when I wanted to know how (if) was implemented and whether there is any performance penalty associated with this construct.
Re: OT: Does OpenBSD run on SuperMicro MicroCloud models, and may be on 5037MC-H12TRF
On Thu, May 15, 2014 at 8:51 PM, Daniel Ouellet dan...@presscom.net wrote: I was also looking at these two if the above one wasn't supported. But if I remember the Atom SoC one is not working on OpenBSD yet, but I could be wrong. SuperServer 5038MA-H24TRF http://www.supermicro.com/products/system/3U/5038/SYS-5038MA-H24TRF.cfm I have the Supermicro A1SRi-2758F motherboard (Atom C2758 Rangeley). No issues running OpenBSD 5.5 amd64.
Support for Intel QuickAssist on Atom Rangeley CPUs?
I'm about to purchase a new Supermicro Atom board for a firewall. The decision is between Atom C2750 (Avoton) and C2758 (Rangeley) CPUs. The latter is marketed as a communications processor and exchanges Turbo Boost for QuickAssist, which seems to be an FPGA-type thing for accelerating certain cryptographic and data compression functions. Is there support for this in OpenBSD and does anyone have any practical experience with using this hardware for VPN, SSL/TLS, or anything else of that sort? I'm not sure whether the 2.4 - 2.6 GHz Turbo Boost on C2750 will make any significant performance difference for a firewall, but I'd rather have that if QuickAssist is not supported. - Max
Re: When are default 'set prio' priorities set?
On Fri, Dec 20, 2013 at 4:11 PM, Maxim Khitrov m...@mxcrypt.com wrote: I was under the impression that the packet priority was always set to 3 prior to the pf ruleset evaluation (ignoring VLAN and CARP for a moment), and that 'set prio' on an inbound rule only affected returning traffic that matched the state entry. Here's an artificial example: pass out on $wan pass in on $lan set prio 7 What will be the priority of outbound packets on the $wan interface, 3 or 7? Looking at the code in pf.c, the priority is copied to m-m_pkthdr.pf.prio, but I'm not sure where this value is initialized or reset. I think I figured this out, but I would appreciate a confirmation. The m_pkthdr.pf.prio value is set to IFQ_DEFPRIO (3) in sys/kern/uipc_mbuf.c when a new mbuf is allocated. It is not modified after that except by pf rules. Therefore, packets going out on $wan in my example will have their priority set to 7. Essentially, priorities behave the same as tags. The difference is that priorities are saved in the state entries, so all subsequent packets coming in on $lan and matching an existing state will have a priority of 7 when going out on $wan. Returning packets will keep a default priority of 3 after crossing $wan, but this will be changed to 7 when they match the state outbound on $lan. Correct?
Re: How to segregate forwarded and firewall-generated traffic in pf?
On Thu, Dec 19, 2013 at 8:33 AM, Camiel Dobbelaar c...@sentia.nl wrote: On 18/12/13 22:32, Camiel Dobbelaar wrote: On 18/12/13 14:50, Maxim Khitrov wrote: On Wed, Dec 18, 2013 at 8:42 AM, Camiel Dobbelaar c...@sentia.nl wrote: On 18/12/13 13:53, Maxim Khitrov wrote: When writing outbound rules in pf, is there an accepted best practice for only matching packets that are either forwarded or firewall-generated? The best that I could come up with is 'received-on all' as a way of identifying forwarded packets, but that option can't be negated to match packets that were not received on any inbound interface (i.e. those generated by the firewall itself). Another option is 'from (self)', but then you have to be careful with any preceding nat rules. Ideally, I want a solution that doesn't depend on the context. I also tried to use tags in combination with 'received-on', but it became rather messy and created conflicts with other tag usage. What is everyone else using to solve this problem? Check the user option in pf.conf(5): user user This rule only applies to packets of sockets owned by the specified user. For outgoing connections initiated from the firewall, this is the user that opened the connection. For incoming connections to the firewall itself, this is the user that listens on the destination port. For forwarded connections, where the firewall is not a connection endpoint, the user and group are unknown. I tried that a while ago and it doesn't work as documented: http://marc.info/?l=openbsd-bugsm=137650531124231w=2 http://marc.info/?l=openbsd-bugsm=137658379014570w=2 Nice of you to lure me in like this, and spent a few hours looking at the code. :-) I'd say the feature is indeed broken, and probably has been for more then 10 years. in_pcblookup_listen() in pf.c is the culprit. The destination IP does not seem to matter for the socket lookup and will match anything. As you noticed, this makes forwarded traffic match too. So I guess the only way to make this work at all is to match the source and destination IP's yourself first in pf.conf like this: pass in from any to self port 22 user root pass out from self to any user camield I think a documentation fix for pf.conf(5) is all that can be done. The diff adds the following paragraph: When listening sockets are bound to the wildcard address, pf(4) cannot determine if a connection is destined for the firewall itself. To avoid false matches on just the destination port, combine a user rule with source or destination address self. Also, it deletes all mentions of the unknown user since it's useless. And the example is updated. Better? Not sure if you were asking me or other developers, but I think an update to the man page is fine. However, are you certain that pf cannot determine where the packet is going? It should be possible to perform a routing check to find out whether the destination IP belongs to the firewall, and thus may be accepted by a wildcard address, or if it's going to be forwarded to some other destination and should only match 'user unknown'. I think something similar is already being done by the urpf-failed check, only in reverse.
When are default 'set prio' priorities set?
I was under the impression that the packet priority was always set to 3 prior to the pf ruleset evaluation (ignoring VLAN and CARP for a moment), and that 'set prio' on an inbound rule only affected returning traffic that matched the state entry. Here's an artificial example: pass out on $wan pass in on $lan set prio 7 What will be the priority of outbound packets on the $wan interface, 3 or 7? Looking at the code in pf.c, the priority is copied to m-m_pkthdr.pf.prio, but I'm not sure where this value is initialized or reset.
Re: How to segregate forwarded and firewall-generated traffic in pf?
On Thu, Dec 19, 2013 at 7:57 AM, Giancarlo Razzolini grazzol...@gmail.com wrote: Em 18-12-2013 21:33, Andy Lemin escreveu: Fantastic! Thanks Camiel :) Sent from my iPhone On 18 Dec 2013, at 21:32, Camiel Dobbelaar c...@sentia.nl wrote: On 18/12/13 14:50, Maxim Khitrov wrote: On Wed, Dec 18, 2013 at 8:42 AM, Camiel Dobbelaar c...@sentia.nl wrote: On 18/12/13 13:53, Maxim Khitrov wrote: When writing outbound rules in pf, is there an accepted best practice for only matching packets that are either forwarded or firewall-generated? The best that I could come up with is 'received-on all' as a way of identifying forwarded packets, but that option can't be negated to match packets that were not received on any inbound interface (i.e. those generated by the firewall itself). Another option is 'from (self)', but then you have to be careful with any preceding nat rules. Ideally, I want a solution that doesn't depend on the context. I also tried to use tags in combination with 'received-on', but it became rather messy and created conflicts with other tag usage. What is everyone else using to solve this problem? Check the user option in pf.conf(5): user user This rule only applies to packets of sockets owned by the specified user. For outgoing connections initiated from the firewall, this is the user that opened the connection. For incoming connections to the firewall itself, this is the user that listens on the destination port. For forwarded connections, where the firewall is not a connection endpoint, the user and group are unknown. I tried that a while ago and it doesn't work as documented: http://marc.info/?l=openbsd-bugsm=137650531124231w=2 http://marc.info/?l=openbsd-bugsm=137658379014570w=2 Nice of you to lure me in like this, and spent a few hours looking at the code. :-) I'd say the feature is indeed broken, and probably has been for more then 10 years. in_pcblookup_listen() in pf.c is the culprit. The destination IP does not seem to matter for the socket lookup and will match anything. As you noticed, this makes forwarded traffic match too. So I guess the only way to make this work at all is to match the source and destination IP's yourself first in pf.conf like this: pass in from any to self port 22 user root pass out from self to any user camield Regards, Cam There are so many ways to do this. self rules, user, etc. But I'd say that you could also use tags to do policy based matching of packets that are firewall generated or firewall forwarded. Tags can be assigned before any nat matching rules take place, so you do not need to worry with them messing up your packet flow. That's pretty much what I managed to come up with yesterday. I have the following two rules at the top: match out from (self) tag SELF block out log quick received-on all tagged SELF The second rule is mostly a sanity check. It ensures that you can't accidentally add a SELF tag to an inbound packet and have it processed as a firewall-generated packet. These are followed by a few rules common to forwarded and firewall-generated packets. Finally, I split the ruleset like so: anchor out quick tagged SELF { block return log # Rules for firewall-generated traffic ... } # Rules for forwarded traffic ... This seems like a good enough solution, but it would be cleaner if we could do '!received-on all'. There is also a risk here that one of the preceding rules could overwrite the SELF tag.
How to segregate forwarded and firewall-generated traffic in pf?
When writing outbound rules in pf, is there an accepted best practice for only matching packets that are either forwarded or firewall-generated? The best that I could come up with is 'received-on all' as a way of identifying forwarded packets, but that option can't be negated to match packets that were not received on any inbound interface (i.e. those generated by the firewall itself). Another option is 'from (self)', but then you have to be careful with any preceding nat rules. Ideally, I want a solution that doesn't depend on the context. I also tried to use tags in combination with 'received-on', but it became rather messy and created conflicts with other tag usage. What is everyone else using to solve this problem?
Re: How to segregate forwarded and firewall-generated traffic in pf?
On Wed, Dec 18, 2013 at 8:42 AM, Camiel Dobbelaar c...@sentia.nl wrote: On 18/12/13 13:53, Maxim Khitrov wrote: When writing outbound rules in pf, is there an accepted best practice for only matching packets that are either forwarded or firewall-generated? The best that I could come up with is 'received-on all' as a way of identifying forwarded packets, but that option can't be negated to match packets that were not received on any inbound interface (i.e. those generated by the firewall itself). Another option is 'from (self)', but then you have to be careful with any preceding nat rules. Ideally, I want a solution that doesn't depend on the context. I also tried to use tags in combination with 'received-on', but it became rather messy and created conflicts with other tag usage. What is everyone else using to solve this problem? Check the user option in pf.conf(5): user user This rule only applies to packets of sockets owned by the specified user. For outgoing connections initiated from the firewall, this is the user that opened the connection. For incoming connections to the firewall itself, this is the user that listens on the destination port. For forwarded connections, where the firewall is not a connection endpoint, the user and group are unknown. I tried that a while ago and it doesn't work as documented: http://marc.info/?l=openbsd-bugsm=137650531124231w=2 http://marc.info/?l=openbsd-bugsm=137658379014570w=2
Re: How to control set prio
On Wed, Aug 7, 2013 at 12:10 PM, Henning Brauer lists-open...@bsws.de wrote: * Михаил Швецов mishve...@rambler.ru [2013-08-07 14:55]: How can i see that set prio works? it just does. Sometimes it doesn't: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf.c#rev1.862 I got into a habit of separating prioritization from filtering with a bunch of match ... set prio ... rules at the start of the ruleset. Seemed like a good idea at the time. I agree that some visualization of set prio operation is needed. Perhaps systat can show the number of packets assigned to each priority level for each interface over the last N seconds? I know that the design goal was to keep this as simple as possible, but some stats would be helpful in understanding what is happening and catching config errors.
Re: 10G NIC recommendation
On Wed, Aug 14, 2013 at 7:09 PM, Diana Eichert deich...@wrench.com wrote: What I want to do. create a netflow collector using OpenBSD by looking at data fed from a tap I know which 10G NICs are supported by OpenBSD, what I'd like to hear is a recommendation on which one of the following to use. $ apropos 10G che, cheg (4) - Chelsio Communications 10Gb Ethernet device ix (4) - Intel 82598/82599/X540 PCI Express 10Gb Ethernet device ixgb (4) - Intel PRO/10GbE 10Gb Ethernet device myx (4) - Myricom Myri-10G PCI Express 10Gb Ethernet device oce (4) - Emulex OneConnect 10Gb Ethernet device tht, thtc (4) - Tehuti Networks 10Gb Ethernet device xge (4) - Neterion Xframe/Xframe II 10Gb Ethernet device I do have a few Myricom 10G-PCIE2-8B2-2S available already. However I have funds available to get something else if one of the other cards performs better. My only experience is with the X540, but I have no complaints. Here's a discussion of some testing that I did last week: http://marc.info/?l=openbsd-miscm=137588569703330w=2
Re: 10GbE (Intel X540) performance on OpenBSD 5.3
On Thu, Aug 8, 2013 at 9:35 PM, John Jasen jja...@realityfailure.org wrote: You may want to test jumbo frames, just to see what would happen. I would expect you to see closer to 10 Gb/s with the same number of interrupts. Results for jumbo frames are below (spoiler: 10 Gbps, same number of interrupts, 40% CPU0 usage). On 08/08/2013 08:26 PM, Maxim Khitrov wrote: Active Processor Cores: All I would turn that off, or at least make it only dual core. No effect, results are also below. That's... a bit faster. The CPU in the desktops is Intel i7-3770, which is very similar to the Xeon E3-1275v2. Is this a FreeBSD vs OpenBSD difference? Could be. It might be worth testing FreeBSD on your packet forwarding boxes, just to see if you get similar results. I installed FreeBSD on a USB flash drive, booted the backup firewall from that, and ran iperf -c 127.0.0.1 -t 60: [ 3] 0.0-60.0 sec 373 GBytes 53.4 Gbits/sec Almost the same as the desktops, so this performance boost is due to FreeBSD (which keeps all cores at 70% load) and not the hardware. Now for jumbo frames: # s1: iperf -s # c1: iperf -c s1 -t 60 -m [ 3] 0.0-60.0 sec 69.1 GBytes 9.89 Gbits/sec [ 3] MSS size 8192 bytes (MTU 8232 bytes, unknown interface) With MTU set to 9000 along the entire path, a single client can max out the 10 gigabit link through the firewall. This also addresses the question of PCIe bandwidth - not an issue. I just had to double kern.ipc.nmbjumbo9 to 12800 on all FreeBSD hosts before I could enable jumbo frames (got ix0: Could not setup receive structures otherwise). Both clients together: # s1: iperf -s # s2: iperf -s # c1: nc gw 1234 ; iperf -c s1 -t 60 # c2: nc gw 1234 ; iperf -c s2 -t 60 [ 3] 0.0-60.0 sec 34.6 GBytes 4.95 Gbits/sec [ 3] 0.0-60.0 sec 34.5 GBytes 4.94 Gbits/sec During all of these tests, systat shows 8k interrupts on each interface, and CPU0 usage is 40% interrupt, 60% idle. Going back to 1500 MTU, disabling Hardware Prefetcher and Adjacent Cache Line Prefetch in BIOS has no effect: # c1-s1 [ 3] 0.0-60.0 sec 29.5 GBytes 4.22 Gbits/sec # c1-s1, c2-s2 [ 3] 0.0-60.0 sec 14.8 GBytes 2.12 Gbits/sec [ 3] 0.0-60.0 sec 15.7 GBytes 2.25 Gbits/sec Same goes for disabling two of the cores: # c1-s1 [ 3] 0.0-60.0 sec 30.7 GBytes 4.39 Gbits/sec # c1-s1, c2-s2 [ 3] 0.0-60.0 sec 15.2 GBytes 2.18 Gbits/sec [ 3] 0.0-60.0 sec 15.2 GBytes 2.17 Gbits/sec Same with bsd.sp kernel and all but one of the cores disabled: # c1-s1 [ 3] 0.0-60.0 sec 31.3 GBytes 4.48 Gbits/sec # c1-s1, c2-s2 [ 3] 0.0-60.0 sec 15.0 GBytes 2.15 Gbits/sec [ 3] 0.0-60.0 sec 16.1 GBytes 2.30 Gbits/sec Finally, I went back to all cores enabled, bsd.mp kernel, Hardware Prefetcher and Adjacent Cache Line Prefetch enabled: # c1-s1 [ 3] 0.0-60.0 sec 30.9 GBytes 4.43 Gbits/sec # c1-s2, c2-s2 [ 3] 0.0-60.0 sec 16.8 GBytes 2.40 Gbits/sec [ 3] 0.0-60.0 sec 14.0 GBytes 2.00 Gbits/sec As you can see, none of these tweaks had any measurable impact. The firewall can only handle so many packets per second. To push more packets through, I need to reduce the per-packet processing overhead. Here's a simple illustration of this fact using just the c1-s1 test: # pf disabled (set skip on {ix0, ix1}): [ 3] 0.0-60.0 sec 37.4 GBytes 5.35 Gbits/sec # pf enabled, no state on ix0: [ 3] 0.0-60.1 sec 8.28 GBytes 1.18 Gbits/sec # pf enabled, keep state: [ 3] 0.0-60.0 sec 30.8 GBytes 4.41 Gbits/sec # pf enabled, keep state (sloppy): [ 3] 0.0-60.0 sec 31.2 GBytes 4.46 Gbits/sec # pf enabled, modulate state: [ 3] 0.0-60.0 sec 28.3 GBytes 4.05 Gbits/sec # pf enabled, modulate state scrub (random-id reassemble tcp): [ 3] 0.0-60.0 sec 25.8 GBytes 3.69 Gbits/sec The interesting thing about the last test is that systat shows double the number of interrupts (32k total, 16k per interface) and CPU0 is about 5% idle instead of the usual 10%. The rest is self-evident. More work per packet = lower throughput. This is also another confirmation that the sloppy state tracker has no performance benefits. Unless someone has any other ideas on how to reduce the per-packet processing time, I think ~4.5 Gbps is the most that my hardware can handle at the default MTU. A bit disappointing, but it was the fastest CPU that I could get from Lanner and also my first step beyond 1 gigabit. If OpenBSD starts using multiple cores for interrupt processing in the future, 10+ Gbps should be easy to achieve. FreeBSD is an option if performance is critical, but for now I'd rather have all the 4.6+ pf improvements.
Re: 10GbE (Intel X540) performance on OpenBSD 5.3
On Fri, Aug 9, 2013 at 11:52 AM, Henning Brauer lists-open...@bsws.de wrote: * Maxim Khitrov m...@mxcrypt.com [2013-08-09 17:47]: and ran iperf # s1: iperf -s # c1: iperf -c s1 -t 60 -m # s1: iperf -s # s2: iperf -s # c1: nc gw 1234 ; iperf -c s1 -t 60 # c2: nc gw 1234 ; iperf -c s2 -t 60 your tests are flawed. you are testing iperf ('s lack of) performance. use tcpbench. or an ixia. These aren't available from FreeBSD packages. What about nuttcp? # c1: nuttcp -t -T60 s1 5442.6100 MB / 10.10 sec = 4521.6131 Mbps 34 %TX 60 %RX 1233 host-retrans 0.19 msRTT # c1: nuttcp -t -T60 s1 # c2: nuttcp -t -T60 s2 15960.2372 MB / 60.10 sec = 2227.8129 Mbps 15 %TX 32 %RX 10532 host-retrans 0.19 msRTT 17349.9260 MB / 60.10 sec = 2421.8063 Mbps 19 %TX 33 %RX 10932 host-retrans 0.20 msRTT TCP tests don't look any different. UDP is slightly better: # c1: nuttcp -t -u -R 10g -T 60 s1 36592.9785 MB / 60.00 sec = 5116.0419 Mbps 96 %TX 48 %RX 21725 / 37492935 drop/pkt 0.05794 %loss # c1: nuttcp -t -u -R 10g -T 60 s1 # c2: nuttcp -t -u -R 10g -T 60 s2 22217.3467 MB / 60.00 sec = 3105.9963 Mbps 96 %TX 38 %RX 14801348 / 37551911 drop/pkt 39.42 %loss 22270.5674 MB / 60.01 sec = 3113.3326 Mbps 96 %TX 40 %RX 14875602 / 37680663 drop/pkt 39.48 %loss
Re: 10GbE (Intel X540) performance on OpenBSD 5.3
Thanks to everyone for your advice! I'll try to respond to all the questions at once and provide some more information about the testing that I did today. The BIOS on these firewalls is current. For power-saving options, when I first configured these systems I tried turning Intel EIST (SpeedStep) off, but this caused OpenBSD to panic during boot. The panic text is copied at the end of this message, but the keyboard didn't work at the ddb prompt (not even Ctrl-Alt-Del), so I couldn't run any commands. Here's what my performance-related BIOS settings look like: Hyper-threading: Disabled Active Processor Cores: All Limit CPUID Maximum: Disabled Execute Disable Bit: Enabled Intel Virtualization Technology: Disabled Hardware Prefetcher: Enabled Adjacent Cache Line Prefetch: Enabled EIST: Enabled Turbo Mode: Enabled CPU C3 Report: Disabled CPU C6 Report: Disabled CPU C7 Report: Disabled VT-d: Disabled I doubt that disabling EIST would have a significant performance advantage. Latency may suffer a bit while the CPU raises its frequency when the traffic hits, but I don't think this would affect throughput testing. Tomorrow, I'll try disabling other cores and using bsd.sp kernel to see if that performs any better. Might also play with the hardware prefetcher settings. Today, I started testing forwarding performance with pf enabled. I put the second firewall aside and installed the X540-T2 cards into four identical Dell OptiPlex 9010 desktops. Two servers (s1 s2) and two clients (c1 c2). Each pair was connected through a Dell PowerConnect 8164 10GbE switch to a separate port on the firewall. The two switches had no other connections. I installed FreeBSD 9.1-RELEASE amd64 on the desktops. As a side note, iperf doesn't crash on FreeBSD when running in UDP mode, so I think it's a problem with the OpenBSD package. For these tests I stuck with TCP and 1500 MTU. Also, I noticed that a 10 second test is not always sufficient to get consistent results, so I'm now running all tests for 60 seconds. First test is iperf on 127.0.0.1 to compare these desktops with the 11.6 Gbps that I got on the firewall: # c1: iperf -s # c1: iperf -c 127.0.0.1 -t 60 [ 3] 0.0-59.9 sec 402 GBytes 57.7 Gbits/sec That's... a bit faster. The CPU in the desktops is Intel i7-3770, which is very similar to the Xeon E3-1275v2. Is this a FreeBSD vs OpenBSD difference? Second test is c1 - c2 via the 8164 switch (not involving the firewall yet): # c2: iperf -s # c1: iperf -c c2 -t 60 [ 4] 0.0-60.1 sec 40.2 GBytes 5.74 Gbits/sec A single desktop can't saturate the link, at least with the default settings, but two on each side should be plenty to test the firewall to its limit. Third test is c1 - s1 through the firewall with pf stateful filtering: # s1: iperf -s # c1: iperf -c s1 -t 60 [ 3] 0.0-60.0 sec 30.0 GBytes 4.29 Gbits/sec I watched systat and top on the firewall while this test was running. 16k interrupts evenly split between ix0 and ix1, and ~90% interrupt usage on CPU0. Fourth test is c1 - s1 and c2 - s2. I used a netcat server on the firewall (nc -l 1234) to synchronize both clients. They started iperf as soon as I killed the server with Ctrl-C: # s1: iperf -s # s2: iperf -s # c1: nc gw 1234; iperf -c s1 -t 60 # c2: nc gw 1234; iperf -c s2 -t 60 [ 3] 0.0-60.0 sec 14.4 GBytes 2.07 Gbits/sec [ 3] 0.0-60.0 sec 15.8 GBytes 2.26 Gbits/sec An even split of the single client performance, indicating that the firewall is the bottleneck. No changes in systat and top, so it does look like the CPU is the limiting factor. Finally, I used set skip on {ix0, ix1} to disable pf on these two interfaces and re-ran the same test: [ 3] 0.0-60.0 sec 18.1 GBytes 2.59 Gbits/sec [ 3] 0.0-60.0 sec 16.3 GBytes 2.34 Gbits/sec A small improvement, but I think it's fair to say that pf isn't the problem. Will do some more testing tomorrow. Here's the boot panic when I disable SpeedStep in BIOS: acpiec0 at acpi0: Failed to read resource settings acpicpu0 at acpi0Store to default type! 100 01a4 Called: \_PR_.CPU0._PDC arg0: 0x801af588 cnt:01 stk:00 buffer: 0c {01, 00, 00, 00, 01, 00, 00, 00, 3b, 03, 00, 00} panic: aml_die aml_store:2621 Stopped at Debugger+0x5: leave Debugger() at Debugger+0x5 panic() at panic+0xe4 _aml_die() at _aml_die+0x183 aml_store() at aml_store+0xbb aml_parse() at aml_parse+0xcd7 aml_eval() at aml_eval+0x1c8 aml_evalnode() at aml_evalnode+0x63 acpicpu_set_pdc() at acpicpu_set_pdc+0x8c acpucpu_attach() at acpicpu_attach+0x9e config_attach() at config_attach+0x1d4 end trace frame: 0x80e6da90, count: 0
10GbE (Intel X540) performance on OpenBSD 5.3
Hi all, I'm looking for performance measuring and tuning advice for 10 gigabit Ethernet. I have a pair of Lanner FW-8865 systems that will be used as firewalls for the local network. Each one has a Xeon E3-1270v2 CPU, Intel X540 10GbE NIC (PCIe 3.0 8x), and 8GB DDR3-1600 ECC RAM. Before putting them into production I wanted to do some throughput testing, so I connected one directly to the other (via ix0 interfaces) and used iperf to see how much data I can push through. I also disabled pf for now, but will do some additional testing with it enabled later on. The kernel is 5.3 amd64 GENERIC.MP. The initial iperf runs couldn't go beyond ~3.2 Gbps: # server: iperf -s # client: iperf -c 192.168.1.3 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.2 sec 3.84 GBytes 3.22 Gbits/sec Increasing the TCP window size to 256 KB (seems to be the upper limit) brings this up to ~4.2 Gbps: # server: iperf -s -w 256k # client: iperf -c 192.168.1.3 -w 256k [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 4.96 GBytes 4.22 Gbits/sec Increasing the MTU on both ix0 interfaces to 9000 gives me ~7.2 Gbps: # server: ifconfig ix0 mtu 9000 iperf -s -w 256k # client: ifconfig ix0 mtu 9000 iperf -c 192.168.1.3 -w 256k -m [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 8.39 GBytes 7.21 Gbits/sec [ 3] MSS size 8948 bytes (MTU 8988 bytes, unknown interface) This is where I'm stuck at the moment. When running iperf on 127.0.0.1, which should only test CPU and memory, I get 11.6 Gbps. I've read the Network Tuning and Performance Guide @ calomel.org, but none of the tips there help me in getting beyond 7 Gbps on the physical interfaces. I'm also slightly concerned about the performance at the default MTU of 1500. Looking at `ifconfig ix0 hwfeatures` output (below), it seems that the ix driver does not support any checksum offloading for the X540. I wonder if that could be a reason for the poor performance? ix0: flags=28843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6 mtu 9000 hwfeatures=30VLAN_MTU,VLAN_HWTAGGING hardmtu 16110 lladdr 00:90:0b:56:12:0c priority: 0 groups: LAN SVR media: Ethernet autoselect (10GbaseT full-duplex) status: active inet 192.168.1.2 netmask 0xff00 broadcast 192.168.1.255 Any there any sysctl parameters that I should play with? Any other system stats that I should monitor? I did a few runs while watching `top` and `systat vmstat`, but didn't see any problem indications there. I should also note that I couldn't run iperf in UDP mode - the client segfaults any time I increase the bandwidth beyond 300 Mbps. No idea why, but I'm more interested in TCP performance anyway. - Max
Re: 10GbE (Intel X540) performance on OpenBSD 5.3
On Wed, Aug 7, 2013 at 10:31 AM, Martin Schröder mar...@oneiros.de wrote: 2013/8/7 Maxim Khitrov m...@mxcrypt.com: I've read the Network Tuning and Performance Guide @ calomel.org, Ignore that site and search the list archives. Understood :) I found a number of recommendations for the things to keep an eye on, but nothing that gave me any ideas on what else to try for improving the performance. Specifically, I looked at netstat -m on both systems, where everything was well below the limits (500 mbufs in use during the test). I see about 8200/8700 (ix0/total) interrupts in systat with 1500 MTU. CPU usage in top is split between two cores, one at ~80% interrupt and the other at ~80% system. Most of the time all four cores are at least 10% idle (hyper-threading is disabled in BIOS). netstat -i shows no errors for ix0 and sysctl net.inet.ip.ifq.drops is at 0 on both systems. What did surprise me is that netstat -ss (output below) shows that all received packets were hardware-checksummed, but this value is 0 for sent packets. Does this mean that ix supports checksum offloading, but only for inbound packets? This should be a bit of good news for me once I start testing forwarding performance. I assume that as long as pf doesn't modify the packet (no nat/rdr, modulate state, scrubbing, etc.), then there shouldn't be any need to recompute the checksum. Correct? ip: 39827125 total packets received 39820936 packets for this host 40 packets for unknown/unsupported protocol 77150033 packets sent from this host 39826536 input datagrams checksum-processed by hardware icmp: 147 calls to icmp_error Output packet histogram: destination unreachable: 48 Input packet histogram: echo reply: 2 destination unreachable: 40 igmp: ipencap: tcp: 77147020 packets sent 77145183 data packets (111695427326 bytes) 2 data packets (2836 bytes) retransmitted 1763 ack-only packets (4427 delayed) 6 window update packets 66 control packets 39817607 packets received 38983910 acks (for 111695426667 bytes) 18 duplicate acks 5814 packets (560082 bytes) received in-sequence 38 out-of-order packets (872 bytes) 830155 window update packets 1 packet received after close 39817153 packets hardware-checksummed 41 connection requests 10 connection accepts 49 connections established (including accepts) 43 connections closed (including 1 drop) 38983035 segments updated rtt (of 1217192 attempts) 4 retransmit timeouts 2 keepalive timeouts 2 keepalive probes sent 601 correct ACK header predictions 3276 correct data packet header predictions 20 PCB cache misses cwr by timeout: 4 10 SYN cache entries added 10 completed 3 SACK options received 1 SACK option sent udp: 3327 datagrams received 39 with no checksum 3193 input packets hardware-checksummed 47 dropped due to no socket 3280 delivered 2958 datagrams output 708 missed PCB cache esp: ah: etherip: ipcomp: carp: pfsync: divert: pflow: ip6: 2 total packets received 4 packets sent from this host Input packet histogram: ICMP6: 2 Mbuf statistics: 2 one ext mbufs divert6: icmp6: Output packet histogram: multicast listener report: 4 Histogram of error messages to be generated: pim6: rip6:
Re: 10GbE (Intel X540) performance on OpenBSD 5.3
On Wed, Aug 7, 2013 at 11:44 AM, Florian Obser flor...@narrans.de wrote: On Wed, Aug 07, 2013 at 10:26:22AM -0400, Maxim Khitrov wrote: Hi all, I'm looking for performance measuring and tuning advice for 10 gigabit Ethernet. I have a pair of Lanner FW-8865 systems that will be used as firewalls for the local network. [...] The initial iperf runs couldn't go beyond ~3.2 Gbps: you expect a lot of localy generated traffic on your firewall? (if the answer is no, why are you testing that?) No :) But it was the first step until I have a third system with a 10GbE port. I have 15 Intel X540-T2 cards waiting to be installed. Once I have another server that can generate the traffic, I'll test the forwarding performance with pf enabled. [...] Increasing the MTU on both ix0 interfaces to 9000 gives me ~7.2 Gbps: you expect a lot of jumbo frames in front of / behind your firewall? (if the answer is no, why are you testing that?) It's a possibility. What this tells me, however, is that the the throughput isn't the (main) problem. The per-packet processing overhead appears to be the limiting factor, which is why I asked about checksum offloading. anyway, I was testing an Intel 82599 system in July which will become a border router. All of this is forwarding rate; it took me 2 days to beg, borrow and steal enough hw to actually generate the traffic. (I had 4 systems in front of and 4 systems behind the router, all doing 1Gb/s) What tools were you using to generate the traffic and to calculate bytes/packets per second? I assume interrupts per second came from systat?
Outdated documentation for scrub (no-df) in pf.conf(5)?
Hi, The no-df flag can be specified in the set reassemble option or a scrub rule. From looking at the source, I don't think scrub (no-df) does what the man page says it does. To reassemble fragmented packets with the DF flag set, one has to use set reassemble yes no-df option. By the time any scrub rules are applied, the packet is already reassembled, so scrub (no-df) simply clears the DF flag for all _complete_ packets (pf_scrub in sys/net/pf_norm.c). I don't see how this fixes problems with fragmented NFS packets, and I suspect that this breaks legitimate uses of DF, such as MTU discovery. Is the documentation wrong (possibly from before OpenBSD 4.6, when scrub was a separate option) or am I misinterpreting the code? - Max
pf scrub options in OpenBSD 5.3
Hi all, A few questions about the operation of pf scrub options in OpenBSD 5.3: 1. In 2010 Henning advised against the use of reassemble tcp (link below). Is this advice still applicable and what are the known issues that this option may cause in the current implementation? http://marc.info/?l=openbsd-miscm=126343406308201w=2 2. Am I correct in assuming that the following example ruleset would be more efficient (and work the same way) if the 'match on LAN' rule was removed, or if scrubbing was only done for inbound packets (match in ...)? match on WAN scrub (no-df random-id) match on LAN scrub (no-df random-id) pass I'm trying to figure out exactly when options like random-id and reassemble tcp are applied. My current understanding is that a packet passing from LAN to WAN with the above ruleset will have its id randomized twice, and the same thing will happen for any returning packet that matches the two state entries. If I change both match rules to 'match in ...', then packets in both directions are scrubbed just once, but the returning packets are scrubbed as they leave the firewall instead of when they are first received. Is all of that right? If so, does it actually matter that the returning packets are not scrubbed when they are first received? For example, if reassemble tcp or min-ttl options are used and the other side lowers its TTL value to the point where the response packet expires upon reaching the firewall, then the TTL check will have no effect, since the OS wouldn't forward the packet to the outbound interface or run the second state check. - Max
pf: inline anchor rules in not enough to keep tables in memory?
Hello, I was a bit surprised by the following behavior when configuring pf on OpenBSD 5.2. Non-persistent tables that are only referenced by inline anchor rules, as in the following example, are removed from memory when pf.conf is loaded. # Doesn't work (ssh connections are blocked): table admins {10.0.0.2} block pass out anchor in on ix1 { pass proto tcp from admins to ix1 port ssh } # Works as expected: table admins persist {10.0.0.2} block pass out anchor in on ix1 { pass proto tcp from admins to ix1 port ssh } After loading the first configuration, 'pfctl -t admins -T show' gives me: pfctl: Table does not exist. Referencing the table in the main ruleset, or making it persistent as in the second example, fixes the problem. Is this by design? - Max
Re: pf: inline anchor rules in not enough to keep tables in memory?
On Wed, Mar 13, 2013 at 1:59 PM, Michel Blais mic...@targointernet.com wrote: I think you must specify the anchor first. Something like : pfctl -a ix1 -t admins -T show That doesn't work. First, it's an unnamed anchor, so I don't think you can specify it with the -a option. Second, inbound connections to port 22 are rejected in the first case, but not in the second. The table is removed as though it was unreferenced, so the pass rule in the anchor doesn't match any source IPs. - Max
Re: Request improvement for faq 15.2
On Thu, Dec 27, 2012 at 10:10 AM, Live user nots...@live.com wrote: I think 15.2.2 should go before 15.1.1, since if there's no point in running pkg_* when the PKG_PATH is empty, which is after installing using the interactive method. Furthermore, using 'export PKG_PATH=' sets a volatile variable, which in blank again after restarting. I think the faq may include the guideline to make it persistent as well. I went through most of the FAQ this weekend and didn't see any mention of /etc/pkg.conf as an alternative to PKG_PATH. Might be better to document the use of this configuration file, which I think is created automatically if you install the system from an ftp or http mirror. - Max