from:"Maxim Khitrov"

Failed sysupgrade from 6.6 to 6.7 amd64

2020-11-15 Thread Maxim Khitrov

After all these years of trouble-free upgrades, I ran into my first
problem. I used sysupgrade to go from 6.6/amd64 to 6.7. The upgrade
process was successful, but after bsd.upgrade did its thing and
rebooted the system, the new kernel would not boot.

It got to the "boot>" prompt, started loading the kernel, but then the
system would reboot right after showing "booting hd0a:bsd:
12957+2753552..." line. I tried booting bsd.sp, bsd.rd, and
bsd.booted with identical results. Was able to boot from cd67.iso.
Tried downloading the original kernel, but that didn't work either.
Re-running the upgrade didn't help.

Finally, decided to upgrade to 6.8, so did that from cd68.iso, which
fixed the problem. I also replaced bootx64.efi file on the EFI
partition after this upgrade, but I'm not actually sure if it was
different or not.

Obviously curious as to what the issue may have been, but mostly
wondering whether any upgrade steps may have been missed as a result
of never fully booting the 6.7 OS and running post-upgrade steps
there.

Thanks!

Re: Disk I/O performance of OpenBSD 5.9 on Xen

2017-07-21 Thread Maxim Khitrov

On Sat, Jul 16, 2016 at 6:37 AM, Mike Belopuhov <m...@belopuhov.com> wrote:
> On 14 July 2016 at 14:54, Maxim Khitrov <m...@mxcrypt.com> wrote:
>> On Wed, Jul 13, 2016 at 11:47 PM, Tinker <ti...@openmailbox.org> wrote:
>>> On 2016-07-14 07:27, Maxim Khitrov wrote:
>>> [...]
>>>>
>>>> No, the tests are run sequentially. Write performance is measured
>>>> first (20 MB/s), then rewrite (12 MB/s), then read (37 MB/s), then
>>>> seeks (95 IOPS).
>>>
>>>
>>> Okay, you are on a totally weird platform. Or, on an OK platform with a
>>> totally weird configuration.
>>>
>>> Or on an OK platform and configuration with a totally weird underlying
>>> storage device.
>>>
>>> Are you on a magnet disk, are you using a virtual block device or virtual
>>> SATA connection, or some legacy interface like IDE?
>>>
>>> I get some feeling that your hardware + platform + configuration crappiness
>>> factor is fairly much through the ceiling.
>>
>> Dell R720 and R620 servers, 10 gigabit Ethernet SAN, Dell MD3660i
>> storage array, 1.2 TB 10K RPM SAS disks in RAID6. I don't think there
>> is anything crappy or weird about the configuration. Test results for
>> CentOS on the same system: 170 MB/s write, 112 MB/s rewrite, 341 MB/s
>> read, 746 IOPS.
>>
>> I'm assuming that there are others running OpenBSD on Xen, so I was
>> hoping that someone else could share either bonnie++ or even just dd
>> performance numbers. That would help us figure out if there really is
>> an anomaly in our setup.
>>
>
> Hi,
>
> Since you have already discovered that we don't provide a driver
> for the paravirtualized disk interface (blkfront), I'd say that most likely
> your setup is just fine, but emulated pciide performance is subpar.
>
> I plan to implement it, but right now the focus is on making networking
> and specifically interrupt delivery reliable and efficient.
>
> Regards,
> Mike

Hi Mike,

Revisiting this issue with OpenBSD 6.1-RELEASE and the new xbf driver
on XenServer 7.0. The write performance is much better at 74 MB/s
(still slower than other OSs, but good enough). IOPS also improved
from 95 to 167. However, the read performance actually got worse and
is now at 16 MB/s. Here are the full bonnie++ results:

Version  1.97   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
web4.dhcp.bhsai. 8G   76191  43 10052  17   16044  25 167.3  43
Latency 168ms 118ms   416ms 488ms

Here are two dd runs for writing and reading:

$ dd if=/dev/zero of=test bs=1M count=2048
2147483648 bytes transferred in 25.944 secs (82771861 bytes/sec)

$ dd if=test of=/dev/null bs=1M
2147483648 bytes transferred in 123.505 secs (17387767 bytes/sec)

Here's the dmesg output:

pvbus0 at mainbus0: Xen 4.6
xen0 at pvbus0: features 0x2705, 32 grant table frames, event channel 3
xbf0 at xen0 backend 0 channel 8: disk
scsibus1 at xbf0: 2 targets
sd0 at scsibus1 targ 0 lun 0: <Xen, phy xvda 768, > SCSI3 0/direct fixed
sd0: 73728MB, 512 bytes/sector, 150994944 sectors
xbf1 at xen0 backend 0 channel 9: cdrom
xbf1: timed out waiting for backend to connect

Any ideas on why the read performance is so poor?

Thanks,
Max

Re: Disk I/O performance of OpenBSD 5.9 on Xen

2016-07-14 Thread Maxim Khitrov

On Wed, Jul 13, 2016 at 11:47 PM, Tinker <ti...@openmailbox.org> wrote:
> On 2016-07-14 07:27, Maxim Khitrov wrote:
> [...]
>>
>> No, the tests are run sequentially. Write performance is measured
>> first (20 MB/s), then rewrite (12 MB/s), then read (37 MB/s), then
>> seeks (95 IOPS).
>
>
> Okay, you are on a totally weird platform. Or, on an OK platform with a
> totally weird configuration.
>
> Or on an OK platform and configuration with a totally weird underlying
> storage device.
>
> Are you on a magnet disk, are you using a virtual block device or virtual
> SATA connection, or some legacy interface like IDE?
>
> I get some feeling that your hardware + platform + configuration crappiness
> factor is fairly much through the ceiling.

Dell R720 and R620 servers, 10 gigabit Ethernet SAN, Dell MD3660i
storage array, 1.2 TB 10K RPM SAS disks in RAID6. I don't think there
is anything crappy or weird about the configuration. Test results for
CentOS on the same system: 170 MB/s write, 112 MB/s rewrite, 341 MB/s
read, 746 IOPS.

I'm assuming that there are others running OpenBSD on Xen, so I was
hoping that someone else could share either bonnie++ or even just dd
performance numbers. That would help us figure out if there really is
an anomaly in our setup.

Re: Disk I/O performance of OpenBSD 5.9 on Xen

2016-07-13 Thread Maxim Khitrov

On Wed, Jul 13, 2016 at 11:10 AM, Tinker <ti...@openmailbox.org> wrote:
> On 2016-07-13 22:57, Maxim Khitrov wrote:
>>
>> On Wed, Jul 13, 2016 at 10:53 AM, Tinker <ti...@openmailbox.org> wrote:
>>>
>>> On 2016-07-13 20:01, Maxim Khitrov wrote:
>>>>
>>>>
>>>> We're seeing about 20 MB/s write, 35 MB/s read, and 70 IOPS
>>>
>>>
>>>
>>> What do you mean 70, you mean 70 000 IOPS?
>>
>>
>> Sadly, no. It was actually 95, I looked at the wrong column before:
>>
>> Write (K/sec), %cpu, Rewrite (K/sec), %cpu, Read (K/sec), %cpu, Seeks
>> (/sec), %cpu
>> 20075, 22, 12482, 42, 37690, 47, 95.5, 68
>
>
> So that is.. 20075 + 12482 + 37690 = 70247 IOPS?
>
> or 70MB/sec total throughput?

No, the tests are run sequentially. Write performance is measured
first (20 MB/s), then rewrite (12 MB/s), then read (37 MB/s), then
seeks (95 IOPS).

Re: Disk I/O performance of OpenBSD 5.9 on Xen

2016-07-13 Thread Maxim Khitrov

On Wed, Jul 13, 2016 at 10:53 AM, Tinker <ti...@openmailbox.org> wrote:
> On 2016-07-13 20:01, Maxim Khitrov wrote:
>>
>> We're seeing about 20 MB/s write, 35 MB/s read, and 70 IOPS
>
>
> What do you mean 70, you mean 70 000 IOPS?

Sadly, no. It was actually 95, I looked at the wrong column before:

Write (K/sec), %cpu, Rewrite (K/sec), %cpu, Read (K/sec), %cpu, Seeks
(/sec), %cpu
20075, 22, 12482, 42, 37690, 47, 95.5, 68

Disk I/O performance of OpenBSD 5.9 on Xen

2016-07-13 Thread Maxim Khitrov

Hi all,

We're seeing about 20 MB/s write, 35 MB/s read, and 70 IOPS with
OpenBSD 5.9 amd64 on XenServer 7.0 (tested using bonnie++). The
virtual disks are LVM over iSCSI. Linux hosts get well over 100 MB/s
in both directions.

I'm assuming that this is because there is no disk driver for Xen yet,
but I wanted to see if others are getting similar numbers. Any
suggestions for improving this performance?

-Max

Re: APC UPS & sensorsd - how?

2016-02-24 Thread Maxim Khitrov

On Wed, Feb 24, 2016 at 3:38 AM, lilit-aibolit  wrote:
> On 03/22/2015 05:44 PM, T. Ribbrock wrote:
>>
>> Then, I re-applied power, but that, too, was never flagged by sensorsd.
>> For some reason, it looks like sensorsd only ever detects a status change
>> (for these rules) when it gets started - but not afterwards. Regards, Thomas
>
> Have you succeed with getting status change while sensord is running?

low=0:high=0 has been working well for me:

https://marc.info/?l=openbsd-misc=144529176814155=2

Re: sensorsd, upd, and state changes

2015-10-19 Thread Maxim Khitrov

On Mon, Dec 8, 2014 at 3:45 PM, David Higgs  wrote:
> On Mon, Dec 8, 2014 at 3:37 PM, trondd  wrote:
>> On Mon, Dec 8, 2014 at 3:23 PM, trondd  wrote:
>>> On Mon, Dec 8, 2014 at 11:47 AM, David Higgs  wrote:

 sysctl(8) will display Off if the value is zero, and On for nonzero.
 So, using the "closed interval" rule above, you should use "high=0"
 for indicators that you consider in "good" state when Off (i.e.
 ShutdownImminent), and "low=1" for indicators that you consider in
 "good" state when On (i.e. ACPresent).

>>>
>>> Isn't saying high=0 kind of the same thing as saying low=1?
>>
>>
>> Oh, I think I get this.  Since the sensor doesn't trigger if it is on the
>> limit, only outside the limit, you have to set up which is the OK state.
>>
>> Still a little confusing but I guess there is no way to automatically know
>> if an indicator is supposed to be Off or On when it's in it's good state?
>>
>
> Kind of.  The high/low difference is what values you consider "within"
> normal operating parameters (and the output of %l).  The upd(4) code
> hasn't yet been taught how to map specific indicator values to OK /
> WARN / CRITICAL status.  Currently any value successfully read is
> marked OK.
>
> I'm working with tech@ and slowly writing diffs to improve these things.
>
> --david

Resurrecting an old thread since I just ran into the same problem in
5.8. To summarize, upd(4) exposes some SENSOR_INDICATOR-type sensors
for attached UPSes, such as ACPresent = On/Off, and it's not clear how
to configure sensorsd(8) to execute a command when this value changes.
Also, upd always sets sensor status to "OK," so sensorsd never
triggers commands for status changes; we have to use low/high limits
until this is fixed. One proposed hack was to use "low=1:high=2" in
sensorsd.conf, but this doesn't seem to work for everybody.

Has anyone tried using "low=0:high=0"? I'm pretty sure that should
solve the problem in all cases.

The low/high range is inclusive at both ends. Off is 0, but On can be
any other int64 value, including negative. For my UPS, ACPresent = On
is actually a value of -1. I know this because when I set
"low=-1:high=-1" sensorsd reports "upd0.indicator2: within limits:
On". That being the case, "low=1:high=2" would not work because the
value changes from -1 (On) to 0 (Off), and is always below the lower
limit.

Using "low=0:high=0" should always work for On -> Off -> On
transitions, but it will show On as outside (below or above) the
limits. If you want On to be within limits, then just play with the
values until you figure out whether On is 1, -1, or something else
entirely. That may not be as reliable. I'm not actually sure whether
this value is UPS-specific or something that upd determines.

-Max

Re: sensorsd, upd, and state changes

2015-10-19 Thread Maxim Khitrov

On Mon, Oct 19, 2015 at 2:31 PM, David Higgs <hig...@gmail.com> wrote:
> On Mon, Oct 19, 2015 at 11:11 AM, Maxim Khitrov <m...@mxcrypt.com> wrote:
>>
>> On Mon, Dec 8, 2014 at 3:45 PM, David Higgs <hig...@gmail.com> wrote:
>> > On Mon, Dec 8, 2014 at 3:37 PM, trondd <tro...@gmail.com> wrote:
>> >> On Mon, Dec 8, 2014 at 3:23 PM, trondd <tro...@gmail.com> wrote:
>> >>> On Mon, Dec 8, 2014 at 11:47 AM, David Higgs <hig...@gmail.com> wrote:
>> >>>>
>> >>>>
>> >>>> sysctl(8) will display Off if the value is zero, and On for nonzero.
>> >>>> So, using the "closed interval" rule above, you should use "high=0"
>> >>>> for indicators that you consider in "good" state when Off (i.e.
>> >>>> ShutdownImminent), and "low=1" for indicators that you consider in
>> >>>> "good" state when On (i.e. ACPresent).
>> >>>>
>> >>>
>> >>> Isn't saying high=0 kind of the same thing as saying low=1?
>> >>
>> >>
>> >> Oh, I think I get this.  Since the sensor doesn't trigger if it is on
>> >> the
>> >> limit, only outside the limit, you have to set up which is the OK
>> >> state.
>> >>
>> >> Still a little confusing but I guess there is no way to automatically
>> >> know
>> >> if an indicator is supposed to be Off or On when it's in it's good
>> >> state?
>> >>
>> >
>> > Kind of.  The high/low difference is what values you consider "within"
>> > normal operating parameters (and the output of %l).  The upd(4) code
>> > hasn't yet been taught how to map specific indicator values to OK /
>> > WARN / CRITICAL status.  Currently any value successfully read is
>> > marked OK.
>> >
>> > I'm working with tech@ and slowly writing diffs to improve these things.
>> >
>> > --david
>>
>> Resurrecting an old thread since I just ran into the same problem in
>> 5.8. To summarize, upd(4) exposes some SENSOR_INDICATOR-type sensors
>> for attached UPSes, such as ACPresent = On/Off, and it's not clear how
>> to configure sensorsd(8) to execute a command when this value changes.
>> Also, upd always sets sensor status to "OK," so sensorsd never
>> triggers commands for status changes; we have to use low/high limits
>> until this is fixed. One proposed hack was to use "low=1:high=2" in
>> sensorsd.conf, but this doesn't seem to work for everybody.
>>
>> Has anyone tried using "low=0:high=0"? I'm pretty sure that should
>> solve the problem in all cases.
>>
>> The low/high range is inclusive at both ends. Off is 0, but On can be
>> any other int64 value, including negative. For my UPS, ACPresent = On
>> is actually a value of -1. I know this because when I set
>> "low=-1:high=-1" sensorsd reports "upd0.indicator2: within limits:
>> On". That being the case, "low=1:high=2" would not work because the
>> value changes from -1 (On) to 0 (Off), and is always below the lower
>> limit.
>>
>> Using "low=0:high=0" should always work for On -> Off -> On
>> transitions, but it will show On as outside (below or above) the
>> limits. If you want On to be within limits, then just play with the
>> values until you figure out whether On is 1, -1, or something else
>> entirely. That may not be as reliable. I'm not actually sure whether
>> this value is UPS-specific or something that upd determines.
>
>
> Yes, the values reported are UPS-specific.  You may need to adjust the
> ranges, but (as previously discussed) you can just use either high or low
> (not both) to detect transition between good and bad indicator states.

Why not both? The low limit is initialized to LLONG_MIN and high to
LLONG_MAX. For "indicator" sensors, the logic we are trying to express
is either value == 0 or value != 0. For the former (i.e. a sensor that
should be "Off" normally), "low=0:high=0" is exactly what you want.
For the latter, sensorsd.conf doesn't give you a way of negating the
range (possible feature request?), but if you know that ACPresent = On
is really -1 for your UPS, then "high=-1" is sufficient. This is, of
course, assuming that the On value will never be positive in the
future.

I just tested all of this, and it works perfectly. For UPSes that use
1 to indicate On, instead of "low=1:high=2" you can simplify that to
"low=1". Alternatively, use "low=0:high=0" everywhere, which will be
the most reliable method, and provide an extra parameter to your
script to indicate which value to consider "normal." The downside is
that sensorsd will complain when the value is On and stay silent when
it's Off.

-Max

Re: Firewall question: is using a NIC with multiple jacks considered insecure?

2015-07-27 Thread Maxim Khitrov

On Mon, Jul 27, 2015 at 7:37 AM, Christian Weisgerber
na...@mips.inka.de wrote:
 On 2015-07-27, Quartz qua...@sneakertech.com wrote:

 Some years ago I remember reading that when using OpenBSD (or any OS,
 really) as a router+firewall it was considered inadvisable from a
 security standpoint to have the different networks all attached to a
 single network card with multiple ethernet ports. The thinking being
 that it was theoretically possible for an attacker to exploit bugs in
 the card's chip to short circuit the path and route packets directly
 across the card in a way pf can't control. It was also suggested that in
 addition to using different physical cards, the cards should really use
 different chipsets too, in case an unknown driver bug allows a short
 circuit.

 Those are not realistic concerns.

Intel 82574L packet of death comes to mind as one example of a bug in
the EEPROM that allowed an attacker to bring down an interface:

http://blog.krisk.org/2013/02/packets-of-death.html

These days you have bypass features in hardware that allow packets
to flow from one interface to another even if the firewall is turned
off. Who knows what other bugs in such functionality will be
discovered in the future?

Having said that, just throwing random chipsets into the mix is
probably not the right solution. You may actually be increasing your
attack surface. If this is a real concern for you, I think multiple
firewalls, one behind the other (and using different chipsets, if you
really want to), is a better way to go.

Re: Firewall question: is using a NIC with multiple jacks considered insecure?

2015-07-27 Thread Maxim Khitrov

On Mon, Jul 27, 2015 at 11:10 AM, Quartz qua...@sneakertech.com wrote:
 These days you have bypass features in hardware that allow packets
 to flow from one interface to another even if the firewall is turned
 off.

 Can you elaborate on this?

Search for intel nic bypass mode and you'll find lots of details.
It's an increasingly common feature in server network adapters. If the
host OS is down, the NIC continues forwarding packets between two
ports without any processing. Some older implementations used a
physical jumper to enable or disable this feature. Now it's all done
in software and can even be configured remotely. For example:

http://www.lannerinc.com/applications/product-features/lan-bypass

Re: OpenBSD 5.7 Released

2015-05-01 Thread Maxim Khitrov

On Fri, May 1, 2015 at 4:00 AM, OpenBSD Store Misc
m...@openbsdstore.com wrote:
 one of the master CD's was damaged in transit to the production facility

The NSA agent needed more time to record an alternate version of the song.

Re: pf to read protocol information from /etc/services ?

2015-02-27 Thread Maxim Khitrov

On Fri, Feb 27, 2015 at 3:40 PM, Research resea...@nativemethods.com wrote:
 UDP is meaningless in the context of HTTP.

Well, actually... https://en.wikipedia.org/wiki/QUIC

Not really standard, but still. I now allow UDP on ports 80 and 443 to
make Google Chrome happy.

Preserving unbound cache across reboots

2015-01-30 Thread Maxim Khitrov

Hi all,

I wrote two simple functions for rc.shutdown and rc.login that
save/restore unbound cache when the system is restarted. Since each
record has a relative TTL field, the cache can only be restored within
a short time window to avoid serving stale data to clients. I set this
window to 10 minutes; enough to survive a reboot, but not for any
extended downtime. Is there any interest in including this
functionality in the base OS (moved to /etc/rc)?

- Max

--- /var/backups/etc_rc.shutdown.currentMon Aug  4 21:03:16 2014
+++ /etc/rc.shutdownFri Jan 30 10:06:11 2015
@@ -8,3 +8,17 @@
 powerdown=NO   # set to YES for powerdown

 # Add your local shutdown actions here.
+
+save_unbound_cache() {
+   local db=/var/db/unbound.cache
+   /etc/rc.d/unbound check || return
+   echo -n 'saving unbound cache: '
+   if unbound-control dump_cache  $db; then
+   chmod 0600 $db
+   echo 'done.'
+   else
+   rm -f $db
+   fi
+}
+
+save_unbound_cache

--- /var/backups/etc_rc.local.current   Mon Aug  4 21:03:16 2014
+++ /etc/rc.local   Fri Jan 30 10:07:00 2015
@@ -4,3 +4,17 @@
 # can be done AFTER your system goes into securemode.  For actions
 # which should be done BEFORE your system has gone into securemode
 # please see /etc/rc.securelevel.
+
+restore_unbound_cache() {
+   local db=/var/db/unbound.cache
+   test -s $db  /etc/rc.d/unbound check || return
+   echo -n 'restoring unbound cache: '
+   if [ $(($(date '+%s') - $(stat -qf '%m' $db))) -lt 600 ]; then
+   unbound-control load_cache  $db
+   else
+   echo 'failed (cache expired).'
+   fi
+   rm -f $db
+}
+
+restore_unbound_cache

Re: Preserving unbound cache across reboots

2015-01-30 Thread Maxim Khitrov

On Fri, Jan 30, 2015 at 12:54 PM, Ingo Schwarze schwa...@usta.de wrote:
 Hi,

 Maxim Khitrov wrote on Fri, Jan 30, 2015 at 10:22:23AM -0500:

 I wrote two simple functions for rc.shutdown and rc.login that
 save/restore unbound cache when the system is restarted. Since each
 record has a relative TTL field, the cache can only be restored within
 a short time window to avoid serving stale data to clients. I set this
 window to 10 minutes; enough to survive a reboot, but not for any
 extended downtime. Is there any interest in including this
 functionality in the base OS (moved to /etc/rc)?

 The purpose of rebooting is to reset the system to a clean state,
 so clearing caches looks like a feature rather than a bug.  Given
 that even unbound-control reload flushes the cache, a reboot
 should certainly do that, too.  So i wouldn't even recommend showing
 this to people as something they might add to their local scripts
 if they want to.  It just seems wrong.

 Also note that the unbound-control(8) manual explicitly marks
 load_cache as a debugging feature and warns that it may cause wrong
 data to be served.  On top of that, the version of unbound(8) running
 after the reboot might be newer than the version running before,
 so compatibility is questionable as well, so your proposal is very
 fragile at best.

 Besides, even if the goal would be desirable, which it is not, my
 feeling is that this code is too specialized for adding to the boot
 scripts.

Fair enough, though I would note that this feature is available in
pfSense, which is also using unbound. Some resolvers persist their
cache to disk automatically, so it's not that strange of an idea. I
wanted to share the code anyway for others who might be interested in
doing the same thing.

My thinking on this is that if the cache was valid before the reboot,
there is no good reason to clear it two minutes later just because the
kernel was upgraded. It creates a traffic spike and a noticeable
performance hit for the clients, especially with DNSSEC enabled. An
explicit reload is different because you do it when you change the
unbound configuration. Version upgrades are easily handled and I've
now added that to my scripts, so thanks for the suggestion.

Re: pf: question about tables derived from interface group

2014-12-28 Thread Maxim Khitrov

On Sun, Dec 28, 2014 at 6:38 AM, Harald Dunkel ha...@afaics.de wrote:
 Hi folks,

 pfctl can give me an extended list of tables showing interface
 group names, self, etc. Sample:

 # pfctl -g -sT
 egress
 egress:0
 extern
 extern:network
 intern:network
 nospamd
 self
 spamd-white
 unroutable

 How can I query the value of the special tables?

These tables are under the hidden _pf anchor:

pfctl -a _pf -t extern -T show

Re: pf: question about tables derived from interface group

2014-12-28 Thread Maxim Khitrov

On Sun, Dec 28, 2014 at 9:35 AM, Harald Dunkel ha...@afaics.de wrote:
 On 12/28/14 13:51, Maxim Khitrov wrote:

 These tables are under the hidden _pf anchor:

 pfctl -a _pf -t extern -T show


 Thats cool. Where did you find this? Searching on openbsd.org
 for _pf revealed only 
 http://www.openbsd.org/papers/ven05-henning/mgp00011.txt .
 This is surely something that should go to the man page or to
 the FAQs for pf.

Read the source code when I wanted to know how (if) was
implemented and whether there is any performance penalty associated
with this construct.

Re: OT: Does OpenBSD run on SuperMicro MicroCloud models, and may be on 5037MC-H12TRF

2014-05-16 Thread Maxim Khitrov

On Thu, May 15, 2014 at 8:51 PM, Daniel Ouellet dan...@presscom.net wrote:
 I was also looking at these two if the above one wasn't supported. But
 if I remember the Atom SoC one is not working on OpenBSD yet, but I
 could be wrong.

 SuperServer 5038MA-H24TRF
 http://www.supermicro.com/products/system/3U/5038/SYS-5038MA-H24TRF.cfm

I have the Supermicro A1SRi-2758F motherboard (Atom C2758 Rangeley).
No issues running OpenBSD 5.5 amd64.

Support for Intel QuickAssist on Atom Rangeley CPUs?

2014-03-12 Thread Maxim Khitrov

I'm about to purchase a new Supermicro Atom board for a firewall. The
decision is between Atom C2750 (Avoton) and C2758 (Rangeley) CPUs. The
latter is marketed as a communications processor and exchanges Turbo
Boost for QuickAssist, which seems to be an FPGA-type thing for
accelerating certain cryptographic and data compression functions.

Is there support for this in OpenBSD and does anyone have any
practical experience with using this hardware for VPN, SSL/TLS, or
anything else of that sort? I'm not sure whether the 2.4 - 2.6 GHz
Turbo Boost on C2750 will make any significant performance difference
for a firewall, but I'd rather have that if QuickAssist is not
supported.

- Max

Re: When are default 'set prio' priorities set?

2013-12-22 Thread Maxim Khitrov

On Fri, Dec 20, 2013 at 4:11 PM, Maxim Khitrov m...@mxcrypt.com wrote:
 I was under the impression that the packet priority was always set to
 3 prior to the pf ruleset evaluation (ignoring VLAN and CARP for a
 moment), and that 'set prio' on an inbound rule only affected
 returning traffic that matched the state entry. Here's an artificial
 example:

 pass out on $wan
 pass in on $lan set prio 7

 What will be the priority of outbound packets on the $wan interface, 3
 or 7? Looking at the code in pf.c, the priority is copied to
 m-m_pkthdr.pf.prio, but I'm not sure where this value is initialized
 or reset.

I think I figured this out, but I would appreciate a confirmation. The
m_pkthdr.pf.prio value is set to IFQ_DEFPRIO (3) in
sys/kern/uipc_mbuf.c when a new mbuf is allocated. It is not modified
after that except by pf rules. Therefore, packets going out on $wan in
my example will have their priority set to 7. Essentially, priorities
behave the same as tags.

The difference is that priorities are saved in the state entries, so
all subsequent packets coming in on $lan and matching an existing
state will have a priority of 7 when going out on $wan. Returning
packets will keep a default priority of 3 after crossing $wan, but
this will be changed to 7 when they match the state outbound on $lan.

Correct?

Re: How to segregate forwarded and firewall-generated traffic in pf?

2013-12-20 Thread Maxim Khitrov

On Thu, Dec 19, 2013 at 8:33 AM, Camiel Dobbelaar c...@sentia.nl wrote:
 On 18/12/13 22:32, Camiel Dobbelaar wrote:

 On 18/12/13 14:50, Maxim Khitrov wrote:

 On Wed, Dec 18, 2013 at 8:42 AM, Camiel Dobbelaar c...@sentia.nl wrote:

 On 18/12/13 13:53, Maxim Khitrov wrote:


 When writing outbound rules in pf, is there an accepted best practice
 for only matching packets that are either forwarded or
 firewall-generated?

 The best that I could come up with is 'received-on all' as a way of
 identifying forwarded packets, but that option can't be negated to
 match packets that were not received on any inbound interface (i.e.
 those generated by the firewall itself).

 Another option is 'from (self)', but then you have to be careful with
 any preceding nat rules. Ideally, I want a solution that doesn't
 depend on the context. I also tried to use tags in combination with
 'received-on', but it became rather messy and created conflicts with
 other tag usage.

 What is everyone else using to solve this problem?



 Check the user option in pf.conf(5):

   user user
   This rule only applies to packets of sockets owned by the
   specified user.  For outgoing connections initiated
 from the
   firewall, this is the user that opened the connection.
 For
   incoming connections to the firewall itself, this is
 the user
   that listens on the destination port.  For forwarded
 connections,
   where the firewall is not a connection endpoint, the
 user and
   group are unknown.


 I tried that a while ago and it doesn't work as documented:

 http://marc.info/?l=openbsd-bugsm=137650531124231w=2
 http://marc.info/?l=openbsd-bugsm=137658379014570w=2


 Nice of you to lure me in like this, and spent a few hours looking at
 the code.  :-)

 I'd say the feature is indeed broken, and probably has been for more
 then 10 years.

 in_pcblookup_listen() in pf.c is the culprit.  The destination IP does
 not seem to matter for the socket lookup and will match anything.  As
 you noticed, this makes forwarded traffic match too.

 So I guess the only way to make this work at all is to match the source
 and destination IP's yourself first in pf.conf like this:

 pass in  from any to self port 22 user root
 pass out from self to any user camield


 I think a documentation fix for pf.conf(5) is all that can be done.

 The diff adds the following paragraph:

  When listening sockets are bound to the wildcard address, pf(4)
  cannot determine if a connection is destined for the firewall
  itself.  To avoid false matches on just the destination port,
  combine a user rule with source or destination address self.

 Also, it deletes all mentions of the unknown user since it's useless.  And
 the example is updated.

 Better?

Not sure if you were asking me or other developers, but I think an
update to the man page is fine.

However, are you certain that pf cannot determine where the packet is
going? It should be possible to perform a routing check to find out
whether the destination IP belongs to the firewall, and thus may be
accepted by a wildcard address, or if it's going to be forwarded to
some other destination and should only match 'user unknown'. I think
something similar is already being done by the urpf-failed check, only
in reverse.

When are default 'set prio' priorities set?

2013-12-20 Thread Maxim Khitrov

I was under the impression that the packet priority was always set to
3 prior to the pf ruleset evaluation (ignoring VLAN and CARP for a
moment), and that 'set prio' on an inbound rule only affected
returning traffic that matched the state entry. Here's an artificial
example:

pass out on $wan
pass in on $lan set prio 7

What will be the priority of outbound packets on the $wan interface, 3
or 7? Looking at the code in pf.c, the priority is copied to
m-m_pkthdr.pf.prio, but I'm not sure where this value is initialized
or reset.

Re: How to segregate forwarded and firewall-generated traffic in pf?

2013-12-19 Thread Maxim Khitrov

On Thu, Dec 19, 2013 at 7:57 AM, Giancarlo Razzolini
grazzol...@gmail.com wrote:
 Em 18-12-2013 21:33, Andy Lemin escreveu:
 Fantastic! Thanks Camiel :)

 Sent from my iPhone

 On 18 Dec 2013, at 21:32, Camiel Dobbelaar c...@sentia.nl wrote:

 On 18/12/13 14:50, Maxim Khitrov wrote:
 On Wed, Dec 18, 2013 at 8:42 AM, Camiel Dobbelaar c...@sentia.nl wrote:
 On 18/12/13 13:53, Maxim Khitrov wrote:

 When writing outbound rules in pf, is there an accepted best practice
 for only matching packets that are either forwarded or
 firewall-generated?

 The best that I could come up with is 'received-on all' as a way of
 identifying forwarded packets, but that option can't be negated to
 match packets that were not received on any inbound interface (i.e.
 those generated by the firewall itself).

 Another option is 'from (self)', but then you have to be careful with
 any preceding nat rules. Ideally, I want a solution that doesn't
 depend on the context. I also tried to use tags in combination with
 'received-on', but it became rather messy and created conflicts with
 other tag usage.

 What is everyone else using to solve this problem?

 Check the user option in pf.conf(5):

  user user
  This rule only applies to packets of sockets owned by the
  specified user.  For outgoing connections initiated from the
  firewall, this is the user that opened the connection.  For
  incoming connections to the firewall itself, this is the user
  that listens on the destination port.  For forwarded
 connections,
  where the firewall is not a connection endpoint, the user and
  group are unknown.
 I tried that a while ago and it doesn't work as documented:

 http://marc.info/?l=openbsd-bugsm=137650531124231w=2
 http://marc.info/?l=openbsd-bugsm=137658379014570w=2
 Nice of you to lure me in like this, and spent a few hours looking at the 
 code.  :-)

 I'd say the feature is indeed broken, and probably has been for more then 
 10 years.

 in_pcblookup_listen() in pf.c is the culprit.  The destination IP does not 
 seem to matter for the socket lookup and will match anything.  As you 
 noticed, this makes forwarded traffic match too.

 So I guess the only way to make this work at all is to match the source and 
 destination IP's yourself first in pf.conf like this:

 pass in  from any to self port 22 user root
 pass out from self to any user camield

 Regards,
 Cam
 There are so many ways to do this. self rules, user, etc. But I'd say
 that you could also use tags to do policy based matching of packets that
 are firewall generated or firewall forwarded. Tags can be assigned
 before any nat matching rules take place, so you do not need to worry
 with them messing up your packet flow.

That's pretty much what I managed to come up with yesterday. I have
the following two rules at the top:

match out from (self) tag SELF
block out log quick received-on all tagged SELF

The second rule is mostly a sanity check. It ensures that you can't
accidentally add a SELF tag to an inbound packet and have it processed
as a firewall-generated packet. These are followed by a few rules
common to forwarded and firewall-generated packets. Finally, I split
the ruleset like so:

anchor out quick tagged SELF {
block return log

# Rules for firewall-generated traffic
...
}

# Rules for forwarded traffic
...

This seems like a good enough solution, but it would be cleaner if we
could do '!received-on all'. There is also a risk here that one of the
preceding rules could overwrite the SELF tag.

How to segregate forwarded and firewall-generated traffic in pf?

2013-12-18 Thread Maxim Khitrov

When writing outbound rules in pf, is there an accepted best practice
for only matching packets that are either forwarded or
firewall-generated?

The best that I could come up with is 'received-on all' as a way of
identifying forwarded packets, but that option can't be negated to
match packets that were not received on any inbound interface (i.e.
those generated by the firewall itself).

Another option is 'from (self)', but then you have to be careful with
any preceding nat rules. Ideally, I want a solution that doesn't
depend on the context. I also tried to use tags in combination with
'received-on', but it became rather messy and created conflicts with
other tag usage.

What is everyone else using to solve this problem?

Re: How to segregate forwarded and firewall-generated traffic in pf?

2013-12-18 Thread Maxim Khitrov

On Wed, Dec 18, 2013 at 8:42 AM, Camiel Dobbelaar c...@sentia.nl wrote:
 On 18/12/13 13:53, Maxim Khitrov wrote:

 When writing outbound rules in pf, is there an accepted best practice
 for only matching packets that are either forwarded or
 firewall-generated?

 The best that I could come up with is 'received-on all' as a way of
 identifying forwarded packets, but that option can't be negated to
 match packets that were not received on any inbound interface (i.e.
 those generated by the firewall itself).

 Another option is 'from (self)', but then you have to be careful with
 any preceding nat rules. Ideally, I want a solution that doesn't
 depend on the context. I also tried to use tags in combination with
 'received-on', but it became rather messy and created conflicts with
 other tag usage.

 What is everyone else using to solve this problem?


 Check the user option in pf.conf(5):

  user user
  This rule only applies to packets of sockets owned by the
  specified user.  For outgoing connections initiated from the
  firewall, this is the user that opened the connection.  For
  incoming connections to the firewall itself, this is the user
  that listens on the destination port.  For forwarded
 connections,
  where the firewall is not a connection endpoint, the user and
  group are unknown.


I tried that a while ago and it doesn't work as documented:

http://marc.info/?l=openbsd-bugsm=137650531124231w=2
http://marc.info/?l=openbsd-bugsm=137658379014570w=2

Re: How to control set prio

2013-12-17 Thread Maxim Khitrov

On Wed, Aug 7, 2013 at 12:10 PM, Henning Brauer lists-open...@bsws.de wrote:
 * Михаил Швецов mishve...@rambler.ru [2013-08-07 14:55]:
 How can i see that set prio works?

 it just does.

Sometimes it doesn't:

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf.c#rev1.862

I got into a habit of separating prioritization from filtering with a
bunch of match ... set prio ... rules at the start of the ruleset.
Seemed like a good idea at the time. I agree that some visualization
of set prio operation is needed. Perhaps systat can show the number
of packets assigned to each priority level for each interface over the
last N seconds?

I know that the design goal was to keep this as simple as possible,
but some stats would be helpful in understanding what is happening and
catching config errors.

Re: 10G NIC recommendation

2013-08-14 Thread Maxim Khitrov

On Wed, Aug 14, 2013 at 7:09 PM, Diana Eichert deich...@wrench.com wrote:
 What I want to do.

 create a netflow collector using OpenBSD by looking at
 data fed from a tap

 I know which 10G NICs are supported by OpenBSD, what I'd
 like to hear is a recommendation on which one of the
 following to use.

 $ apropos 10G
 che, cheg (4) - Chelsio Communications 10Gb Ethernet device
 ix (4) - Intel 82598/82599/X540 PCI Express 10Gb Ethernet device
 ixgb (4) - Intel PRO/10GbE 10Gb Ethernet device
 myx (4) - Myricom Myri-10G PCI Express 10Gb Ethernet device
 oce (4) - Emulex OneConnect 10Gb Ethernet device
 tht, thtc (4) - Tehuti Networks 10Gb Ethernet device
 xge (4) - Neterion Xframe/Xframe II 10Gb Ethernet device

 I do have a few Myricom 10G-PCIE2-8B2-2S available already.
 However I have funds available to get something else if one
 of the other cards performs better.

My only experience is with the X540, but I have no complaints. Here's
a discussion of some testing that I did last week:

http://marc.info/?l=openbsd-miscm=137588569703330w=2

Re: 10GbE (Intel X540) performance on OpenBSD 5.3

2013-08-09 Thread Maxim Khitrov

On Thu, Aug 8, 2013 at 9:35 PM, John Jasen jja...@realityfailure.org wrote:
 You may want to test jumbo frames, just to see what would happen. I
 would expect you to see closer to 10 Gb/s with the same number of
 interrupts.

Results for jumbo frames are below (spoiler: 10 Gbps, same number of
interrupts, 40% CPU0 usage).

 On 08/08/2013 08:26 PM, Maxim Khitrov wrote:
 Active Processor Cores: All

 I would turn that off, or at least make it only dual core.

No effect, results are also below.

 That's... a bit faster. The CPU in the desktops is Intel i7-3770,
 which is very similar to the Xeon E3-1275v2. Is this a FreeBSD vs
 OpenBSD difference?

 Could be. It might be worth testing FreeBSD on your packet forwarding
 boxes, just to see if you get similar results.

I installed FreeBSD on a USB flash drive, booted the backup firewall
from that, and ran iperf -c 127.0.0.1 -t 60:

[  3]  0.0-60.0 sec   373 GBytes  53.4 Gbits/sec

Almost the same as the desktops, so this performance boost is due to
FreeBSD (which keeps all cores at 70% load) and not the hardware.

Now for jumbo frames:

# s1: iperf -s
# c1: iperf -c s1 -t 60 -m
[  3]  0.0-60.0 sec  69.1 GBytes  9.89 Gbits/sec
[  3] MSS size 8192 bytes (MTU 8232 bytes, unknown interface)

With MTU set to 9000 along the entire path, a single client can max
out the 10 gigabit link through the firewall. This also addresses the
question of PCIe bandwidth - not an issue. I just had to double
kern.ipc.nmbjumbo9 to 12800 on all FreeBSD hosts before I could enable
jumbo frames (got ix0: Could not setup receive structures
otherwise).

Both clients together:

# s1: iperf -s
# s2: iperf -s
# c1: nc gw 1234 ; iperf -c s1 -t 60
# c2: nc gw 1234 ; iperf -c s2 -t 60
[  3]  0.0-60.0 sec  34.6 GBytes  4.95 Gbits/sec
[  3]  0.0-60.0 sec  34.5 GBytes  4.94 Gbits/sec

During all of these tests, systat shows 8k interrupts on each
interface, and CPU0 usage is 40% interrupt, 60% idle.

Going back to 1500 MTU, disabling Hardware Prefetcher and Adjacent
Cache Line Prefetch in BIOS has no effect:

# c1-s1
[  3]  0.0-60.0 sec  29.5 GBytes  4.22 Gbits/sec

# c1-s1, c2-s2
[  3]  0.0-60.0 sec  14.8 GBytes  2.12 Gbits/sec
[  3]  0.0-60.0 sec  15.7 GBytes  2.25 Gbits/sec

Same goes for disabling two of the cores:

# c1-s1
[  3]  0.0-60.0 sec  30.7 GBytes  4.39 Gbits/sec

# c1-s1, c2-s2
[  3]  0.0-60.0 sec  15.2 GBytes  2.18 Gbits/sec
[  3]  0.0-60.0 sec  15.2 GBytes  2.17 Gbits/sec

Same with bsd.sp kernel and all but one of the cores disabled:

# c1-s1
[  3]  0.0-60.0 sec  31.3 GBytes  4.48 Gbits/sec

# c1-s1, c2-s2
[  3]  0.0-60.0 sec  15.0 GBytes  2.15 Gbits/sec
[  3]  0.0-60.0 sec  16.1 GBytes  2.30 Gbits/sec

Finally, I went back to all cores enabled, bsd.mp kernel, Hardware
Prefetcher and Adjacent Cache Line Prefetch enabled:

# c1-s1
[  3]  0.0-60.0 sec  30.9 GBytes  4.43 Gbits/sec

# c1-s2, c2-s2
[  3]  0.0-60.0 sec  16.8 GBytes  2.40 Gbits/sec
[  3]  0.0-60.0 sec  14.0 GBytes  2.00 Gbits/sec

As you can see, none of these tweaks had any measurable impact. The
firewall can only handle so many packets per second. To push more
packets through, I need to reduce the per-packet processing overhead.
Here's a simple illustration of this fact using just the c1-s1 test:

# pf disabled (set skip on {ix0, ix1}):
[  3]  0.0-60.0 sec  37.4 GBytes  5.35 Gbits/sec

# pf enabled, no state on ix0:
[  3]  0.0-60.1 sec  8.28 GBytes  1.18 Gbits/sec

# pf enabled, keep state:
[  3]  0.0-60.0 sec  30.8 GBytes  4.41 Gbits/sec

# pf enabled, keep state (sloppy):
[  3]  0.0-60.0 sec  31.2 GBytes  4.46 Gbits/sec

# pf enabled, modulate state:
[  3]  0.0-60.0 sec  28.3 GBytes  4.05 Gbits/sec

# pf enabled, modulate state scrub (random-id reassemble tcp):
[  3]  0.0-60.0 sec  25.8 GBytes  3.69 Gbits/sec

The interesting thing about the last test is that systat shows double
the number of interrupts (32k total, 16k per interface) and CPU0 is
about 5% idle instead of the usual 10%. The rest is self-evident. More
work per packet = lower throughput. This is also another confirmation
that the sloppy state tracker has no performance benefits.

Unless someone has any other ideas on how to reduce the per-packet
processing time, I think ~4.5 Gbps is the most that my hardware can
handle at the default MTU. A bit disappointing, but it was the fastest
CPU that I could get from Lanner and also my first step beyond 1
gigabit.

If OpenBSD starts using multiple cores for interrupt processing in the
future, 10+ Gbps should be easy to achieve. FreeBSD is an option if
performance is critical, but for now I'd rather have all the 4.6+ pf
improvements.

Re: 10GbE (Intel X540) performance on OpenBSD 5.3

2013-08-09 Thread Maxim Khitrov

On Fri, Aug 9, 2013 at 11:52 AM, Henning Brauer lists-open...@bsws.de wrote:
 * Maxim Khitrov m...@mxcrypt.com [2013-08-09 17:47]:
 and ran iperf
 # s1: iperf -s
 # c1: iperf -c s1 -t 60 -m
 # s1: iperf -s
 # s2: iperf -s
 # c1: nc gw 1234 ; iperf -c s1 -t 60
 # c2: nc gw 1234 ; iperf -c s2 -t 60

 your tests are flawed. you are testing iperf ('s lack of) performance.

 use tcpbench. or an ixia.

These aren't available from FreeBSD packages. What about nuttcp?

# c1: nuttcp -t -T60 s1
 5442.6100 MB /  10.10 sec = 4521.6131 Mbps 34 %TX 60 %RX 1233
host-retrans 0.19 msRTT

# c1: nuttcp -t -T60 s1
# c2: nuttcp -t -T60 s2
15960.2372 MB /  60.10 sec = 2227.8129 Mbps 15 %TX 32 %RX 10532
host-retrans 0.19 msRTT
17349.9260 MB /  60.10 sec = 2421.8063 Mbps 19 %TX 33 %RX 10932
host-retrans 0.20 msRTT

TCP tests don't look any different. UDP is slightly better:

# c1: nuttcp -t -u -R 10g -T 60 s1
36592.9785 MB /  60.00 sec = 5116.0419 Mbps 96 %TX 48 %RX 21725 /
37492935 drop/pkt 0.05794 %loss

# c1: nuttcp -t -u -R 10g -T 60 s1
# c2: nuttcp -t -u -R 10g -T 60 s2
22217.3467 MB /  60.00 sec = 3105.9963 Mbps 96 %TX 38 %RX 14801348 /
37551911 drop/pkt 39.42 %loss
22270.5674 MB /  60.01 sec = 3113.3326 Mbps 96 %TX 40 %RX 14875602 /
37680663 drop/pkt 39.48 %loss

Re: 10GbE (Intel X540) performance on OpenBSD 5.3

2013-08-08 Thread Maxim Khitrov

Thanks to everyone for your advice! I'll try to respond to all the
questions at once and provide some more information about the testing
that I did today.

The BIOS on these firewalls is current. For power-saving options, when
I first configured these systems I tried turning Intel EIST
(SpeedStep) off, but this caused OpenBSD to panic during boot. The
panic text is copied at the end of this message, but the keyboard
didn't work at the ddb prompt (not even Ctrl-Alt-Del), so I couldn't
run any commands. Here's what my performance-related BIOS settings
look like:

Hyper-threading: Disabled
Active Processor Cores: All
Limit CPUID Maximum: Disabled
Execute Disable Bit: Enabled
Intel Virtualization Technology: Disabled
Hardware Prefetcher: Enabled
Adjacent Cache Line Prefetch: Enabled
EIST: Enabled
Turbo Mode: Enabled
CPU C3 Report: Disabled
CPU C6 Report: Disabled
CPU C7 Report: Disabled
VT-d: Disabled

I doubt that disabling EIST would have a significant performance
advantage. Latency may suffer a bit while the CPU raises its frequency
when the traffic hits, but I don't think this would affect throughput
testing. Tomorrow, I'll try disabling other cores and using bsd.sp
kernel to see if that performs any better. Might also play with the
hardware prefetcher settings.

Today, I started testing forwarding performance with pf enabled. I put
the second firewall aside and installed the X540-T2 cards into four
identical Dell OptiPlex 9010 desktops. Two servers (s1  s2) and two
clients (c1  c2). Each pair was connected through a Dell
PowerConnect 8164 10GbE switch to a separate port on the firewall. The
two switches had no other connections. I installed FreeBSD 9.1-RELEASE
amd64 on the desktops.

As a side note, iperf doesn't crash on FreeBSD when running in UDP
mode, so I think it's a problem with the OpenBSD package. For these
tests I stuck with TCP and 1500 MTU. Also, I noticed that a 10 second
test is not always sufficient to get consistent results, so I'm now
running all tests for 60 seconds.

First test is iperf on 127.0.0.1 to compare these desktops with the
11.6 Gbps that I got on the firewall:

# c1: iperf -s
# c1: iperf -c 127.0.0.1 -t 60
[  3]  0.0-59.9 sec   402 GBytes  57.7 Gbits/sec

That's... a bit faster. The CPU in the desktops is Intel i7-3770,
which is very similar to the Xeon E3-1275v2. Is this a FreeBSD vs
OpenBSD difference?

Second test is c1 - c2 via the 8164 switch (not involving the firewall yet):

# c2: iperf -s
# c1: iperf -c c2 -t 60
[  4]  0.0-60.1 sec  40.2 GBytes  5.74 Gbits/sec

A single desktop can't saturate the link, at least with the default
settings, but two on each side should be plenty to test the firewall
to its limit.

Third test is c1 - s1 through the firewall with pf stateful filtering:

# s1: iperf -s
# c1: iperf -c s1 -t 60
[  3]  0.0-60.0 sec  30.0 GBytes  4.29 Gbits/sec

I watched systat and top on the firewall while this test was running.
16k interrupts evenly split between ix0 and ix1, and ~90% interrupt
usage on CPU0.

Fourth test is c1 - s1 and c2 - s2. I used a netcat server on the
firewall (nc -l 1234) to synchronize both clients. They started iperf
as soon as I killed the server with Ctrl-C:

# s1: iperf -s
# s2: iperf -s
# c1: nc gw 1234; iperf -c s1 -t 60
# c2: nc gw 1234; iperf -c s2 -t 60
[  3]  0.0-60.0 sec  14.4 GBytes  2.07 Gbits/sec
[  3]  0.0-60.0 sec  15.8 GBytes  2.26 Gbits/sec

An even split of the single client performance, indicating that the
firewall is the bottleneck. No changes in systat and top, so it does
look like the CPU is the limiting factor.

Finally, I used set skip on {ix0, ix1} to disable pf on these two
interfaces and re-ran the same test:

[  3]  0.0-60.0 sec  18.1 GBytes  2.59 Gbits/sec
[  3]  0.0-60.0 sec  16.3 GBytes  2.34 Gbits/sec

A small improvement, but I think it's fair to say that pf isn't the problem.

Will do some more testing tomorrow. Here's the boot panic when I
disable SpeedStep in BIOS:

acpiec0 at acpi0: Failed to read resource settings
acpicpu0 at acpi0Store to default type! 100

01a4 Called: \_PR_.CPU0._PDC
  arg0: 0x801af588 cnt:01 stk:00 buffer: 0c {01, 00, 00, 00,
01, 00, 00, 00, 3b, 03, 00, 00}
panic: aml_die aml_store:2621
Stopped at  Debugger+0x5:  leave
Debugger() at Debugger+0x5
panic() at panic+0xe4
_aml_die() at _aml_die+0x183
aml_store() at aml_store+0xbb
aml_parse() at aml_parse+0xcd7
aml_eval() at aml_eval+0x1c8
aml_evalnode() at aml_evalnode+0x63
acpicpu_set_pdc() at acpicpu_set_pdc+0x8c
acpucpu_attach() at acpicpu_attach+0x9e
config_attach() at config_attach+0x1d4
end trace frame: 0x80e6da90, count: 0

10GbE (Intel X540) performance on OpenBSD 5.3

2013-08-07 Thread Maxim Khitrov

Hi all,

I'm looking for performance measuring and tuning advice for 10 gigabit
Ethernet. I have a pair of Lanner FW-8865 systems that will be used as
firewalls for the local network. Each one has a Xeon E3-1270v2 CPU,
Intel X540 10GbE NIC (PCIe 3.0 8x), and 8GB DDR3-1600 ECC RAM. Before
putting them into production I wanted to do some throughput testing,
so I connected one directly to the other (via ix0 interfaces) and used
iperf to see how much data I can push through. I also disabled pf for
now, but will do some additional testing with it enabled later on. The
kernel is 5.3 amd64 GENERIC.MP.

The initial iperf runs couldn't go beyond ~3.2 Gbps:

# server: iperf -s
# client: iperf -c 192.168.1.3
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.2 sec  3.84 GBytes  3.22 Gbits/sec

Increasing the TCP window size to 256 KB (seems to be the upper limit)
brings this up to ~4.2 Gbps:

# server: iperf -s -w 256k
# client: iperf -c 192.168.1.3 -w 256k
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.1 sec  4.96 GBytes  4.22 Gbits/sec

Increasing the MTU on both ix0 interfaces to 9000 gives me ~7.2 Gbps:

# server: ifconfig ix0 mtu 9000  iperf -s -w 256k
# client: ifconfig ix0 mtu 9000  iperf -c 192.168.1.3 -w 256k -m
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  8.39 GBytes  7.21 Gbits/sec
[  3] MSS size 8948 bytes (MTU 8988 bytes, unknown interface)

This is where I'm stuck at the moment. When running iperf on
127.0.0.1, which should only test CPU and memory, I get 11.6 Gbps.
I've read the Network Tuning and Performance Guide @ calomel.org,
but none of the tips there help me in getting beyond 7 Gbps on the
physical interfaces.

I'm also slightly concerned about the performance at the default MTU
of 1500. Looking at `ifconfig ix0 hwfeatures` output (below), it seems
that the ix driver does not support any checksum offloading for the
X540. I wonder if that could be a reason for the poor performance?

ix0: flags=28843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,NOINET6 mtu 9000
hwfeatures=30VLAN_MTU,VLAN_HWTAGGING hardmtu 16110
lladdr 00:90:0b:56:12:0c
priority: 0
groups: LAN SVR
media: Ethernet autoselect (10GbaseT full-duplex)
status: active
inet 192.168.1.2 netmask 0xff00 broadcast 192.168.1.255

Any there any sysctl parameters that I should play with? Any other
system stats that I should monitor? I did a few runs while watching
`top` and `systat vmstat`, but didn't see any problem indications
there. I should also note that I couldn't run iperf in UDP mode - the
client segfaults any time I increase the bandwidth beyond 300 Mbps. No
idea why, but I'm more interested in TCP performance anyway.

- Max

Re: 10GbE (Intel X540) performance on OpenBSD 5.3

2013-08-07 Thread Maxim Khitrov

On Wed, Aug 7, 2013 at 10:31 AM, Martin Schröder mar...@oneiros.de wrote:
 2013/8/7 Maxim Khitrov m...@mxcrypt.com:
 I've read the Network Tuning and Performance Guide @ calomel.org,

 Ignore that site and search the list archives.

Understood :)

I found a number of recommendations for the things to keep an eye on,
but nothing that gave me any ideas on what else to try for improving
the performance.

Specifically, I looked at netstat -m on both systems, where everything
was well below the limits (500 mbufs in use during the test). I see
about 8200/8700 (ix0/total) interrupts in systat with 1500 MTU. CPU
usage in top is split between two cores, one at ~80% interrupt and the
other at ~80% system. Most of the time all four cores are at least 10%
idle (hyper-threading is disabled in BIOS). netstat -i shows no errors
for ix0 and sysctl net.inet.ip.ifq.drops is at 0 on both systems.

What did surprise me is that netstat -ss (output below) shows that all
received packets were hardware-checksummed, but this value is 0 for
sent packets. Does this mean that ix supports checksum offloading, but
only for inbound packets? This should be a bit of good news for me
once I start testing forwarding performance. I assume that as long as
pf doesn't modify the packet (no nat/rdr, modulate state, scrubbing,
etc.), then there shouldn't be any need to recompute the checksum.
Correct?

ip:
39827125 total packets received
39820936 packets for this host
40 packets for unknown/unsupported protocol
77150033 packets sent from this host
39826536 input datagrams checksum-processed by hardware
icmp:
147 calls to icmp_error
Output packet histogram:
destination unreachable: 48
Input packet histogram:
echo reply: 2
destination unreachable: 40
igmp:
ipencap:
tcp:
77147020 packets sent
77145183 data packets (111695427326 bytes)
2 data packets (2836 bytes) retransmitted
1763 ack-only packets (4427 delayed)
6 window update packets
66 control packets
39817607 packets received
38983910 acks (for 111695426667 bytes)
18 duplicate acks
5814 packets (560082 bytes) received in-sequence
38 out-of-order packets (872 bytes)
830155 window update packets
1 packet received after close
39817153 packets hardware-checksummed
41 connection requests
10 connection accepts
49 connections established (including accepts)
43 connections closed (including 1 drop)
38983035 segments updated rtt (of 1217192 attempts)
4 retransmit timeouts
2 keepalive timeouts
2 keepalive probes sent
601 correct ACK header predictions
3276 correct data packet header predictions
20 PCB cache misses
cwr by timeout: 4
10 SYN cache entries added
10 completed
3 SACK options received
1 SACK option sent
udp:
3327 datagrams received
39 with no checksum
3193 input packets hardware-checksummed
47 dropped due to no socket
3280 delivered
2958 datagrams output
708 missed PCB cache
esp:
ah:
etherip:
ipcomp:
carp:
pfsync:
divert:
pflow:
ip6:
2 total packets received
4 packets sent from this host
Input packet histogram:
ICMP6: 2
Mbuf statistics:
2 one ext mbufs
divert6:
icmp6:
Output packet histogram:
multicast listener report: 4
Histogram of error messages to be generated:
pim6:
rip6:

Re: 10GbE (Intel X540) performance on OpenBSD 5.3

2013-08-07 Thread Maxim Khitrov

On Wed, Aug 7, 2013 at 11:44 AM, Florian Obser flor...@narrans.de wrote:
 On Wed, Aug 07, 2013 at 10:26:22AM -0400, Maxim Khitrov wrote:
 Hi all,

 I'm looking for performance measuring and tuning advice for 10 gigabit
 Ethernet. I have a pair of Lanner FW-8865 systems that will be used as
 firewalls for the local network.
 [...]
 The initial iperf runs couldn't go beyond ~3.2 Gbps:

 you expect a lot of localy generated traffic on your firewall?
 (if the answer is no, why are you testing that?)

No :) But it was the first step until I have a third system with a
10GbE port. I have 15 Intel X540-T2 cards waiting to be installed.
Once I have another server that can generate the traffic, I'll test
the forwarding performance with pf enabled.

 [...]
 Increasing the MTU on both ix0 interfaces to 9000 gives me ~7.2 Gbps:

 you expect a lot of jumbo frames in front of / behind your firewall?
 (if the answer is no, why are you testing that?)

It's a possibility. What this tells me, however, is that the the
throughput isn't the (main) problem. The per-packet processing
overhead appears to be the limiting factor, which is why I asked about
checksum offloading.

 anyway, I was testing an Intel 82599 system in July which will become
 a border router. All of this is forwarding rate; it took me 2 days to
 beg, borrow and steal enough hw to actually generate the traffic.  (I
 had 4 systems in front of and 4 systems behind the router, all doing
 1Gb/s)

What tools were you using to generate the traffic and to calculate
bytes/packets per second? I assume interrupts per second came from
systat?

Outdated documentation for scrub (no-df) in pf.conf(5)?

2013-07-25 Thread Maxim Khitrov

Hi,

The no-df flag can be specified in the set reassemble option or a
scrub rule. From looking at the source, I don't think scrub
(no-df) does what the man page says it does. To reassemble fragmented
packets with the DF flag set, one has to use set reassemble yes
no-df option. By the time any scrub rules are applied, the packet is
already reassembled, so scrub (no-df) simply clears the DF flag for
all _complete_ packets (pf_scrub in sys/net/pf_norm.c).

I don't see how this fixes problems with fragmented NFS packets, and I
suspect that this breaks legitimate uses of DF, such as MTU discovery.
Is the documentation wrong (possibly from before OpenBSD 4.6, when
scrub was a separate option) or am I misinterpreting the code?

- Max

pf scrub options in OpenBSD 5.3

2013-07-24 Thread Maxim Khitrov

Hi all,

A few questions about the operation of pf scrub options in OpenBSD 5.3:

1. In 2010 Henning advised against the use of reassemble tcp (link
below). Is this advice still applicable and what are the known issues
that this option may cause in the current implementation?

http://marc.info/?l=openbsd-miscm=126343406308201w=2

2. Am I correct in assuming that the following example ruleset would
be more efficient (and work the same way) if the 'match on LAN' rule
was removed, or if scrubbing was only done for inbound packets (match
in ...)?

match on WAN scrub (no-df random-id)
match on LAN scrub (no-df random-id)
pass

I'm trying to figure out exactly when options like random-id and
reassemble tcp are applied. My current understanding is that a
packet passing from LAN to WAN with the above ruleset will have its id
randomized twice, and the same thing will happen for any returning
packet that matches the two state entries. If I change both match
rules to 'match in ...', then packets in both directions are scrubbed
just once, but the returning packets are scrubbed as they leave the
firewall instead of when they are first received. Is all of that
right?

If so, does it actually matter that the returning packets are not
scrubbed when they are first received? For example, if reassemble
tcp or min-ttl options are used and the other side lowers its TTL
value to the point where the response packet expires upon reaching the
firewall, then the TTL check will have no effect, since the OS
wouldn't forward the packet to the outbound interface or run the
second state check.

- Max

pf: inline anchor rules in not enough to keep tables in memory?

2013-03-13 Thread Maxim Khitrov

Hello,

I was a bit surprised by the following behavior when configuring pf on
OpenBSD 5.2. Non-persistent tables that are only referenced by inline
anchor rules, as in the following example, are removed from memory
when pf.conf is loaded.

# Doesn't work (ssh connections are blocked):
table admins {10.0.0.2}
block
pass out
anchor in on ix1 {
pass proto tcp from admins to ix1 port ssh
}

# Works as expected:
table admins persist {10.0.0.2}
block
pass out
anchor in on ix1 {
pass proto tcp from admins to ix1 port ssh
}

After loading the first configuration, 'pfctl -t admins -T show' gives me:

pfctl: Table does not exist.

Referencing the table in the main ruleset, or making it persistent as
in the second example, fixes the problem. Is this by design?

- Max

Re: pf: inline anchor rules in not enough to keep tables in memory?

2013-03-13 Thread Maxim Khitrov

On Wed, Mar 13, 2013 at 1:59 PM, Michel Blais mic...@targointernet.com wrote:
 I think you must specify the anchor first. Something like :

 pfctl -a ix1 -t admins -T show

That doesn't work. First, it's an unnamed anchor, so I don't think you
can specify it with the -a option. Second, inbound connections to port
22 are rejected in the first case, but not in the second. The table is
removed as though it was unreferenced, so the pass rule in the anchor
doesn't match any source IPs.

- Max

Re: Request improvement for faq 15.2

2012-12-27 Thread Maxim Khitrov

On Thu, Dec 27, 2012 at 10:10 AM, Live user nots...@live.com wrote:
 I think 15.2.2 should go before 15.1.1, since if there's no point in running
 pkg_* when the PKG_PATH is empty, which is after installing using the
 interactive method.

 Furthermore, using 'export PKG_PATH=' sets a volatile variable, which in
 blank again after restarting. I think the faq may include the guideline to
 make it persistent as well.

I went through most of the FAQ this weekend and didn't see any mention
of /etc/pkg.conf as an alternative to PKG_PATH. Might be better to
document the use of this configuration file, which I think is created
automatically if you install the system from an ftp or http mirror.

- Max

38 matches

Mail list logo