Re: pflow on PE router

2021-06-06 Thread Stuart Henderson
On 2021-06-06, Patrick Dohman  wrote:
> Perhaps it has something to do with Citrix being a dinosaur.
> God forbid the powers that be choose on premise unix.
> Regards
> Patrick

Your message doesn't appear to relate in any way to the message to which you're 
replying.


>> On Jun 4, 2021, at 6:43 AM, Stuart Henderson  wrote:
>> 
>> On 2021/06/03 15:04, Chris Cappuccio wrote:
>>> Stuart Henderson [s...@spacehopper.org] wrote:
 
 Oh watch out with sloppy. Keep an eye on your state table size.
>>> 
>>> Really? Wouldn't sloppy keep the state table smaller if anything since it's 
>>> tracking less specifically?
>>> 
>>> Anyways I use sloppy across four boxes that run in parallel with pfsync. 
>>> There could easily be 10,000 devices behind it at any given time. I keep my 
>>> state table limit at 1,000,000. It's around 300,000 during this lighter 
>>> traffic period today. I had to do sloppy after moving to several boxes in 
>>> parallel, I didn't notice sloppy making any significant difference?
>>> 
>>> Chris
>> 
>> The problem I had was in conjunction with synfloods. I didn't get
>> captures for everything to figure it out (it was in 2018 and my
>> network was in flames, with the full state table bgp sessions were
>> getting dropped / not reestablishing) but I think what happened was
>> this,
>> 
>> spoofed SYN to real server behind PF
>> SYN+ACK from server
>> 
>> and the state entry ended up as ESTABLISHED:ESTABLISHED where it
>> remained until the tcp.established timer expired (24h default
>> or 5h with "set optimization aggressive").
>> 
>> My "fix" was to move as much as possible to "pass XX flags any no state"
>> but that's clearly not going to help with what Denis would like to do.
>> (fwiw - I'm not doing flow monitoring regularly, but when I do it's
>> usually via sflow on switches instead, which solves some problems,
>> though it's only possible in some situations).
>> 
>
>



Re: pflow on PE router

2021-06-06 Thread Patrick Dohman
Perhaps it has something to do with Citrix being a dinosaur.
God forbid the powers that be choose on premise unix.
Regards
Patrick

> On Jun 4, 2021, at 6:43 AM, Stuart Henderson  wrote:
> 
> On 2021/06/03 15:04, Chris Cappuccio wrote:
>> Stuart Henderson [s...@spacehopper.org] wrote:
>>> 
>>> Oh watch out with sloppy. Keep an eye on your state table size.
>> 
>> Really? Wouldn't sloppy keep the state table smaller if anything since it's 
>> tracking less specifically?
>> 
>> Anyways I use sloppy across four boxes that run in parallel with pfsync. 
>> There could easily be 10,000 devices behind it at any given time. I keep my 
>> state table limit at 1,000,000. It's around 300,000 during this lighter 
>> traffic period today. I had to do sloppy after moving to several boxes in 
>> parallel, I didn't notice sloppy making any significant difference?
>> 
>> Chris
> 
> The problem I had was in conjunction with synfloods. I didn't get
> captures for everything to figure it out (it was in 2018 and my
> network was in flames, with the full state table bgp sessions were
> getting dropped / not reestablishing) but I think what happened was
> this,
> 
> spoofed SYN to real server behind PF
> SYN+ACK from server
> 
> and the state entry ended up as ESTABLISHED:ESTABLISHED where it
> remained until the tcp.established timer expired (24h default
> or 5h with "set optimization aggressive").
> 
> My "fix" was to move as much as possible to "pass XX flags any no state"
> but that's clearly not going to help with what Denis would like to do.
> (fwiw - I'm not doing flow monitoring regularly, but when I do it's
> usually via sflow on switches instead, which solves some problems,
> though it's only possible in some situations).
> 



Re: pflow on PE router

2021-06-04 Thread Stuart Henderson
On 2021/06/03 15:04, Chris Cappuccio wrote:
> Stuart Henderson [s...@spacehopper.org] wrote:
> > 
> > Oh watch out with sloppy. Keep an eye on your state table size.
> 
> Really? Wouldn't sloppy keep the state table smaller if anything since it's 
> tracking less specifically?
> 
> Anyways I use sloppy across four boxes that run in parallel with pfsync. 
> There could easily be 10,000 devices behind it at any given time. I keep my 
> state table limit at 1,000,000. It's around 300,000 during this lighter 
> traffic period today. I had to do sloppy after moving to several boxes in 
> parallel, I didn't notice sloppy making any significant difference?
> 
> Chris

The problem I had was in conjunction with synfloods. I didn't get
captures for everything to figure it out (it was in 2018 and my
network was in flames, with the full state table bgp sessions were
getting dropped / not reestablishing) but I think what happened was
this,

 spoofed SYN to real server behind PF
 SYN+ACK from server

and the state entry ended up as ESTABLISHED:ESTABLISHED where it
remained until the tcp.established timer expired (24h default
or 5h with "set optimization aggressive").

My "fix" was to move as much as possible to "pass XX flags any no state"
but that's clearly not going to help with what Denis would like to do.
(fwiw - I'm not doing flow monitoring regularly, but when I do it's
usually via sflow on switches instead, which solves some problems,
though it's only possible in some situations).



Re: pflow on PE router

2021-06-03 Thread Chris Cappuccio
Stuart Henderson [s...@spacehopper.org] wrote:
> 
> Oh watch out with sloppy. Keep an eye on your state table size.

Really? Wouldn't sloppy keep the state table smaller if anything since it's 
tracking less specifically?

Anyways I use sloppy across four boxes that run in parallel with pfsync. There 
could easily be 10,000 devices behind it at any given time. I keep my state 
table limit at 1,000,000. It's around 300,000 during this lighter traffic 
period today. I had to do sloppy after moving to several boxes in parallel, I 
didn't notice sloppy making any significant difference?

Chris



Re: pflow on PE router

2021-06-03 Thread Patrick Dohman
I suspect that you’ll be out of luck until TLSv1.3 is implemented. 
I’ve found the same to be true with the new 10 gb sfp switches in our 
infrastructure which surprisingly still implement TLSv1.0 & broken CGI web 
server.
Regards
Patrick

> On Jun 1, 2021, at 3:44 PM, Stuart Henderson  wrote:
> 
> On 2021-05-30, Denis Fondras  wrote:
>> Le Fri, May 28, 2021 at 03:30:58PM -0700, Chris Cappuccio a écrit :
>>> You might try "set state-defaults pflow, sloppy", also in some scenarios 
>>> you 
>>> might need "set state-policy floating"
>>> 
>>> If "sloppy" fixes it, there may be some bugs to hunt.
>>> 
>> 
>> "sloppy" seems to fix the issue. I will do more tests this week before 
>> declaring
>> victory :)
>> 
>> Thank you Chris.
>> 
>> 
> 
> Oh watch out with sloppy. Keep an eye on your state table size.
> 



Re: pflow on PE router

2021-06-01 Thread Stuart Henderson
On 2021-05-30, Denis Fondras  wrote:
> Le Fri, May 28, 2021 at 03:30:58PM -0700, Chris Cappuccio a écrit :
>> You might try "set state-defaults pflow, sloppy", also in some scenarios you 
>> might need "set state-policy floating"
>> 
>> If "sloppy" fixes it, there may be some bugs to hunt.
>>
>
> "sloppy" seems to fix the issue. I will do more tests this week before 
> declaring
> victory :)
>
> Thank you Chris.
>
>

Oh watch out with sloppy. Keep an eye on your state table size.



Re: pflow on PE router

2021-06-01 Thread Chris Cappuccio
Denis Fondras [open...@ledeuns.net] wrote:
> 
> "sloppy" seems to fix the issue. I will do more tests this week before 
> declaring
> victory :)
> 

If that really works, then there could be a problem with PF sequence number 
tracking. Can you develop a specific sequence of events to reproduce the 
failures?



Re: pflow on PE router

2021-05-30 Thread Patrick Dohman


> "sloppy" seems to fix the issue. I will do more tests this week before 
> declaring
> victory :)
> 
> Thank you Chris.
> 

Get somme ;)
Regards
Patrick



Re: pflow on PE router

2021-05-30 Thread Denis Fondras
Le Fri, May 28, 2021 at 03:30:58PM -0700, Chris Cappuccio a écrit :
> You might try "set state-defaults pflow, sloppy", also in some scenarios you 
> might need "set state-policy floating"
> 
> If "sloppy" fixes it, there may be some bugs to hunt.
>

"sloppy" seems to fix the issue. I will do more tests this week before declaring
victory :)

Thank you Chris.



Re: pflow on PE router

2021-05-28 Thread Chris Cappuccio
Denis Fondras [open...@ledeuns.net] wrote:
> Hello,
> 
> I used OpenBSD as a PE router on my network. The router is connected to an 
> IX, a
> transit and multiple peers with OpenBGPd.
> 
> Earlier this week, I enabled pflow(4) to track traffic usage.
> Unfortunately enabling pf(4) on a edge router does not seems like a good idea.
> Some peers called in to tell they notice multiple problems (ranging from what
> seems MTU problem to cut in lengthy TCP sessions), deactivating pf(4)
> instantaneously fixed the problem on their side, reactivating pf($) and the
> problems are back.
> 
> I tried to push up the state table (I reached 300k states), to no avail.
> 
> Do you know what are the "right settings" to have pflow(4) enabled on PE 
> router
> ?

Pflow requires pf to be enabled to create states otherwise there is nothing to
export. You could use a different flow generator tool (there is at least one
in ports) that will watch the traffic over bpf and generate flow data.

You might try "set state-defaults pflow, sloppy", also in some scenarios you 
might need "set state-policy floating"

If "sloppy" fixes it, there may be some bugs to hunt.



Re: pflow on PE router

2021-05-16 Thread Denis Fondras
Here are some more infos :

>- does running pf(4) without pflow(4) cause issue? 

Yes, the issue is linked to pf(4) being enabled.

>- can you confirm you were running with pf(4) disabled prior to enabling 
> pflow(4)?

I do confirm. I never enable pf(4) on edge routers, it bit in the past with
assymetric routing :)

>- are you able to provide or indicate your pf.conf? 

--- /etc/pf.conf ---
set state-defaults pflow
set limit states 100

pass
--- /etc/pf.conf ---

>- how many pf(4) states are you seeing in # pfctl -s info ? what is the 
> removal rate?

depending on the period of the day, it ranges from 300 to 30.
The removal rate was 112761228.5/s when I disabled pf(4) again.

>- was traffic to the pflow sink machine transiting MPLS?  

No, there is no MPLS involved at all. (I guess PE was not the right word, but
edge router might have triggered Ubiquiti fans...)

>- can you provide a dmesg

I upgraded this morning, problem is still the same :

OpenBSD 6.9-current (GENERIC.MP) #20: Sun May 16 00:32:45 MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34228760576 (32643MB)
avail mem = 33175949312 (31639MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xdab19000 (51 entries)
bios0: vendor American Megatrends Inc. version "1.0c" date 06/30/2020
bios0: Supermicro AS -5019D-FTN4
acpi0 at bios0: ACPI 6.1
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SPMI SSDT MCFG SSDT CRAT CDIT BERT 
EINJ HEST HPET SSDT UEFI SSDT WSMT
acpi0: wakeup devices S0D0(S3) S0D1(S3) S0D2(S3) S0D3(S3) S1D0(S3) S1D1(S3) 
S1D2(S3) S1D3(S3)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD EPYC 3251 8-Core Processor, 2500.55 MHz, 17-01-02
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: AMD EPYC 3251 8-Core Processor, 2500.01 MHz, 17-01-02
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: AMD EPYC 3251 8-Core Processor, 2500.01 MHz, 17-01-02
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache
cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: AMD EPYC 3251 8-Core Processor, 2500.01 MHz, 17-01-02
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu3: 64KB 64b/line 4-way I-cache, 

pflow on PE router

2021-05-14 Thread Denis Fondras
Hello,

I used OpenBSD as a PE router on my network. The router is connected to an IX, a
transit and multiple peers with OpenBGPd.

Earlier this week, I enabled pflow(4) to track traffic usage.
Unfortunately enabling pf(4) on a edge router does not seems like a good idea.
Some peers called in to tell they notice multiple problems (ranging from what
seems MTU problem to cut in lengthy TCP sessions), deactivating pf(4)
instantaneously fixed the problem on their side, reactivating pf($) and the
problems are back.

I tried to push up the state table (I reached 300k states), to no avail.

Do you know what are the "right settings" to have pflow(4) enabled on PE router
?

Thank you in advance,
Denis