On 2020-03-11 12:41, Anders Andersson wrote:
On Tue, Mar 10, 2020 at 10:53 PM Jordan Geoghegan wrote:
pf-badhost and unbound-adblock are both now at version 0.3, released
earlier today.
Links to the scripts can be found here:
www.geoghegan.ca/pfbadhost.html
www.geoghegan.ca/unbound-adblock.html
Thanks, this looks very interesting! But maybe you can help answering
a question that popped up when I read your page about pf-badhost.
You mention that "Subnet aggregation is used to take the address list
and "aggregate" the addresses into the smallest possible
representation using CIDR blocks.", but I was under the assumption
that pf already did this for its tables to speed up lookups.
Is there anything preventing the aggregation code to run on every pf
table modification? Assuming an already sorted list, it shouldn't take
long to merge a new entry. Perhaps I've missed some use of pf tables
that makes this impossible or not applicable in the general case.
Hi Anders,
I am by no means an expert on the nuts and bolts of pf, but I do know
that pf stores table data in a radix tree / radix table. By their
nature, radix trees ignore exact duplicates, but I'm not exactly sure
how they handle the partial overlapping of ranges. This article gives an
easy to follow cursory overview of raddix trees if you're interested:
https://blog.sqreen.com/demystifying-radix-trees/
As far as I understand, pf makes no modifications to the contents of
your tables, all it does is parse the list to confirm the addresses
and/or CIDR blocks are valid. When it's looking for matches within
ranges, it will look for the most specific match available. For example,
if you have a list containing an overlap:
...
192.168.0.0/16
192.168.1.0/22
...
When a packet from 192.168.1.5 arrives and is processed by a rule
referencing this table, it will match with 192.168.1.0/22. Even though
both entries are valid and match the packet, the /22 is more specific,
and thus the one which matches closest.
pf may do some magic optimizations under the hood that I'm unaware of,
but at the end of the day, it does not modify the actual contents of
your table.
The use I've found in the subnet aggregation function has been mostly
for the purpose of keeping the list clean and tidy. I have a few
installations where I have all the lists enabled, including the use of
the GeoIP country blacklisting function. On these installations, subnet
aggregation can reduce the /etc/pf-badhost.txt file from ~60,000 lines
down to ~40,000 lines. For example, when blocking China's netblocks
(which pulls an aggregated list of all addresses assigned to China by
APNIC, and thus uses massive CIDR blocks of /10's etc), if any addresses
from any of the other blocklists come from China, they will be removed
from the list as they are already covered by the CIDR block info from
APNIC. I run pf-badhost on a bunch of Edgerouter Lites, and I've found
them to run better when the lists are tidy.
With regards to pf performing aggregation on all tables automatically,
it wouldn't make sense to run the full subnet aggregation calculations
for every table load or insertion/removal, as it can be quite CPU
intensive. It takes less than a second to load the table on a $5 Vultr
VPS, it takes 20-70 seconds to run the subnet aggregation (depending on
which lists are enabled). On my Edgerouter Pro with all the lists
enabled, it takes ~6 minutes. On my Edgerouter Lite it takes ~15 minutes
to run (over 2 hours when using the built in Perl-based aggregator). I
just run the aggregation function with nice and let it do its thing, its
being called by cron in the wee hours, so I'm fine just letting it chug
along.
Regards,
Jordan