Re: unveil(2) tcpdump(8)

2018-09-27 Thread Ricardo Mestre
Hi tech@

I've commited this, please test it as much as possible by applying the diff
right now or just wait for the next snapshot.

Let it run for a long time and let me know of any problems as soon as you get
any. For this your ktrace, pcap and coredump files will be VERY important for
further analysis of the issue. As noted in sysctl(8)'s man in order to generate
the dumps in a safe place please do the following:

# mkdir -m 700 /var/crash/tcpdump
# sysctl kern.nosuidcoredump=3

If someone has weird networks then even better for testing it since mine is
pretty much vanilla and might not have been enough to see flaws with the diff.

I can't stress this enough, please test it and report it ASAP either here or in
private.

On 18:08 Wed 26 Sep , Ricardo Mestre wrote:
> I'm not too worried about the crash, I was playing with removing rpath
> from pledge in the case that /etc/ethers wasn't need which sure enough
> it is. pledge was most likely the reason that made it crash.
> 
> But brynet@ pointed out that while he was working on tcpdump last year
> he saw that it also needs to open /etc/rpc, and yes it calls
> getrpcbynumber(3) so an updated diff is below.
> 
> Index: privsep.c
> ===
> RCS file: /cvs/src/usr.sbin/tcpdump/privsep.c,v
> retrieving revision 1.48
> diff -u -p -u -r1.48 privsep.c
> --- privsep.c 8 Aug 2018 22:57:12 -   1.48
> +++ privsep.c 26 Sep 2018 17:04:25 -
> @@ -207,7 +207,7 @@ __dead void
>  priv_exec(int argc, char *argv[])
>  {
>   int bpfd = -1;
> - int i, sock, cmd, nflag = 0, Pflag = 0;
> + int i, sock, cmd, nflag = 0, oflag = 0, Pflag = 0;
>   char *cmdbuf, *infile = NULL;
>   char *RFileName = NULL;
>   char *WFileName = NULL;
> @@ -229,6 +229,10 @@ priv_exec(int argc, char *argv[])
>   nflag++;
>   break;
>  
> + case 'o':
> + oflag = 1;
> + break;
> +
>   case 'r':
>   RFileName = optarg;
>   break;
> @@ -305,6 +309,14 @@ priv_exec(int argc, char *argv[])
>   test_state(cmd, STATE_RUN);
>   impl_init_done(sock, &bpfd);
>  
> + if (oflag) {
> + if (unveil("/etc/pf.os", "r") == -1)
> + err(1, "unveil");
> + }
> + if (unveil("/etc/ethers", "r") == -1)
> + err(1, "unveil");
> + if (unveil("/etc/rpc", "r") == -1)
> + err(1, "unveil");
>   if (pledge("stdio rpath inet dns recvfd bpf", NULL) == 
> -1)
>   err(1, "pledge");
>  
> 
> On 08:58 Wed 26 Sep , Theo de Raadt wrote:
> > Ricardo Mestre  wrote:
> > 
> > > Hi,
> > > 
> > > This has been shown internally for some time, but deraadt@ asked me to 
> > > show it
> > > to a bigger audience now so here it is!
> > > 
> > > If we want OS fingerprinting by using -o flag then we can unveil 
> > > /etc/pf.os in
> > > read mode, nevertheless in order to do this we need to inform the privsep 
> > > proc
> > > that we are using -o so I added it to priv_exec().
> > 
> > looks right
> > 
> > > The other file needed to be unveiled is /etc/ethers in read mode, which I 
> > > tried
> > > to make it conditional but after several successful tests I bumped into a
> > > packet which made tcpdump crash after some time. Unfortunately I don't 
> > > have the
> > > core nor the pcap files to investigate what happen so for now the unveil 
> > > of
> > > this file will be kept unconditional regardless of the flags or expression
> > > used.
> > 
> > It is very likely that a protocol parser will find a MAC address nested
> > inside, and try to parse it via /etc/ethers.  So makes sense, and there
> > is little risk in exposing ethers
> > 
> > There is far more risk that some other file is required, and will fail to
> > open silently, and tcpdump willbehave differently
> > 
> > You know well: unveil requires a different audit approach than pledge
> > 
> > But it is rare for a missing file via unveil to crash a program, so 
> > something
> > is fishy.  A crash isn't good, perhaps try to reproduce and collect a 
> > corefile
> > using the sysctl kern.nosuidcoredump=3 method.
> > 
> 



Re: add vlan and trunk to arm64 RAMDISK

2018-09-27 Thread Theo de Raadt
That is exactly how I like to see this displayed, all the details
are there.

i find it odd the rd0a didn't change that much in size.  Compression
is really helping it.

ok deraadt


> Howdy.
> 
> Attached is a patch to add vlan and trunk to arm64's RAMDISK (parity
> with amd64).
> 
> make release'ed and tested, size increase as follows:
> 
> -rw---   1 los   los12940598 Sep 27 21:01 bsd.rd-patch
>   
> -rw---   1 los   los12864682 Sep 27 05:41 bsd.rd-snap 
>   
> 
> ~74KB difference
> 
> Snap - OpenBSD 6.4-beta (RAMDISK) #475: Wed Sep 26 13:15:25 MDT 2018
> /dev/vnd1c 7.7M7.3M382K95%/mnt
> /dev/vnd1c   7907  7526   38195% 1721874 8%  /mnt
> 
> /dev/rd0a15815 15055   76095%/
> /dev/rd0a15815 15055   76095% 1751871 9%  /
> 
> Patch 
>   
> /dev/vnd0c 7.7M7.3M382K95%/mnt
> /dev/vnd0c   7907  7526   38195% 1721874 8%  /mnt
> 
> /dev/rd0a15815 15055   76095%/
> /dev/rd0a15815 15055   76095% 1751871 9%  /
> 
> Ok?
> 
> +--+
> Carlos
> Index: RAMDISK
> ===
> RCS file: /cvs/src/sys/arch/arm64/conf/RAMDISK,v
> retrieving revision 1.74
> diff -u -p -r1.74 RAMDISK
> --- RAMDISK   18 Sep 2018 13:45:09 -  1.74
> +++ RAMDISK   28 Sep 2018 04:48:57 -
> @@ -251,6 +251,8 @@ islrtc*   at iic? # ISL1208 RTC
>  rkpmic*  at iic? # RK808 PMIC
>  
>  pseudo-deviceloop 1
> +pseudo-device   vlan
> +pseudo-device   trunk
>  pseudo-devicebpfilter 1
>  pseudo-devicerd 1
>  pseudo-devicebio 1



add vlan and trunk to arm64 RAMDISK

2018-09-27 Thread Carlos Cardenas
Howdy.

Attached is a patch to add vlan and trunk to arm64's RAMDISK (parity
with amd64).

make release'ed and tested, size increase as follows:

-rw---   1 los   los12940598 Sep 27 21:01 bsd.rd-patch  
-rw---   1 los   los12864682 Sep 27 05:41 bsd.rd-snap   

~74KB difference

Snap - OpenBSD 6.4-beta (RAMDISK) #475: Wed Sep 26 13:15:25 MDT 2018
/dev/vnd1c 7.7M7.3M382K95%/mnt
/dev/vnd1c   7907  7526   38195% 1721874 8%  /mnt

/dev/rd0a15815 15055   76095%/
/dev/rd0a15815 15055   76095% 1751871 9%  /

Patch   
/dev/vnd0c 7.7M7.3M382K95%/mnt
/dev/vnd0c   7907  7526   38195% 1721874 8%  /mnt

/dev/rd0a15815 15055   76095%/
/dev/rd0a15815 15055   76095% 1751871 9%  /

Ok?

+--+
Carlos
Index: RAMDISK
===
RCS file: /cvs/src/sys/arch/arm64/conf/RAMDISK,v
retrieving revision 1.74
diff -u -p -r1.74 RAMDISK
--- RAMDISK 18 Sep 2018 13:45:09 -  1.74
+++ RAMDISK 28 Sep 2018 04:48:57 -
@@ -251,6 +251,8 @@ islrtc* at iic? # ISL1208 RTC
 rkpmic*at iic? # RK808 PMIC
 
 pseudo-device  loop 1
+pseudo-device   vlan
+pseudo-device   trunk
 pseudo-device  bpfilter 1
 pseudo-device  rd 1
 pseudo-device  bio 1


Re: pfsync: avoid a recursion on PF_LOCK

2018-09-27 Thread Alexandr Nedvedicky
On Thu, Sep 27, 2018 at 11:30:09PM +0200, Hrvoje Popovski wrote:
> On 27.9.2018. 18:34, Alexandr Nedvedicky wrote:
> > Mentioning parallelism: there is yet another change you need to perform
> > in order to get more pf_test() instances running. Currently there
> > is only single input task, which processes inbound packets. In order
> > to allow more input tasks one has to change NET_TASKQ constant found
> > in net/if.c. Patch below tells kernel to use two input tasks:
> 
> shouldn't this be - Patch below tells kernel to use four input tasks?

absolutely correct, it should be four. The patch below makes kernel
to use 4 input tasks.

regards
sashan
> 
> 
> > 8<---8<---8<--8<
> > diff --git a/sys/net/if.c b/sys/net/if.c
> > index 78d7d3f3af6..87fca3921f1 100644
> > --- a/sys/net/if.c
> > +++ b/sys/net/if.c
> > @@ -232,7 +232,7 @@ int ifq_congestion;
> > 
> >  int netisr;
> > 
> > -#defineNET_TASKQ   1
> > +#defineNET_TASKQ   4
> >  struct taskq   *nettqmp[NET_TASKQ];
> > 
> >  struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
> > 8<---8<---8<--8<
> 



Re: ftp -w is not dying

2018-09-27 Thread sven falempin
On Thu, Sep 27, 2018 at 11:59 AM sven falempin 
wrote:

>
>
> On Thu, Sep 27, 2018 at 10:47 AM sven falempin 
> wrote:
>
>> I m not sure how this is possible but here s the data :
>>
>> i used the ENV to push -w 5 in my pkg_add process :
>> # date
>> Thu Sep 27 10:40:28 EDT 2018
>> # ps auxww | grep pkgfet
>> _pkgfetc 60348 0.0 0.1 1728 5456 ?? INp Wed05PM 0:00.09 /usr/bin/ftp -w 5
>> -S session -o - https://
>> myportal.com/tar/6.3/packages/amd64/p5-LWP-MediaTypes-6.02.tgz
>> # fstat -p  60348
>> _pkgfetc ftp 60348 5* internet stream tcp 0x806e8548
>> 172.16.1.35:5512 --> 92.222.70.241:443
>>
>> I can see the PF state on the natting device:
>>
>> tcp 206.180.254.190:52251 (172.16.1.35:5512) -> 92.222.70.241:443
>> ESTABLISHED:ESTABLISHED
>> age: 16:55:28 expires: 07:07:25 id: 5b7ce30b0094854a
>> rule: pass out quick on em5 all flags S/SA label "READY_TO_NAT" tagged
>> READY_TO_NAT
>>
>> this is stock 6.3 ftp which is still the same now (
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ftp/ )
>>
>> # ktrace -p 60348
>> generate an empty 64 bit file.
>>
>> Any other test i could do to understand ?
>>
>>
>
Reading NC it seems that the 'poll' syscall is indeed required to
timeout correctly.

I m trying to find a way to recreate the problem and i am running a
modified version in a loop

My understanding of the tls_read(SSL_read then ssl_read) .. is that the
retry is for SSL protocol error
and retry which would only occur on non TCP layer. That's  irrelevant in
fetch.c
I m just guessing the  ( at the top of ssl_read )
call will write and fix an POLLIN when write is done.
This explaining trying to read again when a write is requested >.<

Nevertheless each read/write code ( and even close ) shall be precede
by a poll call.

Bellow is the patch i m testing at the moment , i do not quite like it,
it s not solving the problem, and show how the factorization of read is
making the code hard to read/modify
IMHO url_get should be just calling

url_get_https
url_get_http
url_get_ftp

and each of this function do only that. The _fin_ FILE* may be NULL
fd may be -1 randomly in the function, it s kinda messy.
And this make it difficult to nicely call poll before reading
to avoid being blocked in a read indefinitely

RCS file: /cvs/src/usr.bin/ftp/fetch.c,v
retrieving revision 1.167
diff -u -p -r1.167 fetch.c
--- fetch.c 10 Feb 2018 06:25:16 -  1.167
+++ fetch.c 27 Sep 2018 20:10:52 -
@@ -59,6 +59,7 @@
 #include 

 #ifndef NOSSL
+#include 
 #include 
 #else /* !NOSSL */
 struct tls;
@@ -73,12 +74,12 @@ voidabortfile(int);
 char   hextochar(const char *);
 char   *urldecode(const char *);
 char   *recode_credentials(const char *_userinfo);
-intftp_printf(FILE *, struct tls *, const char *, ...)
__attribute__((format(printf, 3, 4)));
+intftp_printf(FILE *, int fd, struct tls *, const char *, ...)
__attribute__((format(printf, 4, 5)));
 char   *ftp_readline(FILE *, struct tls *, size_t *);
 size_t ftp_read(FILE *, struct tls *, char *, size_t);
 #ifndef NOSSL
 intproxy_connect(int, char *, char *);
-intSSL_vprintf(struct tls *, const char *, va_list);
+intSSL_vprintf(int fd, struct tls *, const char *, va_list);
 char   *SSL_readline(struct tls *, size_t *);
 #endif /* !NOSSL */

@@ -98,6 +99,90 @@ jmp_buf  httpabort;

 static int redirect_loop;

+#ifndef NOSSL
+/*from netcat.c correct way to timeout a tls_call*/
+int
+timeout_tls(int s, struct tls *tls_ctx, int (*func)(struct tls *))
+{
+   int timeout = connect_timeout; // glu
+   struct pollfd pfd;
+   int ret;
+
+   while ((ret = (*func)(tls_ctx)) != 0) {
+   if (ret == TLS_WANT_POLLIN)
+   pfd.events = POLLIN;
+   else if (ret == TLS_WANT_POLLOUT)
+   pfd.events = POLLOUT;
+   else
+   break;
+   pfd.fd = s;
+   if ((ret = poll(&pfd, 1, timeout)) == 1)
+   continue;
+   else if (ret == 0) {
+   errno = ETIMEDOUT;
+   ret = -1;
+   break;
+   } else
+   err(1, "poll failed");
+   }
+
+   return ret;
+}
+
+/*must realloc / offset*/
+int
+timeout_tls_write(int s, struct tls *tls_ctx,
+   const void *buf, size_t buflen)
+{
+   int timeout = connect_timeout; // glu
+   struct pollfd pfd;
+   int ret;
+
+   // should check ready first => do poll write then write
+   while ((ret = tls_write(tls_ctx,buf,buflen)) != 0) {
+   if (ret == TLS_WANT_POLLIN)
+   pfd.events = POLLIN;
+   else if (ret == TLS_WANT_POLLOUT)
+   pfd.events = POLLOUT;
+   else
+   break;
+   pfd.fd = s;
+   if ((ret = p

Re: pfctl(8) and securelevel(7)

2018-09-27 Thread Klemens Nanni
On Thu, Sep 27, 2018 at 03:35:30PM +0200, Zbyszek Żółkiewski wrote:
> sorry, forgot mention: 6.3 -stable
Same on -CURRENT and probably older releases as well.

> to reproduce:
> - at securelevel=1
> - load pf.conf - file whitelist is populated with IP addresses
> - try to list table: pfctl -t whitelist -T show
> - will all work as expected
> - set securelevel=2 (sysctl kern.securelevel=2)
> - repeat command: pfctl -t whitelist -T show
> - this result in "Operation not permitted”
> - now try: pfctl -t whitelist -v -T show
> - this will result with printed table contents as well as some stats
`-v' uses a different ioctl, see sbin/pfctl/pfctl_table.c:306.

# ktrace pfctl -t whitelist -T show
# kdump | grep ioctl
 61090 pfctlCALL  ioctl(3,DIOCRGETADDRS,0x7f7eb8b0)
 61090 pfctlRET   ioctl -1 errno 1 Operation not permitted
# ktrace pfctl -t whitelist -T show -v
# kdump |grep ioctl
 99126 pfctlCALL  ioctl(3,DIOCRGETASTATS,0x7f7d5aa0)
 99126 pfctlRET   ioctl 0

The securelevel check for PF ioctls simply doesn't account for the first
one.

Grepping for DIOCRGETADDRS consumers in base only shows pfctl and I
currently don't have any concerns in allowing this ioctl when
securelevel is >1, but review/confirmation is welcome here.

Feedback? OK?

Index: net/pf_ioctl.c
===
RCS file: /cvs/src/sys/net/pf_ioctl.c,v
retrieving revision 1.337
diff -u -p -r1.337 pf_ioctl.c
--- net/pf_ioctl.c  11 Sep 2018 07:53:38 -  1.337
+++ net/pf_ioctl.c  27 Sep 2018 18:29:25 -
@@ -951,6 +951,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t a
case DIOCRADDADDRS:
case DIOCRDELADDRS:
case DIOCRSETADDRS:
+   case DIOCRGETADDRS:
case DIOCRGETASTATS:
case DIOCRCLRASTATS:
case DIOCRTSTADDRS:



vmd.c copy-pasto

2018-09-27 Thread Greg Steuck
diff --git a/usr.sbin/vmd/vmd.c b/usr.sbin/vmd/vmd.c
index 25d19dc7a7f..2694c3111ac 100644
--- a/usr.sbin/vmd/vmd.c
+++ b/usr.sbin/vmd/vmd.c
@@ -1861,7 +1861,7 @@ user_checklimit(struct vmd_user *usr, struct
vm_create_params *vcp)
limit = "cpu ";
goto fail;
}
-   if (usr->usr_maxcpu > VM_DEFAULT_USER_MAXMEM) {
+   if (usr->usr_maxmem > VM_DEFAULT_USER_MAXMEM) {
limit = "memory ";
goto fail;
}
-- 
nest.cx is Gmail hosted, use PGP for anything private. Key:
http://goo.gl/6dMsr
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0


nsd 4.1.25

2018-09-27 Thread Florian Obser
unexciting update to 4.1.25, running in production in front of a
powerdns signer without issues.
OK?

diff --git config.h.in config.h.in
index eded09dd6b3..4d47f603062 100644
--- config.h.in
+++ config.h.in
@@ -1,5 +1,8 @@
 /* config.h.in.  Generated from configure.ac by autoheader.  */
 
+/* apply the noreturn attribute to a function that exits the program */
+#undef ATTR_NORETURN
+
 /* Define this to enable BIND8 like NSTATS & XSTATS. */
 #undef BIND8_STATS
 
@@ -43,6 +46,9 @@
 /* Whether the C compiler accepts the "format" attribute */
 #undef HAVE_ATTR_FORMAT
 
+/* Whether the C compiler accepts the "noreturn" attribute */
+#undef HAVE_ATTR_NORETURN
+
 /* Whether the C compiler accepts the "unused" attribute */
 #undef HAVE_ATTR_UNUSED
 
diff --git configparser.y configparser.y
index 547518f88c6..567641ce706 100644
--- configparser.y
+++ configparser.y
@@ -154,9 +154,6 @@ server_debug_mode: VAR_DEBUG_MODE STRING
 server_use_systemd: VAR_USE_SYSTEMD STRING 
{ 
OUTYY(("P(server_use_systemd:%s)\n", $2)); 
-   if(strcmp($2, "yes") != 0 && strcmp($2, "no") != 0)
-   yyerror("expected yes or no.");
-   else cfg_parser->opt->use_systemd = (strcmp($2, "yes")==0);
}
;
 server_verbosity: VAR_VERBOSITY STRING 
diff --git configure configure
index 13da401ab9b..bedd3930dbf 100644
--- configure
+++ configure
@@ -1,6 +1,6 @@
 #! /bin/sh
 # Guess values for system-dependent variables and create Makefiles.
-# Generated by GNU Autoconf 2.69 for NSD 4.1.24.
+# Generated by GNU Autoconf 2.69 for NSD 4.1.25.
 #
 # Report bugs to .
 #
@@ -580,8 +580,8 @@ MAKEFLAGS=
 # Identity of this package.
 PACKAGE_NAME='NSD'
 PACKAGE_TARNAME='nsd'
-PACKAGE_VERSION='4.1.24'
-PACKAGE_STRING='NSD 4.1.24'
+PACKAGE_VERSION='4.1.25'
+PACKAGE_STRING='NSD 4.1.25'
 PACKAGE_BUGREPORT='nsd-b...@nlnetlabs.nl'
 PACKAGE_URL=''
 
@@ -734,7 +734,6 @@ enable_minimal_responses
 enable_mmap
 enable_radix_tree
 enable_packed
-enable_systemd
 '
   ac_precious_vars='build_alias
 host_alias
@@ -1287,7 +1286,7 @@ if test "$ac_init_help" = "long"; then
   # Omit some internal or obsolete options to make the list less imposing.
   # This message is too long to be a string in the A/UX 3.1 sh.
   cat <<_ACEOF
-\`configure' configures NSD 4.1.24 to adapt to many kinds of systems.
+\`configure' configures NSD 4.1.25 to adapt to many kinds of systems.
 
 Usage: $0 [OPTION]... [VAR=VALUE]...
 
@@ -1348,7 +1347,7 @@ fi
 
 if test -n "$ac_init_help"; then
   case $ac_init_help in
- short | recursive ) echo "Configuration of NSD 4.1.24:";;
+ short | recursive ) echo "Configuration of NSD 4.1.25:";;
esac
   cat <<\_ACEOF
 
@@ -1387,7 +1386,6 @@ Optional Features:
   less memory, but uses some more CPU.
   --enable-packed Enable packed structure alignment, uses less memory,
   but unaligned reads.
-  --enable-systemdcompile with systemd support
 
 Optional Packages:
   --with-PACKAGE[=ARG]use PACKAGE [ARG=yes]
@@ -1498,7 +1496,7 @@ fi
 test -n "$ac_init_help" && exit $ac_status
 if $ac_init_version; then
   cat <<\_ACEOF
-NSD configure 4.1.24
+NSD configure 4.1.25
 generated by GNU Autoconf 2.69
 
 Copyright (C) 2012 Free Software Foundation, Inc.
@@ -2207,7 +2205,7 @@ cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
 
-It was created by NSD $as_me 4.1.24, which was
+It was created by NSD $as_me 4.1.25, which was
 generated by GNU Autoconf 2.69.  Invocation command line was
 
   $ $0 $@
@@ -4987,6 +4985,7 @@ fi
 
 
 
+
 # Checks for typedefs, structures, and compiler characteristics.
 # allow user to override the -g -O2 flags.
 if test "x$CFLAGS" = "x" ; then
@@ -5456,6 +5455,49 @@ $as_echo "#define HAVE_ATTR_UNUSED 1" >>confdefs.h
 
 fi
 
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the C compiler 
(${CC-cc}) accepts the \"noreturn\" attribute" >&5
+$as_echo_n "checking whether the C compiler (${CC-cc}) accepts the 
\"noreturn\" attribute... " >&6; }
+if ${ac_cv_c_noreturn_attribute+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_cv_c_noreturn_attribute=no
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+ #include 
+__attribute__((noreturn)) void f(int x) { printf("%d", x); }
+
+int
+main ()
+{
+
+   f(1);
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_cv_c_noreturn_attribute="yes"
+else
+  ac_cv_c_noreturn_attribute="no"
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_noreturn_attribute" 
>&5
+$as_echo "$ac_cv_c_noreturn_attribute" >&6; }
+if test $ac_cv_c_noreturn_attribute = yes; then
+
+$as_echo "#define HAVE_ATTR_NORETURN 1" >>confdefs.h
+
+
+$as_echo "#define ATTR_NORETURN __attribute__((__noreturn__))" >>confdefs.h
+
+fi
+
 { $as_ech

pfsync: avoid a recursion on PF_LOCK

2018-09-27 Thread Alexandr Nedvedicky
Hello,

patch below is missing piece to stuff, which I commit on n2k18 [1].
Fairly quickly people, who have PF deployed with pfsync (and are
willing to experiment), discovered a panic on PF_LOCK recursion.
The recursion is identified by stack as follows:

login: panic: rw_enter: pf_lock locking against myself
Stopped at  db_enter+0x12:  popq%r11
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 147548  82409  00x12  03  ld
*192136   5192  0 0x14000  0x2000  softnet
db_enter() at db_enter+0x12
panic() at panic+0x120
_rw_enter() at _rw_enter+0x2a3
pfsync_update_state() at pfsync_update_state+0x90
pf_test() at pf_test+0x608
ip_output() at ip_output+0x6cd
pfsync_sendout() at pfsync_sendout+0x645
pfsync_bulk_start() at pfsync_bulk_start+0x127
pfsync_in_ureq() at pfsync_in_ureq+0x71 
  
pfsync_input() at pfsync_input+0x3ab
ip_deliver() at ip_deliver+0x203
ipintr() at ipintr+0x6a
if_netisr(813abf00) at if_netisr+0x5a
taskq_thread(0) at taskq_thread+0x6d

Big thanks goes to hrvoje@, who provided me with proper set up so
I could debug the patch. And also a fair amount of credit goes
to Christiano Haesbaert for idea to dispatch outbound psync update
packet to task. 

The change in patch is conservative. The pfsync behavior does not
change unless PF gets compiled with '-DWITH_PF_LOCK' option. '-DWITH_PF_LOCK'
tells pfsync to use a softnet task to transmit packet with
update (see net/if.c [2])

Also let me clarify my earlier commit message [1] as I've received
few questions about it. At the moment my changes, which unlock PF
are experimental. The code is disabled by default. In order to
test my code you need to compile kernel with '-DWITH_PF_LOCK'. The
easiest thing is to patch CONFIG.MP with change below:
8<---8<---8<--8<
diff --git a/sys/arch/amd64/conf/GENERIC.MP b/sys/arch/amd64/conf/GENERIC.MP
index 7f195a53092..69c62d23cf8 100644
--- a/sys/arch/amd64/conf/GENERIC.MP
+++ b/sys/arch/amd64/conf/GENERIC.MP
@@ -2,6 +2,7 @@
 
 include "arch/amd64/conf/GENERIC"
 
+option WITH_PF_LOCK
 option MULTIPROCESSOR
 #optionMP_LOCKDEBUG
 #optionWITNESS
8<---8<---8<--8<
Compiling kernel with -DWITH_PF_LOCK will get you PF with two rw-locks:
pf_lock, which serializes packets with ioctls. Currently pf_lock,
is always grabbed exclusively.

pf_state_lock, which packet grabs as a reader to look up a state,
if no state is found, then packet drops the pf_state_lock and proceeds
to rules. At that point packet grabs pf_lock as a writer. If the packet
matches a pass rule, then it must grab pf_state_lock as a writer in
order to insert a new state to table.

The pf_state_lock allows two pf_test() function to run in parallel for
packets, which match existing state. Packets, which don't match the state,
are serialized on pf_lock.

Mentioning parallelism: there is yet another change you need to perform
in order to get more pf_test() instances running. Currently there
is only single input task, which processes inbound packets. In order
to allow more input tasks one has to change NET_TASKQ constant found
in net/if.c. Patch below tells kernel to use two input tasks:
8<---8<---8<--8<
diff --git a/sys/net/if.c b/sys/net/if.c
index 78d7d3f3af6..87fca3921f1 100644
--- a/sys/net/if.c
+++ b/sys/net/if.c
@@ -232,7 +232,7 @@ int ifq_congestion;

 int netisr;

-#defineNET_TASKQ   1
+#defineNET_TASKQ   4
 struct taskq   *nettqmp[NET_TASKQ];

 struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
8<---8<---8<--8<

Finally let me explain few details found in my patch:

@@ -288,6 +296,9 @@ pfsyncattach(int npfsync)
 {
if_clone_attach(&pfsync_cloner);
pfsynccounters = counters_alloc(pfsyncs_ncounters);
+#ifdef WITH_PF_LOCK
+   mq_init(&pfsync_mq, 4096, IPL_SOFTNET);
+#endif /* WITH_PF_LOCK */
 }

The queue limit of 4k is empirically determined by testing on
set up provided hrvoje@. The tests were running on intel 10Gb NICs.

Using 4k provides sufficient buffer to limit drops of pfsync
packets to <5%. The drops, which I could see when using 4k limit,
were coming from ip_output() here:

 680 for (off = hlen + len; off < ntohs(ip->ip_len); off += len) {
 681 MGETHDR(m, M_DONTWAIT, MT_HEADER);
 682 if (m == NULL) {
 683 ipstat_inc(ips_odropped);
 684 error = ENOBUFS;
 685 goto sendorfree;
 686 }

clearly the syste

Re: bgpd, withdraws and stuck routes

2018-09-27 Thread Claudio Jeker
On Thu, Sep 27, 2018 at 04:41:22PM +0200, Claudio Jeker wrote:
> Some people noticed that routes get stuck or to be more precise that
> withdraws are not sent to peers in some cases. Until now bgpd did not
> really track what was announced and what not (there is no real
> Adj-RIB-Out) and instead tried to figure out if it should or should not
> send the withdraw. With the increased filtering we try to push this fails
> more often and the result is rather terrible.
> Instead now introduce minimal tracking of announced prefixes. This adds a
> per peer RB tree that tracks what was sent out as UPDATE. This can be
> considered a minimal Adj-RIB-Out.

As usual once you send it out you realize that something is missing.
I first only looked at callers of up_generate_updates() but then I double
checked the sofftreconfig out handler and noticed that we need the same
magic there too (else the accouting is getting off).
The following diff does this now as well. Double checked all other
up_generate() users as well.

-- 
:wq Claudio

Index: rde.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.427
diff -u -p -r1.427 rde.c
--- rde.c   25 Sep 2018 08:08:38 -  1.427
+++ rde.c   27 Sep 2018 15:49:42 -
@@ -3147,9 +3147,11 @@ rde_softreconfig_out_peer(struct rib_ent
/* nothing todo */
if (oa == ACTION_DENY && na == ACTION_ALLOW) {
/* send update */
+   up_rib_add(peer, re);
up_generate(peer, &nstate, &addr, pt->prefixlen);
} else if (oa == ACTION_ALLOW && na == ACTION_DENY) {
/* send withdraw */
+   up_rib_remove(peer, re);
up_generate(peer, NULL, &addr, pt->prefixlen);
} else if (oa == ACTION_ALLOW && na == ACTION_ALLOW) {
/* send update if anything changed */
@@ -3197,6 +3199,7 @@ rde_softreconfig_unload_peer(struct rib_
prefix_nhflags(p));
if (rde_filter(out_rules_tmp, peer, p, &ostate) != ACTION_DENY) {
/* send withdraw */
+   up_rib_remove(peer, re);
up_generate(peer, NULL, &addr, pt->prefixlen);
}
rde_filterstate_clean(&ostate);
Index: rde.h
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.h,v
retrieving revision 1.193
diff -u -p -r1.193 rde.h
--- rde.h   20 Sep 2018 11:45:59 -  1.193
+++ rde.h   27 Sep 2018 15:48:02 -
@@ -55,6 +55,7 @@ LIST_HEAD(aspath_head, rde_aspath);
 TAILQ_HEAD(aspath_queue, rde_aspath);
 RB_HEAD(uptree_prefix, update_prefix);
 RB_HEAD(uptree_attr, update_attr);
+RB_HEAD(uptree_rib, update_rib);
 
 struct rib_desc;
 struct rib;
@@ -70,6 +71,7 @@ struct rde_peer {
struct bgpd_addr remote_addr;
struct bgpd_addr local_v4_addr;
struct bgpd_addr local_v6_addr;
+   struct uptree_ribup_rib;
struct uptree_prefix up_prefix;
struct uptree_attr   up_attrs;
struct uplist_attr   updates[AID_MAX];
@@ -566,6 +568,8 @@ int  nexthop_compare(struct nexthop *, 
 /* rde_update.c */
 voidup_init(struct rde_peer *);
 voidup_down(struct rde_peer *);
+int up_rib_remove(struct rde_peer *, struct rib_entry *);
+voidup_rib_add(struct rde_peer *, struct rib_entry *);
 int up_test_update(struct rde_peer *, struct prefix *);
 int up_generate(struct rde_peer *, struct filterstate *,
 struct bgpd_addr *, u_int8_t);
Index: rde_update.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v
retrieving revision 1.99
diff -u -p -r1.99 rde_update.c
--- rde_update.c18 Sep 2018 16:54:01 -  1.99
+++ rde_update.c27 Sep 2018 15:51:22 -
@@ -53,9 +53,15 @@ struct update_attr {
u_int16_tmpattr_len;
 };
 
+struct update_rib {
+   RB_ENTRY(update_rib) entry;
+   struct rib_entry*re;
+};
+
 void   up_clear(struct uplist_attr *, struct uplist_prefix *);
 intup_prefix_cmp(struct update_prefix *, struct update_prefix *);
 intup_attr_cmp(struct update_attr *, struct update_attr *);
+intup_rib_cmp(struct update_rib *, struct update_rib *);
 intup_add(struct rde_peer *, struct update_prefix *, struct update_attr *);
 
 RB_PROTOTYPE(uptree_prefix, update_prefix, entry, up_prefix_cmp)
@@ -64,6 +70,9 @@ RB_GENERATE(uptree_prefix, update_prefix
 RB_PROTOTYPE(uptree_attr, update_attr, entry, up_attr_cmp)
 RB_GENERATE(uptree_attr, update_attr, entry, up_attr_cmp)
 
+RB_PROTOTYPE(uptree_rib, update_rib, entry, up_rib_cmp)
+RB_GENERATE(uptree_rib, update_rib, entry, up_rib_cmp)
+
 SIPHASH_KEY uptree_key;
 
 void
@@ -77,6 +86,7 @@ up_init(struct r

Re: ftp -w is not dying

2018-09-27 Thread sven falempin
On Thu, Sep 27, 2018 at 10:47 AM sven falempin 
wrote:

> I m not sure how this is possible but here s the data :
>
> i used the ENV to push -w 5 in my pkg_add process :
> # date
> Thu Sep 27 10:40:28 EDT 2018
> # ps auxww | grep pkgfet
> _pkgfetc 60348 0.0 0.1 1728 5456 ?? INp Wed05PM 0:00.09 /usr/bin/ftp -w 5
> -S session -o - https://
> myportal.com/tar/6.3/packages/amd64/p5-LWP-MediaTypes-6.02.tgz
> # fstat -p  60348
> _pkgfetc ftp 60348 5* internet stream tcp 0x806e8548
> 172.16.1.35:5512 --> 92.222.70.241:443
>
> I can see the PF state on the natting device:
>
> tcp 206.180.254.190:52251 (172.16.1.35:5512) -> 92.222.70.241:443
> ESTABLISHED:ESTABLISHED
> age: 16:55:28 expires: 07:07:25 id: 5b7ce30b0094854a
> rule: pass out quick on em5 all flags S/SA label "READY_TO_NAT" tagged
> READY_TO_NAT
>
> this is stock 6.3 ftp which is still the same now (
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ftp/ )
>
> # ktrace -p 60348
> generate an empty 64 bit file.
>
> Any other test i could do to understand ?
>
>

My understanding of the code is that the -w cannot work .
I have been using a lot of socket and c code, and
always ended using non blocking fd because : life

the code is not using non blocking fd , and rely on the POLLIN of tls in a
strange way

The syscall in fetch.c are ( geee the get_url is crazy long and full o
ifdef :s )

error = getaddrinfo(host, port, &hints, &res0);
fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);

(void)signal(SIGALRM, tooslow);
alarmtimer(connect_timeout);

connect ( and tls_init and stuff

and then

directly tls_read like that :

do {
  tret = tls_read(tls, buf, len);
} while (tret == TLS_WANT_POLLIN || tret == TLS_WANT_POLLOUT);

i did not code much with tls_read
but calling tls_read on TLS_WANT_POLLOUT

seems
far FETCH

Cheers.

-- 
--
-
Knowing is not enough; we must apply. Willing is not enough; we must do


ftp -w is not dying

2018-09-27 Thread sven falempin
I m not sure how this is possible but here s the data :

i used the ENV to push -w 5 in my pkg_add process :
# date
Thu Sep 27 10:40:28 EDT 2018
# ps auxww | grep pkgfet
_pkgfetc 60348 0.0 0.1 1728 5456 ?? INp Wed05PM 0:00.09 /usr/bin/ftp -w 5
-S session -o - https://
myportal.com/tar/6.3/packages/amd64/p5-LWP-MediaTypes-6.02.tgz
# fstat -p  60348
_pkgfetc ftp 60348 5* internet stream tcp 0x806e8548
172.16.1.35:5512 --> 92.222.70.241:443

I can see the PF state on the natting device:

tcp 206.180.254.190:52251 (172.16.1.35:5512) -> 92.222.70.241:443
ESTABLISHED:ESTABLISHED
age: 16:55:28 expires: 07:07:25 id: 5b7ce30b0094854a
rule: pass out quick on em5 all flags S/SA label "READY_TO_NAT" tagged
READY_TO_NAT

this is stock 6.3 ftp which is still the same now (
https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ftp/ )

# ktrace -p 60348
generate an empty 64 bit file.

Any other test i could do to understand ?

--
-
Knowing is not enough; we must apply. Willing is not enough; we must do


bgpd, withdraws and stuck routes

2018-09-27 Thread Claudio Jeker
Some people noticed that routes get stuck or to be more precise that
withdraws are not sent to peers in some cases. Until now bgpd did not
really track what was announced and what not (there is no real
Adj-RIB-Out) and instead tried to figure out if it should or should not
send the withdraw. With the increased filtering we try to push this fails
more often and the result is rather terrible.
Instead now introduce minimal tracking of announced prefixes. This adds a
per peer RB tree that tracks what was sent out as UPDATE. This can be
considered a minimal Adj-RIB-Out.

This is an important fix and needs to be tested.
-- 
:wq Claudio


Index: rde.h
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.h,v
retrieving revision 1.193
diff -u -p -r1.193 rde.h
--- rde.h   20 Sep 2018 11:45:59 -  1.193
+++ rde.h   27 Sep 2018 11:56:43 -
@@ -55,6 +55,7 @@ LIST_HEAD(aspath_head, rde_aspath);
 TAILQ_HEAD(aspath_queue, rde_aspath);
 RB_HEAD(uptree_prefix, update_prefix);
 RB_HEAD(uptree_attr, update_attr);
+RB_HEAD(uptree_rib, update_rib);
 
 struct rib_desc;
 struct rib;
@@ -70,6 +71,7 @@ struct rde_peer {
struct bgpd_addr remote_addr;
struct bgpd_addr local_v4_addr;
struct bgpd_addr local_v6_addr;
+   struct uptree_ribup_rib;
struct uptree_prefix up_prefix;
struct uptree_attr   up_attrs;
struct uplist_attr   updates[AID_MAX];
Index: rde_decide.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde_decide.c,v
retrieving revision 1.71
diff -u -p -r1.71 rde_decide.c
--- rde_decide.c29 Aug 2018 08:51:49 -  1.71
+++ rde_decide.c27 Sep 2018 09:25:52 -
@@ -264,7 +264,7 @@ prefix_evaluate(struct prefix *p, struct
if (LIST_EMPTY(&re->prefix_h))
LIST_INSERT_HEAD(&re->prefix_h, p, rib_l);
else {
-   LIST_FOREACH(xp, &re->prefix_h, rib_l)
+   LIST_FOREACH(xp, &re->prefix_h, rib_l) {
if (prefix_cmp(p, xp) > 0) {
LIST_INSERT_BEFORE(xp, p, rib_l);
break;
@@ -273,6 +273,7 @@ prefix_evaluate(struct prefix *p, struct
LIST_INSERT_AFTER(xp, p, rib_l);
break;
}
+   }
}
}
 
Index: rde_update.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde_update.c,v
retrieving revision 1.99
diff -u -p -r1.99 rde_update.c
--- rde_update.c18 Sep 2018 16:54:01 -  1.99
+++ rde_update.c27 Sep 2018 13:34:56 -
@@ -53,9 +53,17 @@ struct update_attr {
u_int16_tmpattr_len;
 };
 
+struct update_rib {
+   RB_ENTRY(update_rib) entry;
+   struct rib_entry*re;
+};
+
 void   up_clear(struct uplist_attr *, struct uplist_prefix *);
 intup_prefix_cmp(struct update_prefix *, struct update_prefix *);
 intup_attr_cmp(struct update_attr *, struct update_attr *);
+intup_rib_cmp(struct update_rib *, struct update_rib *);
+intup_rib_remove(struct rde_peer *, struct rib_entry *);
+void   up_rib_add(struct rde_peer *, struct rib_entry *);
 intup_add(struct rde_peer *, struct update_prefix *, struct update_attr *);
 
 RB_PROTOTYPE(uptree_prefix, update_prefix, entry, up_prefix_cmp)
@@ -64,6 +72,9 @@ RB_GENERATE(uptree_prefix, update_prefix
 RB_PROTOTYPE(uptree_attr, update_attr, entry, up_attr_cmp)
 RB_GENERATE(uptree_attr, update_attr, entry, up_attr_cmp)
 
+RB_PROTOTYPE(uptree_rib, update_rib, entry, up_rib_cmp)
+RB_GENERATE(uptree_rib, update_rib, entry, up_rib_cmp)
+
 SIPHASH_KEY uptree_key;
 
 void
@@ -77,6 +88,7 @@ up_init(struct rde_peer *peer)
}
RB_INIT(&peer->up_prefix);
RB_INIT(&peer->up_attrs);
+   RB_INIT(&peer->up_rib);
peer->up_pcnt = 0;
peer->up_acnt = 0;
peer->up_nlricnt = 0;
@@ -110,13 +122,18 @@ up_clear(struct uplist_attr *updates, st
 void
 up_down(struct rde_peer *peer)
 {
-   u_int8_ti;
+   struct update_rib   *ur, *nur;
+   u_int8_ti;
 
for (i = 0; i < AID_MAX; i++)
up_clear(&peer->updates[i], &peer->withdraws[i]);
 
RB_INIT(&peer->up_prefix);
RB_INIT(&peer->up_attrs);
+   RB_FOREACH_SAFE(ur, uptree_rib, &peer->up_rib, nur) {
+   RB_REMOVE(uptree_rib, &peer->up_rib, ur);
+   free(ur);
+   }
 
peer->up_pcnt = 0;
peer->up_acnt = 0;
@@ -195,6 +212,42 @@ up_attr_cmp(struct update_attr *a, struc
 }
 
 int
+up_rib_cmp(struct update_rib *a, struct update_rib *b)
+{
+

Re: pfctl(8) and securelevel(7)

2018-09-27 Thread Zbyszek Żółkiewski
Here:
> Wiadomość napisana przez Klemens Nanni  w dniu 27.09.2018, 
> o godz. 15:19:
> 
> What version are you running?

sorry, forgot mention: 6.3 -stable

> 
> On Thu, Sep 27, 2018 at 02:06:44PM +0200, Zbyszek Żółkiewski wrote:
>> At securelevel(7) set to 2, NAT rules and filter cannot be altered, however 
>> as stated in pfctl.conf(5) manual - it is possible to modify tables by 
>> adding/deleting entries
>> (https://man.openbsd.org/pf.conf.5#TABLES)
>> 
>> and this works fine. Question: why it is not possible to list content of 
>> tables?:
>> n 
>> kern.securelevel=2
>> pfctl -t whitelist -T show
>> pfctl: Operation not permitted.
>> 
>> while:
>> kern.securelevel=1
>> pfctl -t whitelist -T show
>>   192.168.1.7
>>   192.168.1.20
>>   192.168.1.25
>> 
>> and more odd, adding -v flag allow list it anyway:
>> 
>> pfctl -t whitelist -v -T show
>>   192.168.1.7
>>Cleared: Thu Sep 27 13:47:58 2018
>>   192.168.1.20
>>Cleared: Thu Sep 27 13:47:58 2018 
>> 
>> I am bit confused, this is bug or i am missing something ?
> So am I. Did you add `-v' while securelevel was set to 2 or 1?

it was set 2

> 
> Please provide a clear way to reproduce your scenario, possibly
> including the table definitions from your pf.conf.

pf.conf snippet:
table  persist file "/etc/pf/whitelist” counters

whitelist contains list of IPs

to reproduce:
- at securelevel=1
- load pf.conf - file whitelist is populated with IP addresses
- try to list table: pfctl -t whitelist -T show
- will all work as expected
- set securelevel=2 (sysctl kern.securelevel=2)
- repeat command: pfctl -t whitelist -T show
- this result in "Operation not permitted”
- now try: pfctl -t whitelist -v -T show
- this will result with printed table contents as well as some stats

_
Zbyszek Żółkiewski



Re: pfctl(8) and securelevel(7)

2018-09-27 Thread Klemens Nanni
What version are you running?

On Thu, Sep 27, 2018 at 02:06:44PM +0200, Zbyszek Żółkiewski wrote:
> At securelevel(7) set to 2, NAT rules and filter cannot be altered, however 
> as stated in pfctl.conf(5) manual - it is possible to modify tables by 
> adding/deleting entries
> (https://man.openbsd.org/pf.conf.5#TABLES)
> 
> and this works fine. Question: why it is not possible to list content of 
> tables?:
>n 
> kern.securelevel=2
> pfctl -t whitelist -T show
> pfctl: Operation not permitted.
> 
> while:
> kern.securelevel=1
> pfctl -t whitelist -T show
>192.168.1.7
>192.168.1.20
>192.168.1.25
> 
> and more odd, adding -v flag allow list it anyway:
> 
> pfctl -t whitelist -v -T show
>192.168.1.7
> Cleared: Thu Sep 27 13:47:58 2018
>192.168.1.20
> Cleared: Thu Sep 27 13:47:58 2018 
> 
> I am bit confused, this is bug or i am missing something ?
So am I. Did you add `-v' while securelevel was set to 2 or 1?

Please provide a clear way to reproduce your scenario, possibly
including the table definitions from your pf.conf.



pfctl(8) and securelevel(7)

2018-09-27 Thread Zbyszek Żółkiewski
Hi list,

At securelevel(7) set to 2, NAT rules and filter cannot be altered, however as 
stated in pfctl.conf(5) manual - it is possible to modify tables by 
adding/deleting entries
(https://man.openbsd.org/pf.conf.5#TABLES)

and this works fine. Question: why it is not possible to list content of 
tables?:

kern.securelevel=2
pfctl -t whitelist -T show
pfctl: Operation not permitted.

while:
kern.securelevel=1
pfctl -t whitelist -T show
   192.168.1.7
   192.168.1.20
   192.168.1.25

and more odd, adding -v flag allow list it anyway:

pfctl -t whitelist -v -T show
   192.168.1.7
Cleared: Thu Sep 27 13:47:58 2018
   192.168.1.20
Cleared: Thu Sep 27 13:47:58 2018 

I am bit confused, this is bug or i am missing something ?

_
Zbyszek Żółkiewski



Re: bgpd ROA validation

2018-09-27 Thread Claudio Jeker
On Thu, Sep 27, 2018 at 09:39:36AM +0200, Claudio Jeker wrote:
> On Wed, Sep 26, 2018 at 06:24:36PM +0200, Claudio Jeker wrote:
> > On Tue, Sep 25, 2018 at 12:23:48PM +0200, Claudio Jeker wrote:
> > > On Sat, Sep 22, 2018 at 09:48:24PM +, Job Snijders wrote:
> > > > Hi claudio,
> > > > 
> > > > Seems we are getting very close. Some suggestions to simplify the
> > > > experience for the end user.
> > > > 
> > > > Let's start with supporting just one (unnamed) roa-set, so far I've
> > > > really not come across a use case where multiple ROA tables are useful.
> > > > I say this having implemented origin validation on both ISPs and IXPs.
> > > > 
> > > > The semantics of the Origin Validation procedure that apply to
> > > > "Validated ROA Payloads" are not compatible with how the ARIN WHOIS
> > > > OriginAS semantics, so roa-set will never be used for ARIN WHOIS data.
> > > 
> > > Please explain further, because honestly both the ruleset that
> > > arouteserver generates for ARIN WHOIS OriginAS and ROA are doing
> > > the same. They connect a source-as with a prefix. This is what a
> > > roa-set is giving you. It implements a way to quickly lookup a 2 key
> > > element (AS + prefix). Depending on how this table is used it can be used
> > > for both cases. This is the technical view based on looking at rulesets 
> > > and
> > > thinking on how to write them in a way that is fast and understandable.
> > > I understand that from an administration point of view RPKI ROA is much
> > > stronger and can therefor be used more strictly (e.g. with the simple rule
> > > deny quick from ebgp roa-set RPKI invalid). The generic origin 
> > > validatiation
> > > as a supporting measure to IRR based prefixset filtering is only allowing
> > > additional prefixes based on the roa-set (e.g. match from ebgp \
> > > large-community $INTCOMM_ORIGIN_OK roa-set ARINDB valid \
> > > set large-community $INTCOMM_PREF_OK). They are two different things but
> > > can use the same filter building blocks.
> > > Maybe naming it roa-set is wrong (since it is too much connected with
> > > RPKI) but the lookup table is usable for more than just RPKI. This is why
> > > I think supporting multiple tables is benefitial. I also agree that a
> > > simplified user experience for RPKI is a good thing, it should be simple
> > > to use.
> > > 
> > > > There currently is 1 RPKI, and therefor we should have just 1 roa-set.
> > > 
> > > I would fully agree here if ARIN would make their trust anchor freely
> > > distributable. But yes in general only one RPKI table is needed.
> > > My point is that roa-set is more generic than RPKI. It is not an RPKI only
> > > thing. It can be used for more than that and we should support this as
> > > well since it makes for much better and efficient configs.
> > >  
> > > > The advantage of having only one roa-set is that it does not need to be
> > > > referenced in the policies.
> > > > 
> > > > I propose the patch is amended to accomodate the following:
> > > > 
> > > > roa-set {
> > > > 165.254.255.0/24 source-as 15562
> > > > 193.0.0.0/21 source-as 
> > > > }
> > > > 
> > > > deny from any ovs invalid
> > > > match from any ovs valid set community local-as:42
> > > > match from any ovs unknown set community local-as:43
> > > > 
> > > > I suggest the filter keyword 'ovs' (origin validation state) is
> > > > introduced to allow easy matching. I think this looks better. If the
> > > > roa-set is not defined or empty, all route announcements are 'unknown'.
> > > 
> > > If we want to use ovs then the naming scheme should be the same as for the
> > > extended community. At least if we reuse this keyword it should work the
> > > same. Also the side-effect is that the two configs look fairly similar
> > > which can be good or bad:
> > > match from any ovs valid set community local-as:42 
> > > match from any ext-community ovs valid set community local-as:42 
> > > 
> > > I'm not against this, just wanted to bring it up.
> > > 
> > > Also I think unknown should be renamed not-found. I will do that since
> > > not-found is what the RFC is using.
> > > 
> > > I will rework the diff considering the input.
> > 
> > Here we go. This changes the diff so there is only one roa-set like
> > mentioned above:
> > 
> > roa-set {
> > 165.254.255.0/24 source-as 15562
> > 193.0.0.0/21 source-as 
> > }
> >  
> > deny from any ovs invalid
> > match from any ovs valid set community local-as:42
> > match from any ovs unknown set community local-as:43
> > 
> > Additionally I left the old sets around but used a bit different:
> > 
> > origin-set FOO {
> > ...
> > }
> > 
> > match from any community $ORIGIN_AS_OK origin-set FOO valid \
> > set community $ORIGIN_PREF_OK
> > 
> > I may even consider to drop the 'valid' in the above rule since that is
> > the only check that kind of matt

Re: bgpd ROA validation

2018-09-27 Thread Claudio Jeker
On Wed, Sep 26, 2018 at 06:24:36PM +0200, Claudio Jeker wrote:
> On Tue, Sep 25, 2018 at 12:23:48PM +0200, Claudio Jeker wrote:
> > On Sat, Sep 22, 2018 at 09:48:24PM +, Job Snijders wrote:
> > > Hi claudio,
> > > 
> > > Seems we are getting very close. Some suggestions to simplify the
> > > experience for the end user.
> > > 
> > > Let's start with supporting just one (unnamed) roa-set, so far I've
> > > really not come across a use case where multiple ROA tables are useful.
> > > I say this having implemented origin validation on both ISPs and IXPs.
> > > 
> > > The semantics of the Origin Validation procedure that apply to
> > > "Validated ROA Payloads" are not compatible with how the ARIN WHOIS
> > > OriginAS semantics, so roa-set will never be used for ARIN WHOIS data.
> > 
> > Please explain further, because honestly both the ruleset that
> > arouteserver generates for ARIN WHOIS OriginAS and ROA are doing
> > the same. They connect a source-as with a prefix. This is what a
> > roa-set is giving you. It implements a way to quickly lookup a 2 key
> > element (AS + prefix). Depending on how this table is used it can be used
> > for both cases. This is the technical view based on looking at rulesets and
> > thinking on how to write them in a way that is fast and understandable.
> > I understand that from an administration point of view RPKI ROA is much
> > stronger and can therefor be used more strictly (e.g. with the simple rule
> > deny quick from ebgp roa-set RPKI invalid). The generic origin validatiation
> > as a supporting measure to IRR based prefixset filtering is only allowing
> > additional prefixes based on the roa-set (e.g. match from ebgp \
> > large-community $INTCOMM_ORIGIN_OK roa-set ARINDB valid \
> > set large-community $INTCOMM_PREF_OK). They are two different things but
> > can use the same filter building blocks.
> > Maybe naming it roa-set is wrong (since it is too much connected with
> > RPKI) but the lookup table is usable for more than just RPKI. This is why
> > I think supporting multiple tables is benefitial. I also agree that a
> > simplified user experience for RPKI is a good thing, it should be simple
> > to use.
> > 
> > > There currently is 1 RPKI, and therefor we should have just 1 roa-set.
> > 
> > I would fully agree here if ARIN would make their trust anchor freely
> > distributable. But yes in general only one RPKI table is needed.
> > My point is that roa-set is more generic than RPKI. It is not an RPKI only
> > thing. It can be used for more than that and we should support this as
> > well since it makes for much better and efficient configs.
> >  
> > > The advantage of having only one roa-set is that it does not need to be
> > > referenced in the policies.
> > > 
> > > I propose the patch is amended to accomodate the following:
> > > 
> > >   roa-set {
> > >   165.254.255.0/24 source-as 15562
> > >   193.0.0.0/21 source-as 
> > >   }
> > > 
> > >   deny from any ovs invalid
> > >   match from any ovs valid set community local-as:42
> > >   match from any ovs unknown set community local-as:43
> > > 
> > > I suggest the filter keyword 'ovs' (origin validation state) is
> > > introduced to allow easy matching. I think this looks better. If the
> > > roa-set is not defined or empty, all route announcements are 'unknown'.
> > 
> > If we want to use ovs then the naming scheme should be the same as for the
> > extended community. At least if we reuse this keyword it should work the
> > same. Also the side-effect is that the two configs look fairly similar
> > which can be good or bad:
> > match from any ovs valid set community local-as:42 
> > match from any ext-community ovs valid set community local-as:42 
> > 
> > I'm not against this, just wanted to bring it up.
> > 
> > Also I think unknown should be renamed not-found. I will do that since
> > not-found is what the RFC is using.
> > 
> > I will rework the diff considering the input.
> 
> Here we go. This changes the diff so there is only one roa-set like
> mentioned above:
> 
>   roa-set {
>   165.254.255.0/24 source-as 15562
>   193.0.0.0/21 source-as 
>   }
>  
>   deny from any ovs invalid
>   match from any ovs valid set community local-as:42
>   match from any ovs unknown set community local-as:43
> 
> Additionally I left the old sets around but used a bit different:
> 
>   origin-set FOO {
>   ...
>   }
> 
>   match from any community $ORIGIN_AS_OK origin-set FOO valid \
>   set community $ORIGIN_PREF_OK
> 
> I may even consider to drop the 'valid' in the above rule since that is
> the only check that kind of matters in this case.
> 

New version removing the validity argument from origin-set FOO filter
rules as mentioned above. Apart from that it should be the same.

-- 
:wq Claudio

Index: bgpd.c
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.