Re: Linker changes between 5.7 and 5.8

2015-12-04 Thread Stefan Kempf
Tati Chevron wrote:
> This assembled and linked without problems on 5.7-release, but now when
> I try it on 5.8-release, I get an error:
> 
> $ as -o charset.o charset.S
> $ ld -Bstatic charset.o
 
> ld: charset.o: relocation R_X86_64_32S against `a local symbol' can not
> be used when making a shared object; recompile with -fPIC
> charset.o: could not read symbols: Bad value

Try it with ld -Bstatic -nopie charset.o



[trivial] strings.h -> string.h in tcpdump

2015-12-04 Thread Michael McConville
Otherwise it warns about memcpy being implicitly declared.

ok?


Index: print-l2tp.c
===
RCS file: /cvs/src/usr.sbin/tcpdump/print-l2tp.c,v
retrieving revision 1.8
diff -u -p -r1.8 print-l2tp.c
--- print-l2tp.c16 Jan 2015 06:40:21 -  1.8
+++ print-l2tp.c5 Dec 2015 02:20:31 -
@@ -25,7 +25,7 @@
 
 #include 
 #include 
-#include 
+#include 
 
 #include "l2tp.h"
 #include "interface.h"



Re: __progname in base

2015-12-04 Thread Michael McConville
Tobias Stoeckmann wrote:
> Opinions, thoughts?
> 
> > [...]
> > 
> > Index: sbin/newfs/newfs.c
> > ===
> > RCS file: /cvs/src/sbin/newfs/newfs.c,v
> > retrieving revision 1.100
> > diff -u -p -u -p -r1.100 newfs.c
> > --- sbin/newfs/newfs.c  29 Sep 2015 03:19:24 -  1.100
> > +++ sbin/newfs/newfs.c  7 Nov 2015 11:16:26 -
> > @@ -410,7 +410,7 @@ main(int argc, char *argv[])
> > special);
> > }
> > cp = strchr(argv[0], '\0') - 1;
> > -   if (cp == NULL ||
> > +   if (cp == NULL || cp < argv[0] ||
> > ((*cp < 'a' || *cp > ('a' + maxpartitions - 1))
> > && !isdigit((unsigned char)*cp)))
> > fatal("%s: can't figure out file system partition",

Why not just use *argv[0] == '\0' in the condition? Seems clearer to me.

ok mmcc@ for that hunk either way.

It could also be nicer to break that out into its own condition and make
the error message something like "Empty partition name supplied".



Re: libc: getusershell, new implementation

2015-12-04 Thread Philip Guenther
On Fri, Dec 4, 2015 at 2:54 PM, Tobias Stoeckmann  wrote:
> On Fri, Dec 04, 2015 at 03:47:07PM -0700, Theo de Raadt wrote:
>> > Is it worth it, knowing that it's a deprecated BSD-specific function?
>>
>> Careful there :-) It isn't deprecated in the BSD's, and it isn't
>> deprecated in any other system which is still doing maintainance and
>> improvements to it...
>
> $ grep usershell /usr/src/lib/libc/hidden/unistd.h
> PROTO_DEPRECATED(endusershell);
> PROTO_DEPRECATED(getusershell);
> PROTO_DEPRECATED(setusershell);
>
> Formally, it is deprecated.

  That's not what that macro means, sorry.


> Which is why I had to introduce the
> usershell_reset function.  We are not able to call endusershell from
> within setusershell due to these entries, which Mike's code does.

You should read /usr/srclib/libc/include/README and follow the logic
there for each symbol you add *and* possibly adjust existing symbols
per those steps.  For example, if the new code wants to call
endusershell(), then change PROTO_DEPRECATED(endusershell) to
PROTO_NORMAL(endusershell) and add a DEF_WEAK(endusershell) in the .c
file which defines it.

(Why "PROTO_DEPRECATED"?  Because it uses gcc's
__attribute__((deprecated)) to detect attempts to use external symbols
when an internal symbol should be used.  I should have stuck with my
intent to rename the macro, but I was encouraged to leave it as is.
Sigh)


Philip Guenther



Re: Linker changes between 5.7 and 5.8

2015-12-04 Thread Tati Chevron

ld ignores the -fno-pie option and produces the same error.  Presumably
it's because I'm trying to build a static executable.

I've tried passing every option I can think of to ld, but I can't work
it out.  The output from the assembler is fine, can be linked on a machine
running 5.7, and the resulting binary executes on a 5.8 box.

I'm not sure where else to look...

On Fri, Dec 04, 2015 at 03:55:09PM -0700, Theo de Raadt wrote:

Most architectures went to 'static PIE' by default in 5.8.

You will probably need something like -fno-pie.

The entire exercise of writing asm which bootstraps without
the c runtime means you are very much on your own.


This assembled and linked without problems on 5.7-release, but now when
I try it on 5.8-release, I get an error:

$ as -o charset.o charset.S
$ ld -Bstatic charset.o

ld: charset.o: relocation R_X86_64_32S against `a local symbol' can not
be used when making a shared object; recompile with -fPIC
charset.o: could not read symbols: Bad value

What's changed?

charset.S is fairly trivial:

.section ".note", "a"
.p2align 2
.long 0x08
.long 0x04
.long 0x01
.ascii "OpenBSD\0"
.long 0x00
.p2align 2

.section .data
bytestore: .quad 0x20
welcome: .ascii "Characterset generator by Tati Chevron,  27/4/2015."
.byte 0x0A
newline: .quad 0x0A

.section .text
.globl _start

_start:

mov $welcome, %rsi
mov $1, %rdi
mov $4, %rax
mov $53, %rdx
syscall

_loop:

push bytestore
pop %rax
cmp $0xFF, %rax
jz _exit
cmp $0x80, %rax
jnz _skip
call _nl
push bytestore
pop %rax
_skip:

inc %rax
mov %rax, bytestore

mov $4, %rax
mov $1, %rdi
mov $bytestore, %rsi
mov $1, %rdx
syscall

jmp _loop

_exit:
call _nl
mov $1, %rdi
mov $1, %rax
syscall

_nl:
mov $4, %rax
mov $1, %rdi
mov $newline, %rsi
mov $1, %rdx
syscall
ret

--
Tati Chevron
Perl and FORTRAN specialist.
SWABSIT development and migration department.
http://www.swabsit.com





--
Tati Chevron
Perl and FORTRAN specialist.
SWABSIT development and migration department.
http://www.swabsit.com



Re: libc: getusershell, new implementation

2015-12-04 Thread Tobias Stoeckmann
On Fri, Dec 04, 2015 at 03:47:07PM -0700, Theo de Raadt wrote:
> > Is it worth it, knowing that it's a deprecated BSD-specific function?
> 
> Careful there :-) It isn't deprecated in the BSD's, and it isn't
> deprecated in any other system which is still doing maintainance and
> improvements to it...

$ grep usershell /usr/src/lib/libc/hidden/unistd.h
PROTO_DEPRECATED(endusershell);
PROTO_DEPRECATED(getusershell);
PROTO_DEPRECATED(setusershell);

Formally, it is deprecated. Which is why I had to introduce the
usershell_reset function. We are not able to call endusershell from
within setusershell due to these entries, which Mike's code does.



Re: Linker changes between 5.7 and 5.8

2015-12-04 Thread Theo de Raadt
Most architectures went to 'static PIE' by default in 5.8.

You will probably need something like -fno-pie.

The entire exercise of writing asm which bootstraps without
the c runtime means you are very much on your own.

> This assembled and linked without problems on 5.7-release, but now when
> I try it on 5.8-release, I get an error:
> 
> $ as -o charset.o charset.S
> $ ld -Bstatic charset.o
> 
> ld: charset.o: relocation R_X86_64_32S against `a local symbol' can not
> be used when making a shared object; recompile with -fPIC
> charset.o: could not read symbols: Bad value
> 
> What's changed?
> 
> charset.S is fairly trivial:
> 
> .section ".note", "a"
> .p2align 2
> .long 0x08
> .long 0x04
> .long 0x01
> .ascii "OpenBSD\0"
> .long 0x00
> .p2align 2
> 
> .section .data
> bytestore: .quad 0x20
> welcome: .ascii "Characterset generator by Tati Chevron,  27/4/2015."
> .byte 0x0A
> newline: .quad 0x0A
> 
> .section .text
> .globl _start
> 
> _start:
> 
> mov $welcome, %rsi
> mov $1, %rdi
> mov $4, %rax
> mov $53, %rdx
> syscall
> 
> _loop:
> 
> push bytestore
> pop %rax
> cmp $0xFF, %rax
> jz _exit
> cmp $0x80, %rax
> jnz _skip
> call _nl
> push bytestore
> pop %rax
> _skip:
> 
> inc %rax
> mov %rax, bytestore
> 
> mov $4, %rax
> mov $1, %rdi
> mov $bytestore, %rsi
> mov $1, %rdx
> syscall
> 
> jmp _loop
> 
> _exit:
> call _nl
> mov $1, %rdi
> mov $1, %rax
> syscall
> 
> _nl:
> mov $4, %rax
> mov $1, %rdi
> mov $newline, %rsi
> mov $1, %rdx
> syscall
> ret
> 
> -- 
> Tati Chevron
> Perl and FORTRAN specialist.
> SWABSIT development and migration department.
> http://www.swabsit.com
> 



Re: libc: getusershell, new implementation

2015-12-04 Thread Theo de Raadt
> That is not true with the new code, and should be avoided IMHO. If this
> implementation encounters a line that does not start with a slash '/',
> it will be discarded. Maybe we should adjust the manual page to be more
> precise?

I'm happy to see the new strictness.  Maybe tigthen the manual page a
tiny bit.  Maybe another manual page contains more precise wording
about starting with / ?

> Is it worth it, knowing that it's a deprecated BSD-specific function?

Careful there :-) It isn't deprecated in the BSD's, and it isn't
deprecated in any other system which is still doing maintainance and
improvements to it...



Re: __progname in base

2015-12-04 Thread Tobias Stoeckmann
Opinions, thoughts?

Joerg Sonnenberger mentioned that getprogname() could be used.

On Sat, Nov 07, 2015 at 12:20:42PM +0100, Tobias Stoeckmann wrote:
> Based on Todd's patch for at and cron, I did a grep through our base
> tree to see if there are more occurrences of self-made __progname
> handling.
> 
> Here's the patch that fixes these cases.
> 
> In newfs and newfs_ext2fs, I prevent an out-of-boundary access in case
> someone calls them with argv[0] set to an empty string. And gomoku
> get a little style while at it.
> 
> 
> Index: bin/mt/mt.c
> ===
> RCS file: /cvs/src/bin/mt/mt.c,v
> retrieving revision 1.36
> diff -u -p -u -p -r1.36 mt.c
> --- bin/mt/mt.c   12 Nov 2013 04:36:02 -  1.36
> +++ bin/mt/mt.c   7 Nov 2015 11:16:25 -
> @@ -88,6 +88,8 @@ int _rmtmtioctop(int fd, struct mtop *c
>  struct mtget *_rmtstatus(int fd);
>  void _rmtclose(void);
>  
> +extern char  *__progname;
> +
>  char *host = NULL;   /* remote host (if any) */
>  
>  int
> @@ -133,7 +135,6 @@ _rmtclose(void)
>  #endif
>  }
>  
> -char *progname;
>  int  eject = 0;
>  
>  int
> @@ -145,12 +146,7 @@ main(int argc, char *argv[])
>   char *p, *tape, *realtape, *opts;
>   size_t len;
>  
> - if ((progname = strrchr(argv[0], '/')))
> - progname++;
> - else
> - progname = argv[0];
> -
> - if (strcmp(progname, "eject") == 0) {
> + if (strcmp(__progname, "eject") == 0) {
>   opts = "t";
>   eject = 1;
>   tape = NULL;
> @@ -320,9 +316,9 @@ void
>  usage(void)
>  {
>   if (eject)
> - (void)fprintf(stderr, "usage: %s [-t] device\n", progname);
> + (void)fprintf(stderr, "usage: %s [-t] device\n", __progname);
>   else
>   (void)fprintf(stderr,
> - "usage: %s [-f device] command [count]\n", progname);
> + "usage: %s [-f device] command [count]\n", __progname);
>   exit(X_USAGE);
>  }
> Index: bin/pax/options.c
> ===
> RCS file: /cvs/src/bin/pax/options.c,v
> retrieving revision 1.91
> diff -u -p -u -p -r1.91 options.c
> --- bin/pax/options.c 18 May 2015 20:26:16 -  1.91
> +++ bin/pax/options.c 7 Nov 2015 11:16:25 -
> @@ -184,14 +184,12 @@ char *chdname = NULL;
>  void
>  options(int argc, char **argv)
>  {
> + extern char *__progname;
>  
>   /*
>* Are we acting like pax, tar or cpio (based on argv[0])
>*/
> - if ((argv0 = strrchr(argv[0], '/')) != NULL)
> - argv0++;
> - else
> - argv0 = argv[0];
> + argv0 = __progname;
>  
>   if (strcmp(NM_TAR, argv0) == 0) {
>   tar_options(argc, argv);
> Index: games/gomoku/main.c
> ===
> RCS file: /cvs/src/games/gomoku/main.c,v
> retrieving revision 1.28
> diff -u -p -u -p -r1.28 main.c
> --- games/gomoku/main.c   4 Nov 2015 21:22:10 -   1.28
> +++ games/gomoku/main.c   7 Nov 2015 11:16:26 -
> @@ -45,10 +45,11 @@
>  #define PROGRAM  1   /* get input from program */
>  #define INPUTF   2   /* get input from a file */
>  
> +extern char *__progname;
> +
>  int  interactive = 1;/* true if interactive */
>  int  debug;  /* true if debugging */
>  int  test;   /* both moves come from 1: input, 2: computer */
> -char *prog;  /* name of program */
>  FILE *debugfp;   /* file for debug output */
>  FILE *inputfp;   /* file for debug input */
>  
> @@ -67,9 +68,7 @@ char*plyr[2];   /* who's who */
>  static char you[LOGIN_NAME_MAX]; /* username */
>  
>  int
> -main(argc, argv)
> - int argc;
> - char **argv;
> +main(int argc, char *argv[])
>  {
>   char buf[128];
>   char fname[PATH_MAX];
> @@ -81,12 +80,6 @@ main(argc, argv)
>   };
>   char *tmpname;
>  
> - prog = strrchr(argv[0], '/');
> - if (prog)
> - prog++;
> - else
> - prog = argv[0];
> -
>   if ((tmpname = getlogin()) != NULL)
>   strlcpy(you, tmpname, sizeof(you));
>   else
> @@ -114,7 +107,7 @@ main(argc, argv)
>   default:
>   fprintf(stderr,
>   "usage: %s [-bcdu] [-D debugfile] [inputfile]\n",
> - prog);
> + __progname);
>   exit(1);
>   }
>   }
> @@ -191,8 +184,8 @@ again:
>   }
>   }
>   if (interactive) {
> - plyr[BLACK] = input[BLACK] == USER ? you : prog;
> - plyr[WHITE] = input[WHITE] == USER ? you : prog;
> + plyr[BLACK] = input[BLACK] == USER ? you : __progname;
> + plyr[WHITE] = input[WHITE] == USER ? you : __progname;
>  

Linker changes between 5.7 and 5.8

2015-12-04 Thread Tati Chevron

This assembled and linked without problems on 5.7-release, but now when
I try it on 5.8-release, I get an error:

$ as -o charset.o charset.S
$ ld -Bstatic charset.o

ld: charset.o: relocation R_X86_64_32S against `a local symbol' can not
be used when making a shared object; recompile with -fPIC
charset.o: could not read symbols: Bad value

What's changed?

charset.S is fairly trivial:

.section ".note", "a"
.p2align 2
.long 0x08
.long 0x04
.long 0x01
.ascii "OpenBSD\0"
.long 0x00
.p2align 2

.section .data
bytestore: .quad 0x20
welcome: .ascii "Characterset generator by Tati Chevron,  27/4/2015."
.byte 0x0A
newline: .quad 0x0A

.section .text
.globl _start

_start:

mov $welcome, %rsi
mov $1, %rdi
mov $4, %rax
mov $53, %rdx
syscall

_loop:

push bytestore
pop %rax
cmp $0xFF, %rax
jz _exit
cmp $0x80, %rax
jnz _skip
call _nl
push bytestore
pop %rax
_skip:

inc %rax
mov %rax, bytestore

mov $4, %rax
mov $1, %rdi
mov $bytestore, %rsi
mov $1, %rdx
syscall

jmp _loop

_exit:
call _nl
mov $1, %rdi
mov $1, %rax
syscall

_nl:
mov $4, %rax
mov $1, %rdi
mov $newline, %rsi
mov $1, %rdx
syscall
ret

--
Tati Chevron
Perl and FORTRAN specialist.
SWABSIT development and migration department.
http://www.swabsit.com



libc: getusershell, new implementation

2015-12-04 Thread Tobias Stoeckmann
There's still a possible overflow in getusershell.c. We could increase
the buffer allocation yet again, but I have to agree with the glibc
developers here: enough is enough. The code is ugly and has proven to be
difficult to review.

The overflow has been spotted by Paul Pluzhnikov, after I submitted
a bug report of our getusershell fixes to glibc, as seen here:
https://sourceware.org/bugzilla/show_bug.cgi?id=18660

It led to a discussion of the glibc developers. Eventually, Mike Frysinger
came up with a new implementation, which is not in glibc yet. I like it
a lot, and Mike was kind enough to allow an MIT-licensed version for us.
If in doubt, Mike is in CC. It's basically his implementation, plus KNF
and using strcspn for shorter code.

The new code differs from old behavior, but it should be sufficient. We
support /etc/shells as seen in our system and as it is described in
shells(5). But the wording of shells(5) is a bit odd. It makes me believe
that relative paths would be okay, because:

 For each shell a single line should be present, consisting of
 the shell's path, relative to root.

That is not true with the new code, and should be avoided IMHO. If this
implementation encounters a line that does not start with a slash '/',
it will be discarded. Maybe we should adjust the manual page to be more
precise?

If you want to test this code, chpass is the easiest way. Change your
login shell to something that is not in /etc/shells and you should get
a warning. If it's in /etc/shells, no warning shall be seen.

Is it worth it, knowing that it's a deprecated BSD-specific function?
Or maybe just increase buffer allocation (hopefully just) one more time?

Tobias


Index: lib/libc/gen/getusershell.c
===
RCS file: /cvs/src/lib/libc/gen/getusershell.c,v
retrieving revision 1.16
diff -u -p -u -p -r1.16 getusershell.c
--- gen/getusershell.c  14 Sep 2015 16:09:13 -  1.16
+++ gen/getusershell.c  4 Dec 2015 21:39:01 -
@@ -1,131 +1,106 @@
-/* $OpenBSD: getusershell.c,v 1.16 2015/09/14 16:09:13 tedu Exp $ */
+/* $OpenBSD$ */
 /*
- * Copyright (c) 1985, 1993
- * The Regents of the University of California.  All rights reserved.
+ * Copyright (c) 2015 Mike Frysinger 
  *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *notice, this list of conditions and the following disclaimer in the
- *documentation and/or other materials provided with the distribution.
- * 3. Neither the name of the University nor the names of its contributors
- *may be used to endorse or promote products derived from this software
- *without specific prior written permission.
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
  *
- * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
 
-#include 
-
-#include 
-#include 
+#include 
 #include 
-#include 
 #include 
 #include 
+#include 
 #include 
 
-/*
- * Local shells should NOT be added here.  They should be added in
- * /etc/shells.
- */
+/* These functions are not thread safe (by design), so globals are OK. */
+static FILE*shellfp;
+static int  okshell_idx;
+static char*shellbuf;
+static size_t   buf_len;
 
-static char *okshells[] = { _PATH_BSHELL, _PATH_CSHELL, _PATH_KS

Re: libc: locale/rune.c input validation

2015-12-04 Thread Tobias Stoeckmann
Hi Ingo,

On Fri, Dec 04, 2015 at 12:27:39PM +0100, Ingo Schwarze wrote:
> uint32_t should be preferred because that's the POSIX type, while
> u_int32_t is not standardized.  If you are working on the file
> anyway, i'd recommend to unify all uses to uint32_t.

done.

> __LP64__ that is overly specific and not standardized.  Something
> like this maybe:
> 
>   #if SIZE_MAX <= UINT32_MAX

That sounds nice and gives a better clue why that #if is in place.
I've applied it.

> > I still cast frl_variable_len to u_int32_t for ntohl, but will cast it
> > back to int32_t for a "less than 0" check. This should properly handle
> > error cases when it's been negative to begin with.
> 
> True.  Slightly ugly, but i don't see a better way either.

Looking at UINT32_MAX usage now, I just test for var_len > INT32_MAX.


Tobias

Index: locale/rune.c
===
RCS file: /cvs/src/lib/libc/locale/rune.c,v
retrieving revision 1.4
diff -u -p -u -p -r1.4 rune.c
--- locale/rune.c   25 May 2014 17:47:04 -  1.4
+++ locale/rune.c   4 Dec 2015 20:48:52 -
@@ -59,23 +59,31 @@
  * SUCH DAMAGE.
  */
 
+#include 
+#include 
 #include 
+#include 
+#include 
 #include 
-#include 
 #include 
-#include 
+#include 
 #include 
-#include 
-#include 
 #include "rune.h"
 #include "rune_local.h"
 
-static int readrange(_RuneLocale *, _RuneRange *, _FileRuneRange *, void *, 
FILE *);
+#define SAFE_ADD(x, y) \
+do {   \
+   if ((x) > SIZE_MAX - (y))   \
+   return NULL;\
+   (x) += (y); \
+} while (0);
+
+static int readrange(_RuneLocale *, _RuneRange *, uint32_t, void *, FILE *);
 static void _freeentry(_RuneRange *);
 static void _wctype_init(_RuneLocale *rl);
 
 static int
-readrange(_RuneLocale *rl, _RuneRange *rr, _FileRuneRange *frr, void *lastp,
+readrange(_RuneLocale *rl, _RuneRange *rr, uint32_t nranges, void *lastp,
FILE *fp)
 {
uint32_t i;
@@ -84,7 +92,7 @@ readrange(_RuneLocale *rl, _RuneRange *r
 
re = (_RuneEntry *)rl->rl_variable;
 
-   rr->rr_nranges = ntohl(frr->frr_nranges);
+   rr->rr_nranges = nranges;
if (rr->rr_nranges == 0) {
rr->rr_rune_ranges = NULL;
return 0;
@@ -92,16 +100,16 @@ readrange(_RuneLocale *rl, _RuneRange *r
 
rr->rr_rune_ranges = re;
for (i = 0; i < rr->rr_nranges; i++) {
+   if ((void *)re >= lastp)
+   return -1;
+
if (fread(&fre, sizeof(fre), 1, fp) != 1)
return -1;
 
-   re->re_min = ntohl((u_int32_t)fre.fre_min);
-   re->re_max = ntohl((u_int32_t)fre.fre_max);
-   re->re_map = ntohl((u_int32_t)fre.fre_map);
+   re->re_min = ntohl((uint32_t)fre.fre_min);
+   re->re_max = ntohl((uint32_t)fre.fre_max);
+   re->re_map = ntohl((uint32_t)fre.fre_map);
re++;
-
-   if ((void *)re > lastp)
-   return -1;
}
rl->rl_variable = re;
return 0;
@@ -121,6 +129,9 @@ readentry(_RuneRange *rr, FILE *fp)
continue;
}
 
+   if (re[i].re_max < re[i].re_min)
+   goto fail;
+
l = re[i].re_max - re[i].re_min + 1;
re[i].re_rune_types = calloc(l, sizeof(_RuneType));
if (!re[i].re_rune_types) {
@@ -151,17 +162,20 @@ fail2:
 }
 
 /* XXX: temporary implementation */
-static void
+static int
 find_codeset(_RuneLocale *rl)
 {
char *top, *codeset, *tail, *ep;
 
+   if (rl->rl_variable == NULL)
+   return 0;
+
/* end of rl_variable region */
ep = (char *)rl->rl_variable;
ep += rl->rl_variable_len;
rl->rl_codeset = NULL;
if (!(top = strstr(rl->rl_variable, _RUNE_CODESET)))
-   return;
+   return 0;
tail = strpbrk(top, " \t");
codeset = top + sizeof(_RUNE_CODESET) - 1;
if (tail) {
@@ -173,6 +187,7 @@ find_codeset(_RuneLocale *rl)
*top = '\0';
rl->rl_codeset = strdup(codeset);
}
+   return (rl->rl_codeset == NULL);
 }
 
 void
@@ -183,8 +198,7 @@ _freeentry(_RuneRange *rr)
 
re = rr->rr_rune_ranges;
for (i = 0; i < rr->rr_nranges; i++) {
-   if (re[i].re_rune_types)
-   free(re[i].re_rune_types);
+   free(re[i].re_rune_types);
re[i].re_rune_types = NULL;
}
 }
@@ -209,6 +223,7 @@ _Read_RuneMagi(FILE *fp)
_RuneLocale *rl;
struct stat sb;
int x;
+   uint32_t runetype_nranges, maplower_nranges, mapupper_nranges, var_len;
 
if (fstat(fileno(fp), &sb) < 0)
return NULL;
@@ -225,10 +240,26 @@ _Read_RuneMagi(FILE *fp)
if (memcmp(frl.frl_magic, _RUNE_MAGIC_1, sizeof

Re: npppd: simplify and lock down priv_open()

2015-12-04 Thread YASUOKA Masahiko

On Fri, 4 Dec 2015 09:47:46 -0800
Philip Guenther  wrote:
> On Fri, Dec 4, 2015 at 3:41 AM, YASUOKA Masahiko  wrote:
>> On Sat, 10 Oct 2015 21:32:29 -0700
>> Philip Guenther  wrote:
>>> So:
>>>  * kill the 'mode' argument to PRIVSEP_OPEN and priv_open()
>>>  * fail a PRIVSEP_OPEN call if it's passed any flags other than
>>>O_ACCMODE or O_NONBLOCK
>>>  * paranoia: mask O_CREAT when calling open() with only two arguments
>>>  * instead of using ioctl(FIONBIO) after the fact, pass O_NONBLOCK to
>>>priv_open()
>>
>> As for "open with O_NONBLOCK and ioctl(FIONBIO)", see the diff
>> following for example,
>>
>>> Index: usr.sbin/npppd/npppd/npppd_iface.c
>>> ===
>>> RCS file: /cvs/src/usr.sbin/npppd/npppd/npppd_iface.c,v
>>> retrieving revision 1.11
>>> diff -u -p -r1.11 npppd_iface.c
>>> --- usr.sbin/npppd/npppd/npppd_iface.c20 Jul 2015 18:58:30 -
>>>   1.11
>>> +++ usr.sbin/npppd/npppd/npppd_iface.c11 Oct 2015 04:17:40 -
>>> @@ -283,15 +283,8 @@ npppd_iface_start(npppd_iface *_this)
>>>
>>>   /* open device file */
>>>   snprintf(buf, sizeof(buf), "/dev/%s", _this->ifname);
>>> - if ((_this->devf = priv_open(buf, O_RDWR, 0600)) < 0) {
>>> + if ((_this->devf = priv_open(buf, O_RDWR | O_NONBLOCK)) < 0) {
>>>   npppd_iface_log(_this, LOG_ERR, "open(%s) failed: %m", buf);
>>> - goto fail;
>>> - }
>>> -
>>> - x = 1;
>>> - if (ioctl(_this->devf, FIONBIO, &x) != 0) {
>>> - npppd_iface_log(_this, LOG_ERR,
>>> - "ioctl(FIONBIO) failed in %s(): %m", __func__);
>>>   goto fail;
>>>   }
>>
>> At least on tun, bpf or pppx they behave differently.  Since
>> open(,O_NONBLOCK) doesn't call ioctl(FIONBIO) internally and those
>> device have/use their own flag.
>>
>> How should we fix this?  Revert ioctl(FIONBIO), change the open(2)
>> behavior or the devices behavior?
> 
> We should fix open(2); please try the diff below.  Are you sure pppx
> is affected?  pppxioctl()'s FIONBIO case appears to be a no-op.  I
> certainly agree that bpf and tun are affected.

You are right, sorry for my miunderstanding.  Actually pppx is not
affected.  npppd with tun (+ pipex.enable=0) was affected.

The diff fixes the problem.

Thanks,

--yasuoka



Re: npppd: simplify and lock down priv_open()

2015-12-04 Thread Todd C. Miller
On Fri, 04 Dec 2015 09:47:46 -0800, Philip Guenther wrote:

> We should fix open(2); please try the diff below.  Are you sure pppx
> is affected?  pppxioctl()'s FIONBIO case appears to be a no-op.  I
> certainly agree that bpf and tun are affected.

Shouldn't the device open function check for FNONBLOCK and handle
it as needed?

 - todd



Re: npppd: simplify and lock down priv_open()

2015-12-04 Thread Philip Guenther
On Fri, Dec 4, 2015 at 3:41 AM, YASUOKA Masahiko  wrote:
> On Sat, 10 Oct 2015 21:32:29 -0700
> Philip Guenther  wrote:
>> So:
>>  * kill the 'mode' argument to PRIVSEP_OPEN and priv_open()
>>  * fail a PRIVSEP_OPEN call if it's passed any flags other than
>>O_ACCMODE or O_NONBLOCK
>>  * paranoia: mask O_CREAT when calling open() with only two arguments
>>  * instead of using ioctl(FIONBIO) after the fact, pass O_NONBLOCK to
>>priv_open()
>
> As for "open with O_NONBLOCK and ioctl(FIONBIO)", see the diff
> following for example,
>
>> Index: usr.sbin/npppd/npppd/npppd_iface.c
>> ===
>> RCS file: /cvs/src/usr.sbin/npppd/npppd/npppd_iface.c,v
>> retrieving revision 1.11
>> diff -u -p -r1.11 npppd_iface.c
>> --- usr.sbin/npppd/npppd/npppd_iface.c20 Jul 2015 18:58:30 - 
>>  1.11
>> +++ usr.sbin/npppd/npppd/npppd_iface.c11 Oct 2015 04:17:40 -
>> @@ -283,15 +283,8 @@ npppd_iface_start(npppd_iface *_this)
>>
>>   /* open device file */
>>   snprintf(buf, sizeof(buf), "/dev/%s", _this->ifname);
>> - if ((_this->devf = priv_open(buf, O_RDWR, 0600)) < 0) {
>> + if ((_this->devf = priv_open(buf, O_RDWR | O_NONBLOCK)) < 0) {
>>   npppd_iface_log(_this, LOG_ERR, "open(%s) failed: %m", buf);
>> - goto fail;
>> - }
>> -
>> - x = 1;
>> - if (ioctl(_this->devf, FIONBIO, &x) != 0) {
>> - npppd_iface_log(_this, LOG_ERR,
>> - "ioctl(FIONBIO) failed in %s(): %m", __func__);
>>   goto fail;
>>   }
>
> At least on tun, bpf or pppx they behave differently.  Since
> open(,O_NONBLOCK) doesn't call ioctl(FIONBIO) internally and those
> device have/use their own flag.
>
> How should we fix this?  Revert ioctl(FIONBIO), change the open(2)
> behavior or the devices behavior?

We should fix open(2); please try the diff below.  Are you sure pppx
is affected?  pppxioctl()'s FIONBIO case appears to be a no-op.  I
certainly agree that bpf and tun are affected.

Philip Guenther


--- kern/vfs_syscalls.c 20 Nov 2015 07:11:52 -  1.245
+++ kern/vfs_syscalls.c 4 Dec 2015 17:46:34 -
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -951,6 +952,8 @@ doopenat(struct proc *p, int fd, const c
}
}
VOP_UNLOCK(vp, 0, p);
+   if (flags & O_NONBLOCK)
+   (*fp->f_ops->fo_ioctl)(fp, FIONBIO, (caddr_t)&flags, p);
*retval = indx;
FILE_SET_MATURE(fp, p);
 out:



Re: Make ix(4) mpsafe: take 2

2015-12-04 Thread Hrvoje Popovski
On 4.12.2015. 12:47, Mark Kettenis wrote:
> Here is a new diff to make ix(4) mpsafe.  Should now longer get stuck
> in the OACTIVE state.  Tests more than welcome.
> 


Hi,

i have tested this patch with 82599 and x540 while sending 6Mpps for cca
3 hours and ifconfig down/up and everything is working fine. Will test
it more and if I find something will send mail.

Thank you.



Do not grab the KERNEL_LOCK() in rtalloc(9)

2015-12-04 Thread Martin Pieuchot
To reduce contention in the fast path we can use a mutex(9) to serialize
read/write operations on the radix tree instead of the KERNEL_LOCK.

Note that the KERNEL_LOCK is still needed to serialize walk/delete/insert
because a call to rtable_delete() can be made inside rtable_walk(), and
we cannot use a mutex for such recursive problem.

I'd be interested to know if this helps MPLS fowarding.

Index: net/rtable.c
===
RCS file: /cvs/src/sys/net/rtable.c,v
retrieving revision 1.33
diff -u -p -r1.33 rtable.c
--- net/rtable.c4 Dec 2015 13:42:48 -   1.33
+++ net/rtable.c4 Dec 2015 13:45:53 -
@@ -292,6 +292,10 @@ rtable_l2set(unsigned int rtableid, unsi
 }
 
 #ifndef ART
+#include 
+
+struct mutex rn_mtx = MUTEX_INITIALIZER(IPL_NONE);
+
 void
 rtable_init_backend(unsigned int keylen)
 {
@@ -319,15 +323,16 @@ rtable_lookup(unsigned int rtableid, str
 {
struct radix_node_head  *rnh;
struct radix_node   *rn;
-   struct rtentry  *rt;
+   struct rtentry  *rt = NULL;
 
rnh = rtable_get(rtableid, dst->sa_family);
if (rnh == NULL)
return (NULL);
 
+   mtx_enter(&rn_mtx);
rn = rn_lookup(dst, mask, rnh);
if (rn == NULL || (rn->rn_flags & RNF_ROOT) != 0)
-   return (NULL);
+   goto out;
 
rt = ((struct rtentry *)rn);
 
@@ -335,11 +340,13 @@ rtable_lookup(unsigned int rtableid, str
if (rnh->rnh_multipath) {
rt = rt_mpath_matchgate(rt, gateway, prio);
if (rt == NULL)
-   return (NULL);
+   goto out;
}
 #endif /* !SMALL_KERNEL */
 
rtref(rt);
+out:
+   mtx_leave(&rn_mtx);
return (rt);
 }
 
@@ -354,7 +361,7 @@ rtable_match(unsigned int rtableid, stru
if (rnh == NULL)
return (NULL);
 
-   KERNEL_LOCK();
+   mtx_enter(&rn_mtx);
rn = rn_match(dst, rnh);
if (rn == NULL || (rn->rn_flags & RNF_ROOT) != 0)
goto out;
@@ -366,7 +373,7 @@ rtable_match(unsigned int rtableid, stru
rt = rtable_mpath_select(rt, src);
 #endif /* SMALL_KERNEL */
 out:
-   KERNEL_UNLOCK();
+   mtx_leave(&rn_mtx);
return (rt);
 }
 
@@ -377,29 +384,37 @@ rtable_insert(unsigned int rtableid, str
 {
struct radix_node_head  *rnh;
struct radix_node   *rn = (struct radix_node *)rt;
+   int  error = 0;
+
+   KERNEL_ASSERT_LOCKED();
 
rnh = rtable_get(rtableid, dst->sa_family);
if (rnh == NULL)
return (EAFNOSUPPORT);
 
+   mtx_enter(&rn_mtx);
 #ifndef SMALL_KERNEL
if (rnh->rnh_multipath) {
/* Do not permit exactly the same dst/mask/gw pair. */
if (rt_mpath_conflict(rnh, dst, mask, gateway, prio,
ISSET(rt->rt_flags, RTF_MPATH))) {
-   return (EEXIST);
+   error = EEXIST;
+   goto out;
}
}
 #endif /* SMALL_KERNEL */
 
rn = rn_addroute(dst, mask, rnh, rn, prio);
-   if (rn == NULL)
-   return (ESRCH);
+   if (rn == NULL) {
+   error = ESRCH;
+   goto out;
+   }
 
rt = ((struct rtentry *)rn);
rtref(rt);
-
-   return (0);
+out:
+   mtx_leave(&rn_mtx);
+   return (error);
 }
 
 int
@@ -408,22 +423,31 @@ rtable_delete(unsigned int rtableid, str
 {
struct radix_node_head  *rnh;
struct radix_node   *rn = (struct radix_node *)rt;
+   int  error = 0;
+
+   KERNEL_ASSERT_LOCKED();
 
rnh = rtable_get(rtableid, dst->sa_family);
if (rnh == NULL)
return (EAFNOSUPPORT);
 
+   mtx_enter(&rn_mtx);
rn = rn_delete(dst, mask, rnh, rn);
-   if (rn == NULL)
-   return (ESRCH);
+   if (rn == NULL) {
+   error = ESRCH;
+   goto out;
+   }
 
+#ifdef DIAGNOSTIC
if (rn->rn_flags & (RNF_ACTIVE | RNF_ROOT))
panic("active node flags=%x", rn->rn_flags);
+#endif /* DIAGNOSTIC */
 
rt = ((struct rtentry *)rn);
rtfree(rt);
-
-   return (0);
+out:
+   mtx_leave(&rn_mtx);
+   return (error);
 }
 
 int
@@ -432,6 +456,8 @@ rtable_walk(unsigned int rtableid, sa_fa
 {
struct radix_node_head  *rnh;
int (*f)(struct radix_node *, void *, unsigned int) = (void *)func;
+
+   KERNEL_ASSERT_LOCKED();
 
rnh = rtable_get(rtableid, af);
if (rnh == NULL)



Re: rc.d/rebound: check for rebound.conf

2015-12-04 Thread Antoine Jacoutot
On Fri, Dec 04, 2015 at 01:41:55PM +0100, Christian Weisgerber wrote:
> Require /etc/rebound.conf to exist before starting rebound.
> 
> Otherwise, if the configuration file doesn't exist, rebound will
> error out, but only after daemonizing, so the rc system won't notice
> the failure.
> 
> ok?

I did the same for httpd but was told this is not the way we should do things 
and that the daemon should return a proper error code.

> Index: rebound
> ===
> RCS file: /cvs/src/etc/rc.d/rebound,v
> retrieving revision 1.1
> diff -u -p -r1.1 rebound
> --- rebound   30 Nov 2015 23:35:30 -  1.1
> +++ rebound   4 Dec 2015 12:37:04 -
> @@ -6,4 +6,8 @@ daemon="/usr/sbin/rebound"
>  
>  . /etc/rc.d/rc.subr
>  
> +rc_pre() {
> + [ -s /etc/rebound.conf ]
> +}
> +
>  rc_cmd $1
> -- 
> Christian "naddy" Weisgerber  na...@mips.inka.de
> 

-- 
Antoine



Re: UTF-8 support for wc(1)

2015-12-04 Thread Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Thu, Dec 03, 2015 at 11:40:55AM -0700:
> On Sun, 29 Nov 2015 17:45:55 +0100, Ingo Schwarze wrote:

>> our wc(1) utility currently violates POSIX in two ways:
>> 
>>  1. The -m option counts bytes instead of characters.
>> The patch given below fixes that.
>> 
>>  2. Word counting with -w only treats ASCII whitespace as word
>> boundaries and regards two words joined by non-ASCII whitespace
>> as one single word.
>> 
>> The second issue is not related to UTF-8, but a matter of full
>> Unicode support.  It would not be hard to fix that by using
>> mbtowc(3) and iswblank(3) instead of mblen(3).  However, i don't
>> think we want to pollute our base system tools with functions
>> requiring full Unicode support, not even to the extent available
>> in our own C library.  So i consider iswblank(3) taboo for now.

> I'm a little surprised by this.  It doesn't seem like it would be
> any more complicated to use mbtowc(3) and iswblank(3) for the
> multibyte case.

Reconsidering, your argument makes sense to me.  Even if we implement
a simplified lookup table in the future, it doesn't complicate matters.
We already include data for iswprint(3) and wcwidth(3); iswspace(3)
is not more expensive and probably about as often needed.

So let's include iswblank(3) and iswspace(3) into the list of
function that we are willing to use.  Of course, that still doesn't
mean that we can do full Unicode support (think of collations etc.).

So, here is a patch for wc(1) getting both character and word
counting right.  I also improved the manual in various respects.

OK?
  Ingo


Index: wc.1
===
RCS file: /cvs/src/usr.bin/wc/wc.1,v
retrieving revision 1.25
diff -u -p -r1.25 wc.1
--- wc.121 Apr 2015 10:46:48 -  1.25
+++ wc.14 Dec 2015 12:54:26 -
@@ -72,9 +72,10 @@ using powers of 2 for sizes (K=1024, M=1
 The number of lines in each input file
 is written to the standard output.
 .It Fl m
-Intended to count characters instead of bytes;
-currently an alias for
-.Fl c .
+Count characters instead of bytes, and use
+.Xr iswspace 3
+instead of
+.Xr isspace 3 .
 .It Fl w
 The number of words in each input file
 is written to the standard output.
@@ -102,6 +103,20 @@ lines   words  bytes   file_name
 The counts for lines, words, and bytes
 .Pq or characters
 are integers separated by spaces.
+.Sh ENVIRONMENT
+.Bl -tag -width LC_CTYPE
+.It Ev LC_CTYPE
+The character set
+.Xr locale 1 ,
+defining which byte sequences form characters.
+If unset or set to
+.Qq C ,
+.Qq POSIX ,
+or an unsupported value,
+.Fl m
+has the same effect as
+.Fl c .
+.El
 .Sh EXIT STATUS
 .Ex -std wc
 .Sh SEE ALSO
@@ -111,7 +126,7 @@ The
 .Nm
 utility is compliant with the
 .St -p1003.1-2008
-specification, except that it ignores the locale.
+specification.
 .Pp
 The flag
 .Op Fl h
@@ -121,7 +136,3 @@ A
 .Nm
 utility appeared in
 .At v1 .
-.Sh BUGS
-The
-.Fl m
-option counts bytes instead of characters.
Index: wc.c
===
RCS file: /cvs/src/usr.bin/wc/wc.c,v
retrieving revision 1.19
diff -u -p -r1.19 wc.c
--- wc.c9 Oct 2015 01:37:09 -   1.19
+++ wc.c4 Dec 2015 12:54:26 -
@@ -40,9 +40,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 int64_ttlinect, twordct, tcharct;
-intdoline, doword, dochar, humanchar;
+intdoline, doword, dochar, humanchar, multibyte;
 intrval;
 extern char *__progname;
 
@@ -55,7 +57,7 @@ main(int argc, char *argv[])
 {
int ch;
 
-   setlocale(LC_ALL, "");
+   setlocale(LC_CTYPE, "");
 
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");
@@ -68,8 +70,11 @@ main(int argc, char *argv[])
case 'w':
doword = 1;
break;
-   case 'c':
case 'm':
+   if (MB_CUR_MAX > 1)
+   multibyte = 1;
+   /* FALLTHROUGH */
+   case 'c':
dochar = 1;
break;
case 'h':
@@ -112,15 +117,20 @@ main(int argc, char *argv[])
 void
 cnt(char *file)
 {
-   u_char *C;
+   static char *buf;
+   static ssize_t bufsz;
+
+   FILE *stream;
+   char *C;
+   wchar_t wc;
short gotsp;
-   int len;
+   ssize_t len;
int64_t linect, wordct, charct;
struct stat sbuf;
int fd;
-   u_char buf[MAXBSIZE];
 
linect = wordct = charct = 0;
+   stream = NULL;
if (file) {
if ((fd = open(file, O_RDONLY, 0)) < 0) {
warn("%s", file);
@@ -131,7 +141,10 @@ cnt(char *file)
fd = STDIN_FILENO;
}
 
-   if (!doword) {
+   if (!doword && !multibyte) {
+   if (bufsz < MAXBSIZE &&
+   (buf = realloc(b

rc.d/rebound: check for rebound.conf

2015-12-04 Thread Christian Weisgerber
Require /etc/rebound.conf to exist before starting rebound.

Otherwise, if the configuration file doesn't exist, rebound will
error out, but only after daemonizing, so the rc system won't notice
the failure.

ok?

Index: rebound
===
RCS file: /cvs/src/etc/rc.d/rebound,v
retrieving revision 1.1
diff -u -p -r1.1 rebound
--- rebound 30 Nov 2015 23:35:30 -  1.1
+++ rebound 4 Dec 2015 12:37:04 -
@@ -6,4 +6,8 @@ daemon="/usr/sbin/rebound"
 
 . /etc/rc.d/rc.subr
 
+rc_pre() {
+   [ -s /etc/rebound.conf ]
+}
+
 rc_cmd $1
-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: introducing ip_send()/ip6_send() to OpenBSD kernel

2015-12-04 Thread Alexandr Nedvedicky
On Thu, Dec 03, 2015 at 11:00:09PM +0100, Thierry Deval wrote:
> Hello Sasha,
> 
> You kept the static prototypes for ip_send_dispatch and 
> ip6_send_dispatch.
> 
> You'd better avoid mixing static and non-static declarations for the same 
> functions. ;-)
> 
bluhm sitting next to me spot it too. I've fixed it in follow up
commit and forgot to send email.

regards
sasha



Make ix(4) mpsafe: take 2

2015-12-04 Thread Mark Kettenis
Here is a new diff to make ix(4) mpsafe.  Should now longer get stuck
in the OACTIVE state.  Tests more than welcome.


Index: if_ix.c
===
RCS file: /cvs/src/sys/dev/pci/if_ix.c,v
retrieving revision 1.129
diff -u -p -r1.129 if_ix.c
--- if_ix.c 25 Nov 2015 03:09:59 -  1.129
+++ if_ix.c 3 Dec 2015 16:50:26 -
@@ -210,8 +210,6 @@ ixgbe_attach(struct device *parent, stru
sc->osdep.os_sc = sc;
sc->osdep.os_pa = *pa;
 
-   mtx_init(&sc->rx_mtx, IPL_NET);
-
/* Set up the timer callout */
timeout_set(&sc->timer, ixgbe_local_timer, sc);
timeout_set(&sc->rx_refill, ixgbe_rxrefill, sc);
@@ -385,6 +383,16 @@ ixgbe_start(struct ifnet * ifp)
if (ixgbe_encap(txr, m_head)) {
ifq_deq_rollback(&ifp->if_snd, m_head);
ifq_set_oactive(&ifp->if_snd);
+   /*
+*  Make sure there are still packets on the
+*  ring.  The interrupt handler may have
+*  cleaned up the ring before we were able to
+*  set the IF_OACTIVE flag.
+*/
+   if (txr->tx_avail == sc->num_tx_desc) {
+   ifq_clr_oactive(&ifp->if_snd);
+   continue;
+   }
break;
}
 
@@ -527,10 +535,7 @@ ixgbe_watchdog(struct ifnet * ifp)
 
/*
 * The timer is set to 5 every time ixgbe_start() queues a packet.
-* Then ixgbe_txeof() keeps resetting to 5 as long as it cleans at
-* least one descriptor.
-* Finally, anytime all descriptors are clean the timer is
-* set to 0.
+* Anytime all descriptors are clean the timer is set to 0.
 */
for (i = 0; i < sc->num_queues; i++, txr++) {
if (txr->watchdog_timer == 0 || --txr->watchdog_timer)
@@ -862,7 +867,7 @@ ixgbe_intr(void *arg)
struct tx_ring  *txr = sc->tx_rings;
struct ixgbe_hw *hw = &sc->hw;
uint32_t reg_eicr;
-   int  i, refill = 0, was_active = 0;
+   int  i, refill = 0;
 
reg_eicr = IXGBE_READ_REG(&sc->hw, IXGBE_EICR);
if (reg_eicr == 0) {
@@ -872,11 +877,8 @@ ixgbe_intr(void *arg)
 
if (ISSET(ifp->if_flags, IFF_RUNNING)) {
ixgbe_rxeof(que);
-   refill = 1;
-
-   if (ifq_is_oactive(&ifp->if_snd))
-   was_active = 1;
ixgbe_txeof(txr);
+   refill = 1;
}
 
if (refill) {
@@ -892,6 +894,8 @@ ixgbe_intr(void *arg)
if (reg_eicr & IXGBE_EICR_LSC) {
KERNEL_LOCK();
ixgbe_update_link_status(sc);
+   if (!IFQ_IS_EMPTY(&ifp->if_snd))
+   ixgbe_start(ifp);
KERNEL_UNLOCK();
}
 
@@ -913,12 +917,6 @@ ixgbe_intr(void *arg)
}
}
 
-   if (was_active) {
-   KERNEL_LOCK();
-   ixgbe_start(ifp);
-   KERNEL_UNLOCK();
-   }
-
/* Check for fan failure */
if ((hw->device_id == IXGBE_DEV_ID_82598AT) &&
(reg_eicr & IXGBE_EICR_GPI_SDP1)) {
@@ -1061,17 +1059,9 @@ ixgbe_encap(struct tx_ring *txr, struct 
cmd_type_len |= IXGBE_ADVTXD_DCMD_VLE;
 #endif
 
-   /*
-* Force a cleanup if number of TX descriptors
-* available is below the threshold. If it fails
-* to get above, then abort transmit.
-*/
-   if (txr->tx_avail <= IXGBE_TX_CLEANUP_THRESHOLD) {
-   ixgbe_txeof(txr);
-   /* Make sure things have improved */
-   if (txr->tx_avail <= IXGBE_TX_OP_THRESHOLD)
-   return (ENOBUFS);
-   }
+   /* Check that we have least the minimal number of TX descriptors. */
+   if (txr->tx_avail <= IXGBE_TX_OP_THRESHOLD)
+   return (ENOBUFS);
 
/*
 * Important to capture the first descriptor
@@ -1135,8 +1125,6 @@ ixgbe_encap(struct tx_ring *txr, struct 
 
txd->read.cmd_type_len |=
htole32(IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS);
-   txr->tx_avail -= map->dm_nsegs;
-   txr->next_avail_desc = i;
 
txbuf->m_head = m_head;
/*
@@ -1154,6 +1142,11 @@ ixgbe_encap(struct tx_ring *txr, struct 
txbuf = &txr->tx_buffers[first];
txbuf->eop_index = last;
 
+   membar_producer();
+
+   atomic_sub_int(&txr->tx_avail, map->dm_nsegs);
+   txr->next_avail_desc = i;
+
++txr->tx_packets;
return (0);
 
@@ -1322,6 +1315,10 @@ ixgbe_stop(void *arg)
/* reprogram the RAR[0] in case user changed it. */
ixgbe_set_rar(&sc->hw, 0, sc->hw.mac.addr, 0, IXGBE_RAH_AV);
 
+   intr_barrier(sc->tag);
+
+   KASSERT((ifp->if_flags & IFF_RU

Re: npppd: simplify and lock down priv_open()

2015-12-04 Thread YASUOKA Masahiko
Hi,

On Sat, 10 Oct 2015 21:32:29 -0700
Philip Guenther  wrote:
> So:
>  * kill the 'mode' argument to PRIVSEP_OPEN and priv_open()
>  * fail a PRIVSEP_OPEN call if it's passed any flags other than 
>O_ACCMODE or O_NONBLOCK
>  * paranoia: mask O_CREAT when calling open() with only two arguments
>  * instead of using ioctl(FIONBIO) after the fact, pass O_NONBLOCK to 
>priv_open()

As for "open with O_NONBLOCK and ioctl(FIONBIO)", see the diff
following for example,

> Index: usr.sbin/npppd/npppd/npppd_iface.c
> ===
> RCS file: /cvs/src/usr.sbin/npppd/npppd/npppd_iface.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 npppd_iface.c
> --- usr.sbin/npppd/npppd/npppd_iface.c20 Jul 2015 18:58:30 -  
> 1.11
> +++ usr.sbin/npppd/npppd/npppd_iface.c11 Oct 2015 04:17:40 -
> @@ -283,15 +283,8 @@ npppd_iface_start(npppd_iface *_this)
>  
>   /* open device file */
>   snprintf(buf, sizeof(buf), "/dev/%s", _this->ifname);
> - if ((_this->devf = priv_open(buf, O_RDWR, 0600)) < 0) {
> + if ((_this->devf = priv_open(buf, O_RDWR | O_NONBLOCK)) < 0) {
>   npppd_iface_log(_this, LOG_ERR, "open(%s) failed: %m", buf);
> - goto fail;
> - }
> -
> - x = 1;
> - if (ioctl(_this->devf, FIONBIO, &x) != 0) {
> - npppd_iface_log(_this, LOG_ERR,
> - "ioctl(FIONBIO) failed in %s(): %m", __func__);
>   goto fail;
>   }
>  

At least on tun, bpf or pppx they behave differently.  Since
open(,O_NONBLOCK) doesn't call ioctl(FIONBIO) internally and those
device have/use their own flag.

How should we fix this?  Revert ioctl(FIONBIO), change the open(2)
behavior or the devices behavior?

--yasuoka



Re: libc: locale/rune.c input validation

2015-12-04 Thread Ingo Schwarze
Hi Tobias,

Tobias Stoeckmann wrote on Thu, Dec 03, 2015 at 11:08:28PM +0100:
> Ingo Schwarze wrote:

>> I would suggest to use uint32_t.

> Just while applying this, I noticed that the file has a mix of the
> types u_int32_t and uint32_t. I took u_int32_t for now because it was
> in there more often than uint32_t. What do you think?

uint32_t should be preferred because that's the POSIX type, while
u_int32_t is not standardized.  If you are working on the file
anyway, i'd recommend to unify all uses to uint32_t.

>> Tobias Stoeckmann wrote:
>>> +#ifndef __LP64__
>>> +   if (runetype_nranges > SIZE_MAX / sizeof(_RuneEntry) ||
>>> +   maplower_nranges > SIZE_MAX / sizeof(_RuneEntry) ||
>>> +   mapupper_nranges > SIZE_MAX / sizeof(_RuneEntry))
>>> +   return NULL;
>>> +#endif

>> I dislike using #ifdef, it seems error-prone to me.  The checks
>> are correct in any case.  Sure, if size_t is much larger than uint32_t,
>> these checks can never trigger.  But why do micro-optimization here?
>> Why not just delete the #ifndef/#endif?

> If I skip this __LP64__ check, it triggers a gcc warning on amd64,
> because the compiler tells me that the test is useless. So it's a
> cosmetical thing. It's removed for now.

Oh.  Frankly, i'm unsure which style is best then.  Can someone
with more experience chime in to explain whether it's better
practice to ignore or avoid that warning?

If people think a preprocessor test makes sense, i'd prefer to test
for the actual property in question rather than for a macro like
__LP64__ that is overly specific and not standardized.  Something
like this maybe:

  #if SIZE_MAX <= UINT32_MAX

> I still cast frl_variable_len to u_int32_t for ntohl, but will cast it
> back to int32_t for a "less than 0" check. This should properly handle
> error cases when it's been negative to begin with.

True.  Slightly ugly, but i don't see a better way either.

Either way, this is OK schwarze@.

Thanks again,
  Ingo


> Index: lib/libc/locale/rune.c
> ===
> RCS file: /cvs/src/lib/libc/locale/rune.c,v
> retrieving revision 1.4
> diff -u -p -u -p -r1.4 rune.c
> --- locale/rune.c 25 May 2014 17:47:04 -  1.4
> +++ locale/rune.c 3 Dec 2015 21:55:54 -
> @@ -59,32 +59,40 @@
>   * SUCH DAMAGE.
>   */
>  
> +#include 
> +#include 
>  #include 
> +#include 
> +#include 
>  #include 
> -#include 
>  #include 
> -#include 
> +#include 
>  #include 
> -#include 
> -#include 
>  #include "rune.h"
>  #include "rune_local.h"
>  
> -static int readrange(_RuneLocale *, _RuneRange *, _FileRuneRange *, void *, 
> FILE *);
> +#define SAFE_ADD(x, y)   \
> +do { \
> + if ((x) > SIZE_MAX - (y))   \
> + return NULL;\
> + (x) += (y); \
> +} while (0);
> +
> +static int readrange(_RuneLocale *, _RuneRange *, u_int32_t, void *, FILE *);
>  static void _freeentry(_RuneRange *);
>  static void _wctype_init(_RuneLocale *rl);
>  
>  static int
> -readrange(_RuneLocale *rl, _RuneRange *rr, _FileRuneRange *frr, void *lastp,
> +readrange(_RuneLocale *rl, _RuneRange *rr, u_int32_t nranges, void *lastp,
>   FILE *fp)
>  {
> - uint32_t i;
> + u_int32_t i;
>   _RuneEntry *re;
>   _FileRuneEntry fre;
>  
>   re = (_RuneEntry *)rl->rl_variable;
>  
> - rr->rr_nranges = ntohl(frr->frr_nranges);
> + rr->rr_nranges = nranges;
>   if (rr->rr_nranges == 0) {
>   rr->rr_rune_ranges = NULL;
>   return 0;
> @@ -92,6 +100,9 @@ readrange(_RuneLocale *rl, _RuneRange *r
>  
>   rr->rr_rune_ranges = re;
>   for (i = 0; i < rr->rr_nranges; i++) {
> + if ((void *)re >= lastp)
> + return -1;
> +
>   if (fread(&fre, sizeof(fre), 1, fp) != 1)
>   return -1;
>  
> @@ -99,9 +110,6 @@ readrange(_RuneLocale *rl, _RuneRange *r
>   re->re_max = ntohl((u_int32_t)fre.fre_max);
>   re->re_map = ntohl((u_int32_t)fre.fre_map);
>   re++;
> -
> - if ((void *)re > lastp)
> - return -1;
>   }
>   rl->rl_variable = re;
>   return 0;
> @@ -121,6 +129,9 @@ readentry(_RuneRange *rr, FILE *fp)
>   continue;
>   }
>  
> + if (re[i].re_max < re[i].re_min)
> + goto fail;
> +
>   l = re[i].re_max - re[i].re_min + 1;
>   re[i].re_rune_types = calloc(l, sizeof(_RuneType));
>   if (!re[i].re_rune_types) {
> @@ -151,17 +162,20 @@ fail2:
>  }
>  
>  /* XXX: temporary implementation */
> -static void
> +static int
>  find_codeset(_RuneLocale *rl)
>  {
>   char *top, *codeset, *tail, *ep;
>  
> + if (rl->rl_variable == NULL)
> + return 0;
> +
>   /* end of rl_variable region */
>   ep = (char *)rl->rl_variable;
>   ep += rl->rl_variable_len;
> 

Re: introducing ip_send()/ip6_send() to OpenBSD kernel

2015-12-04 Thread Martin Pieuchot
On 04/12/15(Fri) 20:45, David Gwynne wrote:
> > On 4 Dec 2015, at 7:36 PM, Martin Pieuchot  wrote:
> > On 04/12/15(Fri) 12:38, David Gwynne wrote:
> >>> On 4 Dec 2015, at 06:44, Alexandr Nedvedicky 
> >>>  wrote:
> >>  [...]
> >> it might be better to add the task to systq instead of &softnettq. if 
> >> you're always going to take the kernel lock to send packets then any 
> >> pending ip_input_process work will end implicitly waiting for the kernel 
> >> lock too. ideally only mpsafe work should be pushed into the softnettq.
> > 
> > I insisted to keep most of the packets processing in the same context.
> > 
> > I'd rather see efforts to remove those locks and go towards multiples
> > softnettq.
> > 
> > It this stage I'm more afraid of two ip_output() running in parallel
> > than the possible latency due to reporting an error.
> 
> the context you want is KERNEL_LOCKed at splsoftnet(). you're insisting on 
> forcing the softnettq to wait to assume that context and blocking work that 
> doesnt need it.

I don't understand.

With fast network cards the network stack is currently bound to the
amount of processing a CPU can do holding the kernel lock.

As soon as you remove the KERNEL_LOCK from the forwarding path you
remove it from this task too.

So where's the problem?



Re: bridge_output() and KERNEL_LOCK

2015-12-04 Thread Mark Kettenis
> Date: Fri, 4 Dec 2015 11:48:57 +0100
> From: Martin Pieuchot 
> 
> Now that if_start() will take the KERNEL_LOCK itself if the driver is
> not marked as IFXF_MPSAFE if_enqueue() is almost mpsafe. 
> 
> The missing piece is addressed by the diff below.  We want to ensure
> bridge_output() is called with the KERNEL_LOCK() held.
> 
> ok?

Absolutely.

> Index: net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.420
> diff -u -p -r1.420 if.c
> --- net/if.c  3 Dec 2015 16:27:32 -   1.420
> +++ net/if.c  4 Dec 2015 10:28:30 -
> @@ -626,8 +626,12 @@ if_enqueue(struct ifnet *ifp, struct mbu
>   unsigned short mflags;
>  
>  #if NBRIDGE > 0
> - if (ifp->if_bridgeport && (m->m_flags & M_PROTO1) == 0)
> - return (bridge_output(ifp, m, NULL, NULL));
> + if (ifp->if_bridgeport && (m->m_flags & M_PROTO1) == 0) {
> + KERNEL_LOCK();
> + error = bridge_output(ifp, m, NULL, NULL);
> + KERNEL_UNLOCK();
> + return (error);
> + }
>  #endif
>  
>   length = m->m_pkthdr.len;
> 
> 



Re: bridge_output() and KERNEL_LOCK

2015-12-04 Thread David Gwynne
ok

> On 4 Dec 2015, at 8:48 PM, Martin Pieuchot  wrote:
> 
> Now that if_start() will take the KERNEL_LOCK itself if the driver is
> not marked as IFXF_MPSAFE if_enqueue() is almost mpsafe. 
> 
> The missing piece is addressed by the diff below.  We want to ensure
> bridge_output() is called with the KERNEL_LOCK() held.
> 
> ok?
> 
> Index: net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.420
> diff -u -p -r1.420 if.c
> --- net/if.c  3 Dec 2015 16:27:32 -   1.420
> +++ net/if.c  4 Dec 2015 10:28:30 -
> @@ -626,8 +626,12 @@ if_enqueue(struct ifnet *ifp, struct mbu
>   unsigned short mflags;
> 
> #if NBRIDGE > 0
> - if (ifp->if_bridgeport && (m->m_flags & M_PROTO1) == 0)
> - return (bridge_output(ifp, m, NULL, NULL));
> + if (ifp->if_bridgeport && (m->m_flags & M_PROTO1) == 0) {
> + KERNEL_LOCK();
> + error = bridge_output(ifp, m, NULL, NULL);
> + KERNEL_UNLOCK();
> + return (error);
> + }
> #endif
> 
>   length = m->m_pkthdr.len;
> 



ARP input path without KERNEL_LOCK

2015-12-04 Thread Martin Pieuchot
Now that in_arpinput() only uses the routing table, if_get()/if_put()
and carp_iamatch being already mpsafe we can kill the ARP input queue.

This moves the ARP input path processing from the softnet interrupt
context (under KERNEL_LOCK) to the sofnettask (without KERNEL_LOCK).

ok?

Index: net/if_ethersubr.c
===
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.231
diff -u -p -r1.231 if_ethersubr.c
--- net/if_ethersubr.c  2 Dec 2015 08:47:00 -   1.231
+++ net/if_ethersubr.c  4 Dec 2015 10:42:14 -
@@ -372,14 +372,14 @@ decapsulate:
case ETHERTYPE_ARP:
if (ifp->if_flags & IFF_NOARP)
goto dropanyway;
-   inq = &arpintrq;
-   break;
+   arpinput(m);
+   return (1);
 
case ETHERTYPE_REVARP:
if (ifp->if_flags & IFF_NOARP)
goto dropanyway;
-   inq = &rarpintrq;
-   break;
+   revarpinput(m);
+   return (1);
 
 #ifdef INET6
/*
Index: net/netisr.c
===
RCS file: /cvs/src/sys/net/netisr.c,v
retrieving revision 1.8
diff -u -p -r1.8 netisr.c
--- net/netisr.c3 Dec 2015 12:22:51 -   1.8
+++ net/netisr.c4 Dec 2015 10:43:35 -
@@ -20,7 +20,6 @@
 
 #include 
 
-#include "ether.h"
 #include "ppp.h"
 #include "bridge.h"
 #include "pppoe.h"
@@ -39,10 +38,6 @@ netintr(void *unused) /* ARGSUSED */
while ((n = netisr) != 0) {
atomic_clearbits_int(&netisr, n);
 
-#if NETHER > 0
-   if (n & (1 << NETISR_ARP))
-   arpintr();
-#endif
if (n & (1 << NETISR_IP))
ipintr();
 #ifdef INET6
Index: net/netisr.h
===
RCS file: /cvs/src/sys/net/netisr.h,v
retrieving revision 1.43
diff -u -p -r1.43 netisr.h
--- net/netisr.h3 Dec 2015 12:27:33 -   1.43
+++ net/netisr.h4 Dec 2015 10:44:01 -
@@ -53,7 +53,6 @@
 #defineNETISR_IP   2   /* same as AF_INET */
 #defineNETISR_TX   3   /* for if_snd processing */
 #defineNETISR_PFSYNC   5   /* for pfsync "immediate" tx */
-#defineNETISR_ARP  18  /* same as AF_LINK */
 #defineNETISR_IPV6 24  /* same as AF_INET6 */
 #defineNETISR_ISDN 26  /* same as AF_E164 */
 #defineNETISR_PPP  28  /* for PPP processing */
@@ -64,7 +63,6 @@
 #ifdef _KERNEL
 extern int netisr; /* scheduling bits for network */
 
-void   arpintr(void);
 void   ipintr(void);
 void   ip6intr(void);
 void   pppintr(void);
Index: netinet/if_ether.c
===
RCS file: /cvs/src/sys/netinet/if_ether.c,v
retrieving revision 1.197
diff -u -p -r1.197 if_ether.c
--- netinet/if_ether.c  2 Dec 2015 22:02:18 -   1.197
+++ netinet/if_ether.c  4 Dec 2015 10:44:17 -
@@ -88,14 +88,10 @@ void arptfree(struct rtentry *);
 void arptimer(void *);
 struct rtentry *arplookup(u_int32_t, int, int, u_int);
 void in_arpinput(struct mbuf *);
-void revarpinput(struct mbuf *);
 void in_revarpinput(struct mbuf *);
 
 LIST_HEAD(, llinfo_arp) arp_list;
 struct pool arp_pool;  /* pool for llinfo_arp structures */
-/* XXX hate magic numbers */
-struct niqueue arpintrq = NIQUEUE_INITIALIZER(50, NETISR_ARP);
-struct niqueue rarpintrq = NIQUEUE_INITIALIZER(50, NETISR_ARP);
 intarp_inuse, arp_allocated;
 intarp_maxtries = 5;
 intarpinit_done;
@@ -426,43 +422,40 @@ bad:
  * then the protocol-specific routine is called.
  */
 void
-arpintr(void)
+arpinput(struct mbuf *m)
 {
-   struct mbuf *m;
struct arphdr *ar;
int len;
 
-   while ((m = niq_dequeue(&arpintrq)) != NULL) {
 #ifdef DIAGNOSTIC
-   if ((m->m_flags & M_PKTHDR) == 0)
-   panic("arpintr");
+   if ((m->m_flags & M_PKTHDR) == 0)
+   panic("%s", __func__);
 #endif
 
-   len = sizeof(struct arphdr);
-   if (m->m_len < len && (m = m_pullup(m, len)) == NULL)
-   continue;
-
-   ar = mtod(m, struct arphdr *);
-   if (ntohs(ar->ar_hrd) != ARPHRD_ETHER) {
-   m_freem(m);
-   continue;
-   }
+   len = sizeof(struct arphdr);
+   if (m->m_len < len && (m = m_pullup(m, len)) == NULL)
+   return;
 
-   len += 2 * (ar->ar_hln + ar->ar_pln);
-   if (m->m_len < len && (m = m_pullup(m, len)) == NULL)
-   continue;
-
-   switch (ntohs(ar->ar_pro)) {
-   case ETHERTYPE_IP:
-   case ETHERTYPE_IPTRAILERS:
- 

Re: introducing ip_send()/ip6_send() to OpenBSD kernel

2015-12-04 Thread David Gwynne

> On 4 Dec 2015, at 7:36 PM, Martin Pieuchot  wrote:
> 
> On 04/12/15(Fri) 12:38, David Gwynne wrote:
>> 
>>> On 4 Dec 2015, at 06:44, Alexandr Nedvedicky 
>>>  wrote:
>>> 
>>> Hello,
>>> 
>>> below is final patch I'm going to commit. Summary of changes:
>>> - softnettq declaration moved to net/if_var.h (by bluhm@)
>>> - lock order swapped: KERNEL_LOCK() goes first folllowed
>>>   by spl (by bluhm@)
>>> - long line got fixed (by bluhm@)
>>> - ip_insertoptions() prototype deleted in ip_output.c (by bluhm@)
>>> - avoiding mix of tab/space at netinet6/ip6_var.h (by bluhm@)
>>> - static at ip*_send_dispatch() got removed (by mpi@)
>>> - ip*_send_dispatch() functions look more like if_input_process()
>>>   now (by mpi@)
>> 
>> it might be better to add the task to systq instead of &softnettq. if you're 
>> always going to take the kernel lock to send packets then any pending 
>> ip_input_process work will end implicitly waiting for the kernel lock too. 
>> ideally only mpsafe work should be pushed into the softnettq.
> 
> I insisted to keep most of the packets processing in the same context.
> 
> I'd rather see efforts to remove those locks and go towards multiples
> softnettq.
> 
> It this stage I'm more afraid of two ip_output() running in parallel
> than the possible latency due to reporting an error.

the context you want is KERNEL_LOCKed at splsoftnet(). you're insisting on 
forcing the softnettq to wait to assume that context and blocking work that 
doesnt need it.


bridge_output() and KERNEL_LOCK

2015-12-04 Thread Martin Pieuchot
Now that if_start() will take the KERNEL_LOCK itself if the driver is
not marked as IFXF_MPSAFE if_enqueue() is almost mpsafe. 

The missing piece is addressed by the diff below.  We want to ensure
bridge_output() is called with the KERNEL_LOCK() held.

ok?

Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.420
diff -u -p -r1.420 if.c
--- net/if.c3 Dec 2015 16:27:32 -   1.420
+++ net/if.c4 Dec 2015 10:28:30 -
@@ -626,8 +626,12 @@ if_enqueue(struct ifnet *ifp, struct mbu
unsigned short mflags;
 
 #if NBRIDGE > 0
-   if (ifp->if_bridgeport && (m->m_flags & M_PROTO1) == 0)
-   return (bridge_output(ifp, m, NULL, NULL));
+   if (ifp->if_bridgeport && (m->m_flags & M_PROTO1) == 0) {
+   KERNEL_LOCK();
+   error = bridge_output(ifp, m, NULL, NULL);
+   KERNEL_UNLOCK();
+   return (error);
+   }
 #endif
 
length = m->m_pkthdr.len;



Re: relayd patch - delayed failover

2015-12-04 Thread Brian S. Vangsgaard


i believe i committed the correct one, i just replied to the wrong mail 
here

on the list. Here is what i put in:

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/relayd/pfe.c.diff?r1=1.82&r2=1.83&sortby=date

/Benno



Correct, thank you.



Sebastian Benoit skrev den 2015-12-03 17:43:
>thanks, commited
>
>Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200:
>>Hi,
>>
>>Problem:
>>If a client have a state entry in the relayd anchor, and the target
>>server goes down, the client will be unable to "failover" for 10 sec +
>>(10 sec - elapsed time since last SLA check).
>>
>>There are two issues here, this patch only fix the problem about
>>delayed
>>(10 seconds) failover.
>>
>>When the host fails the SLA check, it will be marked as being down.
>>However it will not be removed from the achor before the next SLA
>>check.
>>
>>Reproduce:
>>Start relayd with -dvvv, let it run for 10-20 seconds, then make a
>>host
>>fail its SLA check. Relayd will mark the host as being down when it
>>reach next SLA check, but the sync_table() will not be called until 10
>>sec. later (at the next SLA check).
>>
>>Solution:
>>The logic is already in the code, but right now it only handle the
>>statistics and set the host as being down.
>>
>>Call sync_table() when a host goes from UP to DOWN.
>>
>>
>>Index: pfe.c
>>===
>>RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
>>retrieving revision 1.79.2.1
>>diff -u -p -u -p -r1.79.2.1 pfe.c
>>--- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
>>+++ pfe.c   1 Oct 2015 10:48:59 -
>>@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
>>table->conf.flags |= F_CHANGED;
>>host->flags |= F_DEL;
>>host->flags &= ~(F_ADD);
>>+   pfe_sync();
>>}
>>
>>host->up = st.up;
>>
>>
>>If you need more details or want to fix the scheduler issue, please
>>contact me :)
>>
>>
>>--
>>bsv
>>





Re: introducing ip_send()/ip6_send() to OpenBSD kernel

2015-12-04 Thread Martin Pieuchot
On 04/12/15(Fri) 12:38, David Gwynne wrote:
> 
> > On 4 Dec 2015, at 06:44, Alexandr Nedvedicky 
> >  wrote:
> > 
> > Hello,
> > 
> > below is final patch I'm going to commit. Summary of changes:
> > - softnettq declaration moved to net/if_var.h (by bluhm@)
> > - lock order swapped: KERNEL_LOCK() goes first folllowed
> >   by spl (by bluhm@)
> > - long line got fixed (by bluhm@)
> > - ip_insertoptions() prototype deleted in ip_output.c (by bluhm@)
> > - avoiding mix of tab/space at netinet6/ip6_var.h (by bluhm@)
> > - static at ip*_send_dispatch() got removed (by mpi@)
> > - ip*_send_dispatch() functions look more like if_input_process()
> >   now (by mpi@)
> 
> it might be better to add the task to systq instead of &softnettq. if you're 
> always going to take the kernel lock to send packets then any pending 
> ip_input_process work will end implicitly waiting for the kernel lock too. 
> ideally only mpsafe work should be pushed into the softnettq.

I insisted to keep most of the packets processing in the same context.

I'd rather see efforts to remove those locks and go towards multiples
softnettq.

It this stage I'm more afraid of two ip_output() running in parallel
than the possible latency due to reporting an error.



Re: relayd patch - delayed failover

2015-12-04 Thread Sebastian Benoit
Brian S. Vangsgaard(b...@avalanic.dk) on 2015.12.04 09:04:19 +0100:
> Hi Sebastian
> 
> You commited the wrong patch.
> 
> Please see http://marc.info/?l=openbsd-tech&m=144378086813524&w=2
> 
> The patch below, results in a relayd panic if more than one host is
> available in the group.

i believe i committed the correct one, i just replied to the wrong mail here
on the list. Here is what i put in:

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/relayd/pfe.c.diff?r1=1.82&r2=1.83&sortby=date

/Benno

> Sebastian Benoit skrev den 2015-12-03 17:43:
> >thanks, commited
> >
> >Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200:
> >>Hi,
> >>
> >>Problem:
> >>If a client have a state entry in the relayd anchor, and the target
> >>server goes down, the client will be unable to "failover" for 10 sec +
> >>(10 sec - elapsed time since last SLA check).
> >>
> >>There are two issues here, this patch only fix the problem about 
> >>delayed
> >>(10 seconds) failover.
> >>
> >>When the host fails the SLA check, it will be marked as being down.
> >>However it will not be removed from the achor before the next SLA 
> >>check.
> >>
> >>Reproduce:
> >>Start relayd with -dvvv, let it run for 10-20 seconds, then make a 
> >>host
> >>fail its SLA check. Relayd will mark the host as being down when it
> >>reach next SLA check, but the sync_table() will not be called until 10
> >>sec. later (at the next SLA check).
> >>
> >>Solution:
> >>The logic is already in the code, but right now it only handle the
> >>statistics and set the host as being down.
> >>
> >>Call sync_table() when a host goes from UP to DOWN.
> >>
> >>
> >>Index: pfe.c
> >>===
> >>RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
> >>retrieving revision 1.79.2.1
> >>diff -u -p -u -p -r1.79.2.1 pfe.c
> >>--- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
> >>+++ pfe.c   1 Oct 2015 10:48:59 -
> >>@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
> >>table->conf.flags |= F_CHANGED;
> >>host->flags |= F_DEL;
> >>host->flags &= ~(F_ADD);
> >>+   pfe_sync();
> >>}
> >>
> >>host->up = st.up;
> >>
> >>
> >>If you need more details or want to fix the scheduler issue, please
> >>contact me :)
> >>
> >>
> >>--
> >>bsv
> >>
> 

-- 



Re: relayd patch - delayed failover

2015-12-04 Thread Brian S. Vangsgaard

Hi Sebastian

You commited the wrong patch.

Please see http://marc.info/?l=openbsd-tech&m=144378086813524&w=2

The patch below, results in a relayd panic if more than one host is
available in the group.

Sebastian Benoit skrev den 2015-12-03 17:43:

thanks, commited

Brian S. Vangsgaard(b...@avalanic.dk) on 2015.10.01 13:27:12 +0200:

Hi,

Problem:
If a client have a state entry in the relayd anchor, and the target
server goes down, the client will be unable to "failover" for 10 sec +
(10 sec - elapsed time since last SLA check).

There are two issues here, this patch only fix the problem about 
delayed

(10 seconds) failover.

When the host fails the SLA check, it will be marked as being down.
However it will not be removed from the achor before the next SLA 
check.


Reproduce:
Start relayd with -dvvv, let it run for 10-20 seconds, then make a 
host

fail its SLA check. Relayd will mark the host as being down when it
reach next SLA check, but the sync_table() will not be called until 10
sec. later (at the next SLA check).

Solution:
The logic is already in the code, but right now it only handle the
statistics and set the host as being down.

Call sync_table() when a host goes from UP to DOWN.


Index: pfe.c
===
RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
retrieving revision 1.79.2.1
diff -u -p -u -p -r1.79.2.1 pfe.c
--- pfe.c   20 Sep 2015 11:20:16 -  1.79.2.1
+++ pfe.c   1 Oct 2015 10:48:59 -
@@ -152,6 +152,7 @@ pfe_dispatch_hce(int fd, struct privsep_
table->conf.flags |= F_CHANGED;
host->flags |= F_DEL;
host->flags &= ~(F_ADD);
+   pfe_sync();
}

host->up = st.up;


If you need more details or want to fix the scheduler issue, please
contact me :)


--
bsv