Re: FIFO fd not marked readable after EOF

2014-10-06 Thread Matthew Dempsky
On Mon, Oct 6, 2014 at 5:08 AM, Dimitris Papastamos s...@2f30.org wrote:
 I am using select(2) on a FIFO fd and monitoring
 for readability.  select(2) doesn't return after the writer
 exits.

 The same piece of code marks the fd as readable on Linux.
 Not sure which behaviour is correct though.

The Linux behavior is correct:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html:
A descriptor shall be considered ready for reading when a call to an
input function with O_NONBLOCK clear would not block, whether or not
the function would transfer data successfully. (The function might
return data, an end-of-file indication, or an error other than one
indicating that it is blocked, and in each of these cases the
descriptor shall be considered ready for reading.)

http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html:
When attempting to read from an empty pipe or FIFO: [...] If no
process has the pipe open for writing, read() shall return 0 to
indicate end-of-file.



[patch] dhclient: support for RFC 3442 Local Subnet Routes

2014-09-30 Thread Matthew Dempsky
From RFC 3442:

   Local Subnet Routes

   In some cases more than one IP subnet may be configured on a link.
   In such cases, a host whose IP address is in one IP subnet in the
   link could communicate directly with a host whose IP address is in a
   different IP subnet on the same link.  In cases where a client is
   being assigned an IP address on an IP subnet on such a link, for each
   IP subnet in the link other than the IP subnet on which the client
   has been assigned the DHCP server MAY be configured to specify a
   router IP address of 0.0.0.0.

   For example, consider the case where there are three IP subnets
   configured on a link: 10.0.0/24, 192.168.0/24, 10.0.21/24.  If the
   client is assigned an IP address of 10.0.21.17, then the server could
   include a route with a destination of 10.0.0/24 and a router address
   of 0.0.0.0, and also a route with a destination of 192.168.0/24 and a
   router address of 0.0.0.0.

   A DHCP client whose underlying TCP/IP stack does not provide this
   capability MUST ignore routes in the Classless Static Routes option
   whose router IP address is 0.0.0.0.  Please note that the behavior
   described here only applies to the Classless Static Routes option,
   not to the Static Routes option nor the Router option.

The patch below adds support for these routes to dhclient.  It
refactors the existing magic direct-route incantation I added in
r1.272, and reuses it in add_classless_static-routes().

I've experimentally verified that this makes OpenBSD work on Google
Compute Engine again.  E.g., a DHCP lease with:

fixed-address 10.240.142.165;
option subnet-mask 255.255.255.255;
option classless-static-routes 10.240.0.1/32 0.0.0.0, 0.0.0.0/0 10.240.0.1;

now results in a routing table like:

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio 
Iface
default10.240.0.1 UGS00 - 8 vio0
10.240.0.1 link#1 UHLc   10 - 8 vio0
10.240.0.1/32  link#1 UCS10 - 8 vio0
10.240.142.165 42:01:0a:f0:8e:a5  UHLl   00 - 1 lo0
10.240.142.165/32  link#1 UC 00 - 4 vio0
127/8  127.0.0.1  UGRS   00 32768 8 lo0
127.0.0.1  127.0.0.1  UH 10 32768 4 lo0
224/4  127.0.0.1  URS00 32768 8 lo0

ok?

Index: dhclient.c
===
RCS file: /cvs/src/sbin/dhclient/dhclient.c,v
retrieving revision 1.319
diff -u -p -r1.319 dhclient.c
--- dhclient.c  11 Aug 2014 18:41:13 -  1.319
+++ dhclient.c  1 Oct 2014 00:47:22 -
@@ -108,9 +108,10 @@ struct client_lease *clone_lease(struct 
 voidsocket_nonblockmode(int);
 voidapply_ignore_list(char *);
 
+void add_direct_route(int, struct in_addr, struct in_addr, struct in_addr);
 void add_default_route(int, struct in_addr, struct in_addr);
 void add_static_routes(int, struct option_data *);
-void add_classless_static_routes(int, struct option_data *);
+void add_classless_static_routes(int, struct option_data *, struct in_addr);
 
 int compare_lease(struct client_lease *, struct client_lease *);
 void set_lease_times(struct client_lease *);
@@ -872,10 +873,12 @@ bind_lease(void)
add_address(ifi-name, ifi-rdomain, client-active-address, mask);
if (options[DHO_CLASSLESS_STATIC_ROUTES].len) {
add_classless_static_routes(ifi-rdomain,
-   options[DHO_CLASSLESS_STATIC_ROUTES]);
+   options[DHO_CLASSLESS_STATIC_ROUTES],
+   client-active-address);
} else if (options[DHO_CLASSLESS_MS_STATIC_ROUTES].len) {
add_classless_static_routes(ifi-rdomain,
-   options[DHO_CLASSLESS_MS_STATIC_ROUTES]);
+   options[DHO_CLASSLESS_MS_STATIC_ROUTES],
+   client-active-address);
} else {
opt = options[DHO_ROUTERS];
if (opt-len = sizeof(gateway)) {
@@ -883,17 +886,13 @@ bind_lease(void)
gateway.s_addr = ((struct in_addr *)opt-data)-s_addr;
 
/*
-* If we were given a /32 IP assignment, then make sure
-* the gateway address is routable with equivalent of
-*
-* route add -net $gw -netmask 255.255.255.255 \
-* -cloning -iface $addr
+* To be compatible with ISC DHCP behavior on Linux, if
+* we were given a /32 IP assignment, then add a /32
+* direct route for the gateway to make it routable.
 

Enable DWARF line decoding in ddb

2014-09-29 Thread Matthew Dempsky
Patch below enables DWARF line decoding in ddb; e.g.,

  ddb{0} trace
  Debugger() at Debugger+0x9 [../../../../arch/amd64/amd64/db_interface.c:405]
  ddb_sysctl() at ddb_sysctl+0x1b4 [../../../../ddb/db_usrreq.c:104]
  sys___sysctl() at sys___sysctl+0x216 [../../../../kern/kern_sysctl.c:229]
  syscall() at syscall+0x297 [../../../../sys/syscall_mi.h:84]
  --- syscall (number 202) ---
  end of kernel
  end trace frame: 0x7f7cf1d7, count: -4
  acpi_pdirpa+0x4117aa:

Full set of instructions to apply and test (assuming amd64 MP):

  # Apply patch
  cd /usr/src/sys
  patch  /path/to/diff

  # Enable DEBUG=-g
  $EDITOR conf/GENERIC
  # uncomment ``makeoptions DEBUG=-g'' line and save

  # Build new kernel
  cd arch/amd64/conf
  config GENERIC.MP
  cd ../compile/GENERIC.MP
  make clean
  make

  # Build new boot(8)
  cd ../../stand/boot
  make install

  # Install new boot(8)
  cp /usr/mdec/boot /boot
  installboot -v /boot /usr/mdec/biosboot sd0

Now reboot and boot the nbsd kernel.  Stack traces in ddb should
include file and line numbers.

For file/line numbers to appear, all of the following are necessary:

  1. New boot(8)
  2. New kernel
  3. Kernel built with DEBUG=-g
  4. Boot bsd.gdb instead of bsd

Under any other conditions, ddb should gracefully fall back to bare
stack traces.  I've tested most combinations, but additional testing
is appreciated.


Implementation details:

  - boot(8) now loads the ELF header string table so it can identify
section names. It also now loads the .debug_line section into
memory, and marks loaded sections with SHF_ALLOC.

  - ddb when generating stack traces now checks if there's a section
named .debug_line that's marked SHF_ALLOC; and if so, ddb parses
the section contents to translate program counter values into
file/line pairs.

  - New boot(8) should be backwards compatible with old kernels, because
they won't care if .debug_line is loaded; new bsd.gdb should be
compatible with old boot(8) because .debug_line isn't normally
marked SHF_ALLOC.

Index: conf/files
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/conf/files,v
retrieving revision 1.577
diff -u -p -r1.577 files
--- conf/files  8 Sep 2014 04:40:30 -   1.577
+++ conf/files  29 Sep 2014 02:46:16 -
@@ -611,6 +611,7 @@ filenet/if_pppoe.c  pppoe   
needs-flag
 file ddb/db_access.c   ddb | kgdb
 file ddb/db_break.cddb
 file ddb/db_command.c  ddb
+file ddb/db_dwarf.cddb
 file ddb/db_elf.c  ddb
 file ddb/db_examine.c  ddb
 file ddb/db_expr.c ddb
Index: ddb/db_elf.c
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/ddb/db_elf.c,v
retrieving revision 1.11
diff -u -p -r1.11 db_elf.c
--- ddb/db_elf.c14 Sep 2014 14:17:24 -  1.11
+++ ddb/db_elf.c29 Sep 2014 02:46:16 -
@@ -32,12 +32,14 @@
  */
 
 #include sys/types.h
+#include sys/stdint.h
 #include sys/param.h
 #include sys/systm.h  
 #include sys/exec.h
 
 #include machine/db_machdep.h
 
+#include ddb/db_dwarf.h
 #include ddb/db_sym.h
 #include ddb/db_output.h
 #include ddb/db_extern.h
@@ -45,6 +47,7 @@
 #include sys/exec_elf.h
 
 static char *db_elf_find_strtab(db_symtab_t *);
+static char *db_elf_find_linetab(db_symtab_t *, size_t *);
 
 #defineSTAB_TO_SYMSTART(stab)  ((Elf_Sym *)((stab)-start))
 #defineSTAB_TO_SYMEND(stab)((Elf_Sym *)((stab)-end))
@@ -110,10 +113,10 @@ db_elf_sym_init(int symsize, void *symta
 *  . . .
 *  . . .
 *  last section header
-*  first symbol or string table section
+*  first symbol, string, or line table section
 *  . . .
 *  . . .
-*  last symbol or string table section
+*  last symbol, string, or line table section
 */
 
/*
@@ -232,6 +235,30 @@ db_elf_find_strtab(db_symtab_t *stab)
 }
 
 /*
+ * Internal helper function - return a pointer to the line table
+ * for the current symbol table.
+ */
+static char *
+db_elf_find_linetab(db_symtab_t *stab, size_t *size)
+{
+   Elf_Ehdr *elf = STAB_TO_EHDR(stab);
+   Elf_Shdr *shp = STAB_TO_SHDR(stab, elf);
+   char *shstrtab;
+   int i;
+
+   shstrtab = (char *)elf + shp[elf-e_shstrndx].sh_offset;
+   for (i = 0; i  elf-e_shnum; i++) {
+   if ((shp[i].sh_flags  SHF_ALLOC) != 0 
+   strcmp(.debug_line, shstrtab+shp[i].sh_name) == 0) {
+   *size = shp[i].sh_size;
+   return ((char *)elf + shp[i].sh_offset);
+   }
+   }
+
+   return (NULL);
+}
+
+/*
  * Lookup the symbol with the given name.
  */
 db_sym_t
@@ -350,11 +377,25 @@ boolean_t
 db_elf_line_at_pc(db_symtab_t *symtab, db_sym_t cursym, char 

Re: lsearch.c

2014-07-17 Thread Matthew Dempsky
Hm, is there a practical consequence of this?  Seems like it would
only affect applications trying to store a reference to lsearch in a
function pointer of type void (*)(const void *, void *, size_t *,
size_t, int (*)(const void *, const void *)); do those exist?

On Thu, Jul 17, 2014 at 5:38 PM, enh e...@google.com wrote:
 POSIX says lsearch's 'base' is void*, not const void*.

 http://pubs.opengroup.org/onlinepubs/9699919799/functions/lsearch.html

 openbsd gets this wrong in search.h and lsearch.c. (but lfind is
 correct --- that should be const void*.)

  -e





Re: lsearch.c

2014-07-17 Thread Matthew Dempsky
Hrm.  It seems silly to me to change it to require a non-const pointer
argument, but POSIX, Linux, Solaris, FreeBSD, and NetBSD all use void
*base so it doesn't seem like we should have any ports tree issues
from changing it to be compatible either.

On Thu, Jul 17, 2014 at 6:04 PM, enh e...@google.com wrote:
 For me (Android) the practical consequence is compiler warnings unless I
 change search.h to be wrong.

 I'm going through the non-Open BSD stuff I have working out why. In this
 case, this seems to be the reason we have the NetBSD search.h
 implementation. The majority of our BSD code is OpenBSD so my preference
 would be to switch over as much as possible.

 On Jul 17, 2014 5:59 PM, Matthew Dempsky matt...@dempsky.org wrote:

 Hm, is there a practical consequence of this?  Seems like it would
 only affect applications trying to store a reference to lsearch in a
 function pointer of type void (*)(const void *, void *, size_t *,
 size_t, int (*)(const void *, const void *)); do those exist?

 On Thu, Jul 17, 2014 at 5:38 PM, enh e...@google.com wrote:
  POSIX says lsearch's 'base' is void*, not const void*.
 
  http://pubs.opengroup.org/onlinepubs/9699919799/functions/lsearch.html
 
  openbsd gets this wrong in search.h and lsearch.c. (but lfind is
  correct --- that should be const void*.)
 
   -e
 



Re: lsearch.c

2014-07-17 Thread Matthew Dempsky
On Thu, Jul 17, 2014 at 8:05 PM, patrick keshishian pkesh...@gmail.com wrote:
 Silly even though, the description of lsearch says it will modify
 (it shall be added at the end of) the table, for which base
 argument points to the first element?

Ah, I didn't pay close attention to the definition and assumed it was
simply a search function like memchr() or strstr().  If it might
modify memory, then yes, making it non-const absolutely makes sense.



Possible bug in cpu_chooseproc?

2014-07-13 Thread Matthew Dempsky
As the name suggests, remrunqueue(p) removes p from its run queue, and
I believe that makes TAILQ_FOREACH() here unsafe.  Instead of actually
removing all threads from the processor, we'll only remove the first
from each of its run queues.

Diff below replaces TAILQ_FOREACH with the safe/idiomatic pattern for
draining a queue.

ok?

(This hasn't bit me and I don't know any practical consequences of it.
Just spotted it while tracking down a bug in a kern_synch.c diff I'm
working on.)

Index: kern_sched.c
===
RCS file: /cvs/src/sys/kern/kern_sched.c,v
retrieving revision 1.32
diff -u -p -r1.32 kern_sched.c
--- kern_sched.c4 May 2014 05:03:26 -   1.32
+++ kern_sched.c13 Jul 2014 20:18:38 -
@@ -272,7 +272,7 @@ sched_chooseproc(void)
if (spc-spc_schedflags  SPCF_SHOULDHALT) {
if (spc-spc_whichqs) {
for (queue = 0; queue  SCHED_NQS; queue++) {
-   TAILQ_FOREACH(p, spc-spc_qs[queue], p_runq) {
+   while ((p = TAILQ_FIRST(spc-spc_qs[queue]))) {
remrunqueue(p);
p-p_cpu = sched_choosecpu(p);
setrunqueue(p);



Re: CVS: cvs.openbsd.org: src

2014-07-11 Thread Matthew Dempsky
On Fri, Jul 11, 2014 at 3:41 PM, Bob Beck b...@obtuse.com wrote:
 The OPENSSL_VERSION number is a guarantee for a certain version of the
 ABI. As we dont' provide that (in fact much
 of the ABI in LIbreSSL is beyond 1.0.1g, it is not accurate to use
 the old OPENSSL_VERSION. Essnentially this OPENSSL_VERSION
 is bigger than 1.0.1g's.

By that argument, we won't be ABI compatible with OpenSSL 2.0 either,
so we shouldn't provide OPENSSL_VERSION at all.

My 2c is for keeping OPENSSL_VERSION_NUMBER as the most recent OpenSSL
version that we're *mostly* API/feature compatible with, and using
LIBRESSL_VERSION_NUMBER to identify the exact LibreSSL version.  By
polluting the OPENSSL_VERSION_NUMBER namespace we just make things
more difficult for downstream users that want to be compatible with
both OpenSSL and LibreSSL.

E.g., to check for a feature that was added in OpenSSL 1.2 but isn't
present in LibreSSL, that code now needs to be

#if OPENSSL_VERSION_NUMBER = 1.2  !defined(LIBRESSL_VERSION_NUMBER)

rather than simply

#if OPENSSL_VERSION_NUMBER = 1.2

Breaking the latter just seems like making it more difficult to get
people to port their software from OpenSSL to LibreSSL.



Re: CVS: cvs.openbsd.org: src

2014-07-11 Thread Matthew Dempsky
On Fri, Jul 11, 2014 at 4:37 PM, Bob Beck b...@obtuse.com wrote:
 The fundamental probelm with this Matthew - is that next time, if we
 do this, by the next release we will
 be chasing what features we have imported from 1.0.2g  and 10.2.z, and
 1.0.2.qq - where does it end?

It ends whenever it stops helping portability for apps that are
currently written for OpenSSL.  We've expressly decided to ignore any
API/ABI compatibility guarantees with OpenSSL, so an OpenSSL version
number is inherently just a best effort to make things easier on
applications to transition from OpenSSL to LibreSSL.

Clang went through this same process with code that did GCC version
checks.  Today Clang still claims it's GCC 4.2, but in a separate
version it reveals it's Clang 3.5.

Existing code that only knows to check for older versions of GCC
(e.g., OpenBSD's sys/cdefs.h) continues work just fine with Clang,
because it picks up all of the definitions targeted towards GCC 4.2.
New code that wants to make use of features in GCC 4.7 and Clang 3.5
though needs to check for both; but even if it doesn't, if it includes
fallback for older versions of GCC it should still work okay with
Clang.

Concrete analogy: suppose LibreSSL 2.1 and OpenSSL 1.1 both add some
new feature, and an application that wants to be compatible with both
wants to make use of that feature.  How do they version check for its
availability?

Naively, it would be

#if LibreSSL = 2.1 || OpenSSL = 1.1

but that's going to cause the application to break when compiled with
older versions of LibreSSL.  It would actually needs to be

#if LibreSSL = 2.1 || (!defined(LibreSSL)  OpenSSL = 1.1)

We don't gain anything by making people need to write the latter, IMO.



Re: ddb: ELF line decoding in stack traces

2014-07-10 Thread Matthew Dempsky
On Mon, Jun 23, 2014 at 06:46:27AM -0700, Matthew Dempsky wrote:
 I did this mostly for fun / personal enlightenment, so I haven't put
 too much effort into polishing it yet.

I got some positive feedback about the direction of this diff, so
tonight I finally got around to polishing it some more.

In particular, db_dwarf.c now includes a stand-alone addr2line clone
that can be compiled separately with cc -o ddb-addr2line db_dwarf.c,
and the output should match addr2line(1)'s.  I've extensively tested
it against amd64's bsd.gdb, but I'd love testing feedback on
big-endian and LP32 platforms.

Unless anyone objects, I plan to commit the db_dwarf.{c,h} additions
tomorrow (i.e., not wired into the build) so they can be
developed/tested further in tree.  The other bits I'll split out and
post for separate review as appropriate.  (I at least need to first
ensure new bsd.gdb kernels will gracefully handle being loaded by old
boot loaders.)


Index: ddb/db_dwarf.h
===
RCS file: ddb/db_dwarf.h
diff -N ddb/db_dwarf.h
--- /dev/null   1 Jan 1970 00:00:00 -
+++ ddb/db_dwarf.h  10 Jul 2014 07:14:26 -
@@ -0,0 +1,24 @@
+/* $OpenBSD$*/
+/*
+ * Copyright (c) 2014 Matthew Dempsky matt...@dempsky.org
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _DDB_DB_DWARF_H_
+#define _DDB_DB_DWARF_H_
+
+bool db_dwarf_line_at_pc(const char *, size_t, uintptr_t,
+const char **, const char **, int *);
+
+#endif /* _DDB_DB_DWARF_H_ */
Index: ddb/db_dwarf.c
===
RCS file: ddb/db_dwarf.c
diff -N ddb/db_dwarf.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ ddb/db_dwarf.c  10 Jul 2014 07:49:07 -
@@ -0,0 +1,550 @@
+/* $OpenBSD$*/
+/*
+ * Copyright (c) 2014 Matthew Dempsky matt...@dempsky.org
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifdef _KERNEL
+#include sys/types.h
+#include sys/stdint.h
+#include sys/param.h
+#include sys/systm.h  
+#include ddb/db_dwarf.h
+#ifdef DIAGNOSTIC
+#define DWARN(fmt, ...) printf(ddb:  fmt \n, __VA_ARGS__)
+#else
+#define DWARN(fmt, ...) ((void)0)
+#endif
+#else /* _KERNEL */
+#include err.h
+#include stdbool.h
+#include stdint.h
+#include string.h
+#define DWARN warnx
+#endif /* _KERNEL */
+
+enum {
+   DW_LNS_copy = 1,
+   DW_LNS_advance_pc   = 2,
+   DW_LNS_advance_line = 3,
+   DW_LNS_set_file = 4,
+   DW_LNS_set_column   = 5,
+   DW_LNS_negate_stmt  = 6,
+   DW_LNS_set_basic_block  = 7,
+   DW_LNS_const_add_pc = 8,
+   DW_LNS_fixed_advance_pc = 9,
+   DW_LNS_set_prologue_end = 10,
+   DW_LNS_set_epilogue_begin   = 11,
+};
+
+enum {
+   DW_LNE_end_sequence = 1,
+   DW_LNE_set_address  = 2,
+   DW_LNE_define_file  = 3,
+};
+
+struct dwbuf {
+   const char *buf;
+   size_t len;
+};
+
+static bool
+read_bytes(struct dwbuf *d, void *v, size_t n)
+{
+   if (d-len  n)
+   return (false);
+   memcpy(v, d-buf, n);
+   d-buf += n;
+   d-len -= n;
+   return (true);
+}
+
+static bool
+read_s8(struct dwbuf *d, int8_t *v)
+{
+   return (read_bytes(d, v, sizeof(*v)));
+}
+
+static bool
+read_u8(struct dwbuf *d, uint8_t *v)
+{
+   return (read_bytes(d, v, sizeof(*v)));
+}
+
+static bool
+read_u16(struct dwbuf *d, uint16_t *v)
+{
+   return (read_bytes(d, v, sizeof(*v)));
+}
+
+static bool
+read_u32

mallocarray(9)

2014-07-10 Thread Matthew Dempsky
We've found a bunch of uses for reallocarray() in userland, and I
think the idiom is worth reusing in the kernel.  There are enough
places where we do malloc(x * y) that I think it makes sense to add
mallocarray(x, y).

ok?

Index: share/man/man9/Makefile
===
RCS file: /home/matthew/cvs-mirror/cvs/src/share/man/man9/Makefile,v
retrieving revision 1.210
diff -u -p -r1.210 Makefile
--- share/man/man9/Makefile 30 Jun 2014 21:48:09 -  1.210
+++ share/man/man9/Makefile 10 Jul 2014 17:15:45 -
@@ -232,7 +232,7 @@ MLINKS+=ktrace.9 ktrcsw.9 ktrace.9 ktrem
ktrace.9 ktrsysret.9 ktrace.9 KTRPOINT.9
 MLINKS+=lock.9 lockinit.9 lock.9 lockmgr.9 lock.9 lockstatus.9
 MLINKS+=log.9 addlog.9
-MLINKS+=malloc.9 free.9
+MLINKS+=malloc.9 mallocarray.9 malloc.9 free.9
 MLINKS+=membar_sync.9 membar_enter.9 membar_sync.9 membar_exit.9 \
membar_sync.9 membar_producer.9 membar_sync.9 membar_consumer.9
 MLINKS+=mbuf.9 m_copym2.9 mbuf.9 m_copym.9 mbuf.9 m_free.9 mbuf.9 MFREE.9 \
Index: share/man/man9/malloc.9
===
RCS file: /home/matthew/cvs-mirror/cvs/src/share/man/man9/malloc.9,v
retrieving revision 1.52
diff -u -p -r1.52 malloc.9
--- share/man/man9/malloc.9 3 Apr 2014 04:10:34 -   1.52
+++ share/man/man9/malloc.9 10 Jul 2014 17:13:39 -
@@ -33,6 +33,7 @@
 .Os
 .Sh NAME
 .Nm malloc ,
+.Nm mallocarray ,
 .Nm free
 .Nd kernel memory allocator
 .Sh SYNOPSIS
@@ -40,6 +41,13 @@
 .In sys/malloc.h
 .Ft void *
 .Fn malloc unsigned long size int type int flags
+.Ft void *
+.Fo mallocarray
+.Fa unsigned long nmemb
+.Fa unsigned long size
+.Fa int type
+.Fa int flags
+.Fc
 .Ft void
 .Fn free void *addr int type
 .Sh DESCRIPTION
@@ -48,8 +56,17 @@ The
 function allocates uninitialized memory in kernel address space for an
 object whose size is specified by
 .Fa size .
+The
+.Fn mallocarray
+function is the same as
+.Fn malloc ,
+except it allocates space for an array of
+.Fa nmemb
+objects and checks for arithmetic overflow.
+.Pp
+The
 .Fn free
-releases memory at address
+function releases memory at address
 .Fa addr
 that was previously allocated by
 .Fn malloc
@@ -60,7 +77,8 @@ is a null pointer, no action occurs.
 .Pp
 The
 .Fa flags
-argument further qualifies malloc's
+argument further qualifies
+.Fn malloc Ns 's
 operational characteristics as follows:
 .Bl -tag -width xxx -offset indent
 .It Dv M_WAITOK
Index: sys/conf/files
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/conf/files,v
retrieving revision 1.572
diff -u -p -r1.572 files
--- sys/conf/files  19 Apr 2014 12:27:06 -  1.572
+++ sys/conf/files  10 Jul 2014 17:09:03 -
@@ -679,6 +679,7 @@ file kern/kern_ktrace.c ktrace
 file kern/kern_lock.c
 file kern/kern_lkm.c   lkm
 file kern/kern_malloc.c
+file kern/kern_mallocarray.c
 file kern/kern_malloc_debug.c  malloc_debug
 file kern/kern_rwlock.c
 file kern/kern_physio.c
Index: sys/sys/malloc.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/malloc.h,v
retrieving revision 1.108
diff -u -p -r1.108 malloc.h
--- sys/sys/malloc.h19 May 2014 14:30:03 -  1.108
+++ sys/sys/malloc.h10 Jul 2014 17:08:05 -
@@ -391,10 +391,12 @@ extern struct kmemusage *kmemusage;
 extern char *kmembase;
 extern struct kmembuckets bucket[];
 
-extern void *malloc(unsigned long size, int type, int flags);
-extern void free(void *addr, int type);
-extern int sysctl_malloc(int *, u_int, void *, size_t *, void *, size_t,
- struct proc *);
+void   *malloc(unsigned long size, int type, int flags);
+void   *mallocarray(unsigned long nmemb, unsigned long size,
+   int type, int flags);
+void   free(void *addr, int type);
+intsysctl_malloc(int *, u_int, void *, size_t *, void *, size_t,
+   struct proc *);
 
 size_t malloc_roundup(size_t);
 void   malloc_printit(int (*)(const char *, ...));
Index: sys/kern/kern_mallocarray.c
===
RCS file: sys/kern/kern_mallocarray.c
diff -N sys/kern/kern_mallocarray.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ sys/kern/kern_mallocarray.c 10 Jul 2014 17:10:06 -
@@ -0,0 +1,38 @@
+/* $OpenBSD: reallocarray.c,v 1.1 2014/05/08 21:43:49 deraadt Exp $
*/
+/*
+ * Copyright (c) 2008 Otto Moerbeek o...@drijf.net
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT 

Re: mallocarray(9)

2014-07-10 Thread Matthew Dempsky
On Thu, Jul 10, 2014 at 10:28:35AM -0700, Matthew Dempsky wrote:
 +/*
 + * This is sqrt(SIZE_MAX+1), as s1*s2 = SIZE_MAX
 + * if both s1  MUL_NO_OVERFLOW and s2  MUL_NO_OVERFLOW
 + */
 +#define MUL_NO_OVERFLOW  (1UL  (sizeof(size_t) * 4))
 +
 +void *
 +mallocarray(unsigned long nmemb, unsigned long size, int type, int flags)
 +{
 + if ((nmemb = MUL_NO_OVERFLOW || size = MUL_NO_OVERFLOW) 
 + nmemb  0  SIZE_MAX / nmemb  size) {
 + if (flags  M_CANFAIL)
 + return (NULL);
 + panic(overflow);
 + }
 + return (malloc(size * nmemb, type, flags));
 +}

Oops, didn't fully convert size_t - unsigned long to match malloc(9).
(size_t *is* unsigned long currently on all of our architectures, but
technically that's supposed to be opaque.)

Fixed kern_mallocarray.c below.


/*  $OpenBSD$   */
/*
 * Copyright (c) 2008 Otto Moerbeek o...@drijf.net
 *
 * Permission to use, copy, modify, and distribute this software for any
 * purpose with or without fee is hereby granted, provided that the above
 * copyright notice and this permission notice appear in all copies.
 *
 * THE SOFTWARE IS PROVIDED AS IS AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 */

#include sys/param.h
#include sys/limits.h
#include sys/malloc.h
#include sys/systm.h

/*
 * This is sqrt(ULONG_MAX+1), as s1*s2 = ULONG_MAX
 * if both s1  MUL_NO_OVERFLOW and s2  MUL_NO_OVERFLOW
 */
#define MUL_NO_OVERFLOW (1UL  (sizeof(unsigned long) * 4))

void *
mallocarray(unsigned long nmemb, unsigned long size, int type, int flags)
{
if ((nmemb = MUL_NO_OVERFLOW || size = MUL_NO_OVERFLOW) 
nmemb  0  ULONG_MAX / nmemb  size) {
if (flags  M_CANFAIL)
return (NULL);
panic(overflow);
}
return (malloc(size * nmemb, type, flags));
}



Add MAP_ANONYMOUS as a synonym for MAP_ANON

2014-07-10 Thread Matthew Dempsky
The Austin Group this morning accepted proposed wording to standardize
both MAP_ANON and MAP_ANONYMOUS, recognizing that neither was clearly
more popular than the other among applications, and that there's
precedent in POSIX for simply standardizing multiple spellings for a
feature when both are common and trivial for implementations to
provide (e.g., PAGE_SIZE and PAGESIZE).

Diff below defines MAP_ANONYMOUS as a synonym for MAP_ANON, keeping
MAP_ANON as canonical in following BSD heritage.

ok?

Index: lib/libc/sys/mmap.2
===
RCS file: /cvs/src/lib/libc/sys/mmap.2,v
retrieving revision 1.50
diff -u -p -r1.50 mmap.2
--- lib/libc/sys/mmap.2 2 Jul 2014 22:22:35 -   1.50
+++ lib/libc/sys/mmap.2 10 Jul 2014 18:32:59 -
@@ -121,7 +121,7 @@ Sharing, mapping type, and options are s
 .Fa flags
 argument by OR'ing the following values.
 Exactly one of the first two values must be specified:
-.Bl -tag -width MAP_PRIVATE -offset indent
+.Bl -tag -width MAP_ANONYMOUS -offset indent
 .It Dv MAP_PRIVATE
 Modifications are private.
 .It Dv MAP_SHARED
@@ -129,13 +129,16 @@ Modifications are shared.
 .El
 .Pp
 Any combination of the following flags may additionally be used:
-.Bl -tag -width MAP_PRIVATE -offset indent
+.Bl -tag -width MAP_ANONYMOUS -offset indent
 .It Dv MAP_ANON
 Map anonymous memory not associated with any specific file.
 The file descriptor used for creating
 .Dv MAP_ANON
 must currently be \-1 indicating no name is associated with the
 region.
+.It Dv MAP_ANONYMOUS
+Synonym for
+.Dv MAP_ANON .
 .It Dv MAP_FIXED
 Demand that the mapping is placed at
 .Fa addr ,
@@ -157,7 +160,7 @@ source compatibility with code written f
 but are not recommended for use in new
 .Ox
 code:
-.Bl -tag -width MAP_PRIVATE -offset indent
+.Bl -tag -width MAP_ANONYMOUS -offset indent
 .It Dv MAP_COPY
 Modifications are private and, unlike
 .Dv MAP_PRIVATE ,
Index: sys/sys/mman.h
===
RCS file: /cvs/src/sys/sys/mman.h,v
retrieving revision 1.25
diff -u -p -r1.25 mman.h
--- sys/sys/mman.h  27 Jun 2014 20:50:43 -  1.25
+++ sys/sys/mman.h  10 Jul 2014 18:32:59 -
@@ -57,6 +57,7 @@
 #defineMAP_FIXED   0x0010  /* map addr must be exactly as 
requested */
 #define__MAP_NOREPLACE 0x0800  /* fail if address not available */
 #defineMAP_ANON0x1000  /* allocated from memory, swap space */
+#defineMAP_ANONYMOUS   MAP_ANON/* alternate POSIX spelling */
 
 #defineMAP_FLAGMASK0x1ff7
 



Re: mallocarray(9)

2014-07-10 Thread Matthew Dempsky
On Thu, Jul 10, 2014 at 02:42:34PM -0400, Ted Unangst wrote:
 On Thu, Jul 10, 2014 at 10:28, Matthew Dempsky wrote:
  We've found a bunch of uses for reallocarray() in userland, and I
  think the idiom is worth reusing in the kernel.  There are enough
  places where we do malloc(x * y) that I think it makes sense to add
  mallocarray(x, y).
 
  +void   *malloc(unsigned long size, int type, int flags);
  +void   *mallocarray(unsigned long nmemb, unsigned long size,
  +   int type, int flags);
 
 The unsigned long here is ancient history. It should be size_t I think.
 
  Index: sys/kern/kern_mallocarray.c
 
 Please just add it to kern_malloc.c without creating a new file.

ok?

Index: share/man/man9/Makefile
===
RCS file: /home/matthew/cvs-mirror/cvs/src/share/man/man9/Makefile,v
retrieving revision 1.210
diff -u -p -r1.210 Makefile
--- share/man/man9/Makefile 30 Jun 2014 21:48:09 -  1.210
+++ share/man/man9/Makefile 10 Jul 2014 17:15:45 -
@@ -232,7 +232,7 @@ MLINKS+=ktrace.9 ktrcsw.9 ktrace.9 ktrem
ktrace.9 ktrsysret.9 ktrace.9 KTRPOINT.9
 MLINKS+=lock.9 lockinit.9 lock.9 lockmgr.9 lock.9 lockstatus.9
 MLINKS+=log.9 addlog.9
-MLINKS+=malloc.9 free.9
+MLINKS+=malloc.9 mallocarray.9 malloc.9 free.9
 MLINKS+=membar_sync.9 membar_enter.9 membar_sync.9 membar_exit.9 \
membar_sync.9 membar_producer.9 membar_sync.9 membar_consumer.9
 MLINKS+=mbuf.9 m_copym2.9 mbuf.9 m_copym.9 mbuf.9 m_free.9 mbuf.9 MFREE.9 \
Index: share/man/man9/malloc.9
===
RCS file: /home/matthew/cvs-mirror/cvs/src/share/man/man9/malloc.9,v
retrieving revision 1.52
diff -u -p -r1.52 malloc.9
--- share/man/man9/malloc.9 3 Apr 2014 04:10:34 -   1.52
+++ share/man/man9/malloc.9 10 Jul 2014 18:52:54 -
@@ -33,13 +33,16 @@
 .Os
 .Sh NAME
 .Nm malloc ,
+.Nm mallocarray ,
 .Nm free
 .Nd kernel memory allocator
 .Sh SYNOPSIS
 .In sys/types.h
 .In sys/malloc.h
 .Ft void *
-.Fn malloc unsigned long size int type int flags
+.Fn malloc size_t size int type int flags
+.Ft void *
+.Fn malloc size_t nmemb size_t size int type int flags
 .Ft void
 .Fn free void *addr int type
 .Sh DESCRIPTION
@@ -48,8 +51,17 @@ The
 function allocates uninitialized memory in kernel address space for an
 object whose size is specified by
 .Fa size .
+The
+.Fn mallocarray
+function is the same as
+.Fn malloc ,
+except it allocates space for an array of
+.Fa nmemb
+objects and checks for arithmetic overflow.
+.Pp
+The
 .Fn free
-releases memory at address
+function releases memory at address
 .Fa addr
 that was previously allocated by
 .Fn malloc
@@ -60,7 +72,8 @@ is a null pointer, no action occurs.
 .Pp
 The
 .Fa flags
-argument further qualifies malloc's
+argument further qualifies
+.Fn malloc Ns 's
 operational characteristics as follows:
 .Bl -tag -width xxx -offset indent
 .It Dv M_WAITOK
Index: sys/sys/malloc.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/malloc.h,v
retrieving revision 1.108
diff -u -p -r1.108 malloc.h
--- sys/sys/malloc.h19 May 2014 14:30:03 -  1.108
+++ sys/sys/malloc.h10 Jul 2014 18:49:44 -
@@ -391,10 +391,11 @@ extern struct kmemusage *kmemusage;
 extern char *kmembase;
 extern struct kmembuckets bucket[];
 
-extern void *malloc(unsigned long size, int type, int flags);
-extern void free(void *addr, int type);
-extern int sysctl_malloc(int *, u_int, void *, size_t *, void *, size_t,
- struct proc *);
+void   *malloc(size_t, int, int);
+void   *mallocarray(size_t, size_t, int, int);
+void   free(void *, int);
+intsysctl_malloc(int *, u_int, void *, size_t *, void *, size_t,
+   struct proc *);
 
 size_t malloc_roundup(size_t);
 void   malloc_printit(int (*)(const char *, ...));
Index: sys/kern/kern_malloc.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/kern_malloc.c,v
retrieving revision 1.107
diff -u -p -r1.107 kern_malloc.c
--- sys/kern/kern_malloc.c  19 May 2014 14:30:03 -  1.107
+++ sys/kern/kern_malloc.c  10 Jul 2014 18:50:15 -
@@ -159,7 +159,7 @@ struct timeval malloc_lasterr;
  * Allocate a block of memory
  */
 void *
-malloc(unsigned long size, int type, int flags)
+malloc(size_t size, int type, int flags)
 {
struct kmembuckets *kbp;
struct kmemusage *kup;
@@ -701,3 +701,37 @@ malloc_printit(
 #endif
 }
 #endif /* DDB */
+
+/*
+ * Copyright (c) 2008 Otto Moerbeek o...@drijf.net
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE

Re: mallocarray(9)

2014-07-10 Thread Matthew Dempsky
On Thu, Jul 10, 2014 at 12:34 PM, Jean-Philippe Ouellet
jean-phili...@ouellet.biz wrote:
 2nd one should be mallocarray.

Doh, fixed.  Thanks!



Re: Add MAP_ANONYMOUS as a synonym for MAP_ANON

2014-07-10 Thread Matthew Dempsky
On Thu, Jul 10, 2014 at 1:17 PM, Kent R. Spillner kspill...@acm.org wrote:
 If MAP_ANON is going to be the canonical value then shouldn't you
 change MAP_PRIVATE to MAP_ANON instead of MAP_ANONYMOUS in mmap.2?

No, Bl's -width option is used to specify how much padding should be
used to align the list entry text when mandoc, and MAP_ANONYMOUS is
the widest map flag (other than MAP_HASSEMAPHORE, which I'm hoping
goes away soon anyway).



Re: tun TUNDOIOVEC ioctl

2014-07-10 Thread Matthew Dempsky
On Thu, Jul 10, 2014 at 1:20 PM, Ted Unangst t...@tedunangst.com wrote:
 Thoughts?

Seems kind of hacky to me, but if it results in significant
performance improvements in real world uses, then I could be swayed
since it's not very intrusive either.



Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 1:21 AM, Martin Pieuchot mpieuc...@nolizard.org wrote:
 There was a previous attempt to mark sigpromask(2) as nolock but if I
 recall properly setting p_sigmask atomically is not enough and there's
 still a possible race in ptsignal().

 Have you looked into this function, is it safe now?

I haven't looked very closely yet, but I'll be sure to look again
before proposing NOLOCK for sys_sigprocmask().  However, this diff
only replaces the SPLs with atomic operations; it leaves
sys_sigprocmask() under kernel lock.

As long as sys_sigprocmask() is still kernel locked, ptsignal() can
only execute concurrently with sys_sigprocmask() if its an interrupt
on the same CPU.  Without this diff, that interrupt can only occur
before or after the SPL-protected code section; while after this diff,
the interrupt can only occur before or after the atomic operation
(since by definition atomic operations are atomic).



Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 1:29 AM, Mark Kettenis mark.kette...@xs4all.nl wrote:
 Date: Mon, 7 Jul 2014 13:46:03 -0700
 From: Matthew Dempsky matt...@dempsky.org

 p_sigmask is only modified by the owning thread from process context
 (e.g., sys_sigprocmask(), sys_sigreturn(), userret(), or postsig()),
 but it can be accessed anywhere (e.g., interrupts or threads on other
 CPUs).  Currently sys_sigprocmask() protects p_sigmask with splhigh(),
 but that's not MP-safe.

 The simpler solution is to take advantage of our atomics APIs.
 Unfortunately for implementing SIG_SETMASK, we don't have an
 atomic_store_int() function, and I can't bring myself to abuse
 volatile for this purpose, so I've resorted to atomic_swap_uint().

 Sory, but I think that's wrong.  You need volatile to make sure the
 mask is read from memory when you're checking bits.  Or you need to
 insert the proper memory barriers I think.

To be clear: I meant I don't want to abuse volatile as though it's a
magic make-these-operations-**atomic** flag, as that's not what it
really means (even if in practice it often has that effect).

Also, as long as the (always current thread) mutators and cross-thread
accessors are still serialized by the kernel lock, marking p_sigmask
as volatile shouldn't be necessary.  However, I do agree that once we
start unlocking any of them (which is a future goal of this work), we
need some sort of atomic guarantee on the read side too, and marking
p_sigmask as volatile seems like a reasonable first step.  (I'd
probably go further and also decorate the accesses with C11-style
atomic_load()s.)

So I'm happy to mark p_sigmask as volatile with this diff if you'd
prefer.  I just don't think it's strictly necessary yet, but I'm not
opposed to it either.



Allow tsleep() with ident==NULL

2014-07-08 Thread Matthew Dempsky
It's rare, but there *are* cases where a thread wants to sleep and
doesn't expect any wakeup() calls at all; e.g., nanosleep() and
sigsuspend().  In these cases, there's no point in requiring a valid
wait channel identifier or adding the thread to the sleep queue.

Diff below explicitly allows tsleep() with ident==NULL, but skips
adding the thread to a sleep queue in that case.

As an implementation detail, we currently treat p_wchan == NULL as
an indication that the thread has been woken up.  To avoid breaking
that assumption this diff uses p_wchan == NOSLPQUE to identify
threads that are still sleeping, but aren't associated with a sleep
queue.

This also lets us tighten the the kernel lock KASSERT in tsleep():
rather than weakly only checking non-timed sleeps, we can precisely
check all sleeps that use a wait channel.

ok?


Index: share/man/man9/tsleep.9
===
RCS file: /home/matthew/cvs-mirror/cvs/src/share/man/man9/tsleep.9,v
retrieving revision 1.9
diff -u -p -r1.9 tsleep.9
--- share/man/man9/tsleep.9 22 Jan 2014 07:32:47 -  1.9
+++ share/man/man9/tsleep.9 8 Jul 2014 16:25:53 -
@@ -96,8 +96,9 @@ The same identifier must be used in a ca
 .Fn wakeup
 to get the process going again.
 .Fa ident
-should not be
-.Dv NULL .
+may be
+.Dv NULL
+if the process is not waiting on a wait channel.
 .It Fa priority
 The process priority to be used when the process is awakened and put on
 the queue of runnable processes.
Index: sys/kern/kern_synch.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.115
diff -u -p -r1.115 kern_synch.c
--- sys/kern/kern_synch.c   22 Mar 2014 06:05:45 -  1.115
+++ sys/kern/kern_synch.c   8 Jul 2014 16:32:50 -
@@ -78,6 +78,12 @@ sleep_queue_init(void)
TAILQ_INIT(slpque[i]);
 }
 
+/*
+ * Dummy wait channel identifier for threads that don't expect a wakeup().
+ * Threads sleeping on this wait channel aren't queued on slpque.
+ */
+#defineNOSLPQUE((volatile void *)(noslpque))
+static char noslpque;
 
 /*
  * During autoconfiguration or after a panic, a sleep will simply
@@ -109,8 +115,17 @@ tsleep(const volatile void *ident, int p
 
KASSERT((priority  ~(PRIMASK | PCATCH)) == 0);
 
+   /* Make sure caller gave us at least one reason to stop sleeping. */
+   KASSERT(ident != NULL || (priority  PCATCH) != 0 || timo != 0);
+
 #ifdef MULTIPROCESSOR
-   KASSERT(timo || __mp_lock_held(kernel_lock));
+   if (ident != NULL) {
+   /*
+* Waiting on a condition variable requires a lock, otherwise
+* we might miss our wakeup notification.
+*/
+   KASSERT(__mp_lock_held(kernel_lock));
+   }
 #endif
 
if (cold || panicstr) {
@@ -191,12 +206,11 @@ sleep_setup(struct sleep_state *sls, con
 {
struct proc *p = curproc;
 
-#ifdef DIAGNOSTIC
+   KASSERT(ident != NOSLPQUE);
+   KASSERT(p-p_stat == SONPROC);
+
if (ident == NULL)
-   panic(tsleep: no ident);
-   if (p-p_stat != SONPROC)
-   panic(tsleep: not SONPROC);
-#endif
+   ident = NOSLPQUE;
 
 #ifdef KTRACE
if (KTRPOINT(p, KTR_CSW))
@@ -213,7 +227,8 @@ sleep_setup(struct sleep_state *sls, con
p-p_wmesg = wmesg;
p-p_slptime = 0;
p-p_priority = prio  PRIMASK;
-   TAILQ_INSERT_TAIL(slpque[LOOKUP(ident)], p, p_runq);
+   if (ident != NOSLPQUE)
+   TAILQ_INSERT_TAIL(slpque[LOOKUP(ident)], p, p_runq);
 }
 
 void
@@ -350,7 +365,8 @@ void
 unsleep(struct proc *p)
 {
if (p-p_wchan) {
-   TAILQ_REMOVE(slpque[LOOKUP(p-p_wchan)], p, p_runq);
+   if (p-p_wchan != NOSLPQUE)
+   TAILQ_REMOVE(slpque[LOOKUP(p-p_wchan)], p, p_runq);
p-p_wchan = NULL;
}
 }
@@ -365,6 +381,9 @@ wakeup_n(const volatile void *ident, int
struct proc *p;
struct proc *pnext;
int s;
+
+   KASSERT(ident != NULL);
+   KASSERT(ident != NOSLPQUE);
 
SCHED_LOCK(s);
qp = slpque[LOOKUP(ident)];



Compile kernel with -std=gnu99

2014-07-08 Thread Matthew Dempsky
Diff below converts the kernel to build with -std=gnu99.  (For
simplicity, I've only included amd64 for now, but I'll make the same
change to all kernel Makefiles if this is ok.)

The only incompatibility (that I'm aware of) is that ISO C99's inline
semantics differ slightly from GNU C89's historical (but non-standard)
inline semantics, but I believe the diff below keeps us consistent
with the semantics the kernel currently assumes.  (More details
below.)

I've tested on amd64 and I get the exact same .o files with or without
this change (except vers.o, but only because of timestamping).  It's
probably worth conducting the same test on one of our GCC 3
architectures.

ok?

Boring standards details: GNU89 and C99 define static inline to have
the same semantics, but they differ for non-static definitions.  In
particular, C99 introduces the concept of inline definitions so that
it can allow non-static inline functions to be defined in headers and
used across multiple translation units without causing the function to
be receive an external definition in each object file.  GNU89 requires
static inline for inline functions in header files, but then you can
end up with multiple static definitions of the function if it's not
inlined everywhere.

That might seem scary for silently introducing incompatibilities, but
fortunately GCC 3.3 and 4.2 don't support C99 inline semantics
anyway (they always use GNU89 inline), and GCC 4.2 unconditionally
emits a warning if it sees code that would be compiled differently
under C99 semantics.  For newer compilers that do support C99 inline,
-fgnu89-inline forces the legacy GNU89 semantics while also silencing
the GCC 4.2 warning.

Index: Makefile.amd64
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
retrieving revision 1.59
diff -u -p -r1.59 Makefile.amd64
--- Makefile.amd64  8 May 2014 17:59:28 -   1.59
+++ Makefile.amd64  8 Jul 2014 17:41:49 -
@@ -39,8 +39,13 @@ CMACHFLAGS+= -fno-stack-protector
 CMACHFLAGS+=   -Wa,-n
 .endif
 
+CSTD=  -std=gnu99
+.if ${COMPILER_VERSION} != gcc3
+CSTD+= -fgnu89-inline
+.endif
+
 COPTS?=-O2
-CFLAGS=${DEBUG} ${CWARNFLAGS} ${CMACHFLAGS} ${COPTS} ${PIPE}
+CFLAGS=${CSTD} ${DEBUG} ${CWARNFLAGS} ${CMACHFLAGS} ${COPTS} 
${PIPE}
 AFLAGS=-D_LOCORE -x assembler-with-cpp ${CWARNFLAGS} 
${CMACHFLAGS}
 LINKFLAGS= -Ttext 0x810001e0 -e start --warn-common -nopie
 



Re: Compile kernel with -std=gnu99

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 12:03 PM, Mark Kettenis mark.kette...@xs4all.nl wrote:
 I disagree with this diff.  We should discourage the use of GNU
 extensions in our kernel.  Therefore I think std=gnu99 would give the
 wrong signal.

Can you clarify your concern?  Currently we're (implicitly) compiling
with -std=gnu89, which is ISO C90 + GNU extensions.  This diff changes
us to (explicitly) compile with -std=gnu99 -fgnu89-inline, which is
ISO C99 + GNU extensions + GNU C89 inline.

I think moving from C90 to C99 is a good idea.

I'm indifferent to GNU extensions.  If we could make do without them,
then great; but technically inline asm is a GNU extension (even if
it's available in C99 mode) and I doubt we'll benefit from eliminating
that.

Using GNU89 inline is an intermediary step, because the kernel isn't
ready for C99 inline.  See below.

 As for the inline issue.  IMHO, given the incompatbility problems
 between ISO C and GNU C, only static inline is usable.  The
 semantics of the other inline forms is just too confusing.  The
 occasional extra copy of the code is as far as I'm concerned
 acceptable.  The functions should be small enough for it not to
 matter.

Converting the existing inline functions to be C99 compatible (either
by adding static or removing inline) is a next step I have planned,
but I want to allow other C99 features first.

Also, there are inline functions in MD code, so I'd rather have all
kernels on -std=gnu99 -fgnu89-inline.  Then we can start cleaning up
GNU89 inlines and remove -fgnu89-inline on arches once they're clean.
E.g., first clean up all MI and x86 inlines, then the x86 kernels can
start compiling without -fgnu89-inline, which will make sure we don't
regress in MI code while we start tackling other MD files.



Re: Compile kernel with -std=gnu99

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 12:28 PM, Mark Kettenis mark.kette...@xs4all.nl wrote:
 With that explanation, this sounds a lot more reasonable.

Ah, good. :)  Sorry, I should have mentioned up front the followup
steps I had planned and the rationale for taking this path.



Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Matthew Dempsky
Updated diff below with the following changes:

- p_sigmask is now volatile, per kettenis's request
- kern_fork.c's memcpy() for p_startcopy needs to cast away the
  volatile now
- kettenis pointed out atomic_swap_uint() isn't safe to use in MI code
  yet, so for now proc_sigsetmask() just relies on volatile assignment
  being atomic

ok?

Index: sys/proc.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/proc.h,v
retrieving revision 1.186
diff -u -p -r1.186 proc.h
--- sys/proc.h  15 May 2014 03:52:25 -  1.186
+++ sys/proc.h  8 Jul 2014 19:19:17 -
@@ -310,7 +310,7 @@ struct proc {
 
 /* The following fields are all copied upon creation in fork. */
 #definep_startcopy p_sigmask
-   sigset_t p_sigmask; /* Current signal mask. */
+   volatile sigset_t p_sigmask;/* Current signal mask. */
 
u_char  p_priority; /* Process priority. */
u_char  p_usrpri;   /* User-priority based on p_cpu and ps_nice. */
@@ -520,6 +520,8 @@ int proc_cansugid(struct proc *);
 void   proc_finish_wait(struct proc *, struct proc *);
 void   process_zap(struct process *);
 void   proc_free(struct proc *);
+void   proc_sigsetmask(struct proc *, sigset_t);
+void   proc_sigsuspend(struct proc *, sigset_t);
 
 struct sleep_state {
int sls_s;
@@ -559,4 +561,3 @@ struct cpu_info *cpuset_first(struct cpu
 
 #endif /* _KERNEL */
 #endif /* !_SYS_PROC_H_ */
-
Index: kern/kern_sig.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/kern_sig.c,v
retrieving revision 1.166
diff -u -p -r1.166 kern_sig.c
--- kern/kern_sig.c 4 May 2014 05:03:26 -   1.166
+++ kern/kern_sig.c 8 Jul 2014 20:29:50 -
@@ -62,6 +62,7 @@
 #include sys/ptrace.h
 #include sys/sched.h
 #include sys/user.h
+#include sys/atomic.h
 
 #include sys/mount.h
 #include sys/syscallargs.h
@@ -439,37 +440,55 @@ sys_sigprocmask(struct proc *p, void *v,
syscallarg(int) how;
syscallarg(sigset_t) mask;
} */ *uap = v;
-   int error = 0;
-   int s;
-   sigset_t mask;
+   sigset_t mask = SCARG(uap, mask);
 
*retval = p-p_sigmask;
-   mask = SCARG(uap, mask);
-   s = splhigh();
 
switch (SCARG(uap, how)) {
case SIG_BLOCK:
-   p-p_sigmask |= mask ~ sigcantmask;
+   atomic_setbits_int(p-p_sigmask, mask ~ sigcantmask);
break;
case SIG_UNBLOCK:
-   p-p_sigmask = ~mask;
+   atomic_clearbits_int(p-p_sigmask, mask);
break;
case SIG_SETMASK:
-   p-p_sigmask = mask ~ sigcantmask;
+   proc_sigsetmask(p, mask ~ sigcantmask);
break;
default:
-   error = EINVAL;
-   break;
+   return (EINVAL);
}
-   splx(s);
-   return (error);
+
+   return (0);
+}
+
+void
+proc_sigsetmask(struct proc *p, sigset_t mask)
+{
+   KASSERT(p == curproc);
+
+   /* XXX: Abusing volatile assignment as atomic store operation. */
+   p-p_sigmask = mask;
+}
+
+/*
+ * Temporarily replace calling proc's signal mask for the duration of a
+ * system call.  Original signal mask will be restored by userret().
+ */
+void
+proc_sigsuspend(struct proc *p, sigset_t mask)
+{
+   KASSERT(p == curproc);
+   KASSERT(!(p-p_flag  P_SIGSUSPEND));
+
+   p-p_oldmask = p-p_sigmask;
+   atomic_setbits_int(p-p_flag, P_SIGSUSPEND);
+   proc_sigsetmask(p, mask);
 }
 
 /* ARGSUSED */
 int
 sys_sigpending(struct proc *p, void *v, register_t *retval)
 {
-
*retval = p-p_siglist;
return (0);
 }
@@ -489,16 +508,7 @@ sys_sigsuspend(struct proc *p, void *v, 
struct process *pr = p-p_p;
struct sigacts *ps = pr-ps_sigacts;
 
-   /*
-* When returning from sigpause, we want
-* the old mask to be restored after the
-* signal handler has finished.  Thus, we
-* save it here and mark the sigacts structure
-* to indicate this.
-*/
-   p-p_oldmask = p-p_sigmask;
-   atomic_setbits_int(p-p_flag, P_SIGSUSPEND);
-   p-p_sigmask = SCARG(uap, mask) ~ sigcantmask;
+   proc_sigsuspend(p, SCARG(uap, mask) ~ sigcantmask);
while (tsleep(ps, PPAUSE|PCATCH, pause, 0) == 0)
/* void */;
/* always return EINTR rather than ERESTART... */
@@ -749,7 +759,7 @@ trapsignal(struct proc *p, int signum, u
p-p_ru.ru_nsignals++;
(*pr-ps_emul-e_sendsig)(ps-ps_sigact[signum], signum,
p-p_sigmask, trapno, code, sigval);
-   p-p_sigmask |= ps-ps_catchmask[signum];
+   atomic_setbits_int(p-p_sigmask, ps-ps_catchmask[signum]);
if ((ps-ps_sigreset  mask) != 0) {
ps-ps_sigcatch = ~mask;
if (signum != 

Re: Use atomics for mutating p_sigmask

2014-07-08 Thread Matthew Dempsky
On Tue, Jul 8, 2014 at 3:51 PM, Philip Guenther guent...@gmail.com wrote:
  * using atomics when we don't even have correct process-pending signals
 risks a solution that
doesn't work when more than one pending set has to be examined.

I'll hold off on this until we have process-pending signals then.



Mark get*[ug]id() as NOLOCK

2014-07-07 Thread Matthew Dempsky
Recently guenther changed user credentials to be a per-process
resource, but kept a per-thread cache of credentials that get
refreshed at each system call entry.  All of the get*[ug]id() system
calls simply access the thread cached credentials, and possibly
copyout() them if necessary, so they're safe to mark as NOLOCK.

ok?


Index: syscalls.master
===
RCS file: /cvs/src/sys/kern/syscalls.master,v
retrieving revision 1.141
diff -u -p -r1.141 syscalls.master
--- syscalls.master 6 Jul 2014 20:55:58 -   1.141
+++ syscalls.master 7 Jul 2014 18:06:34 -
@@ -80,8 +80,8 @@
int flags, void *data); }
 22 STD { int sys_unmount(const char *path, int flags); }
 23 STD { int sys_setuid(uid_t uid); }
-24 STD { uid_t sys_getuid(void); }
-25 STD { uid_t sys_geteuid(void); }
+24 STD NOLOCK  { uid_t sys_getuid(void); }
+25 STD NOLOCK  { uid_t sys_geteuid(void); }
 #ifdef PTRACE
 26 STD { int sys_ptrace(int req, pid_t pid, caddr_t addr, \
int data); }
@@ -112,7 +112,7 @@
 41 STD { int sys_dup(int fd); }
 42 STD { int sys_fstatat(int fd, const char *path, \
struct stat *buf, int flag); }
-43 STD { gid_t sys_getegid(void); }
+43 STD NOLOCK  { gid_t sys_getegid(void); }
 44 STD { int sys_profil(caddr_t samples, size_t size, \
u_long offset, u_int scale); }
 #ifdef KTRACE
@@ -124,7 +124,7 @@
 46 STD { int sys_sigaction(int signum, \
const struct sigaction *nsa, \
struct sigaction *osa); }
-47 STD { gid_t sys_getgid(void); }
+47 STD NOLOCK  { gid_t sys_getgid(void); }
 48 STD { int sys_sigprocmask(int how, sigset_t mask); }
 49 STD { int sys_getlogin(char *namebuf, u_int namelen); }
 50 STD { int sys_setlogin(const char *namebuf); }
@@ -181,7 +181,7 @@
const struct timeval *tptr); }
 78 STD { int sys_mincore(void *addr, size_t len, \
char *vec); }
-79 STD { int sys_getgroups(int gidsetsize, \
+79 STD NOLOCK  { int sys_getgroups(int gidsetsize, \
gid_t *gidset); }
 80 STD { int sys_setgroups(int gidsetsize, \
const gid_t *gidset); }
@@ -476,11 +476,11 @@
 278UNIMPL  sys_extattr_set_fd
 279UNIMPL  sys_extattr_get_fd
 280UNIMPL  sys_extattr_delete_fd
-281STD { int sys_getresuid(uid_t *ruid, uid_t *euid, \
+281STD NOLOCK  { int sys_getresuid(uid_t *ruid, uid_t *euid, \
uid_t *suid); }
 282STD { int sys_setresuid(uid_t ruid, uid_t euid, \
uid_t suid); }
-283STD { int sys_getresgid(gid_t *rgid, gid_t *egid, \
+283STD NOLOCK  { int sys_getresgid(gid_t *rgid, gid_t *egid, \
gid_t *sgid); }
 284STD { int sys_setresgid(gid_t rgid, gid_t egid, \
gid_t sgid); }



Use atomics for mutating p_sigmask

2014-07-07 Thread Matthew Dempsky
p_sigmask is only modified by the owning thread from process context
(e.g., sys_sigprocmask(), sys_sigreturn(), userret(), or postsig()),
but it can be accessed anywhere (e.g., interrupts or threads on other
CPUs).  Currently sys_sigprocmask() protects p_sigmask with splhigh(),
but that's not MP-safe.

The simpler solution is to take advantage of our atomics APIs.
Unfortunately for implementing SIG_SETMASK, we don't have an
atomic_store_int() function, and I can't bring myself to abuse
volatile for this purpose, so I've resorted to atomic_swap_uint().

While here, I also refactored the P_SIGSUSPEND code a bit so there's
less code duplication.

I've included just amd64 and i386's machdep.c for now, because those
were the only ones I'm readily able to test.  The others should be
easy to similarly fix though, and I can send a followup diff for them.

ok?


Index: sys/proc.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/proc.h,v
retrieving revision 1.186
diff -u -p -r1.186 proc.h
--- sys/proc.h  15 May 2014 03:52:25 -  1.186
+++ sys/proc.h  7 Jul 2014 19:23:06 -
@@ -520,6 +520,8 @@ int proc_cansugid(struct proc *);
 void   proc_finish_wait(struct proc *, struct proc *);
 void   process_zap(struct process *);
 void   proc_free(struct proc *);
+void   proc_sigsetmask(struct proc *, sigset_t);
+void   proc_sigsuspend(struct proc *, sigset_t);
 
 struct sleep_state {
int sls_s;
@@ -559,4 +561,3 @@ struct cpu_info *cpuset_first(struct cpu
 
 #endif /* _KERNEL */
 #endif /* !_SYS_PROC_H_ */
-
Index: kern/kern_sig.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/kern_sig.c,v
retrieving revision 1.166
diff -u -p -r1.166 kern_sig.c
--- kern/kern_sig.c 4 May 2014 05:03:26 -   1.166
+++ kern/kern_sig.c 7 Jul 2014 19:24:07 -
@@ -62,6 +62,7 @@
 #include sys/ptrace.h
 #include sys/sched.h
 #include sys/user.h
+#include sys/atomic.h
 
 #include sys/mount.h
 #include sys/syscallargs.h
@@ -439,37 +440,55 @@ sys_sigprocmask(struct proc *p, void *v,
syscallarg(int) how;
syscallarg(sigset_t) mask;
} */ *uap = v;
-   int error = 0;
-   int s;
-   sigset_t mask;
+   sigset_t mask = SCARG(uap, mask);
 
*retval = p-p_sigmask;
-   mask = SCARG(uap, mask);
-   s = splhigh();
 
switch (SCARG(uap, how)) {
case SIG_BLOCK:
-   p-p_sigmask |= mask ~ sigcantmask;
+   atomic_setbits_int(p-p_sigmask, mask ~ sigcantmask);
break;
case SIG_UNBLOCK:
-   p-p_sigmask = ~mask;
+   atomic_clearbits_int(p-p_sigmask, mask);
break;
case SIG_SETMASK:
-   p-p_sigmask = mask ~ sigcantmask;
+   proc_sigsetmask(p, mask ~ sigcantmask);
break;
default:
-   error = EINVAL;
-   break;
+   return (EINVAL);
}
-   splx(s);
-   return (error);
+
+   return (0);
+}
+
+void
+proc_sigsetmask(struct proc *p, sigset_t mask)
+{
+   KASSERT(p == curproc);
+
+   /* XXX: An atomic store would suffice here. */
+   (void)atomic_swap_uint(p-p_sigmask, mask);
+}
+
+/*
+ * Temporarily replace calling proc's signal mask for the duration of a
+ * system call.  Original signal mask will be restored by userret().
+ */
+void
+proc_sigsuspend(struct proc *p, sigset_t mask)
+{
+   KASSERT(p == curproc);
+   KASSERT(!(p-p_flag  P_SIGSUSPEND));
+
+   p-p_oldmask = p-p_sigmask;
+   atomic_setbits_int(p-p_flag, P_SIGSUSPEND);
+   proc_sigsetmask(p, mask);
 }
 
 /* ARGSUSED */
 int
 sys_sigpending(struct proc *p, void *v, register_t *retval)
 {
-
*retval = p-p_siglist;
return (0);
 }
@@ -489,16 +508,7 @@ sys_sigsuspend(struct proc *p, void *v, 
struct process *pr = p-p_p;
struct sigacts *ps = pr-ps_sigacts;
 
-   /*
-* When returning from sigpause, we want
-* the old mask to be restored after the
-* signal handler has finished.  Thus, we
-* save it here and mark the sigacts structure
-* to indicate this.
-*/
-   p-p_oldmask = p-p_sigmask;
-   atomic_setbits_int(p-p_flag, P_SIGSUSPEND);
-   p-p_sigmask = SCARG(uap, mask) ~ sigcantmask;
+   proc_sigsuspend(p, SCARG(uap, mask) ~ sigcantmask);
while (tsleep(ps, PPAUSE|PCATCH, pause, 0) == 0)
/* void */;
/* always return EINTR rather than ERESTART... */
Index: kern/sys_generic.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/sys_generic.c,v
retrieving revision 1.86
diff -u -p -r1.86 sys_generic.c
--- kern/sys_generic.c  30 Mar 2014 21:54:48 -  1.86
+++ kern/sys_generic.c  7 Jul 2014 19:24:07 -
@@ -658,9 +620,7 @@ dopselect(struct proc 

Reclaim some wasted pages on amd64

2014-07-03 Thread Matthew Dempsky
I spent yesterday trying to really grok early pmap initialization on
amd64, and I noticed what I believe to be wasted physical pages of
memory:

1. In locore.S, we setup both an identity and an actual mapping
for the kernel, to help us bootstrap to executing in high memory.
Also, we reserve pages as though these mappings are independent; but
actually the mappings cleverly (accidentally??) reuse the same L2 and
L3 tables, so we can safely reserve just one set of pages.  (Just the
same, I've added checks to make sure this is safe, and also to sanity
check that we don't overflow the page tables.)

2. In pmap_bootstrap(), we setup the rest of the direct physical
memory mapping.  locore.S has already mapped the first 4GB of memory,
and we don't remove or replace those mappings; so we can save another
couple pages by only allocating page directories for the rest of
physical memory.


In total, this saves a whopping 5 pages (20kB) of physical memory on
my amd64 laptop, as measured by dmesg's avail mem.

So not terribly important, but I have a patch to enable 512GB of
physical address space that somewhat builds on top of this, and I
think it will be easier to review if these changes are already in
place.

This could probably use some testing under memory-intensive workloads,
especially on systems with 4GB of RAM.


Index: amd64/locore.S
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/amd64/amd64/locore.S,v
retrieving revision 1.54
diff -u -p -r1.54 locore.S
--- amd64/locore.S  10 Nov 2012 09:45:05 -  1.54
+++ amd64/locore.S  3 Jul 2014 23:33:25 -
@@ -358,29 +358,48 @@ bi_size_ok:
  *   0  1   2  3
  */
 
-#if L2_SLOT_KERNBASE  0
-#define TABLE_L2_ENTRIES (2 * (NKL2_KIMG_ENTRIES + 1))
-#else
-#define TABLE_L2_ENTRIES (NKL2_KIMG_ENTRIES + 1)
-#endif
+/*
+ * We now want to enable paging, move the kernel to high virtual memory, and
+ * jump there; but unfortunately, we can't do that atomically.  Instead, we
+ * setup a page table that maps the kernel pages to their eventual virtual
+ * address ranges, but also includes an identity map that keeps those pages
+ * available via their physical addresses even once paging is enabled.  Later,
+ * once we're executing out of the the eventual VA range, we remove the
+ * identity map.
+ *
+ * As an additional trick to conserve a few pages, the actual map and the
+ * identity map share L2 and L3 tables.  This works as long as they don't
+ * require inexactly overlapping table entries, which we check for below.
+ * (It also means the kernel may end up mapped multiple times in virtual
+ * memory, but only until we tear down the identity map.)
+ */
+/* XXX(matthew): Why do we always add 1 to NKL2_KIMG_ENTRIES? */
 
-#if L3_SLOT_KERNBASE  0
-#define TABLE_L3_ENTRIES (2 * NKL3_KIMG_ENTRIES)
-#else
-#define TABLE_L3_ENTRIES NKL3_KIMG_ENTRIES
+#if L2_SLOT_KERNBASE  0  L2_SLOT_KERNBASE  NKL2_KIMG_ENTRIES + 1
+#error L2 table collision between identity and actual map entries
+#endif
+#if L3_SLOT_KERNBASE  0  L3_SLOT_KERNBASE  NKL3_KIMG_ENTRIES
+#error L3 table collision between identity and actual map entries
 #endif
 
+/* Sanity check. */
+#if L2_SLOT_KERNBASE + NKL2_KIMG_ENTRIES + 1  NPDPG
+#error L2 table overflow
+#endif
+#if L3_SLOT_KERNBASE + NKL3_KIMG_ENTRIES  NPDPG
+#error L3 table overflow
+#endif
 
 #define PROC0_PML4_OFF 0
 #define PROC0_STK_OFF  (PROC0_PML4_OFF + NBPG)
 #define PROC0_PTP3_OFF (PROC0_STK_OFF + UPAGES * NBPG)
 #define PROC0_PTP2_OFF (PROC0_PTP3_OFF + NKL4_KIMG_ENTRIES * NBPG)
-#define PROC0_PTP1_OFF (PROC0_PTP2_OFF + TABLE_L3_ENTRIES * NBPG)
-#definePROC0_DMP3_OFF  (PROC0_PTP1_OFF + TABLE_L2_ENTRIES * NBPG)
+#define PROC0_PTP1_OFF (PROC0_PTP2_OFF + NKL3_KIMG_ENTRIES * NBPG)
+#definePROC0_DMP3_OFF  (PROC0_PTP1_OFF + (NKL2_KIMG_ENTRIES + 1) * 
NBPG)
 #define PROC0_DMP2_OFF (PROC0_DMP3_OFF + NDML3_ENTRIES * NBPG)
 #define TABLESIZE \
-((NKL4_KIMG_ENTRIES + TABLE_L3_ENTRIES + TABLE_L2_ENTRIES + 1 + UPAGES + \
-   NDML3_ENTRIES + NDML2_ENTRIES) * NBPG)
+((NKL4_KIMG_ENTRIES + NKL3_KIMG_ENTRIES + (NKL2_KIMG_ENTRIES + 1) + 1 + \
+   UPAGES + NDML3_ENTRIES + NDML2_ENTRIES) * NBPG)
 
 #define fillkpt\
 1: movl%eax,(%ebx) ;   /* store phys addr */ \
Index: amd64/pmap.c
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/amd64/amd64/pmap.c,v
retrieving revision 1.70
diff -u -p -r1.70 pmap.c
--- amd64/pmap.c15 Jun 2014 11:43:24 -  1.70
+++ amd64/pmap.c3 Jul 2014 23:41:34 -
@@ -510,10 +510,9 @@ pmap_bootstrap(paddr_t first_avail, padd
 {
vaddr_t kva, kva_end, kva_start = VM_MIN_KERNEL_ADDRESS;
struct pmap *kpm;
-   int i;
+   size_t i, ndmpdp;
unsigned long p1i;
pt_entry_t pg_nx = (cpu_feature  CPUID_NXE? PG_NX : 0);
-   long ndmpdp;
paddr_t dmpd, 

clang compat for Makefile.amd64

2014-07-02 Thread Matthew Dempsky
Diff below allows building an amd64 kernel with CC=clang make.

Some random notes:

1. A bunch of the -Wno-foo warnings are probably worth fixing, but
it's not a high priority for me at the moment.

2. -Wframe-larger-than= is the name modern GCC and Clang have adopted,
and it probably wouldn't hurt to add it as an alias for base GCC's
-Wstack-larger-than- flag.

3. I think -ffreestanding instead of -fno-builtin-{foo,bar,quux,...}
may work with GCC too, but I haven't tested.  At least according to
the GCC 3.3.6 and 4.2.4 manuals, the only functions GCC will assume
exist are mem{cmp,cpy,move,set}().

4. The HOSTCC and HOSTED_XXX variables are present in all of our
Makefile.foo files, but they don't seem to be used for anything.  Is
there any point to keeping them around?  (If not, I can remove them
from all Makefile.foo's.)

5. amd64/conf/kern.ldscript is confusing.  It's not used, and I spent
a while the other day (while working on my ddb line numbers diff)
trying to figure out why changing it wasn't affecting my build output.
I'd like to remove it until someone proposes a real use for it.


Index: Makefile.amd64
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
retrieving revision 1.59
diff -u -p -r1.59 Makefile.amd64
--- Makefile.amd64  8 May 2014 17:59:28 -   1.59
+++ Makefile.amd64  2 Jul 2014 23:53:07 -
@@ -24,14 +24,24 @@ _archdir?=  $S/arch/${_arch}
 INCLUDES=  -nostdinc -I$S -I. -I$S/arch
 CPPFLAGS=  ${INCLUDES} ${IDENT} ${PARAM} -D_KERNEL -MD -MP
 CWARNFLAGS=-Werror -Wall -Wstrict-prototypes -Wmissing-prototypes \
-   -Wno-main -Wno-uninitialized \
-   -Wstack-larger-than-2047
+   -Wno-main -Wno-uninitialized
+.if ${CC} == clang
+CWARNFLAGS+=   -Wframe-larger-than=2047 -Wno-enum-conversion -Wno-unused \
+   -Wno-tautological-compare -Wno-pointer-sign \
+   -Wno-shift-overflow
+.else
+CWARNFLAGS+=   -Wstack-larger-than-2047
+.endif
 
 CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \
-   -mno-mmx -msoft-float -fno-omit-frame-pointer
+   -mno-mmx -msoft-float -fno-omit-frame-pointer ${NOPIE_FLAGS}
+.if ${CC} == clang
+CMACHFLAGS+=   -ffreestanding
+.else
 CMACHFLAGS+=   -fno-builtin-printf -fno-builtin-snprintf \
-fno-builtin-vsnprintf -fno-builtin-log \
-   -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS}
+   -fno-builtin-log2 -fno-builtin-malloc
+.endif
 .if ${IDENT:M-DNO_PROPOLICE}
 CMACHFLAGS+=   -fno-stack-protector
 .endif
@@ -49,11 +59,6 @@ DB_STRUCTINFO=   db_structinfo.h
 .else
 DB_STRUCTINFO=
 .endif
-
-HOSTCC?=   ${CC}
-HOSTED_CPPFLAGS=${CPPFLAGS:S/^-nostdinc$//}
-HOSTED_CFLAGS= ${CFLAGS}
-HOSTED_C=  ${HOSTCC} ${HOSTED_CFLAGS} ${HOSTED_CPPFLAGS} -c $
 
 NORMAL_C_NOP=  ${CC} ${CFLAGS} ${CPPFLAGS} -c $
 NORMAL_C=  ${CC} ${CFLAGS} ${CPPFLAGS} ${PROF} -c $



Re: clang compat for Makefile.amd64

2014-07-02 Thread Matthew Dempsky
On Wed, Jul 2, 2014 at 5:16 PM, Matthew Dempsky matt...@dempsky.org wrote:
 +.if ${CC} == clang
 +CMACHFLAGS+=   -ffreestanding

Sorry, Clang actually needs -fno-integrated-as too, otherwise the
integrated assembler complains about the bogus XYZZY instructions in
kern/genassym.sh.



identcpu for 1-GByte pages

2014-07-02 Thread Matthew Dempsky
According to the Intel 64 and IA-32 Architectures Software
Developer's Manual, CPUID.8001H:EDX.Page1GB [bit 26] indicates
whether 1-GByte pages are supported with IA-32e paging.

I think the diff below adds support for identifying this feature in
dmesg, but my X201s is seemingly to old to support it.

ok?

Index: amd64/include/specialreg.h
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/amd64/include/specialreg.h,v
retrieving revision 1.27
diff -w -u -p -r1.27 specialreg.h
--- amd64/include/specialreg.h  24 Aug 2013 04:26:16 -  1.27
+++ amd64/include/specialreg.h  3 Jul 2014 05:17:18 -
@@ -207,6 +207,7 @@
 #defineCPUID_NXE   0x0010  /* No-Execute Extension */
 #defineCPUID_MMXX  0x0040  /* AMD MMX Extensions */
 #defineCPUID_FFXSR 0x0200  /* fast FP/MMX save/restore */
+#defineCPUID_PAGE1GB   0x0400  /* 1-GByte pages */
 #defineCPUID_LONG  0x2000  /* long mode */
 #defineCPUID_3DNOW20x4000  /* 3DNow! Instruction Extension 
*/
 #defineCPUID_3DNOW 0x8000  /* 3DNow! Instructions */
Index: amd64/amd64/identcpu.c
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/amd64/amd64/identcpu.c,v
retrieving revision 1.52
diff -w -u -p -r1.52 identcpu.c
--- amd64/amd64/identcpu.c  19 Nov 2013 04:12:17 -  1.52
+++ amd64/amd64/identcpu.c  3 Jul 2014 05:19:54 -
@@ -96,6 +96,7 @@ const struct {
{ CPUID_NXE,NXE },
{ CPUID_MMXX,   MMXX },
{ CPUID_FFXSR,  FFXSR },
+   { CPUID_PAGE1GB,PAGE1GB },
{ CPUID_LONG,   LONG },
{ CPUID_3DNOW2, 3DNOW2 },
{ CPUID_3DNOW,  3DNOW }
Index: i386/include/specialreg.h
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/i386/include/specialreg.h,v
retrieving revision 1.46
diff -w -u -p -r1.46 specialreg.h
--- i386/include/specialreg.h   24 Aug 2013 04:26:16 -  1.46
+++ i386/include/specialreg.h   3 Jul 2014 05:20:23 -
@@ -206,6 +206,7 @@
 #defineCPUID_NXE   0x0010  /* No-Execute Extension */
 #defineCPUID_MMXX  0x0040  /* AMD MMX Extensions */
 #defineCPUID_FFXSR 0x0200  /* fast FP/MMX save/restore */
+#defineCPUID_PAGE1GB   0x0400  /* 1-GByte pages */
 #defineCPUID_LONG  0x2000  /* long mode */
 #defineCPUID_3DNOW20x4000  /* 3DNow! Instruction Extension 
*/
 #defineCPUID_3DNOW 0x8000  /* 3DNow! Instructions */
Index: i386/i386/machdep.c
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/i386/i386/machdep.c,v
retrieving revision 1.539
diff -w -u -p -r1.539 machdep.c
--- i386/i386/machdep.c 15 Jun 2014 11:43:24 -  1.539
+++ i386/i386/machdep.c 3 Jul 2014 05:20:56 -
@@ -1001,6 +1001,7 @@ const struct cpu_cpuid_feature i386_ecpu
{ CPUID_NXE,NXE },
{ CPUID_MMXX,   MMXX },
{ CPUID_FFXSR,  FFXSR },
+   { CPUID_PAGE1GB,PAGE1GB },
{ CPUID_LONG,   LONG },
{ CPUID_3DNOW2, 3DNOW2 },
{ CPUID_3DNOW,  3DNOW }



Replace void * in u{dv,vn}_attach()

2014-07-01 Thread Matthew Dempsky
udv_attach() and uvn_attach() are called directly, not via any generic
dispatch table mechanism, so there's no point in specifying that they
accept void * instead of the underlying types they actually expect.

ok?

Index: share/man/man9/uvm.9
===
RCS file: /home/matthew/cvs-mirror/cvs/src/share/man/man9/uvm.9,v
retrieving revision 1.55
diff -u -p -r1.55 uvm.9
--- share/man/man9/uvm.930 Jun 2014 21:48:09 -  1.55
+++ share/man/man9/uvm.91 Jul 2014 19:57:54 -
@@ -439,7 +439,7 @@ returns a standard errno.
 .Sh MEMORY MAPPING FILES AND DEVICES
 .nr nS 1
 .Ft struct uvm_object *
-.Fn uvn_attach void *arg vm_prot_t accessprot
+.Fn uvn_attach struct vnode *vp vm_prot_t accessprot
 .Ft void
 .Fn uvm_vnp_setsize struct vnode *vp voff_t newsize
 .Ft void
@@ -453,7 +453,7 @@ returns a standard errno.
 The
 .Fn uvn_attach
 function attaches a UVM object to vnode
-.Fa arg ,
+.Fa vp ,
 creating the object if necessary.
 The object is returned.
 .Pp
Index: sys/kern/exec_subr.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/kern/exec_subr.c,v
retrieving revision 1.33
diff -u -p -r1.33 exec_subr.c
--- sys/kern/exec_subr.c29 May 2014 05:05:34 -  1.33
+++ sys/kern/exec_subr.c1 Jul 2014 19:54:15 -
@@ -187,7 +187,7 @@ vmcmd_map_pagedvn(struct proc *p, struct
 * first, attach to the object
 */
 
-   uobj = uvn_attach((void *)cmd-ev_vp, VM_PROT_READ|VM_PROT_EXECUTE);
+   uobj = uvn_attach(cmd-ev_vp, VM_PROT_READ|VM_PROT_EXECUTE);
if (uobj == NULL)
return (ENOMEM);
 
Index: sys/uvm/uvm_vnode.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_vnode.c,v
retrieving revision 1.82
diff -u -p -r1.82 uvm_vnode.c
--- sys/uvm/uvm_vnode.c 8 May 2014 20:08:50 -   1.82
+++ sys/uvm/uvm_vnode.c 1 Jul 2014 19:53:41 -
@@ -139,9 +139,8 @@ uvn_init(void)
  *pointers are equiv.
  */
 struct uvm_object *
-uvn_attach(void *arg, vm_prot_t accessprot)
+uvn_attach(struct vnode *vp, vm_prot_t accessprot)
 {
-   struct vnode *vp = arg;
struct uvm_vnode *uvn = vp-v_uvm;
struct vattr vattr;
int oldflags, result;
Index: sys/uvm/uvm_vnode.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_vnode.h,v
retrieving revision 1.12
diff -u -p -r1.12 uvm_vnode.h
--- sys/uvm/uvm_vnode.h 14 Mar 2002 01:27:19 -  1.12
+++ sys/uvm/uvm_vnode.h 1 Jul 2014 19:53:54 -
@@ -102,7 +102,7 @@ struct uvm_vnode {
  * include sys/vnode.h, and files that include sys/vnode.h don't know
  * what a vm_prot_t is.
  */
-struct uvm_object  *uvn_attach(void *, vm_prot_t);
+struct uvm_object  *uvn_attach(struct vnode *, vm_prot_t);
 #endif
 
 #endif /* _KERNEL */
Index: sys/uvm/uvm_device.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_device.c,v
retrieving revision 1.45
diff -u -p -r1.45 uvm_device.c
--- sys/uvm/uvm_device.c13 Apr 2014 23:14:15 -  1.45
+++ sys/uvm/uvm_device.c1 Jul 2014 19:56:09 -
@@ -97,9 +97,8 @@ struct uvm_pagerops uvm_deviceops = {
  * The last two arguments (off and size) are only used for access checking.
  */
 struct uvm_object *
-udv_attach(void *arg, vm_prot_t accessprot, voff_t off, vsize_t size)
+udv_attach(dev_t device, vm_prot_t accessprot, voff_t off, vsize_t size)
 {
-   dev_t device = *((dev_t *)arg);
struct uvm_device *udv, *lcv;
paddr_t (*mapfn)(dev_t, off_t, int);
 #if NDRM  0
@@ -118,7 +117,7 @@ udv_attach(void *arg, vm_prot_t accesspr
return(NULL);
 
 #if NDRM  0
-   obj = udv_attach_drm(arg, accessprot, off, size);
+   obj = udv_attach_drm(device, accessprot, off, size);
if (obj)
return(obj);
 #endif
Index: sys/uvm/uvm_device.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_device.h,v
retrieving revision 1.9
diff -u -p -r1.9 uvm_device.h
--- sys/uvm/uvm_device.h7 Jun 2013 20:46:14 -   1.9
+++ sys/uvm/uvm_device.h1 Jul 2014 19:55:42 -
@@ -70,8 +70,8 @@ struct uvm_device {
  * prototypes
  */
 
-struct uvm_object *udv_attach(void *, vm_prot_t, voff_t, vsize_t);
-struct uvm_object *udv_attach_drm(void *, vm_prot_t, voff_t, vsize_t);
+struct uvm_object *udv_attach(dev_t, vm_prot_t, voff_t, vsize_t);
+struct uvm_object *udv_attach_drm(dev_t, vm_prot_t, voff_t, vsize_t);
 
 #endif /* _KERNEL */
 
Index: sys/uvm/uvm_extern.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_extern.h,v
retrieving revision 1.116
diff -u -p -r1.116 uvm_extern.h
--- sys/uvm/uvm_extern.h13 Jun 2014 

Re: Rename MAP_ANON to MAP_ANONYMOUS

2014-07-01 Thread Matthew Dempsky
On Mon, Jun 30, 2014 at 2:47 PM, Ingo Schwarze schwa...@usta.de wrote:
 so in a nutshell, mmap(2) was originally a BSD idea and first implemented
 in SunOS?  And there is no doubt that *BSD always had MAP_ANON and never
 MAP_ANONYMOUS and that SunOS primarily defines MAP_ANON and MAP_ANONYMOUS
 only for /* (source compatibility) */, right?  And that the earliest
 occurence of MAP_ANONYMOUS we found so far is Linux (1994)?  And that
 4.4BSD was released with MAP_ANON before that (1993)...

I've reaffirmed that OpenBSD's stance is that we'd prefer that
MAP_ANON be standardized, explained the commit history of MAP_ANON and
MAP_ANONYMOUS (thanks Ingo!), and also pointed out that Advanced
Programming in the UNIX Environment recommends MAP_ANON:
http://austingroupbugs.net/view.php?id=850#c2299

I'll update the thread if anything new develops.  Next Austin Group
teleconference is July 10.



Rename MAP_ANON to MAP_ANONYMOUS

2014-06-30 Thread Matthew Dempsky
I filed an enhancement request with the Austin Group to standardize an
mmap() flag for mapping anonymous memory.  I proposed standardizing
MAP_ANON, but the current proposal is to standardize MAP_ANONYMOUS
instead, as that seems to be the more common definition and usage:

http://austingroupbugs.net/view.php?id=850

All else equal, I prefer MAP_ANON as it's shorter and it's also the
original BSD name; but I prefer having a standardized name even more.
If anyone objects and has convincing arguments for why POSIX should
standardize MAP_ANON instead, feel free to either comment on the above
bug, or just express them here and I'll proxy them appropriately.

Otherwise, diff below renames MAP_ANON to MAP_ANONYMOUS, and redefines
MAP_ANON as an alias for MAP_ANONYMOUS.


Index: sys/sys/mman.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/mman.h,v
retrieving revision 1.25
diff -u -p -r1.25 mman.h
--- sys/sys/mman.h  27 Jun 2014 20:50:43 -  1.25
+++ sys/sys/mman.h  27 Jun 2014 21:35:47 -
@@ -56,7 +56,8 @@
  */
 #defineMAP_FIXED   0x0010  /* map addr must be exactly as 
requested */
 #define__MAP_NOREPLACE 0x0800  /* fail if address not available */
-#defineMAP_ANON0x1000  /* allocated from memory, swap space */
+#defineMAP_ANONYMOUS   0x1000  /* allocated from memory, swap space */
+#defineMAP_ANONMAP_ANONYMOUS   /* original BSD spelling */
 
 #defineMAP_FLAGMASK0x1ff7
 
Index: lib/libc/sys/mmap.2
===
RCS file: /home/matthew/cvs-mirror/cvs/src/lib/libc/sys/mmap.2,v
retrieving revision 1.48
diff -u -p -r1.48 mmap.2
--- lib/libc/sys/mmap.2 27 Jun 2014 22:51:20 -  1.48
+++ lib/libc/sys/mmap.2 28 Jun 2014 00:01:25 -
@@ -120,7 +120,7 @@ Sharing, mapping type, and options are s
 .Fa flags
 argument by OR'ing the following values.
 Exactly one of the first two values must be specified:
-.Bl -tag -width MAP_PRIVATE -offset indent
+.Bl -tag -width MAP_ANONYMOUS -offset indent
 .It Dv MAP_PRIVATE
 Modifications are private.
 .It Dv MAP_SHARED
@@ -128,11 +128,11 @@ Modifications are shared.
 .El
 .Pp
 Any combination of the following flags may additionally be used:
-.Bl -tag -width MAP_PRIVATE -offset indent
-.It Dv MAP_ANON
+.Bl -tag -width MAP_ANONYMOUS -offset indent
+.It Dv MAP_ANONYMOUS
 Map anonymous memory not associated with any specific file.
 The file descriptor used for creating
-.Dv MAP_ANON
+.Dv MAP_ANONYMOUS
 must currently be \-1 indicating no name is associated with the
 region.
 .It Dv MAP_FIXED
@@ -156,7 +156,16 @@ source compatibility with code written f
 but are not recommended for use in new
 .Ox
 code:
-.Bl -tag -width MAP_PRIVATE -offset indent
+.Bl -tag -width MAP_ANONYMOUS -offset indent
+.It Dv MAP_ANON
+Original name for
+.Dv MAP_ANONYMOUS ,
+as introduced by
+.Bx .
+On
+.Ox
+this is an alias for
+.Dv MAP_ANONYMOUS .
 .It Dv MAP_COPY
 Modifications are private and, unlike
 .Dv MAP_PRIVATE ,



Re: Rename MAP_ANON to MAP_ANONYMOUS

2014-06-30 Thread Matthew Dempsky
On Mon, Jun 30, 2014 at 10:42 AM, Mark Kettenis mark.kette...@xs4all.nl wrote:
 Solaris documents MAP_ANON in its man page, and defines MAP_ANONYMOUS
 as MAP_ANON for source compatibility.

Yep, but what about it?  Are you suggesting that should affect POSIX's
standardization, or that we should do the same thing?  I suspect if
POSIX standardizes MAP_ANONYMOUS, that Solaris would switch to
documenting MAP_ANONYMOUS and providing MAP_ANON for source compat,
no?



Re: Rename MAP_ANON to MAP_ANONYMOUS

2014-06-30 Thread Matthew Dempsky
On Mon, Jun 30, 2014 at 11:31 AM, Mark Kettenis mark.kette...@xs4all.nl wrote:
 Yes, I'm saying that this should affect POSIX's standardization.
 Solaris is where mmap(2) came from.

The full history is a bit more complicated though.  From what I've
managed to uncover over the past few days so far:

mmap() first appeared in 4.1cBSD [mmap.2] and was scheduled for
inclusion in 4.2BSD (1983) [UVM thesis, p36], but didn't actually
appear until the 1993 4.4BSD release [UVM thesis, p36].

In the mean time, it was independently reimplemented for SunOS 4 [UVM
thesis, p36], and released in 1988 [Wikipedia].  However, SunOS didn't
implement MAP_ANON, and instead required users to pass a file
descriptor for /dev/zero.

In 1994 (prior to Linux's 1.0 release), Linus added support for
MAP_ANONYMOUS 
[https://kernel.googlesource.com/pub/scm/linux/kernel/git/nico/archive/+/0b5e8609bf7e6899c1fea30aa467812d488b6c11%5E%21/#F3].

SunOS 5.8 (Feb 2000) added support for MAP_ANON
[http://www.freebsd.org/cgi/man.cgi?query=mmapapropos=0sektion=0manpath=SunOS+5.8arch=defaultformat=html].
I'm not sure when it added MAP_ANONYMOUS support.

Also according to Wikipedia, HP-UX is an SVR2 derivative and AIX is an
SVR3 derivative.  SVR2 and SVR3 were released in 1984 and 1986,
respectively, so neither should have inherited MAP_ANON/MAP_ANONYMOUS
from SVR/SunOS.  I'm not sure when they picked them up.

 Also, look at:

   https://github.com/sgminer-dev/sgminer/blob/master/m4/mmap-anon.m4

Yeah, I've seen that, though despite the comment pointing out MAP_ANON
is more common, it instead adds a #define MAP_ANONYMOUS MAP_ANON if
it notices MAP_ANONYMOUS isn't provided.  So ironically, that macro
package's widespread use is indicative of code that uses MAP_ANONYMOUS
instead of MAP_ANON.  :-/



Re: Rename MAP_ANON to MAP_ANONYMOUS

2014-06-30 Thread Matthew Dempsky
On Mon, Jun 30, 2014 at 2:47 PM, Ingo Schwarze schwa...@usta.de wrote:
 mmap() first appeared in 4.1cBSD [mmap.2] and was scheduled for
 inclusion in 4.2BSD (1983) [UVM thesis, p36], but didn't actually
 appear until the 1993 4.4BSD release [UVM thesis, p36].

 At least MAP_ANON is definitely a lot older than 1993.

 The original CSRG commit introducing it looks like this:

   sys/sys/SCCS/s.mman.h:
   As 00038/00013/00010
   Ad D 7.2 90/12/05 15:28:56 mckusick 6 5
   Ac update for new VM

 And that commit did make it into the 4.3BSD Net/2 release,
 published on August 20, 1991 (as seen on Kirk's CSRG archive CD).

Thanks.  I think that's generally consistent with the years I quoted
above, though I suppose the UVM thesis's actually appear wording is
arguable.

Maybe CDC was referring to how 4.3BSD was a development release,
whereas 4.4BSD was a production release?  (This release represents an
intermediate point in the development of 4.4BSD; [...] This
distribution is NOT intended to be used on production systems
http://ftp.netbsd.org/pub/NetBSD/misc/release/BSD/BSD-Net1)

 So i'd challenge anybody arguing for MAP_ANONYMOUS to show
 a published sys/mman.h header file containing that spelling
 where it was introduced before December 5, 1990.

I don't think MAP_ANONYMOUS is being proposed for standardization
because it's perceived to be of older origin than MAP_ANON or
anything.  I'm pretty sure the focus is instead because it's perceived
to have greater 'market share' among present day systems and
applications.



Re: Unnecessary mmap flags?

2014-06-27 Thread Matthew Dempsky
On Fri, Jun 27, 2014 at 02:46:01PM +0200, Mark Kettenis wrote:
 Losing the descriptions of the no-op flags is a bit unfortunate.
 Can you add those back?

Okay, restored them below.  Also tested that kdump can handle this
change gracefully.

In this diff I've also moved MAP_FILE down to the legacy flags
section.  I don't expect we'll want to remove it, but strictly
speaking it *is* just a legacy flag: POSIX doesn't define MAP_FILE and
instead just mandates the map from file semantics even without any
flag.

ok?

Index: sys/sys/mman.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/mman.h,v
retrieving revision 1.24
diff -u -p -r1.24 mman.h
--- sys/sys/mman.h  13 Jun 2014 01:48:52 -  1.24
+++ sys/sys/mman.h  27 Jun 2014 17:26:32 -
@@ -50,32 +50,49 @@
  */
 #defineMAP_SHARED  0x0001  /* share changes */
 #defineMAP_PRIVATE 0x0002  /* changes are private */
-#defineMAP_COPY0x0004  /* copy region at mmap time */
 
 /*
  * Other flags
  */
-#defineMAP_FIXED0x0010 /* map addr must be exactly as 
requested */
-#defineMAP_RENAME   0x0020 /* Sun: rename private pages to file */
-#defineMAP_NORESERVE0x0040 /* Sun: don't reserve needed swap area 
*/
-#defineMAP_INHERIT  0x0080 /* region is retained after exec */
-#defineMAP_NOEXTEND 0x0100 /* for MAP_FILE, don't change file size 
*/
-#defineMAP_HASSEMAPHORE 0x0200 /* region may contain semaphores */
-#defineMAP_TRYFIXED 0x0400 /* attempt hint address, even within 
heap */
+#defineMAP_FIXED   0x0010  /* map addr must be exactly as 
requested */
+#define__MAP_NOREPLACE 0x0800  /* fail if address not available */
+#defineMAP_ANON0x1000  /* allocated from memory, swap space */
 
-#define__MAP_NOREPLACE  0x0800 /* fail if address not available */
+#defineMAP_FLAGMASK0x1ff7
 
+#ifdef _KERNEL
 /*
- * Error return from mmap()
+ * Backwards compat for OpenBSD 5.5.
+ * TODO: Remove after OpenBSD 5.7 release.
  */
-#define MAP_FAILED ((void *)-1)
+#defineMAP_OLDCOPY 0x0004  /* alias for MAP_PRIVATE */
+#defineMAP_OLDRENAME   0x0020
+#defineMAP_OLDNORESERVE0x0040
+#defineMAP_OLDINHERIT  0x0080
+#defineMAP_OLDNOEXTEND 0x0100
+#defineMAP_OLDHASSEMAPHORE 0x0200
+#defineMAP_OLDTRYFIXED 0x0400
+#endif
 
+#ifndef _KERNEL
 /*
- * Mapping type
+ * Legacy defines for userland source compatibility.
+ * Can be removed once no longer needed in base and ports.
  */
-#defineMAP_FILE0x  /* map from file (default) */
-#defineMAP_ANON0x1000  /* allocated from memory, swap space */
-#defineMAP_FLAGMASK0x1ff7
+#defineMAP_COPYMAP_PRIVATE /* copy region at 
mmap time */
+#defineMAP_FILE0   /* map from file (default) */
+#defineMAP_HASSEMAPHORE0   /* region may contain 
semaphores */
+#defineMAP_INHERIT 0   /* region is retained after 
exec */
+#defineMAP_NOEXTEND0   /* for MAP_FILE, don't change 
file size */
+#defineMAP_NORESERVE   0   /* Sun: don't reserve needed 
swap area */
+#defineMAP_RENAME  0   /* Sun: rename private pages to 
file */
+#defineMAP_TRYFIXED0   /* attempt hint address, even 
within heap */
+#endif
+
+/*
+ * Error return from mmap()
+ */
+#define MAP_FAILED ((void *)-1)
 
 /*
  * POSIX memory advisory values.
Index: usr.bin/kdump/mksubr
===
RCS file: /home/matthew/cvs-mirror/cvs/src/usr.bin/kdump/mksubr,v
retrieving revision 1.18
diff -u -p -r1.18 mksubr
--- usr.bin/kdump/mksubr21 Dec 2013 07:32:35 -  1.18
+++ usr.bin/kdump/mksubr27 Jun 2014 17:08:46 -
@@ -241,7 +241,9 @@ cat _EOF_
 #include sys/fcntl.h
 #include sys/stat.h
 #include sys/unistd.h
+#define _KERNEL
 #include sys/mman.h
+#undef _KERNEL
 #include sys/wait.h
 #include sys/proc.h
 #define _KERNEL



Unnecessary mmap flags?

2014-06-26 Thread Matthew Dempsky
I just reviewed our mmap(2) flags to compare them against Linux,
FreeBSD, Solaris, and Darwin's flags.  Of the flags listed below, none
of them are specified by POSIX, and none of them do anything
interesting on OpenBSD: MAP_COPY just gets rewritten to MAP_PRIVATE,
and the rest are silently ignored by UVM.

Linux   FreeBSD Solaris Darwin
MAP_COPYno  YES*no  YES*
MAP_RENAME  no  YES*YES*YES*
MAP_NORESERVE   YES YES*YES YES*
MAP_INHERIT no  YES**   no  no
MAP_NOEXTENDno  no  no  YES*
MAP_HASSEMAPHOREno  YES***  no  YES***
MAP_TRYFIXEDno  no  no  no

* These are defined in the OS's sys/mman.h, but are undocumented in
their mmap(2) manual, and behave the same as on OpenBSD (i.e.,
MAP_COPY is an alias for MAP_PRIVATE; the others have no effect).

** MAP_INHERIT is documented in FreeBSD's mmap(2) manual (as This
flag never operated as advertised, which is true on OpenBSD too), but
not defined in their sys/mman.h.

*** MAP_HASSEMAPHORE is documented in FreeBSD and Darwin's mmap(2)
manuals and defined in their sys/mman.h, but has no effects on
either's VM behavior.


So MAP_NORESERVE is perhaps necessary/worth keeping around, but the
others seem like candidates for removal if nothing in ports needs
them.

MAP_HASSEMAPHORE is used in rthread_sem.c, but it doesn't do anything,
so I suspect it's just cargo culting based on man page misinformation?
Are there architectures that actually have restrictions on semaphore
memory?



Re: Unnecessary mmap flags?

2014-06-26 Thread Matthew Dempsky
On Thu, Jun 26, 2014 at 12:28:18PM -0700, Matthew Dempsky wrote:
 I just reviewed our mmap(2) flags to compare them against Linux,
 FreeBSD, Solaris, and Darwin's flags.  Of the flags listed below, none
 of them are specified by POSIX, and none of them do anything
 interesting on OpenBSD: MAP_COPY just gets rewritten to MAP_PRIVATE,
 and the rest are silently ignored by UVM.

Feedback so far is the useless flags should go away.  Diff below is a
first step towards this:

1. MAP_COPY is redefined as an alias for MAP_PRIVATE, and the other
useless MAP_* flags are redefined to 0.  They're also hidden from the
kernel to make sure no kernel code accidentally depends on them still.

2. Adds COMPAT_O55_MAP_COPY so we can stay binary compatible with any
OpenBSD 5.5 programs that still use MAP_COPY (probably none, but it's
trivial to do), and COMPAT_O55_MAP_NOOPMASK just to keep track of
which bits were previously reserved for do-nothing flags.

3. Reshuffles the defines a little bit so the order makes more sense.

Followup patch will add a deprecation warning for the MAP_* flags, but
I think that'll need some ports testing, whereas this should be safe
to commit now.

ok?

Index: sys/mman.h
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/sys/mman.h,v
retrieving revision 1.24
diff -u -p -r1.24 mman.h
--- sys/mman.h  13 Jun 2014 01:48:52 -  1.24
+++ sys/mman.h  26 Jun 2014 23:54:28 -
@@ -50,32 +50,43 @@
  */
 #defineMAP_SHARED  0x0001  /* share changes */
 #defineMAP_PRIVATE 0x0002  /* changes are private */
-#defineMAP_COPY0x0004  /* copy region at mmap time */
+
+/*
+ * Mapping type
+ */
+#defineMAP_FILE0x  /* map from file (default) */
+#defineMAP_ANON0x1000  /* allocated from memory, swap space */
 
 /*
  * Other flags
  */
-#defineMAP_FIXED0x0010 /* map addr must be exactly as 
requested */
-#defineMAP_RENAME   0x0020 /* Sun: rename private pages to file */
-#defineMAP_NORESERVE0x0040 /* Sun: don't reserve needed swap area 
*/
-#defineMAP_INHERIT  0x0080 /* region is retained after exec */
-#defineMAP_NOEXTEND 0x0100 /* for MAP_FILE, don't change file size 
*/
-#defineMAP_HASSEMAPHORE 0x0200 /* region may contain semaphores */
-#defineMAP_TRYFIXED 0x0400 /* attempt hint address, even within 
heap */
+#defineMAP_FIXED   0x0010  /* map addr must be exactly as 
requested */
+#define__MAP_NOREPLACE 0x0800  /* fail if address not available */
 
-#define__MAP_NOREPLACE  0x0800 /* fail if address not available */
+#ifdef _KERNEL
+#define COMPAT_O55_MAP_COPY0x0004  /* alias for MAP_PRIVATE */
+#define COMPAT_O55_MAP_NOOPMASK0x07e0  /* formerly reserved flag bits 
*/
+#endif
+
+#defineMAP_FLAGMASK0x1ff7
 
+#ifndef _KERNEL
 /*
- * Error return from mmap()
+ * Deprecated flags with no significant meaning on OpenBSD.
  */
-#define MAP_FAILED ((void *)-1)
+#defineMAP_COPYMAP_PRIVATE
+#defineMAP_HASSEMAPHORE0
+#defineMAP_INHERIT 0
+#defineMAP_NOEXTEND0
+#defineMAP_NORESERVE   0
+#defineMAP_RENAME  0
+#defineMAP_TRYFIXED0
+#endif
 
 /*
- * Mapping type
+ * Error return from mmap()
  */
-#defineMAP_FILE0x  /* map from file (default) */
-#defineMAP_ANON0x1000  /* allocated from memory, swap space */
-#defineMAP_FLAGMASK0x1ff7
+#define MAP_FAILED ((void *)-1)
 
 /*
  * POSIX memory advisory values.
Index: uvm/uvm_mmap.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_mmap.c,v
retrieving revision 1.94
diff -u -p -r1.94 uvm_mmap.c
--- uvm/uvm_mmap.c  13 Apr 2014 23:14:15 -  1.94
+++ uvm/uvm_mmap.c  26 Jun 2014 23:49:39 -
@@ -345,8 +345,8 @@ sys_mmap(struct proc *p, void *v, regist
return (EINVAL);
if ((flags  MAP_FLAGMASK) != flags)
return (EINVAL);
-   if (flags  MAP_COPY)
-   flags = (flags  ~MAP_COPY) | MAP_PRIVATE;
+   if (flags  COMPAT_O55_MAP_COPY)
+   flags = (flags  ~COMPAT_O55_MAP_COPY) | MAP_PRIVATE;
if ((flags  (MAP_SHARED|MAP_PRIVATE)) == (MAP_SHARED|MAP_PRIVATE))
return (EINVAL);
if ((flags  (MAP_FIXED|__MAP_NOREPLACE)) == __MAP_NOREPLACE)



POSIX-compliant page fault error codes

2014-06-24 Thread Matthew Dempsky
POSIX specifies these error cases for memory faults:

  SIGSEGV/SEGV_MAPERR: Accessing an unmapped page.

  SIGSEGV/SEGV_ACCERR: Reading from a non-readable or writing to a
  non-writable page. 

  SIGBUS/BUS_ADRERR: Accessing a mapped page that exceeds the end of
  the underlying mapped file.

I added a regress test at regress/sys/kern/siginfo-fault to cover
these cases, but unfortunately we're non-compliant in a few ways, and
fixing it is somewhat MD.  With the diff below, the tests pass on
amd64, but other platforms will need similar changes.

Currently VM_PAGER_BAD is only returned by pgo_get() in the case of
uvn_get() trying to access a page beyond the end of the file, so this
diff changes uvm_fault() to recognize this and return ENOSPC
(arbitrary unused error code) and then the MD trap() code needs to
know to map this error to BUS_ADRERR.

Additionally, some MD trap()s already know to map EACCES to
SEGV_ACCERR instead of SEGV_MAPERR, but amd64 wasn't one of them.  So
this diff fixes that too.


Index: uvm/uvm_fault.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_fault.c,v
retrieving revision 1.73
diff -u -p -r1.73 uvm_fault.c
--- uvm/uvm_fault.c 8 May 2014 20:08:50 -   1.73
+++ uvm/uvm_fault.c 23 Jun 2014 21:29:24 -
@@ -1125,7 +1125,8 @@ Case2:
goto ReFault;
}
 
-   return (EACCES); /* XXX i/o error */
+   /* XXX i/o error */
+   return (result == VM_PAGER_BAD ? ENOSPC : EACCES);
}
 
/* re-verify the state of the world.  */
Index: arch/amd64/amd64/trap.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/arch/amd64/amd64/trap.c,v
retrieving revision 1.40
diff -u -p -r1.40 trap.c
--- arch/amd64/amd64/trap.c 15 Jun 2014 11:43:24 -  1.40
+++ arch/amd64/amd64/trap.c 23 Jun 2014 21:38:31 -
@@ -387,9 +387,6 @@ faultcommon:
KERNEL_UNLOCK();
goto out;
}
-   if (error == EACCES) {
-   error = EFAULT;
-   }
 
if (type == T_PAGEFLT) {
if (pcb-pcb_onfault != 0) {
@@ -407,13 +404,23 @@ faultcommon:
sv.sival_ptr = (void *)fa;
trapsignal(p, SIGKILL, T_PAGEFLT, SEGV_MAPERR, sv);
} else {
+   int signo, code;
+   if (error == ENOSPC) {
+   signo = SIGBUS;
+   code = BUS_ADRERR;
+   } else {
+   signo = SIGSEGV;
+   code = (error == EACCES) ? SEGV_ACCERR :
+   SEGV_MAPERR;
+   }
 #ifdef TRAP_SIGDEBUG
-   printf(pid %d (%s): SEGV at rip %lx addr %lx\n,
-   p-p_pid, p-p_comm, frame-tf_rip, fa);
+   printf(pid %d (%s): %s at rip %lx addr %lx\n,
+   p-p_pid, p-p_comm, (signo == SIGBUS) ?
+   BUS : SEGV, frame-tf_rip, fa);
frame_dump(frame);
 #endif
sv.sival_ptr = (void *)fa;
-   trapsignal(p, SIGSEGV, T_PAGEFLT, SEGV_MAPERR, sv);
+   trapsignal(p, signo, T_PAGEFLT, code, sv);
}
KERNEL_UNLOCK();
break;



Re: POSIX-compliant page fault error codes

2014-06-24 Thread Matthew Dempsky
On Tue, Jun 24, 2014 at 11:04:10AM -0700, Matthew Dempsky wrote:
   SIGBUS/BUS_ADRERR: Accessing a mapped page that exceeds the end of
   the underlying mapped file.

Generating SIGBUS for this case has proven controversial due to
concern that this is Linux invented behavior and not compatible with
Solaris, so I decided to collect some more background information on
the subject.

- SunOS 4.1.3's mmap() manual specifies: Any reference to addresses
beyond the end of the object, however, will result in the delivery of
a SIGBUS signal. This wording was relaxed to SIGBUS or SIGSEGV in
SunOS 5.6 and remains in current manuals. (I'm not sure, but I suspect
this may be to simply reflect that memory protection violations take
priority over bounds checking.)

  SunOS 4.1.3: 
http://www.freebsd.org/cgi/man.cgi?query=mmapsektion=2manpath=SunOS+4.1.3
  SunOS 5.6: 
http://www.freebsd.org/cgi/man.cgi?query=mmapsektion=2manpath=SunOS+5.6
  Solaris 11: http://docs.oracle.com/cd/E23824_01/html/821-1463/mmap-2.html

- Many other SVR-derived OSes similarly document SIGBUS in their
mmap() manuals too:

  AIX: 
http://www-01.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.basetechref/doc/basetrf1/mmap.htm?lang=en
  HPUX: 
http://h20566.www2.hp.com/portal/site/hpsc/template.BINARYPORTLET/public/kb/docDisplay/resource.process/?spf_p.tpst=kbDocDisplay_ws_BIspf_p.rid_kbDocDisplay=docDisplayResURLjavax.portlet.begCacheTok=com.vignette.cachetokenspf_p.rst_kbDocDisplay=wsrp-resourceState%3DdocId%253Demr_na-c02261243-2%257CdocLocale%253Djavax.portlet.endCacheTok=com.vignette.cachetoken
  UnixWare: http://uw714doc.sco.com/en/man/html.2/mmap.2.html

- This behavior has been (awkwardly) specified for mmap() since SUSv2:
References within the address range starting at pa and continuing for
len bytes to whole pages following the end of an object shall result
in delivery of a SIGBUS signal. Later versions of POSIX have the same
wording.

  SUSv2: http://pubs.opengroup.org/onlinepubs/007908799/xsh/mmap.html
  POSIX.2001: http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html
  POSIX.2008: 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

- More generally, POSIX explains the SIGBUS/SIGSEGV distinction
thusly: When an object is mapped, various application accesses to the
mapped region may result in signals. In this context, SIGBUS is used
to indicate an error using the mapped object, and SIGSEGV is used to
indicate a protection violation or misuse of an address. Specific
examples are provided too:

  Memory Protection: 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_08_03_03



ddb: ELF line decoding in stack traces

2014-06-23 Thread Matthew Dempsky
Diff below implements file/line number decoding in ddb for ELF; e.g.,

ddb{2} trace
Debugger() at Debugger+0x9 [../../../../arch/amd64/amd64/db_interface.c:406]
ddb_sysctl() at ddb_sysctl+0x1b4 [../../../../ddb/db_usrreq.c:104]
sys___sysctl() at sys___sysctl+0x214 [../../../../kern/kern_sysctl.c:229]
syscall() at syscall+0x297 [../../../../sys/syscall_mi.h:84]
--- syscall (number 202) ---
end of kernel
end trace frame: 0x7f7e1def, count: -4
acpi_pdirpa+0x41195a:
ddb{2} 

It works by modifying boot(8) to copy the .debug_line DWARF section
into memory when present (same as it does currently for symbol and
string table sections), and then ddb's ELF symbol format handler code
makes use of it to translate program counters into file/line pairs
just like addr2line would.

For line decoding to work, the kernel must have been compiled with -g.
Normal and/or stripped kernels continue to work as they do now, just
won't produce file/line information.

Only lightly tested on amd64 so far, but might work on other platforms
too.  If you're brave, you can try it out thusly:

  1. Apply patch.
  2. Build boot(8), copy to /boot, and run installboot(8).
  3. Configure kernel with 'makeoptions DEBUG=-g'.
  4. Build kernel and install bsd.gdb (not bsd!).
  5. Reboot.

I did this mostly for fun / personal enlightenment, so I haven't put
too much effort into polishing it yet.


Index: ddb/db_elf.c
===
RCS file: /home/matthew/anoncvs/cvs/src/sys/ddb/db_elf.c,v
retrieving revision 1.10
diff -u -p -r1.10 db_elf.c
--- ddb/db_elf.c16 Mar 2014 20:31:46 -  1.10
+++ ddb/db_elf.c23 Jun 2014 12:27:26 -
@@ -46,6 +46,7 @@
 #include sys/exec_elf.h
 
 static char *db_elf_find_strtab(db_symtab_t *);
+static char *db_elf_find_linetab(db_symtab_t *, size_t *);
 
 #defineSTAB_TO_SYMSTART(stab)  ((Elf_Sym *)((stab)-start))
 #defineSTAB_TO_SYMEND(stab)((Elf_Sym *)((stab)-end))
@@ -111,10 +112,10 @@ db_elf_sym_init(int symsize, void *symta
 *  . . .
 *  . . .
 *  last section header
-*  first symbol or string table section
+*  first symbol, string, or line table section
 *  . . .
 *  . . .
-*  last symbol or string table section
+*  last symbol, string, or line table section
 */
 
/*
@@ -233,6 +234,29 @@ db_elf_find_strtab(db_symtab_t *stab)
 }
 
 /*
+ * Internal helper function - return a pointer to the line table
+ * for the current symbol table.
+ */
+static char *
+db_elf_find_linetab(db_symtab_t *stab, size_t *size)
+{
+   Elf_Ehdr *elf = STAB_TO_EHDR(stab);
+   Elf_Shdr *shp = STAB_TO_SHDR(stab, elf);
+   char *shstrtab;
+   int i;
+
+   shstrtab = (char *)elf + shp[elf-e_shstrndx].sh_offset;
+   for (i = 0; i  elf-e_shnum; i++) {
+   if (strcmp(.debug_line, shstrtab+shp[i].sh_name) == 0) {
+   *size = shp[i].sh_size;
+   return ((char *)elf + shp[i].sh_offset);
+   }
+   }
+
+   return (NULL);
+}
+
+/*
  * Lookup the symbol with the given name.
  */
 db_sym_t
@@ -343,6 +367,8 @@ db_elf_symbol_values(db_symtab_t *symtab
*valuep = symp-st_value;
 }
 
+#include db_elf_dwarf.c
+
 /*
  * Return the file and line number of the current program counter
  * if we can find the appropriate debugging symbol.
@@ -351,11 +377,15 @@ boolean_t
 db_elf_line_at_pc(db_symtab_t *symtab, db_sym_t cursym, char **filename,
 int *linenum, db_expr_t off)
 {
+   const char *linetab;
+   size_t linetab_size;
 
-   /*
-* XXX We don't support this (yet).
-*/
-   return (FALSE);
+   linetab = db_elf_find_linetab(symtab, linetab_size);
+   if (linetab == NULL)
+   return (FALSE);
+
+   return (db_elf_dwarf_line_at_pc(linetab, linetab_size, off,
+   filename, linenum));
 }
 
 /*
Index: ddb/db_elf_dwarf.c
===
RCS file: ddb/db_elf_dwarf.c
diff -N ddb/db_elf_dwarf.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ ddb/db_elf_dwarf.c  23 Jun 2014 13:37:34 -
@@ -0,0 +1,404 @@
+/* $OpenBSD$*/
+/*
+ * Copyright (c) 2014 Matthew Dempsky matt...@dempsky.org
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS

Re: gcc: warn about overly aligned stack variables

2014-06-23 Thread Matthew Dempsky
On Mon, Jun 23, 2014 at 9:59 AM, Miod Vallat m...@online.fr wrote:
 I love this. I have verified that kernels for all our gcc4 platforms
 still build with this diff.

Awesome!  I also heard back from guenther@ that he completed an i386
base build without warnings too.

Do we want to test this any further (e.g., ports), or should I go
ahead and commit?  If the latter, anyone want to step up to ok it? :)



Re: procfs to die [was: CVS: cvs.openbsd.org: src]

2014-06-22 Thread Matthew Dempsky
I suspect procfs is only enabled on i386 because that's the only arch
with compat_linux support?  If so, anyone who relies on compat_linux
support should be sure to test and report back if they have problems.

On Sun, Jun 22, 2014 at 2:22 PM, Philip Guenther guent...@gmail.com wrote:
 If you're currently using procfs, please respond with exactly what parts of
 it are actually needed by the programs you run.  Without a good reason to
 keep it, it'll be deleted, as I broke listing of /proc back in January and
 no one has noticed, suggesting that no one is actually using it...

 Philip Guenther


 -- Forwarded message --
 From: Philip Guenther guent...@cvs.openbsd.org
 Date: Sun, Jun 22, 2014 at 2:15 PM
 Subject: CVS: cvs.openbsd.org: src
 To: source-chan...@cvs.openbsd.org


 CVSROOT:/cvs
 Module name:src
 Changes by: guent...@cvs.openbsd.org2014/06/22 15:15:51

 Modified files:
 sys/arch/i386/conf: GENERIC

 Log message:
 PROCFS has been broken for months without complaints, so stop building it

 suggested by sthen@



gcc: warn about overly aligned stack variables

2014-06-20 Thread Matthew Dempsky
GCC supports an aligned attribute to specify a minimum alignment for
types/objects.  However, if an object is allocated on the stack and
its alignment exceeds the preferred stack boundary, then GCC 4.2
silently ignores the alignment.

This bit us 4 years ago when the SCSI stack started allocating mutexes
on the stack: HPPA mutexes need to be 16-byte aligned, but HPPA's
stack is naturally only 8-byte aligned.

Since newer GCC properly support overly aligned stack allocations, it
seems prudent to at least make base GCC emit a warning if its going to
ignore an alignment request.

With the diff below, compiling a source file like this:

typedef int __attribute__((aligned(512))) aligned_int;

aligned_int good;

void foo() {
aligned_int bad;
}

now produces a warning like this:

$ cc -c test.c
test.c: In function 'foo':
test.c:6: warning: ignoring alignment for stack allocated 'bad'

I verified this doesn't break an amd64 kernel build, but I haven't had
time to look beyond that.  Sharing in case anyone's interested and/or
wants to test further themselves.

Index: gcc/cfgexpand.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/gnu/gcc/gcc/cfgexpand.c,v
retrieving revision 1.4
diff -u -p -r1.4 cfgexpand.c
--- gcc/cfgexpand.c 6 May 2014 23:32:34 -   1.4
+++ gcc/cfgexpand.c 20 Jun 2014 22:55:53 -
@@ -159,8 +159,10 @@ get_decl_align_unit (tree decl)
 
   align = DECL_ALIGN (decl);
   align = LOCAL_ALIGNMENT (TREE_TYPE (decl), align);
-  if (align  PREFERRED_STACK_BOUNDARY)
+  if (align  PREFERRED_STACK_BOUNDARY) {
+warning (0, ignoring alignment for stack allocated %q+D, decl);
 align = PREFERRED_STACK_BOUNDARY;
+  }
   if (cfun-stack_alignment_needed  align)
 cfun-stack_alignment_needed = align;
 



Re: exec_elf.c: mistake ?

2014-06-16 Thread Matthew Dempsky
Reading through exec_elf.c, I just noticed the uninitialized bdiff
variable myself, and remembered this thread.

Tangentially, the code for worrying about zero-filling the last page
is overzealous.  We only need to zero-fill the page if memsz  filesz
(accounting for alignment and page boundaries).  In the common case
(at least on amd64), we always have either memsz==filesz (most
segments) or filesz==0 (segment for .bss), so we shouldn't need
zero-filling.

And actually, I think the logic for only doing zero-fill for writable
segments is wrong.  It looks to me like the ELF gABI specifies
zero-fill semantics for memsz  filesz even for read-only segments.

On Sat, Oct 5, 2013 at 4:27 PM, Philip Guenther guent...@gmail.com wrote:
 On Thu, 15 Aug 2013, Maxime Villard wrote:
 Hum hum, after investigating a bit, and from what I understood - if I'm
 not mistaken, I think it would make sense if bdiff was set to

   bdiff = ph-p_vaddr - trunc_page(ph-p_vaddr);

 Yep.

 Since we are supposed to align only on 2^n boundaries, if we get 0 or 1
 we do not align on p_align. But p_vaddr still has to be aligned on
 PAGE_MASK (with trunc_page()); I read somewhere that ELF loaders are
 smart enough to adjust the address when it does not exactly fit a page
 boundary. So bdiff should be the difference with the original p_vaddr.
 Actually, bdiff is already set to this value in the other conditions.

 There's another problem, 'base' should also be initialized here. I would
 say that is should be set to the truncated p_vaddr plus the address at
 which we want to load:

   base = *addr + trunc_page(ph-p_vaddr);

 but I'm not sure at all.

 By the logic of the ph-p_align  1 case, it should be
 base = *addr + trunc_page(ph-p_vaddr) - ph-p_vaddr;

 (I think the only reason the 'if' is necessary is that p_align==0 must be
 treated the same as p_align==1.  The ELF_TRUNC() macro handles an
 alignment of 1 correctly, but it'll barf on 0.)


 Philip Guenther




Re: wrong semantic type in kern_descrip.c

2014-06-15 Thread Matthew Dempsky
On Sun, Jun 15, 2014 at 1:20 PM, Jean-Philippe Ouellet
jean-phili...@ouellet.biz wrote:
 Those 4 memcpy()s are copying the things referenced by the old
 filedesc to the new one.  The things being copied are file*s,
 not file**s. They're the same size anyway, but still...

Technically, C99 grants implementations leeway to use different
representations for pointers to structure type and pointers to
other types (C99 6.2.5 P26), though I can't imagine implementations
where this would actually come up, and OpenBSD certainly doesn't
support any.

Committed, thanks!



Re: 9p

2014-06-04 Thread Matthew Dempsky
On Wed, Jun 4, 2014 at 2:00 PM, M Farkas-Dyck strake...@gmail.com wrote:
 Would a rwlock do? The sender and recver operate asynchronously, so
 the sender needs to hold a lock while sending and release it when
 asleep, but it can't be a mutex as the send operation may sleep, so I
 used requ.ready as the lock, but a rwlock seems appropriate.

msleep() unlocks the mutex after the process is placed on the sleep
queue, and then relocks the mutex after being woken up.  I would
expect that should be sufficient for your use case, but I haven't
looked close enough.

Style nits: Please keep source files ASCII; i.e., Copyright (C)
instead of the Unicode copyright character.  Also, look at man 9
style; in particular, OpenBSD doesn't put spaces around - and we
keep lines under 80 characters long.

Oh, and no static functions in the kernel.  They cause problems with ddb.



Re: recvmsg, fd passing and soreceive

2014-05-14 Thread Matthew Dempsky
On Wed, May 14, 2014 at 4:58 AM, Jérémie Courrèges-Anglas
j...@wxcvbn.org wrote:
 if (cmsg-cmsg_len == CMSG_LEN(sizeof(int)) 
 cmsg-cmsg_level == SOL_SOCKET 
 cmsg-cmsg_type == SCM_RIGHTS) {
 passed_fd = *(int *)CMSG_DATA(cmsg);

In your test program you only ever send one FD, so this is fine; but
in general multiple FDs can be sent at a time, and you'll receive a
single SOL_SOCKET/SCM_RIGHTS control message with an array of N ints.

E.g., see 
https://code.google.com/p/chromium/codesearch#chromium/src/base/posix/unix_domain_socket_linux.ccl=130.

 So the two issues show up when you want to call sendmsg and recvmsg
 with no data at all, but only control messages.  sendmsg allocates an
 empty mbuf for data (m_len == 0), this mbuf is not discarded and freed
 by soreceive for SOCK_STREAM sockets, and stays in the socket buffer.

 I understand that this API was probably not designed to pass only
 control messages, and no data at all.  But it is easily fixable, and the
 manpage could provide portability information.

Blah, I'm not really a fan of allowing sending zero-length data
messages.  It makes it tricky for receivers to tell if they actually
received an EOF or not.

But if we're going to allow it (which we do currently it seems), then
your diff seems like a reasonable fix to me.  mbufs aren't my
specialty though. =/



Re: malloc in libssl/src/apps

2014-05-05 Thread Matthew Dempsky
On Mon, May 5, 2014 at 3:56 PM, Alexander Hall alexan...@beard.se wrote:
 I believe a similar situation could appear with not explicitly initialized
 global or static declarations, e.g. in
 sbin/fsirand/fsirand.c:

 fsirand(char *device)
 {
 ...
 static char *inodebuf;

This is safe too: C requires that pointer variables with static
storage duration like this be initialized to a null pointer (C99
section 6.7.8 paragraph 10), not to a sequence of 0 bytes.



Re: Question and regression test for strftime adn wcsftime

2014-04-22 Thread Matthew Dempsky
On Tue, Apr 22, 2014 at 2:43 PM, Stefan Sperling s...@openbsd.org wrote:
 Your regression test has at least one bug ('bad' is never initialised).

It's perhaps bad *style* to not explicitly initialize it, but C99
6.7.8p10 says If an object that has static storage duration is not
initialized explicitly, then: [...] if it has arithmetic type, it is
initialized to (positive or unsigned) zero.  So technically not a
bug. :)



Re: dhclient support for /32 assignments

2013-12-09 Thread Matthew Dempsky
On Tue, Dec 3, 2013 at 4:15 PM, Matthew Dempsky matt...@dempsky.org wrote:
 The patch below extends dhclient to mimic this logic from ISC DHCP's
 linux script:

 if [ x$new_subnet_mask = x255.255.255.255 ] ; then
   route add -host $router dev $interface
 fi
 route add default gw $router $metric_arg dev $interface

Just for the archive's benefit, I was able to track this down to this
RELNOTES entry in ISC DHCP:

- The dhclient-script was updated to create a host route for the default
  gateway if the supplied subnet mask for an IPv4 address was a /32.  This
  allows the client to work in 'captive' network environments, where the
  operator does not want clients to crosstalk directly.

listed under Changes since 4.0.0 (new features), which gives a bit
more context for the change.  I haven't found anything else
interesting though, and I've given up hope of finding anything.



Re: dhclient support for /32 assignments

2013-12-04 Thread Matthew Dempsky
On Tue, Dec 03, 2013 at 11:48:05PM -0500, Kenneth Westerback wrote:
 Rfc 3442 is what I referred to.

I don't think RFC 3442 discusses what to do with /32 IP address
assignments though?

Anyway, below is a revised diff that does the same direct-route magic
for all gateway IPs, not just the default gateway IP.  I'll try this
out later today to make sure it still works with Compute Engine.

Assuming it works as intended, ok?

Index: dhclient.c
===
RCS file: /cvs/src/sbin/dhclient/dhclient.c,v
retrieving revision 1.268
diff -u -p -r1.268 dhclient.c
--- dhclient.c  20 Nov 2013 17:22:46 -  1.268
+++ dhclient.c  4 Dec 2013 18:45:21 -
@@ -107,9 +107,11 @@ struct client_lease *clone_lease(struct 
 voidsocket_nonblockmode(int);
 voidapply_ignore_list(char *);
 
-void add_default_route(int, struct in_addr, struct in_addr);
-void add_static_routes(int, struct option_data *);
-void add_classless_static_routes(int, struct option_data *);
+void add_default_route(int, struct in_addr, struct in_addr, struct in_addr);
+void add_static_routes(int, struct in_addr, struct in_addr,
+struct option_data *);
+void add_classless_static_routes(int, struct in_addr, struct in_addr,
+struct option_data *);
 
 int compare_lease(struct client_lease *, struct client_lease *);
 
@@ -871,8 +873,8 @@ bind_lease(void)
 */
add_address(ifi-name, ifi-rdomain, client-new-address, mask);
if (options[DHO_CLASSLESS_STATIC_ROUTES].len) {
-   add_classless_static_routes(ifi-rdomain,
-   options[DHO_CLASSLESS_STATIC_ROUTES]);
+   add_classless_static_routes(ifi-rdomain, client-new-address,
+   mask, options[DHO_CLASSLESS_STATIC_ROUTES]);
} else {
if (options[DHO_ROUTERS].len) {
memset(gateway, 0, sizeof(gateway));
@@ -880,11 +882,11 @@ bind_lease(void)
memcpy(gateway.s_addr, options[DHO_ROUTERS].data,
options[DHO_ROUTERS].len);
add_default_route(ifi-rdomain, client-new-address,
-   gateway);
+   mask, gateway);
}
if (options[DHO_STATIC_ROUTES].len)
-   add_static_routes(ifi-rdomain,
-   options[DHO_STATIC_ROUTES]);
+   add_static_routes(ifi-rdomain, client-new-address,
+   mask, options[DHO_STATIC_ROUTES]);
}
 
client-new-resolv_conf = resolv_conf_contents(
@@ -2379,6 +2381,23 @@ priv_write_file(struct imsg_write_file *
 }
 
 /*
+ * If we were given a /32 IP address assignment, then make sure the
+ * gateway address is routable with the equivalent of
+ *
+ * route add -net $gw -netmask 255.255.255.255 -cloning -iface $addr
+ */
+void
+ensure_direct_route(int rdomain, struct in_addr addr, struct in_addr addrmask,
+struct in_addr gateway)
+{
+   if (addrmask.s_addr == INADDR_BROADCAST) {
+   add_route(rdomain, gateway, addrmask, addr,
+   RTA_DST | RTA_NETMASK | RTA_GATEWAY,
+   RTF_CLONING | RTF_STATIC);
+   }
+}
+
+/*
  * add_default_route is the equivalent of
  *
  * route -q $rdomain add default -iface $router
@@ -2388,11 +2407,14 @@ priv_write_file(struct imsg_write_file *
  * route -q $rdomain add default $router
  */
 void
-add_default_route(int rdomain, struct in_addr addr, struct in_addr gateway)
+add_default_route(int rdomain, struct in_addr addr, struct in_addr addrmask,
+struct in_addr gateway)
 {
struct in_addr netmask, dest;
int addrs, flags;
 
+   ensure_direct_route(rdomain, addr, addrmask, gateway);
+
memset(netmask, 0, sizeof(netmask));
memset(dest, 0, sizeof(dest));
addrs = RTA_DST | RTA_NETMASK;
@@ -2412,23 +2434,26 @@ add_default_route(int rdomain, struct in
 }
 
 void
-add_static_routes(int rdomain, struct option_data *static_routes)
+add_static_routes(int rdomain, struct in_addr addr, struct in_addr addrmask,
+struct option_data *static_routes)
 {
struct in_addr   dest, netmask, gateway;
-   u_int8_t *addr;
+   u_int8_t *data;
int  i;
 
memset(netmask, 0, sizeof(netmask));   /* Not used for CLASSFULL! */
 
for (i = 0; (i + 7)  static_routes-len; i += 8) {
-   addr = static_routes-data[i];
+   data = static_routes-data[i];
memset(dest, 0, sizeof(dest));
memset(gateway, 0, sizeof(gateway));
 
-   memcpy(dest.s_addr, addr, 4);
+   memcpy(dest.s_addr, data, 4);
if (dest.s_addr == INADDR_ANY)
continue; /* RFC 2132 says 0.0.0.0 is not allowed. */
-   memcpy(gateway.s_addr, addr+4, 4);
+   

Re: dhclient support for /32 assignments

2013-12-04 Thread Matthew Dempsky
On Wed, Dec 04, 2013 at 02:10:21PM -0500, Kenneth R Westerback wrote:
 No, that was my point. i.e. don't avoid adding the route when given
 a /32 address just because class static routes are also present.

I think there might be a misunderstanding, so let me back up and try
to clarify. :)

Compute Engine gives out DHCP leases like ip: 10.240.77.88, netmask:
255.255.255.255, gateway: 10.240.0.1.  The problem is 10.240.0.1
isn't routable given a 10.240.77.88/32 IP address.

ISC DHCP on Linux handles this by special casing netmask ==
255.255.255.255, and adding an extra direct route for the gateway IP.
E.g., for the above assignment, ISC DHCP would run route add
10.240.0.1 dev eth0 which tells Linux's network stack 10.240.0.1 is
directly accessible via eth0, even though it's not part of the
10.240.77.88/32 subnet.  (It would then run route add default
10.240.0.1 as per normal.)

As far as I can tell, there's no authoritative/standard document that
describes this special case, and ISC DHCP only applies this special
case to the default gateway IP specified by DHO_ROUTERS.  Notably, it
does *not* apply this direct routing logic for gateway IPs specified
by DHO_STATIC_ROUTES and/or DHO_CLASSLESS_STATIC_ROUTES, but it's
unclear whether that's intentional or accidental.

 Not from me.  This is way bigger that the last three line diff, and
 I have insufficient routing foo to comment on the usefulness of
 all these routes.

The original three line diff was so short because it mimicked ISC DHCP
and only special cased the default gateway (i.e., DHO_ROUTERS), and
only did so when DHO_ROUTERS was actually used (i.e., when
DHO_CLASSLESS_STATIC_ROUTES is not specified).

You suggested we should unconditionally add the route, but it was
unclear to me whether you meant:

  1. We should always apply the special case logic for adding a
 direct route for the default gateway specified by DHO_ROUTERS, even
 if DHO_CLASSLESS_STATIC_ROUTES is specified; or

  2. We should apply the special case logic to add direct routes for
 all gateways (i.e., those specified by DHO_CLASSLESS_STATIC_ROUTES
 and DHO_STATIC_ROUTES too), not just the default gateway.

Interpretation #1 didn't seem right, because if
DHO_CLASSLESS_STATIC_ROUTES is specified, then DHO_ROUTERS isn't used
at all, so it might not be meaningful to add a direct route for
DHO_ROUTERS's gateway.

Interpretation #2 seems reasonable (though deviating from ISC DHCP's
undocumented behavior), so that's what the second patch implemented.
But that's necessarily more code than the original diff, because we
need to repeat the special case logic for each gateway IP as they're
extracted from the DHCP options.

The change is somewhat bigger and more invasive, but I think not that
much more complex:

  1. Add a function ensure_direct_route() function that applies ISC
 DHCP's /32 logic for a given gateway IP address (same code as from
 first patch).

  2. Call ensure_direct_route() in each of add_default_route(),
 add_static_routes(), and add_classless_static_routes() for each
 gateway IP address.

  3. Add extra 'addr' and 'addrmask' parameters to each of the
 add_*_routes() functions to pass the extra data needed for
 ensure_direct_route(), and fix call sites appropriately.

  4. In add_static_routes(), rename the local 'addr' variable that
 points into option data to 'data' to avoid conflicting with the
 new 'addr' parameter.



Re: select.2: update includes

2013-12-03 Thread Matthew Dempsky
On Tue, Dec 3, 2013 at 12:08 PM, Christian Weisgerber
na...@mips.inka.de wrote:
 ok?

ok matthew

 +.Fd #include sys/select.h

Worth using .In instead while you're at it?



Re: select.2: update includes

2013-12-03 Thread Matthew Dempsky
On Tue, Dec 3, 2013 at 12:37 PM, Philip Guenther guent...@gmail.com wrote:
 string.h needs to stay until FD_ZERO() and FD_COPY() are changed to
 not use memset()/memcpy().

Good point.

Would something like this work?

#define FD_COPY(f, t)   (*(fd_set *)(t) = *(const fd_set *)(f))
static const fd_set __fd_zero_set;
#define FD_ZERO(p)  FD_COPY(__fd_zero_set, p)

Downside is we lose const warnings.  We could get those back if we
change FD_COPY into an inline function instead of a macro, or do some
GNU block-expression trickery.

Or we can just declare memset/memcpy in sys/select.h too?



Re: select.2: update includes

2013-12-03 Thread Matthew Dempsky
On Tue, Dec 3, 2013 at 1:55 PM, Christian Weisgerber na...@mips.inka.de wrote:
 +#if __BSD_VISIBLE
 +#defineFD_COPY(f, t)   (void)(*(t) = *(f))
 +#endif
 +#defineFD_ZERO(p) do   \
 +   fd_set *_p = (p);   \
 +   __size_t _n = __howmany(FD_SETSIZE, __NFDBITS); \
 +   \
 +   while (_n  0)  \
 +   _p-fds_bits[--_n] = 0; \
 +} while (0)

I think you're missing a { after the do. :)

But otherwise looks ok to me.



Re: select.2: update includes

2013-12-03 Thread Matthew Dempsky
On Tue, Dec 3, 2013 at 1:39 PM, Philip Guenther guent...@gmail.com wrote:
 What problem does the casts solve?

I wasn't sure if people might be calling FD_COPY()/FD_ZERO() with
void* or char* typed arguments (or other custom sized types).  If we
can assume they'll only pass fd_set* arguments, then they could be
done away with.

 (And actually, as noted in naddy's cite from FreeBSD, FD_COPY() is a
 non-standard BSDism, so it can just continue to assume the caller
 pulled in string.h)

Sounds good to me.

 That'll put a __fd_zero_set in each .o that references FD_ZERO(), ick.

Oops, yeah, I was thinking they'd be collapsed into a single const
value in the output object, but I forgot they'd need to be kept
separate for object identity.



dhclient support for /32 assignments

2013-12-03 Thread Matthew Dempsky
The patch below extends dhclient to mimic this logic from ISC DHCP's
linux script:

if [ x$new_subnet_mask = x255.255.255.255 ] ; then
  route add -host $router dev $interface
fi
route add default gw $router $metric_arg dev $interface

With this change, dhclient is able to successfully configure the
network via DHCP on Compute Engine:

$ ifconfig vio0
vio0: flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 
1500
lladdr 42:01:0a:f0:8f:56
priority: 0
groups: egress
media: Ethernet autoselect
status: active
inet6 fe80::4001:aff:fef0:8f56%vio0 prefixlen 64 scopeid 0x1
inet 10.240.143.86 netmask 0x
$ netstat -r -n -f inet
Routing tables

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
default10.240.0.1 UGS1  183 - 8 vio0 
10.240.0.1 42:01:0a:f0:00:01  UHLc   10 -56 vio0 
10.240.0.1/32  link#1 UC 10 -56 vio0 
10.240.143.86/32   link#1 UC 00 - 4 vio0 
127/8  127.0.0.1  UGRS   00 33144 8 lo0  
127.0.0.1  127.0.0.1  UH 10 33144 4 lo0  
224/4  127.0.0.1  URS00 33144 8 lo0  

ok?

Index: dhclient.c
===
RCS file: /cvs/src/sbin/dhclient/dhclient.c,v
retrieving revision 1.268
diff -u -p -r1.268 dhclient.c
--- dhclient.c  20 Nov 2013 17:22:46 -  1.268
+++ dhclient.c  4 Dec 2013 00:06:08 -
@@ -879,6 +879,21 @@ bind_lease(void)
/* XXX Only use FIRST router address for now. */
memcpy(gateway.s_addr, options[DHO_ROUTERS].data,
options[DHO_ROUTERS].len);
+
+   /*
+* If we were given a /32 IP assignment, then make sure
+* the gateway address is routable with equivalent of
+*
+* route add -net $gw -netmask 255.255.255.255 \
+* -cloning -iface $addr
+*/
+   if (mask.s_addr == INADDR_BROADCAST) {
+   add_route(ifi-rdomain, gateway, mask,
+   client-new-address,
+   RTA_DST | RTA_NETMASK | RTA_GATEWAY,
+   RTF_CLONING | RTF_STATIC);
+   }
+
add_default_route(ifi-rdomain, client-new-address,
gateway);
}



Re: dhclient support for /32 assignments

2013-12-03 Thread Matthew Dempsky
On Tue, Dec 3, 2013 at 5:55 PM, Kenneth R Westerback
kwesterb...@rogers.com wrote:
 Located here, the addition of the 255.255.255.255 route is not done in the
 presence of DHO_CLASSLESS_STATIC_ROUTES. As I recall only DHO_ROUTERS and
 DHO_STATIC_ROUTES are incompatible with DHO_CLASSLESS_STATIC_ROUTES. So
 we may want this chunk to occur unconditionally.

I'm not opposed to that per se and I don't imagine it should cause any
problems, but it would diverge somewhat from ISC DHCP's behavior.  For
reference, ISC DHCP 4.2.5-P1 doesn't support
DHO_CLASSLESS_STATIC_ROUTES, the Linux client script doesn't support
DHO_STATIC_ROUTES, and the FreeBSD script only does this routing magic
for DHO_ROUTERS not DHO_STATIC_ROUTES.

I'll try to find something authoritative on how we're 'supposed' to behave here.



Re: [PATCH] ELF: ensure PT_INTERP strings are NULL-terminated

2013-08-28 Thread Matthew Dempsky
On Wed, Aug 28, 2013 at 5:54 AM, Maxime Villard m...@m00nbsd.net wrote:
 +   /* Ensure interp is a valid, NUL-terminated string */
 +   for (n = 0; n  pp-p_filesz; n++) {
 +   if (interp[n] == '\0')
 +   break;
 +   }
 +   if (n != pp-p_filesz - 1) {
 +   error = ENOEXEC;
 goto bad;
 }

A more concise test would be:

if (strnlen(interp, pp-p_filesz) != pp-p_filesz - 1) {
error = ENOEXEC;
goto bad;
}



Re: Update elf(5)

2013-07-09 Thread Matthew Dempsky
Might be nice to mention that 5.4 'was' the first release that used
ELF on all platforms?

On Mon, Jul 8, 2013 at 11:16 PM, Simon Kuhnle si...@blarzwurst.de wrote:
 With VAX being an ELF platform, this is no longer true, is it?

 Regards
 Simon

 Index: elf.5
 ===
 RCS file: /cvs/src/share/man/man5/elf.5,v
 retrieving revision 1.20
 diff -u -r1.20 elf.5
 --- elf.5   17 Jan 2013 21:54:18 -  1.20
 +++ elf.5   9 Jul 2013 06:14:54 -
 @@ -1355,9 +1355,7 @@
  .Sh HISTORY
  .Ox
  ELF support first appeared in
 -.Ox 1.2 ,
 -although not all supported platforms use it as the native
 -binary file format.
 +.Ox 1.2 .
  ELF in itself first appeared in
  .At V .
  The ELF format is an adopted standard.




Re: Update elf(5)

2013-07-09 Thread Matthew Dempsky
On Tue, Jul 9, 2013 at 12:22 AM, Marc Espie es...@nerim.net wrote:
 Starting
 .Ox 5.4 ,
 all supported platforms use it as the native binary file format.

I think Starting *with* .Ox 5.4, ... but otherwise ok matthew.



Re: Update elf(5)

2013-07-09 Thread Matthew Dempsky
Committed.  Thanks Simon!

On Tue, Jul 9, 2013 at 10:21 AM, Jason McIntyre j...@kerhand.co.uk wrote:
 On Tue, Jul 09, 2013 at 07:07:03PM +0200, Marc Espie wrote:
 On Tue, Jul 09, 2013 at 06:09:02PM +0200, Simon Kuhnle wrote:
  On Tue, Jul 09, 2013 at 12:29:10AM -0700, Matthew Dempsky wrote:
   On Tue, Jul 9, 2013 at 12:22 AM, Marc Espie es...@nerim.net wrote:
Starting
.Ox 5.4 ,
all supported platforms use it as the native binary file format.
  
   I think Starting *with* .Ox 5.4, ... but otherwise ok matthew.
 
  Updated with your suggestions.
 
  Regards
  Simon
 
  Index: elf.5
  ===
  RCS file: /cvs/src/share/man/man5/elf.5,v
  retrieving revision 1.20
  diff -u -r1.20 elf.5
  --- elf.5   17 Jan 2013 21:54:18 -  1.20
  +++ elf.5   9 Jul 2013 16:07:27 -
  @@ -1355,9 +1355,10 @@
   .Sh HISTORY
   .Ox
   ELF support first appeared in
  -.Ox 1.2 ,
  -although not all supported platforms use it as the native
  -binary file format.
  +.Ox 1.2 .
  +Starting with
  +.Ox 5.4 ,
  +all supported platforms use it as the native binary file format.
   ELF in itself first appeared in
   .At V .
   The ELF format is an adopted standard.
 Okay


 ok.
 jmc




Re: Static variables

2013-07-08 Thread Matthew Dempsky
On Mon, Jul 8, 2013 at 2:06 AM, Maxime Villard rusty...@gmx.fr wrote:
 Ah, yes. I didn't know.

For what it's worth, this is specified in C99 §6.7.8 (Initializaton)
paragraph 10:

If an object that has static storage duration is not initialized
explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively)
according to these rules;
— if it is a union, the first named member is initialized (recursively)
according to these rules.

On OpenBSD (and most, if not all, ELF platforms), this is implemented
by assigning these objects into the .bss section, which is initialized
to all zero bytes at program startup, taking advantage of the fact
that all of our platforms represent null pointers and zero values as
sequences of zero bytes.



Re: fsck vs fsck_ffs WHAT'S THE DIFFERENCE?

2013-06-18 Thread Matthew Dempsky
On Tue, Jun 18, 2013 at 11:04 AM,  laborat...@cpnetserver.net wrote:
 Please, What is the difference between 'fsck' and 'fsck_ffs' command?

fsck detects the filesystem on disk and runs the appropriate
fsck_${FILESYSTEM} utility automatically.



Re: catopen(3) improvements

2013-05-31 Thread Matthew Dempsky
On Fri, May 31, 2013 at 5:19 PM, Stefan Sperling s...@openbsd.org wrote:
 Existing lib/libc/nls/*.msg files are renamed to the names shown in
 the libc/Makefile part of the diff, and new ones are added to support
 the UTF-8 locale (converted from the existing .msg files with iconv).
 I'm not including this change in the diff since that would mix four
 different character sets in a single email.

Is it possible for us to use UTF-8 for all of the source files, but
convert them to KOI8 or ISO-8859-1 as appropriate at build/install
time?  Or is that what you mean by we don't have iconv in base, so
there's no functionality in base for converting between two charsets?



Re: catopen(3) improvements

2013-05-31 Thread Matthew Dempsky
On Fri, May 31, 2013 at 5:43 PM, Stefan Sperling s...@openbsd.org wrote:
 Yes. Conversion currently depends on the GNU iconv port.
 If iconv existed in base we could use just the UTF-8 source files.

Hmm, I see.  In that case, I guess committing converted files makes sense.

In your man page diff, you should make use of .Dv and .Ev to identify
the defines and environment variables.

Instead of setting FD_CLOEXEC with fcntl(), just use O_CLOEXEC when opening.

It looks like parse_lang() won't handle locales like C.UTF-8 because
there's no '_'.

I think you should be able to implement the %l/%t/%c stuff without
needing to call strdup() for each one (e.g., by just doing a one time
pass over the locale string up front and noting where the '_' and '.'
chars are).  Otherwise, you should report strdup() failures as ENOMEM
to the catopen() caller, I think.



Re: check for device_lookup result in vscsi

2013-05-10 Thread Matthew Dempsky
On Fri, May 10, 2013 at 6:02 AM, Mike Belopuhov m...@belopuhov.com wrote:
 my intention here is very simple: there's a way you should call
 device_lookup and everyone has to fulfill it's part of the contract.
 all our devices do, vscsi doesn't.  what's the reason for it to be
 one of a kind?

I'm ok with adding a check, though I don't think it should really be
necessary either.

Just a thought: What about adding them as KASSERT()s?

Why doesn't your diff include vscsiclose()?



Re: strcasecmp and multibyte encodings

2013-04-09 Thread Matthew Dempsky
On Tue, Apr 9, 2013 at 11:10 AM, Stefan Sperling s...@openbsd.org wrote:
 + size_t lus1 = strlen(us1);
 + size_t lus2 = strlen(us2);

These strlen() calls are also wrong, because they could read past the
n bytes allowed for strncasecmp().

 It seems strncasecmp() cannot return errors according to POSIX.
 Only the _l variants have errors defined.

The convention for methods like these to return errors is the caller
is required to set errno = 0; before the call, and then check for
errno != 0 after the call. :(

 While I recognize that POSIX mandates the current locale to be used for
 case conversion, I'm not sure this change is worth the extra complexity.
 As you point out, the standard seems to throw its hands up in the air
 when the question about conversion errors with strcasecmp() and strncasecmp()
 is raised, possibly because these functions are older than the locale concept.

The definition of strncasecmp() has actually been a pretty contentious
topic on the POSIX mailing list for the past few weeks, albeit mostly
focused on whether LC_CTYPE=POSIX allows for UTF-8 or not.

E.g., http://austingroupbugs.net/view.php?id=663

I'm inclined to leave the functions as is for now.



Re: rthreads are always enabled

2013-04-05 Thread Matthew Dempsky
On Fri, Apr 5, 2013 at 3:26 PM, Ted Unangst t...@tedunangst.com wrote:
 Index: sys/sysctl.h
 ===
 RCS file: /cvs/src/sys/sys/sysctl.h,v
 retrieving revision 1.131
 diff -u -p -r1.131 sysctl.h
 --- sys/sysctl.h24 Mar 2013 00:09:31 -  1.131
 +++ sys/sysctl.h5 Apr 2013 22:18:44 -
 @@ -174,7 +174,6 @@ struct ctlname {
  #defineKERN_CPTIME271  /* array: cp_time2 */
  #defineKERN_CACHEPCT   72  /* buffer cache % of physmem 
 */
  #defineKERN_FILE2  73  /* struct: file entries */
 -#defineKERN_RTHREADS   74  /* kernel rthreads support 
 enabled */
  #defineKERN_CONSDEV75  /* dev_t: console terminal 
 device */
  #defineKERN_NETLIVELOCKS   76  /* int: number of network 
 livelocks */
  #defineKERN_POOL_DEBUG 77  /* int: enable pool_debug */

Maybe leave a comment that 74 was KERN_RTHREADS like for KERN_PROC?

 @@ -256,7 +255,7 @@ struct ctlname {
 { cp_time2, CTLTYPE_STRUCT }, \
 { bufcachepercent, CTLTYPE_INT }, \
 { file2, CTLTYPE_STRUCT }, \
 -   { rthreads, CTLTYPE_INT }, \
 +   { rthreads, 0 }, \
 { consdev, CTLTYPE_STRUCT }, \
 { netlivelocks, CTLTYPE_INT }, \
 { pool_debug, CTLTYPE_INT }, \

Maybe change rthreads to gap like the other removed sysctl entries?

Otherwise seems ok to me.



Re: ksh SIGFPE

2013-03-27 Thread Matthew Dempsky
On Wed, Mar 27, 2013 at 3:56 PM, Nicholas Marriott
nicholas.marri...@gmail.com wrote:
 case O_DIV:
 case O_DIVASN:
 +   if (vl-val.i == LONG_MIN  vr-val.i == -1)
 +   evalerr(es, ET_STR, can't represent result);
 res = vl-val.i / vr-val.i;
 break;

I think LONG_MIN / -1 should still be LONG_MIN.  We don't give errors
for other overflow conditions.



Re: ksh SIGFPE

2013-03-27 Thread Matthew Dempsky
On Wed, Mar 27, 2013 at 4:15 PM, Nicholas Marriott
nicholas.marri...@gmail.com wrote:
 Sure, that actually looks to be what other shells do anyhow.

That looks ok to me.

Which shells did you check, out of curiosity?  On Goobuntu, both bash
and dash give SIGFPE too actually.

Checking POSIX, I notice that it requires that shell semantics match
the C language semantics, but then C semantics for LONG_MIN / -1 and
LONG_MIN % -1 appear to be undefined*... so there's nothing to match
really.

(* The glibc manual claims C requires LONG_MIN % -1 to evaluate to
0, but my reading of C11 6.5.5.6 says since LONG_MIN / -1 is not
representable, that the behavior of LONG_MIN % -1 is undefined.)



Re: vfs_syscall: doreadlinkat()

2013-01-30 Thread Matthew Dempsky
Committed, thanks!


Re: vfs_syscall: doreadlinkat()

2013-01-24 Thread Matthew Dempsky
On Thu, Jan 24, 2013 at 9:57 AM, Maxime Villard rusty...@gmx.fr wrote:

 Hum here, if vp-v_type != VLNK, auio is untouched, but before returning
 we use auio.uio_resid, which is not initialized. Is it?


Nice catch.  We should probably move the *retval assignment up just after
the VOP_READLINK() call, since this can technically result in undefined
behavior if you try to readlink on a non-symlink file.

I don't think it should leak any information moment though, since the
EINVAL errno will take precedence instead of *retval when we return to
userspace.



Re: tcp ping

2012-09-13 Thread Matthew Dempsky
On Thu, Sep 13, 2012 at 12:02 PM, Ted Unangst t...@tedunangst.com wrote:
 This adds a -T portnum option to ping.  I haven't polished the output
 because I'm not sure if this is desirable or not, but I found it
 useful.  If it's not a hell no, never in base I can finish it up some.

Is there precedent for having TCP-based ping in any other OS's ping(1)
utility?  It seems like adding a separate tcpping(1) utility and
leaving ping(1) for ICMP would be more appropriate.

FWIW, nmap has a lot of TCP ping options; e.g., sending just SYN and
ACK packets for pings instead of a complete TCP handshake.



Switch to -fstack-protector-all by default

2012-09-12 Thread Matthew Dempsky
The diff below changes GCC's default behavior to -fstack-protector-all
(i.e., add stack protection code to every function instead of just
some based on heuristics), but you can still revert to the heuristic
behavior by passing -fstack-protector on the command line.

Beware this diff has only received very light testing so far, so
expect things to break and to need to rollback or reinstall.  Please
report back to the list what breaks so we can take a look and figure
out how to address it.

For most things that break, it should be possible to just add
CFLAGS+=-fstack-protector to an appropriate Makefile to at least get
the old behavior.  Expect at least ld.so on sparc, sparc64, and
powerpc to need -fstack-protector.  The kernel is likely to need this
too.

To apply the patch:

$ cd /usr/src/gnu/gcc/gcc
$ patch  /path/to/diff
$ cd /usr/src/gnu/usr.bin/cc
$ make depend
$ make
$ make install

You can now try building things as usual.

Thanks!


Index: toplev.c
===
RCS file: /cvs/src/gnu/gcc/gcc/toplev.c,v
retrieving revision 1.2
diff -u -p -r1.2 toplev.c
--- toplev.c30 May 2010 22:04:10 -  1.2
+++ toplev.c12 Sep 2012 17:03:45 -
@@ -1834,7 +1834,7 @@ process_options (void)
   /* Targets must be able to place spill slots at lower addresses.  If the
  target already uses a soft frame pointer, the transition is trivial.  */
   if (flag_stack_protect == -1)
-flag_stack_protect = FRAME_GROWS_DOWNWARD ? 1 : 0;
+flag_stack_protect = FRAME_GROWS_DOWNWARD ? 2 : 0;
   if (!FRAME_GROWS_DOWNWARD  flag_stack_protect)
 {
   warning (0, -fstack-protector not supported for this target);



ld.so pre-install sanity check

2012-09-12 Thread Matthew Dempsky
This diff is super hacky and just mailed out for proof-of-concept, but
it's something that's been handy for me while hacking on ld.so.
Basically before installing a new version of ld.so, it just runs a
simple test program to double check that ld.so isn't completely hosed.

There's plenty of issues with this diff as is (e.g., doesn't work for
cross builds, assumes make obj, make clean doesn't work), but I'm
curious what people think of the concept and whether anyone has
suggestions how to improve it.


Index: Makefile
===
RCS file: /cvs/src/libexec/ld.so/Makefile,v
retrieving revision 1.37
diff -u -p -r1.37 Makefile
--- Makefile3 Feb 2010 20:49:00 -   1.37
+++ Makefile12 Sep 2012 17:26:32 -
@@ -35,4 +35,13 @@ ELF_LDFLAGS+=--shared -Bsymbolic --no-un
 $(PROG):
$(LD) -x -e _dl_start $(ELF_LDFLAGS) -o $(PROG) $(OBJS) $(LDADD)
 
+# XXX: Only for native builds
+realinstall: sanitycheck
+
+.PHONY: sanitycheck
+sanitycheck: sanitycheckprog $(PROG)
+   ./sanitycheckprog
+sanitycheckprog: sanitycheckprog.o
+   $(CC) -o sanitycheckprog sanitycheckprog.o 
-Wl,--dynamic-linker,${.CURDIR}/obj/$(PROG)
+
 .include bsd.prog.mk
Index: sanitycheckprog.c
===
RCS file: sanitycheckprog.c
diff -N sanitycheckprog.c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ sanitycheckprog.c   12 Sep 2012 17:26:32 -
@@ -0,0 +1,7 @@
+#include stdio.h
+
+int main()
+{
+   puts(ld.so works);
+   return 0;
+}



Re: GCC diff needs testing on multiple arches

2012-08-30 Thread Matthew Dempsky
Just pinging this again since I've only received one test report so
far, and this really needs more testing on architectures other than
just amd64.  alpha and hppa don't have -fstack-protector, so really
that means arm, i386, mips, powerpc, sh, and sparc64.  Please please
please test if you have one of these machines!

Making sure this works is an important step towards enabling
-fstack-protector-all.  It also makes -fstack-protector emit slightly
smaller, slightly faster, and slightly more secure code for shared
libraries, all of which are nice perks.


On Wed, Aug 29, 2012 at 11:16:16AM -0700, Matthew Dempsky wrote:
 First, make sure you're using a reasonably up to date snapshot.
 You're up to date if nm /usr/lib/crtbegin.o mentions __guard_local.
 
 Next, apply the patch below to GCC, recompile, and install:
 
   $ cd /usr/src/gnu/gcc/gcc
   $ patch  /path/to/diff
   $ cd /usr/src/gnu/usr.bin/cc
   $ make depend  make  make install
 
 Finally, test that stuff still works after you build it. :) In
 particular, make sure that you can still do a full make build by
 following release(8).
 
 Please email me back with test reports.  I'm interested in getting a
 variety of (ELF) architectures before committing this.
 
 Thanks!
 
 
 Index: targhooks.c
 ===
 RCS file: /cvs/src/gnu/gcc/gcc/targhooks.c,v
 retrieving revision 1.2
 diff -u -p -r1.2 targhooks.c
 --- targhooks.c   9 May 2010 11:48:50 -   1.2
 +++ targhooks.c   29 Aug 2012 18:05:48 -
 @@ -372,7 +372,7 @@ default_stack_protect_guard (void)
  
if (t == NULL)
  {
 -  t = build_decl (VAR_DECL, get_identifier (__guard), ptr_type_node);
 +  t = build_decl (VAR_DECL, get_identifier (__guard_local), 
 ptr_type_node);
TREE_STATIC (t) = 1;
TREE_PUBLIC (t) = 1;
DECL_EXTERNAL (t) = 1;
 @@ -380,6 +380,8 @@ default_stack_protect_guard (void)
TREE_THIS_VOLATILE (t) = 1;
DECL_ARTIFICIAL (t) = 1;
DECL_IGNORED_P (t) = 1;
 +  DECL_VISIBILITY (t) = VISIBILITY_HIDDEN;
 +  DECL_VISIBILITY_SPECIFIED (t) = 1;
  
stack_chk_guard_decl = t;
  }



Re: GCC diff needs testing on multiple arches

2012-08-30 Thread Matthew Dempsky
On Wed, Aug 29, 2012 at 11:16 AM, Matthew Dempsky matt...@dempsky.org wrote:
 First, make sure you're using a reasonably up to date snapshot.
 You're up to date if nm /usr/lib/crtbegin.o mentions __guard_local.

At least two people have commented that the snapshots they downloaded
don't have __guard_local yet.  Here's are some extra preliminary steps
to help users with old snapshots:

First, binutils needs to be up to date (i.e., less than 9 days old).
If it's up to date, then ld --verbose | grep -c randomdata will
print a non-zero value.  If it prints zero, follow the steps at
http://www.openbsd.org/faq/current.html#20120823 to update it.

Next, your C runtime files need to be up to date.  If it's up to date,
then nm /usr/lib/crtbegin.o | grep -c __guard_local will print a
non-zero value.  If it prints zero, follow these steps to update them:

$ cd /usr/src/lib/csu
$ make depend  make  make install

You can now follow my original steps for patching GCC.



GCC diff needs testing on multiple arches

2012-08-29 Thread Matthew Dempsky
First, make sure you're using a reasonably up to date snapshot.
You're up to date if nm /usr/lib/crtbegin.o mentions __guard_local.

Next, apply the patch below to GCC, recompile, and install:

  $ cd /usr/src/gnu/gcc/gcc
  $ patch  /path/to/diff
  $ cd /usr/src/gnu/usr.bin/cc
  $ make depend  make  make install

Finally, test that stuff still works after you build it. :) In
particular, make sure that you can still do a full make build by
following release(8).

Please email me back with test reports.  I'm interested in getting a
variety of (ELF) architectures before committing this.

Thanks!


Index: targhooks.c
===
RCS file: /cvs/src/gnu/gcc/gcc/targhooks.c,v
retrieving revision 1.2
diff -u -p -r1.2 targhooks.c
--- targhooks.c 9 May 2010 11:48:50 -   1.2
+++ targhooks.c 29 Aug 2012 18:05:48 -
@@ -372,7 +372,7 @@ default_stack_protect_guard (void)
 
   if (t == NULL)
 {
-  t = build_decl (VAR_DECL, get_identifier (__guard), ptr_type_node);
+  t = build_decl (VAR_DECL, get_identifier (__guard_local), 
ptr_type_node);
   TREE_STATIC (t) = 1;
   TREE_PUBLIC (t) = 1;
   DECL_EXTERNAL (t) = 1;
@@ -380,6 +380,8 @@ default_stack_protect_guard (void)
   TREE_THIS_VOLATILE (t) = 1;
   DECL_ARTIFICIAL (t) = 1;
   DECL_IGNORED_P (t) = 1;
+  DECL_VISIBILITY (t) = VISIBILITY_HIDDEN;
+  DECL_VISIBILITY_SPECIFIED (t) = 1;
 
   stack_chk_guard_decl = t;
 }



Re: acpihpet quality

2012-08-15 Thread Matthew Dempsky
This reminds me...

On Wed, Aug 15, 2012 at 8:02 AM, Ted Unangst t...@tedunangst.com wrote:
 0x, /* counter_mask (24 bits) */

... this comment has been bugging me for a while.  I'm pretty sure it
should be 32 bits.  Seems like a bad copy/paste from acpitimer.c.



Re: add WARNINGS infrastructure to ftp

2012-08-13 Thread Matthew Dempsky
While I recognize there's some precedence in the tree for it already,
this seems ugly to me.  Is there a reason different programs/libraries
need different diagnostic flags?  Why doesn't this belong in
bsd.own.mk or mk.conf?



Re: make acpiec _GLK aware

2012-07-12 Thread Matthew Dempsky
On Thu, Jul 12, 2012 at 9:48 AM, Paul Irofti p...@irofti.net wrote:
 +   if (aml_evalname(sc-sc_acpi, sc-sc_devnode, _GLK, 0, NULL, res))
 +   sc-sc_glk = 0;
 +   if (res.type != AML_OBJTYPE_INTEGER)
 +   sc-sc_glk = 0;
 +   else
 +   sc-sc_glk = res.v_integer ? 1 : 0;

The second if should be an else if, no?  Or is res.type guaranteed
to be initialized even if aml_evalname() fails?



Re: Virtio drivers for OpenBSD

2012-07-11 Thread Matthew Dempsky
On Wed, Jul 11, 2012 at 8:28 AM, Stefan Fritsch s...@sfritsch.de wrote:
 There is a virtio-scsi device, too, but this is only supported in very
 recent versions of qemu. To attach the simpler virtio-block device as scsi,
 the driver would have to emulate the scsi commands. Is there some generic
 infrastructure to do that? Or which driver would you recommend to take the
 emulation code from? sys/dev/ata/atascsi.c seems rather complex (it is twice
 the size of sys/dev/pci/viod.c).

There are quite a few pretendy-SCSI drivers in the tree, but most of
them deal with fairly complicated hardware so they're not good
reference points.

sparc64's vdsk(4) driver is probably the best reference point, since
it's also for a virtual disk driver (see sys/arch/sparc64/dev/vdsk.c).



Re: Virtio drivers for OpenBSD

2012-07-11 Thread Matthew Dempsky
On Wed, Jul 11, 2012 at 8:20 AM, Kenneth R Westerback
kwesterb...@rogers.com wrote:
 And we dream of the day when wd and vnd attach as scsi too!

Stop peaking at my dev box's src tree!!



Re: Virtio drivers for OpenBSD

2012-07-11 Thread Matthew Dempsky
On Wed, Jul 11, 2012 at 6:01 AM, Jiri B ji...@devio.us wrote:
 Or even better, it would be much important to make OpenBSD to see added
 disks without reboot (it means rescanning the bus).

I don't understand what you're referring to here.  OpenBSD already
handles hotplugged disks in quite a few cases; e.g., USB and some
other SCSI drivers.  (Notably ahci(4) does not yet though.)



Re: Virtio drivers for OpenBSD

2012-07-11 Thread Matthew Dempsky
On Wed, Jul 11, 2012 at 12:40 PM, Jiri B ji...@devio.us wrote:
 Like this one...

 mpi0 at pci0 dev 16 function 0 Symbios Logic 53c1030 rev 0x01: apic 2 int 17
 scsibus1 at mpi0: 16 targets, initiator 7
 sd0 at scsibus1 targ 0 lun 0: VMware, Virtual disk, 1.0 SCSI2 0/direct fixed
 sd0: 61440MB, 512 bytes/sector, 125829120 sectors

 For example on Linux you can add new disk and rescan, the disk is visible.
 I can test on one ESXi 5.

I'm not terribly familiar with mpi(4), but I see it has code to call
scsi_req_probe() if it receives a MPI_EVT_SASCH_REASON_ADDED or
MPI_EVT_SASCH_REASON_NO_PERSIST_ADDED event from the adapter
controller.  However, it appears to only watch for events from Serial
Attached SCSI and Fibre Channel controllers, whereas 53c1030 is a
Parallel SCSI controller.

Are you able to configure VMWare to emulate a different mpi(4) device?
 E.g., one that uses SAS or FC instead?  Do you use the 53c1030 with
Linux too?



Re: Typo st_mtim in stat manpage.

2012-07-10 Thread Matthew Dempsky
On Tue, Jul 10, 2012 at 11:49 AM, Han Boetes h...@boetes.org wrote:
 I found this mistake in the stat manpage:

Not a mistake; see /usr/include/sys/stat.h.



Re: perl 5.16 update / macppc / signal dispatch test failure

2012-06-28 Thread Matthew Dempsky
It looks like arm and sh are broken too.  I've added a regress test to
/usr/src/regress/sys/kern/sigpending if anyone wants to try tackling
one or more of these broken implementations.

On Thu, Jun 28, 2012 at 5:10 PM, Matthew Dempsky matt...@dempsky.org wrote:
 On Thu, Jun 28, 2012 at 5:09 PM, Matthew Dempsky matt...@dempsky.org
wrote:
 So the sigpending() system call returns the pending signals as an int
 value via register, but then the libc stub is supposed to take care of
 storing this as appropriate.

 If you look at libc/arch/amd64/sys/sigreturn.S, you'll notice there's
 some followup assembly code for this.  It looks like the analogous
 code is missing from libc/arch/powerpc/sys/sigreturn.S.

 Er, sigpending.S of course.

 On Thu, Jun 28, 2012 at 4:12 PM, Stuart Henderson s...@spacehopper.org
wrote:
 I'm doing some test runs with Perl 5.16 for Andrew Fresh and seeing
 a signal dispatch-related test failure on macppc

 kern.version=OpenBSD 5.2-beta (GENERIC.MP) #198: Sat Jun 23 11:30:52 MDT
2012
    dera...@macppc.openbsd.org:/usr/src/sys/arch/macppc/compile/GENERIC.MP

 t/op/sigdispatch .. # Failed
test 8
 - sigpending at op/sigdispatch.t line 60
 #      got 536870912
 # expected 0 but true
 # Failed test 9 - SIGUSR1 is pending at op/sigdispatch.t line 61
 #      got 0
 # expected 1
 FAILED at test 8

 This is a standard (not threaded) build. I can give files to reproduce
etc
 if wanted but wondered if anyone has ideas.  It's not happening on amd64:

 kern.version=OpenBSD 5.2-beta (GENERIC.MP) #325: Thu Jun 21 10:08:05 MDT
2012
    dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

 Tests are around
http://perl5.git.perl.org/perl.git/blame/67288365cab33e76a48b697c001c11d4dc5b
1912:/t/op/sigdispatch.t#l60



Re: Support for UCD-SNMP-MIB in snmpd (part one)

2012-06-19 Thread Matthew Dempsky
On Tue, Jun 19, 2012 at 1:35 PM, Vincent Bernat ber...@luffy.cx wrote:
 Maybe it would be better to implement standard MIB like
 HOST-RESOURCES-MIB.

snmpd already supports HOST-RESOURCES-MIB.

Seth, does your diff expose any new information that's not already
available via snmpd?  If not, I think we'd prefer to stick with
standard MIBs where possible.  I implemented UCD-DISKIO-MIB only
because I couldn't find any existing MIB that was a good fit for disk
IO stats.  (Arguably it might go under the interface statistics since
you can think of SCSI as just a specialized network protocol, but I
wasn't sure if that would fly and UCD-DISKIO-MIB was less work.)



Re: Support for UCD-SNMP-MIB in snmpd (part one)

2012-06-19 Thread Matthew Dempsky
[+joel]

On Tue, Jun 19, 2012 at 2:11 PM, Seth Wright s...@crosse.org wrote:
 I kind of thought the UCD-SNMP-MIB was fairly standard and/or
 widespread, but perhaps not?

HOST-RESOURCES-MIB is an RFC standard, which I assume gives it more
stature than UCD-SNMP-MIB, but I honestly haven't used SNMP much in
the last few years, so I'm not the best person to make a decision
here.

 * HR-MIB includes hrProcessorLoad which is a percentage-based value
 for each CPU (if I'm reading the MIB right) for the one-minute
 timeframe.  UCD-SNMP presents 1-, 5-, and 15-minute aggregate values,
 which may be more useful.

I think you're referring to how UCD-SNMP-MIB exposes 1, 5, and 15
minute load averages, which are different from CPU utilization.  In
particular, it tends to be a common source of confusion among users
because OpenBSD calculates load averages very differently than other
operating systems (e.g.,
http://www.undeadly.org/cgi?action=articlesid=20090715034920mode=expanded).

You should be able to calculate 5 and 15 minute rolling averages in
your SNMP monitoring software too.

 * HR-MIB, under the hrStorage* attributes, includes free and used
 values for memory, but doesn't get any more granular than that.
 UCD-SNMP tries to break it down a bit further.

That's valid, though I believe we could expose much more fine grained
memory via HOST-RESOURCES-MIB too.  E.g., buffer cache, kernel pool(9)
and malloc(9) buckets, and so on.

I'm not sure off hand if the memory groups in UCD-SNMP-MIB maps well
to our VM system either, but I guess they all say they can be left out
if the OS doesn't support the category.

 * UCD-SNMP includes a bunch of CPU stats that I'm only starting to
 look at (the third blank graph in my monitoring templates) that I
 can't find in a walk against base snmpd.  Those were going to be in
 diff number next.

These might be interesting, and it's data not exposed otherwise
currently.  The closest HOST-RESOURCES-MIB has is hrSWRunPerfCPU and
hrSWRunPerfMem, but those don't track interrupt CPU usage.

 So yes, I think it does provide some value over what's already in
 there, but that's just the opinion of a guy scratching a proverbial
 itch.

Understood, I'm just trying to make sure the diff makes sense to be
committed.



Support for UCD-DISKIO-MIB in snmpd

2012-06-13 Thread Matthew Dempsky
I'd like to monitor disk IO on my workstation, but snmpd doesn't
currently export hw.diskstats as far as I can tell.  I found net-snmp
supports UCD-DISKIO-MIB and it's not too complex, so I went ahead and
coded up an implementation.

It's been a while since I've touched SNMP and/or snmpd, but it seems
to work correctly:

$ snmpwalk -m UCD-DISKIO-MIB -v1 -c public 127.0.0.1 diskIOTable 
UCD-DISKIO-MIB::diskIOIndex.1 = INTEGER: 1
UCD-DISKIO-MIB::diskIOIndex.2 = INTEGER: 2
UCD-DISKIO-MIB::diskIOIndex.3 = INTEGER: 3
UCD-DISKIO-MIB::diskIODevice.1 = STRING: sd0
UCD-DISKIO-MIB::diskIODevice.2 = STRING: sd1
UCD-DISKIO-MIB::diskIODevice.3 = STRING: cd0
UCD-DISKIO-MIB::diskIONRead.1 = Counter32: 3584
UCD-DISKIO-MIB::diskIONRead.2 = Counter32: 3128056320
UCD-DISKIO-MIB::diskIONRead.3 = Counter32: 0
UCD-DISKIO-MIB::diskIONWritten.1 = Counter32: 0
UCD-DISKIO-MIB::diskIONWritten.2 = Counter32: 1418602496
UCD-DISKIO-MIB::diskIONWritten.3 = Counter32: 0
UCD-DISKIO-MIB::diskIOReads.1 = Counter32: 7
UCD-DISKIO-MIB::diskIOReads.2 = Counter32: 18550342
UCD-DISKIO-MIB::diskIOReads.3 = Counter32: 0
UCD-DISKIO-MIB::diskIOWrites.1 = Counter32: 0
UCD-DISKIO-MIB::diskIOWrites.2 = Counter32: 19485959
UCD-DISKIO-MIB::diskIOWrites.3 = Counter32: 0
UCD-DISKIO-MIB::diskIONReadX.1 = Counter64: 3584
UCD-DISKIO-MIB::diskIONReadX.2 = Counter64: 539998968320
UCD-DISKIO-MIB::diskIONReadX.3 = Counter64: 0
UCD-DISKIO-MIB::diskIONWrittenX.1 = Counter64: 0
UCD-DISKIO-MIB::diskIONWrittenX.2 = Counter64: 357900888064
UCD-DISKIO-MIB::diskIONWrittenX.3 = Counter64: 0

Test reports from regular SNMP users would be great.

ok?


Index: mib.h
===
RCS file: /home/mdempsky/anoncvs/cvs/src/usr.sbin/snmpd/mib.h,v
retrieving revision 1.25
diff -u -p -r1.25 mib.h
--- mib.h   20 Mar 2012 03:01:26 -  1.25
+++ mib.h   13 Jun 2012 19:19:23 -
@@ -397,6 +397,22 @@
 #define MIB_vantronix  MIB_enterprises, 26766
 #define MIB_openBSDMIB_enterprises, 30155
 
+/* UCD-DISKIO-MIB */
+#define MIB_ucdExperimentalMIB_ucDavis, 13
+#define MIB_ucdDiskIOMIB   MIB_ucdExperimental, 15
+#define MIB_diskIOTableMIB_ucdDiskIOMIB, 1
+#define MIB_diskIOEntryMIB_diskIOTable, 1
+#define OIDIDX_diskIO  11
+#define OIDIDX_diskIOEntry 12
+#define MIB_diskIOIndexMIB_diskIOEntry, 1
+#define MIB_diskIODevice   MIB_diskIOEntry, 2
+#define MIB_diskIONReadMIB_diskIOEntry, 3
+#define MIB_diskIONWritten MIB_diskIOEntry, 4
+#define MIB_diskIOReadsMIB_diskIOEntry, 5
+#define MIB_diskIOWrites   MIB_diskIOEntry, 6
+#define MIB_diskIONReadX   MIB_diskIOEntry, 12
+#define MIB_diskIONWrittenXMIB_diskIOEntry, 13
+
 /* OPENBSD-MIB */
 #define MIB_pfMIBObjects   MIB_openBSD, 1
 #define MIB_pfInfo MIB_pfMIBObjects, 1
@@ -892,6 +908,19 @@
{ MIBDECL(microSystems) },  \
{ MIBDECL(vantronix) }, \
{ MIBDECL(openBSD) },   \
+   \
+   { MIBDECL(ucdExperimental) },   \
+   { MIBDECL(ucdDiskIOMIB) },  \
+   { MIBDECL(diskIOTable) },   \
+   { MIBDECL(diskIOEntry) },   \
+   { MIBDECL(diskIOIndex) },   \
+   { MIBDECL(diskIODevice) },  \
+   { MIBDECL(diskIONRead) },   \
+   { MIBDECL(diskIONWritten) },\
+   { MIBDECL(diskIOReads) },   \
+   { MIBDECL(diskIOWrites) },  \
+   { MIBDECL(diskIONReadX) },  \
+   { MIBDECL(diskIONWrittenX) },   \
\
{ MIBDECL(pfMIBObjects) },  \
{ MIBDECL(pfInfo) },\
Index: mib.c
===
RCS file: /home/mdempsky/anoncvs/cvs/src/usr.sbin/snmpd/mib.c,v
retrieving revision 1.52
diff -u -p -r1.52 mib.c
--- mib.c   20 Mar 2012 03:01:26 -  1.52
+++ mib.c   13 Jun 2012 19:19:09 -
@@ -32,6 +32,7 @@
 #include sys/socket.h
 #include sys/mount.h
 #include sys/ioctl.h
+#include sys/disk.h
 
 #include net/if.h
 #include net/if_types.h
@@ -3362,6 +3363,99 @@ mib_ipfroute(struct oid *oid, struct ber
 }
 
 /*
+ * Defined in UCD-DISKIO-MIB.txt.
+ */
+
+intmib_diskio(struct oid *oid, struct ber_oid *o, struct ber_element 
**elm);
+
+static struct oid diskio_mib[] = {
+   { MIB(ucdDiskIOMIB),OID_MIB },
+   { MIB(diskIOIndex), OID_TRD, mib_diskio },
+   { 

Re: add statfs64 to compat/linux

2012-05-22 Thread Matthew Dempsky
Are you missing the header bits for linux_statfs64?  I don't see where
it's defined.

 +   /*
 +* Convert BSD filesystem names to Linux filesystem type numbers
 +* where possible.  Linux statfs uses a value of -1 to indicate
 +* an unsupported field.
 +*/
 +   if (!strcmp(bsp-f_fstypename, MOUNT_FFS) ||
 +   !strcmp(bsp-f_fstypename, MOUNT_MFS))
 +   lsp-l_ftype = 0x11954;
 +   else if (!strcmp(bsp-f_fstypename, MOUNT_NFS))
 +   lsp-l_ftype = 0x6969;
 +   else if (!strcmp(bsp-f_fstypename, MOUNT_MSDOS))
 +   lsp-l_ftype = 0x4d44;
 +   else if (!strcmp(bsp-f_fstypename, MOUNT_PROCFS))
 +   lsp-l_ftype = 0x9fa0;
 +   else if (!strcmp(bsp-f_fstypename, MOUNT_EXT2FS))
 +   lsp-l_ftype = 0xef53;
 +   else if (!strcmp(bsp-f_fstypename, MOUNT_CD9660))
 +   lsp-l_ftype = 0x9660;
 +   else if (!strcmp(bsp-f_fstypename, MOUNT_NCPFS))
 +   lsp-l_ftype = 0x6969;
 +   else
 +   lsp-l_ftype = -1;

Can this code go into a separate function so we don't have to
duplicate it for both statfs and statfs64?

 +int
 +linux_sys_statfs64(struct proc *p, void *v, register_t *retval)
 +{
 +   struct linux_sys_statfs64_args /* {
 +   syscallarg(char *) path;
 +   syscallarg(struct linux_statfs *) sp;
 +   } */ *uap = v;

The comment is wrong: sp should be linux_statfs64.

 -268UNIMPL  linux_sys_statfs64
 +268STD { int linux_sys_statfs64(char *path, \
 +   struct linux_statfs64 *sp); }
  269UNIMPL  linux_sys_fstatfs64

It looks like linux_sys_fstatfs64 should be easy to implement too now
if you wanted to do that in this diff as well.



Re: add statfs64 to compat/linux

2012-05-22 Thread Matthew Dempsky
On Tue, May 22, 2012 at 9:42 AM, Paul Irofti p...@irofti.net wrote:
  +   /*
  +* Convert BSD filesystem names to Linux filesystem type numbers
  +* where possible.  Linux statfs uses a value of -1 to indicate
  +* an unsupported field.
  +*/
  +   if (!strcmp(bsp-f_fstypename, MOUNT_FFS) ||
  +   !strcmp(bsp-f_fstypename, MOUNT_MFS))
  +   lsp-l_ftype = 0x11954;
  +   else if (!strcmp(bsp-f_fstypename, MOUNT_NFS))
  +   lsp-l_ftype = 0x6969;
  +   else if (!strcmp(bsp-f_fstypename, MOUNT_MSDOS))
  +   lsp-l_ftype = 0x4d44;
  +   else if (!strcmp(bsp-f_fstypename, MOUNT_PROCFS))
  +   lsp-l_ftype = 0x9fa0;
  +   else if (!strcmp(bsp-f_fstypename, MOUNT_EXT2FS))
  +   lsp-l_ftype = 0xef53;
  +   else if (!strcmp(bsp-f_fstypename, MOUNT_CD9660))
  +   lsp-l_ftype = 0x9660;
  +   else if (!strcmp(bsp-f_fstypename, MOUNT_NCPFS))
  +   lsp-l_ftype = 0x6969;
  +   else
  +   lsp-l_ftype = -1;

 Can this code go into a separate function so we don't have to
 duplicate it for both statfs and statfs64?

 I thought about it. Best would be a macro, but I hate C macro's.

What about a function like this:

long
bsd_to_linux_filesystem_map(const char *fstypename)
{
if (!strcmp(fstypename, MOUNT_FFS) || !strcmp(fstypename, MOUNT_MFS))
return (0x11954);
else if (!strcmp(fstypename, MOUNT_NFS))
return (0x6969);
[...]
else
return (-1);
}

(I'm assuming linux_statfs64's l_ftype is also a long.)



Re: Question about wscons/wsksymdef.h

2012-05-18 Thread Matthew Dempsky
On Fri, May 18, 2012 at 4:27 PM, Juan Francisco Cantero Hurtado
i...@juanfra.info wrote:
 In short. Can someone say me what are these values?. I don't want a
 master class, just a link or a bit of documentation is sufficient.

Are you just trying to swap caps and ctrl at the console?  If so, just
put us.swapctrlcaps (or whatever) in /etc/kbdtype and reboot.  (Or
run kbd us.swapctrlcaps as root to do it instantly.)



Re: c++ headers w/ -pedantic, overflow in implicit constant conversion

2012-05-10 Thread Matthew Dempsky
On Thu, Mar 15, 2012 at 3:19 AM, Marc Espie es...@nerim.net wrote:
  #define __glibcxx_max(T) \
 -  (__glibcxx_signed (T) ? ((T)1  __glibcxx_digits (T)) - 1 : ~(T)0)
 +  (__glibcxx_signed (T) ? \
 +  (T)1  (__glibcxx_digits (T) - 1)) - 1)  1) + 1) : ~(T)0)
 +

How about (T)(((unsigned T)1  __glibc_digits(T)) - 1)?

Also, we should use (T)-1  __glibc_digits(T) in __glibc_min to avoid
relying on overflow behavior for signed types.



Re: c++ headers w/ -pedantic, overflow in implicit constant conversion

2012-05-10 Thread Matthew Dempsky
I'm pretty sure unsigned int is never a signed type.



Re: c++ headers w/ -pedantic, overflow in implicit constant conversion

2012-05-10 Thread Matthew Dempsky
Oh even if it's not signed that ternary branch will still be in code. I
see. Hm.
On May 10, 2012 9:23 AM, Matthew Dempsky matt...@dempsky.org wrote:



  1   2   3   >