Re: armv7 unaligned access in libcrypto

2017-12-30 Thread Philip Guenther
On Fri, 29 Dec 2017, Mark Kettenis wrote:

> > Date: Fri, 29 Dec 2017 21:21:04 +1100
> > From: Jonathan Gray 
> > 
> > On Fri, Dec 29, 2017 at 10:47:06AM +0100, Mark Kettenis wrote:
> > > The Aarch32 assembly code in libcrypto assumes that armv7 supports
> > > unaligned access.  It does, but only if you don't enable the bit that
> > > makes it trap on unaligned access.  And we enable that bit on OpenBSD.
> > > So doing a SHA256 of an unaligned buffer (something ftp(1) ends up
> > > doing) you SIGBUS.
> > > 
> > > This currently isn't an issue since our base GCC does not advertise
> > > that we're compiling for armv7 and up only.  It barely knows about
> > > armv7 at all.  But with clang that is no longer true.  And we really
> > > want to build for armv7 and up only because that gives us proper
> > > atomic operations and such.
> > > 
> > > So here is a diff that avoids the unaligned access bits that matter
> > > when compiling on OpenBSD.
> > > 
> > > ok?

The affected code blocks appear to just be about the loading or storing of 
words from/to memory and not the crypto algorithms working on the 
registers, so these changes all look correct.  You're testing with 
regress/lib/libcrypto/, right?

(The second change in sha256-armv4.pl looked fishy, but it's just that 
they're delaying the 'rev' a bit to allow better scheduling and it's still 
about the loads being done.)


> > Any reason to not use __STRICT_ALIGNMENT for this?
> 
> The assembly code doesn't include the header file that defines
> __STRICT_ALIGNMENT.  But I could have arm_arch.h define it and use it
> instead of __OpenBSD__.  Makes things a little bit more obvious...
> 
> Any of the LibreSSL folks have an opinion about that?

I'm in favor of that, as the tests as is are opaque in intent.


Philip Guenther



Re: paste(1): default to stdin if no files given

2017-12-30 Thread Philip Guenther
On Sat, 30 Dec 2017, Klemens Nanni wrote:
> On Fri, Dec 29, 2017 at 03:55:10PM +0200, Lauri Tirkkonen wrote:
> > Currently paste(1) silently does nothing if it's given no file
> > arguments:
> > 
> > % printf 'hello\nworld\n'|paste
> > %
> That's a bug, FWIW, FreeBSD has it fixed.
> 
> > I often do things like 'ps -p $(pgrep sh | paste -sd,)' and forget the 
> > "-" argument. Some other systems (gnu coreutils, illumos, perhaps 
> > more) default to reading from stdin if no files are given; I think 
> > that makes sense, so diff to do that below.
>
> This would invalidate the STANDARDS section as it breaks POSIX.1-2008
> compliance:
...

I'm with you: the proposed non-compliance isn't an increase in power and 
the arguable usability change directly decreases the portability of 
anyone/anything doing that.



> Here's a diff syncing with FreeBSD to error out on missing files. 
> Feedback?

Looks good to me.  Unless there are any objections, I'll commit your diff.


Philip Guenther


> diff --git a/usr.bin/paste/paste.c b/usr.bin/paste/paste.c
> index 4b00413e5bb..e64ac882510 100644
> --- a/usr.bin/paste/paste.c
> +++ b/usr.bin/paste/paste.c
> @@ -77,6 +77,9 @@ main(int argc, char *argv[])
>   argc -= optind;
>   argv += optind;
>  
> + if (argc == 0)
> + usage();
> +
>   if (!delim) {
>   delimcnt = 1;
>   delim = "\t";
> 
> 



Re: wsconsctl: minor cleanup

2017-12-30 Thread Philip Guenther
On Sat, Dec 30, 2017 at 8:51 AM, Anton Lindqvist  wrote:
>
> Get rid of a unused variable and define the noinput lex option in order
> to suppress the following warning:
>
>   map_scan.c:1235:16: warning: function 'input' is not needed and will not
> be emitted
>   static int input  (void)
>
> Comments? OK?
>

ok guenther@


Re: paste(1): default to stdin if no files given

2017-12-30 Thread Lauri Tirkkonen
On Sat, Dec 30 2017 19:41:24 +0100, Klemens Nanni wrote:
> On Fri, Dec 29, 2017 at 03:55:10PM +0200, Lauri Tirkkonen wrote:
> > Currently paste(1) silently does nothing if it's given no file
> > arguments:
> > 
> > % printf 'hello\nworld\n'|paste
> > %
> That's a bug, FWIW, FreeBSD has it fixed.
> 
> > I often do things like 'ps -p $(pgrep sh | paste -sd,)' and forget the
> > "-" argument. Some other systems (gnu coreutils, illumos, perhaps more)
> > default to reading from stdin if no files are given; I think that makes
> > sense, so diff to do that below.
> This would invalidate the STANDARDS section as it breaks POSIX.1-2008
> compliance:
> 
>   STDIN
>   The standard input shall be used only if one or more file
>   operands is '-'. See the INPUT FILES section.
>   
>   INPUT FILES
>   The input files shall be text files, except that line
>   lengths shall be unlimited.

Perhaps. I did read what POSIX said before submitting the patch, but I
feel it's less useful to error out than to use stdin here. But I'm ok
with either, the real problem is silent failure.

> While your diff works, it's missing the respective usage() and manual
> bits.

Ok, here you go.

Index: paste.1
===
RCS file: /cvs/src/usr.bin/paste/paste.1,v
retrieving revision 1.15
diff -u -p -r1.15 paste.1
--- paste.1 28 Jun 2017 14:49:26 -  1.15
+++ paste.1 30 Dec 2017 20:22:29 -
@@ -43,7 +43,7 @@
 .Nm paste
 .Op Fl s
 .Op Fl d Ar list
-.Ar
+.Op Ar
 .Sh DESCRIPTION
 The
 .Nm paste
@@ -107,6 +107,8 @@ is specified for one or more of the inpu
 input is used; standard input is read one line at a time, circularly,
 for each instance of
 .Dq - .
+.Pp
+If no files are specified, only the standard input is used.
 .Sh EXIT STATUS
 .Ex -std paste
 .Sh EXAMPLES
Index: paste.c
===
RCS file: /cvs/src/usr.bin/paste/paste.c,v
retrieving revision 1.22
diff -u -p -r1.22 paste.c
--- paste.c 9 Dec 2015 19:39:10 -   1.22
+++ paste.c 30 Dec 2017 20:22:29 -
@@ -56,6 +56,7 @@ main(int argc, char *argv[])
extern char *optarg;
extern int optind;
int ch, seq;
+   char **files = (char *[]){ "-", NULL };
 
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");
@@ -77,15 +78,18 @@ main(int argc, char *argv[])
argc -= optind;
argv += optind;
 
+   if (argc)
+   files = argv;
+
if (!delim) {
delimcnt = 1;
delim = "\t";
}
 
if (seq)
-   sequential(argv);
+   sequential(files);
else
-   parallel(argv);
+   parallel(files);
exit(0);
 }
 
@@ -251,7 +255,7 @@ void
 usage(void)
 {
extern char *__progname;
-   (void)fprintf(stderr, "usage: %s [-s] [-d list] file ...\n",
+   (void)fprintf(stderr, "usage: %s [-s] [-d list] [file ...]\n",
__progname);
exit(1);
 }
-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: paste(1): default to stdin if no files given

2017-12-30 Thread Klemens Nanni
On Fri, Dec 29, 2017 at 03:55:10PM +0200, Lauri Tirkkonen wrote:
> Currently paste(1) silently does nothing if it's given no file
> arguments:
> 
> % printf 'hello\nworld\n'|paste
> %
That's a bug, FWIW, FreeBSD has it fixed.

> I often do things like 'ps -p $(pgrep sh | paste -sd,)' and forget the
> "-" argument. Some other systems (gnu coreutils, illumos, perhaps more)
> default to reading from stdin if no files are given; I think that makes
> sense, so diff to do that below.
This would invalidate the STANDARDS section as it breaks POSIX.1-2008
compliance:

STDIN
The standard input shall be used only if one or more file
operands is '-'. See the INPUT FILES section.

INPUT FILES
The input files shall be text files, except that line
lengths shall be unlimited.

> Index: paste.c
> ===
> RCS file: /cvs/src/usr.bin/paste/paste.c,v
> retrieving revision 1.22
> diff -u -p -r1.22 paste.c
> --- paste.c   9 Dec 2015 19:39:10 -   1.22
> +++ paste.c   29 Dec 2017 13:47:50 -
> @@ -56,6 +56,7 @@ main(int argc, char *argv[])
>   extern char *optarg;
>   extern int optind;
>   int ch, seq;
> + char **files = (char *[]){ "-", NULL };
>  
>   if (pledge("stdio rpath", NULL) == -1)
>   err(1, "pledge");
> @@ -77,15 +78,18 @@ main(int argc, char *argv[])
>   argc -= optind;
>   argv += optind;
>  
> + if (argc)
> + files = argv;
> +
>   if (!delim) {
>   delimcnt = 1;
>   delim = "\t";
>   }
>  
>   if (seq)
> - sequential(argv);
> + sequential(files);
>   else
> - parallel(argv);
> + parallel(files);
>   exit(0);
>  }
>  
While your diff works, it's missing the respective usage() and manual
bits.

Here's a diff syncing with FreeBSD to error out on missing files.
Feedback?

diff --git a/usr.bin/paste/paste.c b/usr.bin/paste/paste.c
index 4b00413e5bb..e64ac882510 100644
--- a/usr.bin/paste/paste.c
+++ b/usr.bin/paste/paste.c
@@ -77,6 +77,9 @@ main(int argc, char *argv[])
argc -= optind;
argv += optind;
 
+   if (argc == 0)
+   usage();
+
if (!delim) {
delimcnt = 1;
delim = "\t";



sdhc: support ext dma

2017-12-30 Thread Artturi Alm
Hi,

porting sdhc driver(bcm2835_emmc.c) from NetBSD does lack something
like this, among possibly few other (small?) changes, but consider this
as a step towards.

Comments? to keep changes minimal/safe/easy to review like below, or to
backport in bigger chunks?

-Artturi


diff --git a/sys/dev/sdmmc/sdhc.c b/sys/dev/sdmmc/sdhc.c
index c242a600d4a..45789ac15b0 100644
--- a/sys/dev/sdmmc/sdhc.c
+++ b/sys/dev/sdmmc/sdhc.c
@@ -187,6 +187,10 @@ sdhc_host_found(struct sdhc_softc *sc, bus_space_tag_t iot,
/* Use DMA if the host system and the controller support it. */
if (usedma && ISSET(caps, SDHC_ADMA2_SUPP))
SET(hp->flags, SHF_USE_DMA);
+   if (!usedma || ISSET(caps, SDHC_ADMA2_SUPP))
+   hp->sc_transfer_dma = NULL;
+   else if (usedma && sc->sc_transfer_dma)
+   SET(hp->flags, SHF_USE_DMA);
 
/*
 * Determine the base clock frequency. (2.2.24)
@@ -250,7 +254,7 @@ sdhc_host_found(struct sdhc_softc *sc, bus_space_tag_t iot,
break;
}
 
-   if (ISSET(hp->flags, SHF_USE_DMA)) {
+   if (ISSET(hp->flags, SHF_USE_DMA) && sc->sc_transfer_dma == NULL) {
int rseg;
 
/* Allocate ADMA2 descriptor memory */
@@ -939,6 +943,16 @@ sdhc_transfer_data(struct sdhc_host *hp, struct 
sdmmc_command *cmd)
if (cmd->c_dmamap) {
int status;
 
+   if (sc->sc_transfer_dma != NULL) {
+   error = sc->sc_transfer_dma(sc, cmd);
+   if (error == 0 && !sdhc_wait_intr(hp,
+   SDHC_TRANSFER_COMPLETE, SDHC_DMA_TIMEOUT)) {
+   DPRINTF(1,("%s: timeout\n", __func__));
+   error = ETIMEDOUT;
+   }
+   goto done;
+   }
+
error = 0;
for (;;) {
status = sdhc_wait_intr(hp,
diff --git a/sys/dev/sdmmc/sdhcvar.h b/sys/dev/sdmmc/sdhcvar.h
index 9d4d759a2bc..aa5977a4c7d 100644
--- a/sys/dev/sdmmc/sdhcvar.h
+++ b/sys/dev/sdmmc/sdhcvar.h
@@ -22,6 +22,7 @@
 #include 
 
 struct sdhc_host;
+struct sdmmc_command;
 
 struct sdhc_softc {
struct device sc_dev;
@@ -33,6 +34,7 @@ struct sdhc_softc {
 
int (*sc_card_detect)(struct sdhc_softc *);
int (*sc_signal_voltage)(struct sdhc_softc *, int);
+   int (*sc_transfer_dma)(struct sdhc_softc *, struct sdmmc_command *);
 };
 
 /* Host controller functions called by the attachment driver. */



Re: malloc cleanup and small optimization (step 2)

2017-12-30 Thread Otto Moerbeek
On Sat, Dec 30, 2017 at 12:11:50PM -0500, Daniel Micay wrote:

> On 30 December 2017 at 06:44, Otto Moerbeek  wrote:
> > On Sat, Dec 30, 2017 at 06:53:44AM +, kshe wrote:
> >
> >> Hi,
> >>
> >> Looking at this diff and the previous one, I found some more possible
> >> cleanups for malloc.c (the patch below is to be applied after both of
> >> them, even if the second one has not been committed yet):
> >>
> >> 1.  In malloc_bytes(), use ffs(3) instead of manual loops, which on many
> >> architectures boils down to merely one or two instructions (for example,
> >> see src/lib/libc/arch/amd64/string/ffs.S).
> >
> > I remember doing some measurements using ffs a long time ago, and it
> > turned out that is was slower in some cases. Likely due to the
> > function call overhead or maybe a non-optimal implementation. So I'm
> > only willing to consider this after seeing benchmarks on a handfull of
> > architectures.
> 
> There's __builtin_ffs but Clang / GCC might not be smart enough to use
> it for ffs(3) automatically.

A quick test on amd64 (clang) and armv7 (gcc) shows that on both cases
the compiler generates inline code and no function call. So that is
promising.

-Otto



Re: malloc cleanup and small optimization (step 2)

2017-12-30 Thread Daniel Micay
On 30 December 2017 at 06:44, Otto Moerbeek  wrote:
> On Sat, Dec 30, 2017 at 06:53:44AM +, kshe wrote:
>
>> Hi,
>>
>> Looking at this diff and the previous one, I found some more possible
>> cleanups for malloc.c (the patch below is to be applied after both of
>> them, even if the second one has not been committed yet):
>>
>> 1.  In malloc_bytes(), use ffs(3) instead of manual loops, which on many
>> architectures boils down to merely one or two instructions (for example,
>> see src/lib/libc/arch/amd64/string/ffs.S).
>
> I remember doing some measurements using ffs a long time ago, and it
> turned out that is was slower in some cases. Likely due to the
> function call overhead or maybe a non-optimal implementation. So I'm
> only willing to consider this after seeing benchmarks on a handfull of
> architectures.

There's __builtin_ffs but Clang / GCC might not be smart enough to use
it for ffs(3) automatically.



wsconsctl: minor cleanup

2017-12-30 Thread Anton Lindqvist
Hi,
Get rid of a unused variable and define the noinput lex option in order
to suppress the following warning:

  map_scan.c:1235:16: warning: function 'input' is not needed and will not be 
emitted
  static int input  (void)

Comments? OK?

Index: map_scan.l
===
RCS file: /cvs/src/sbin/wsconsctl/map_scan.l,v
retrieving revision 1.7
diff -u -p -r1.7 map_scan.l
--- map_scan.l  9 Jul 2017 14:04:50 -   1.7
+++ map_scan.l  30 Dec 2017 16:50:02 -
@@ -30,7 +30,7 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-%option noyywrap
+%option noinput noyywrap
 
 %{
 
Index: mousecfg.c
===
RCS file: /cvs/src/sbin/wsconsctl/mousecfg.c,v
retrieving revision 1.1
diff -u -p -r1.1 mousecfg.c
--- mousecfg.c  21 Jul 2017 20:38:20 -  1.1
+++ mousecfg.c  30 Dec 2017 16:50:02 -
@@ -303,7 +303,6 @@ mousecfg_pr_field(struct wsmouse_paramet
 void
 mousecfg_rd_field(struct wsmouse_parameters *field, char *val)
 {
-   enum wsmousecfg first = field->params[0].key;
int i, n;
const char *s;
float f;



Re: Make systat(1) list ordering

2017-12-30 Thread Martijn van Duren
Anyone willing to comment on this?

On 12/18/17 23:00, Martijn van Duren wrote:
> Hello tech@,
> 
> I got a bit annoyed by the fact that it isn't clear what the current
> ordering is for states. I'm not very familiar with systat, so I might
> have missed something obvious.
> 
> Since there was no obvious room available for always printing the
> active ordering and very few fields do support ordering I added a new
> keyword to the "global" command interpreter: "order".
> This is similar to "help", but shows the available orderings for
> the current view. It also adds the corresponding hotkey for every
> ordering and highlights the current active ordering. I also added
> a caret to the active ordering name if the sortdir is reverse.
> 
> I also noticed via this diff that pcache has ordering options. These
> are not documented and don't seem to do much. For now I decided to
> document them, but maybe it's better to just remove the option
> altogether?
> 
> Feedback?
> 
> martijn@
> 
> Index: engine.c
> ===
> RCS file: /cvs/src/usr.bin/systat/engine.c,v
> retrieving revision 1.21
> diff -u -p -u -r1.21 engine.c
> --- engine.c  5 Apr 2017 15:57:11 -   1.21
> +++ engine.c  18 Dec 2017 21:49:22 -
> @@ -890,6 +890,21 @@ print_fld_float(field_def *fld, double f
>  
>  /* ordering */
>  
> +int
> +foreach_order(void (*callback)(order_type *))
> +{
> + order_type *o;
> +
> + if (curr_view == NULL || curr_view->mgr == NULL ||
> + curr_view->mgr->order_list == NULL)
> + return -1;
> + o = curr_view->mgr->order_list;
> + do {
> + callback(o++);
> + } while (o->name != NULL);
> + return 0;
> +}
> +
>  void
>  set_order(const char *opt)
>  {
> Index: engine.h
> ===
> RCS file: /cvs/src/usr.bin/systat/engine.h,v
> retrieving revision 1.8
> diff -u -p -u -r1.8 engine.h
> --- engine.h  7 Sep 2013 11:43:49 -   1.8
> +++ engine.h  18 Dec 2017 21:49:22 -
> @@ -130,6 +130,7 @@ int set_view(const char *opt);
>  void next_view(void);
>  void prev_view(void);
>  
> +int foreach_order(void (*callback)(order_type *));
>  void set_order(const char *opt);
>  void next_order(void);
>  
> Index: main.c
> ===
> RCS file: /cvs/src/usr.bin/systat/main.c,v
> retrieving revision 1.66
> diff -u -p -u -r1.66 main.c
> --- main.c13 Oct 2016 11:22:46 -  1.66
> +++ main.c18 Dec 2017 21:49:22 -
> @@ -251,6 +251,31 @@ show_help(void)
>  }
>  
>  void
> +add_order_tb(order_type *o)
> +{
> + if (curr_view->mgr->order_curr == o)
> + tbprintf("[%s%s(%c)] ", o->name,
> + o->func != NULL && sortdir == -1 ? "^" : "",
> + (char) o->hotkey);
> + else
> + tbprintf("%s(%c) ", o->name, (char) o->hotkey);
> +}
> +
> +void
> +show_order(void)
> +{
> + if (rawmode)
> + return;
> +
> + tb_start();
> + if (foreach_order(add_order_tb) == -1) {
> + tbprintf("No orders available");
> + }
> + tb_end();
> + message_set(tmp_buf);
> +}
> +
> +void
>  cmd_compat(const char *buf)
>  {
>   const char *s;
> @@ -273,6 +298,11 @@ cmd_compat(const char *buf)
>   paused = 0;
>   gotsig_alarm = 1;
>   cmd_delay(buf + 5);
> + return;
> + }
> + if (strncasecmp(buf, "order", 5) == 0) {
> + show_order();
> + need_update = 1;
>   return;
>   }
>  
> Index: pftop.c
> ===
> RCS file: /cvs/src/usr.bin/systat/pftop.c,v
> retrieving revision 1.40
> diff -u -p -u -r1.40 pftop.c
> --- pftop.c   19 Jul 2017 12:58:31 -  1.40
> +++ pftop.c   18 Dec 2017 21:49:22 -
> @@ -269,7 +269,7 @@ order_type order_list[] = {
>  /* Define view managers */
>  struct view_manager state_mgr = {
>   "States", select_states, read_states, sort_states, print_header,
> - print_states, keyboard_callback, order_list, NULL
> + print_states, keyboard_callback, order_list, order_list
>  };
>  
>  struct view_manager rule_mgr = {
> Index: systat.1
> ===
> RCS file: /cvs/src/usr.bin/systat/systat.1,v
> retrieving revision 1.102
> diff -u -p -u -r1.102 systat.1
> --- systat.1  15 Jun 2017 03:47:07 -  1.102
> +++ systat.1  18 Dec 2017 21:49:22 -
> @@ -219,6 +219,8 @@ command interpreter.
>  .Bl -tag -width Fl
>  .It Ic help
>  Print the names of the available views on the command line.
> +.It Ic order
> +Print the names of the available orderings on the command line.
>  .It Ic quit
>  Quit
>  .Nm .
> @@ -384,6 +386,10 @@ changes the view to show all of them.
>  Display kernel
>  .Xr pool 9
>  per CPU cache statistics.
> +Available orderins are:
> +.Ic name ,
> +.Ic 

Re: malloc cleanup and small optimization (step 2)

2017-12-30 Thread Otto Moerbeek
On Wed, Dec 27, 2017 at 03:20:06PM +0100, Otto Moerbeek wrote:

> Hi,
> 
> second step: only init chunk_info on creation. When it is recycled all
> values already have proper values to start using it again, e.g. the
> free bitmap already has all slots marked free.  Plus some moving of
> code to get a better grouping of related functions.

Slightly different diff: instead of initing all chunk_info structs in
page, do it on demand at first use. Measurements show that a lot of
programs only use a few chunk_info structs, so it is a bit wasteful to
always initialize all of them. Also de-inline the init code for
readability.

-Otto

Index: malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.237
diff -u -p -r1.237 malloc.c
--- malloc.c27 Dec 2017 10:05:23 -  1.237
+++ malloc.c30 Dec 2017 11:46:30 -
@@ -203,10 +203,8 @@ static union {
 
 char   *malloc_options;/* compile-time options */
 
-static u_char getrbyte(struct dir_info *d);
 static __dead void wrterror(struct dir_info *d, char *msg, ...)
 __attribute__((__format__ (printf, 2, 3)));
-static void fill_canary(char *ptr, size_t sz, size_t allocated);
 
 #ifdef MALLOC_STATS
 void malloc_dump(int, int, struct dir_info *);
@@ -312,178 +310,6 @@ getrbyte(struct dir_info *d)
return x;
 }
 
-/*
- * Cache maintenance. We keep at most malloc_cache pages cached.
- * If the cache is becoming full, unmap pages in the cache for real,
- * and then add the region to the cache
- * Opposed to the regular region data structure, the sizes in the
- * cache are in MALLOC_PAGESIZE units.
- */
-static void
-unmap(struct dir_info *d, void *p, size_t sz, int clear)
-{
-   size_t psz = sz >> MALLOC_PAGESHIFT;
-   size_t rsz, tounmap;
-   struct region_info *r;
-   u_int i, offset;
-
-   if (sz != PAGEROUND(sz))
-   wrterror(d, "munmap round");
-
-   rsz = mopts.malloc_cache - d->free_regions_size;
-
-   /*
-* normally the cache holds recently freed regions, but if the region
-* to unmap is larger than the cache size or we're clearing and the
-* cache is full, just munmap
-*/
-   if (psz > mopts.malloc_cache || (clear && rsz == 0)) {
-   i = munmap(p, sz);
-   if (i)
-   wrterror(d, "munmap %p", p);
-   STATS_SUB(d->malloc_used, sz);
-   return;
-   }
-   tounmap = 0;
-   if (psz > rsz)
-   tounmap = psz - rsz;
-   offset = getrbyte(d);
-   for (i = 0; tounmap > 0 && i < mopts.malloc_cache; i++) {
-   r = >free_regions[(i + offset) & (mopts.malloc_cache - 1)];
-   if (r->p != NULL) {
-   rsz = r->size << MALLOC_PAGESHIFT;
-   if (munmap(r->p, rsz))
-   wrterror(d, "munmap %p", r->p);
-   r->p = NULL;
-   if (tounmap > r->size)
-   tounmap -= r->size;
-   else
-   tounmap = 0;
-   d->free_regions_size -= r->size;
-   r->size = 0;
-   STATS_SUB(d->malloc_used, rsz);
-   }
-   }
-   if (tounmap > 0)
-   wrterror(d, "malloc cache underflow");
-   for (i = 0; i < mopts.malloc_cache; i++) {
-   r = >free_regions[(i + offset) & (mopts.malloc_cache - 1)];
-   if (r->p == NULL) {
-   if (clear)
-   memset(p, 0, sz - mopts.malloc_guard);
-   if (mopts.malloc_junk && !mopts.malloc_freeunmap) {
-   size_t amt = mopts.malloc_junk == 1 ?
-   MALLOC_MAXCHUNK : sz;
-   memset(p, SOME_FREEJUNK, amt);
-   }
-   if (mopts.malloc_freeunmap)
-   mprotect(p, sz, PROT_NONE);
-   r->p = p;
-   r->size = psz;
-   d->free_regions_size += psz;
-   break;
-   }
-   }
-   if (i == mopts.malloc_cache)
-   wrterror(d, "malloc free slot lost");
-   if (d->free_regions_size > mopts.malloc_cache)
-   wrterror(d, "malloc cache overflow");
-}
-
-static void
-zapcacheregion(struct dir_info *d, void *p, size_t len)
-{
-   u_int i;
-   struct region_info *r;
-   size_t rsz;
-
-   for (i = 0; i < mopts.malloc_cache; i++) {
-   r = >free_regions[i];
-   if (r->p >= p && r->p <= (void *)((char *)p + len)) {
-   rsz = r->size << MALLOC_PAGESHIFT;
-   if (munmap(r->p, rsz))
-   wrterror(d, "munmap %p", r->p);
-   

Re: malloc cleanup and small optimization (step 2)

2017-12-30 Thread Otto Moerbeek
On Sat, Dec 30, 2017 at 06:53:44AM +, kshe wrote:

> Hi,
> 
> Looking at this diff and the previous one, I found some more possible
> cleanups for malloc.c (the patch below is to be applied after both of
> them, even if the second one has not been committed yet):
> 
> 1.  In malloc_bytes(), use ffs(3) instead of manual loops, which on many
> architectures boils down to merely one or two instructions (for example,
> see src/lib/libc/arch/amd64/string/ffs.S).

I remember doing some measurements using ffs a long time ago, and it
turned out that is was slower in some cases. Likely due to the
function call overhead or maybe a non-optimal implementation. So I'm
only willing to consider this after seeing benchmarks on a handfull of
architectures.

> 
> 2.  Slightly reorder find_chunksize() to avoid checking twice for the
> same condition.
> 
> 3.  Remove the outdated comment above omalloc_poolinit().
> 
> 4.  Remove extra braces enclosing a switch case, made unnecessary by
> revision 1.233.
> 
> 5.  Some miscellaneous style and whitespace fixes.

The other chunks I'll consider after the diff below (or a succcesor)
has been committed.

-Otto

> 
> --- malloc.c.orig Fri Dec 29 00:06:24 2017
> +++ malloc.c  Fri Dec 29 06:02:04 2017
> @@ -376,12 +376,11 @@ omalloc_parseopt(char opt)
>   case 'X':
>   mopts.malloc_xmalloc = 1;
>   break;
> - default: {
> + default:
>   dprintf(STDERR_FILENO, "malloc() warning: "
> -"unknown char in MALLOC_OPTIONS\n");
> + "unknown char in MALLOC_OPTIONS\n");
>   break;
>   }
> - }
>  }
>  
>  static void
> @@ -448,9 +447,6 @@ omalloc_init(void)
>   ;
>  }
>  
> -/*
> - * Initialize a dir_info, which should have been cleared by caller
> - */
>  static void
>  omalloc_poolinit(struct dir_info **dp)
>  {
> @@ -502,7 +498,7 @@ omalloc_grow(struct dir_info *d)
>   size_t i;
>   struct region_info *p;
>  
> - if (d->regions_total > SIZE_MAX / sizeof(struct region_info) / 2 )
> + if (d->regions_total > SIZE_MAX / sizeof(struct region_info) / 2)
>   return 1;
>  
>   newtotal = d->regions_total * 2;
> @@ -828,7 +824,7 @@ alloc_chunk_info(struct dir_info *d, int bits)
>   return NULL;
>   STATS_ADD(d->malloc_used, MALLOC_PAGESIZE);
>   count = MALLOC_PAGESIZE / size;
> - 
> +
>   if (bits == 0) {
>   shift = 1;
>   i = MALLOC_MINSIZE - 1;
> @@ -903,23 +899,21 @@ err:
>  static int
>  find_chunksize(size_t size)
>  {
> - int i, j;
> + int r;
>  
> - /* Don't bother with anything less than this */
> - /* unless we have a malloc(0) requests */
> - if (size != 0 && size < MALLOC_MINSIZE)
> + /* malloc(0) is special */
> + if (size == 0)
> + return 0;
> +
> + if (size < MALLOC_MINSIZE)
>   size = MALLOC_MINSIZE;
> + size--;
>  
> - /* Find the right bucket */
> - if (size == 0)
> - j = 0;
> - else {
> - j = MALLOC_MINSHIFT;
> - i = (size - 1) >> (MALLOC_MINSHIFT - 1);
> - while (i >>= 1)
> - j++;
> - }
> - return j;
> + r = MALLOC_MINSHIFT;
> + while (size >> r != 0)
> + r++;
> +
> + return r;
>  }
>  
>  static void
> @@ -941,7 +935,7 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
>   u_int i, r;
>   int j, listnum;
>   size_t k;
> - u_short u, b, *lp;
> + u_short *lp;
>   struct chunk_info *bp;
>   void *p;
>  
> @@ -970,15 +964,12 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
>   /* start somewhere in a short */
>   lp = >bits[i / MALLOC_BITS];
>   if (*lp) {
> - b = *lp;
> - k = i % MALLOC_BITS;
> - u = 1 << k;
> - while (k < MALLOC_BITS) {
> - if (b & u)
> - goto found;
> - k++;
> - u <<= 1;
> - } 
> + j = i % MALLOC_BITS;
> + k = ffs(*lp >> j);
> + if (k != 0) {
> + k += j - 1;
> + goto found;
> + }
>   }
>   /* no bit halfway, go to next full short */
>   i /= MALLOC_BITS;
> @@ -986,16 +977,10 @@ malloc_bytes(struct dir_info *d, size_t size, void *f)
>   if (++i >= bp->total / MALLOC_BITS)
>   i = 0;
>   lp = >bits[i];
> - if (*lp) {
> - b = *lp;
> - k = 0;
> - u = 1;
> - for (;;) {
> - if (b & u)
> - goto found;
> - k++;
> - u <<= 1;
> - }
> + k =