Re: request for testing: malloc bitmap scanning

2018-01-18 Thread Otto Moerbeek
On Sat, Jan 13, 2018 at 03:36:33PM +0100, Otto Moerbeek wrote:

> On Sat, Jan 13, 2018 at 11:14:27AM +0100, Otto Moerbeek wrote:
> 
> > On Sat, Jan 13, 2018 at 09:39:35AM +0100, Otto Moerbeek wrote:
> > 
> > > Hi,
> > > 
> > > This diff is based upon kshe's diff, but there's one differene: I am
> > > using the __builtin_ffs instead of ffs(3). Looking at the assembly
> > > generated by calling ffs(3) produces a function call, while the
> > > __builtin_ffs produces just a few machine instructions on all the
> > > platforms I've checked. __builtin_ffs also makes it more easy to port
> > > over this diff to the ld.so version of malloc.
> > > 
> > > Using a standalone test program both gcc and clang generate
> > > __builtin_ffs.  I do not know why __builtin_ffs is not generated by
> > > either gcc or clang when compiling libc.  I suppose the flags given to
> > > the compiler force it to generate a function call, but I did not find
> > > the actual flag that's to blame.
> > > 
> > > Please test on as much platforms as possible,
> > 
> > 
> > Hmmm, I now see an obvious way to make this diff better. Updated diff
> > will follow later,
> 
> Here it is,

guenther@ committed a diff in libc to make the compiler behave better when
using ffs(3). So I committed the diff below but with regular ffs(3) calls.

-Otto

> 
> Index: malloc.c
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.239
> diff -u -p -r1.239 malloc.c
> --- malloc.c  8 Jan 2018 12:20:23 -   1.239
> +++ malloc.c  13 Jan 2018 14:30:45 -
> @@ -931,7 +931,7 @@ malloc_bytes(struct dir_info *d, size_t 
>   u_int i, r;
>   int j, listnum;
>   size_t k;
> - u_short u, b, *lp;
> + u_short *lp;
>   struct chunk_info *bp;
>   void *p;
>  
> @@ -960,15 +960,12 @@ malloc_bytes(struct dir_info *d, size_t 
>   /* start somewhere in a short */
>   lp = >bits[i / MALLOC_BITS];
>   if (*lp) {
> - b = *lp;
> - k = i % MALLOC_BITS;
> - u = 1 << k;
> - while (k < MALLOC_BITS) {
> - if (b & u)
> - goto found;
> - k++;
> - u <<= 1;
> - } 
> + j = i % MALLOC_BITS;
> + k = __builtin_ffs(*lp >> j);
> + if (k != 0) {
> + k += j - 1;
> + goto found;
> + }
>   }
>   /* no bit halfway, go to next full short */
>   i /= MALLOC_BITS;
> @@ -977,15 +974,8 @@ malloc_bytes(struct dir_info *d, size_t 
>   i = 0;
>   lp = >bits[i];
>   if (*lp) {
> - b = *lp;
> - k = 0;
> - u = 1;
> - for (;;) {
> - if (b & u)
> - goto found;
> - k++;
> - u <<= 1;
> - }
> + k = __builtin_ffs(*lp) - 1;
> + break;
>   }
>   }
>  found:
> @@ -996,7 +986,7 @@ found:
>   }
>  #endif
>  
> - *lp ^= u;
> + *lp ^= 1 << k;
>  
>   /* If there are no more free, remove from free-list */
>   if (--bp->free == 0)



Re: request for testing: malloc bitmap scanning

2018-01-13 Thread Otto Moerbeek
On Sat, Jan 13, 2018 at 11:14:27AM +0100, Otto Moerbeek wrote:

> On Sat, Jan 13, 2018 at 09:39:35AM +0100, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > This diff is based upon kshe's diff, but there's one differene: I am
> > using the __builtin_ffs instead of ffs(3). Looking at the assembly
> > generated by calling ffs(3) produces a function call, while the
> > __builtin_ffs produces just a few machine instructions on all the
> > platforms I've checked. __builtin_ffs also makes it more easy to port
> > over this diff to the ld.so version of malloc.
> > 
> > Using a standalone test program both gcc and clang generate
> > __builtin_ffs.  I do not know why __builtin_ffs is not generated by
> > either gcc or clang when compiling libc.  I suppose the flags given to
> > the compiler force it to generate a function call, but I did not find
> > the actual flag that's to blame.
> > 
> > Please test on as much platforms as possible,
> 
> 
> Hmmm, I now see an obvious way to make this diff better. Updated diff
> will follow later,

Here it is,

-Otto

Index: malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.239
diff -u -p -r1.239 malloc.c
--- malloc.c8 Jan 2018 12:20:23 -   1.239
+++ malloc.c13 Jan 2018 14:30:45 -
@@ -931,7 +931,7 @@ malloc_bytes(struct dir_info *d, size_t 
u_int i, r;
int j, listnum;
size_t k;
-   u_short u, b, *lp;
+   u_short *lp;
struct chunk_info *bp;
void *p;
 
@@ -960,15 +960,12 @@ malloc_bytes(struct dir_info *d, size_t 
/* start somewhere in a short */
lp = >bits[i / MALLOC_BITS];
if (*lp) {
-   b = *lp;
-   k = i % MALLOC_BITS;
-   u = 1 << k;
-   while (k < MALLOC_BITS) {
-   if (b & u)
-   goto found;
-   k++;
-   u <<= 1;
-   } 
+   j = i % MALLOC_BITS;
+   k = __builtin_ffs(*lp >> j);
+   if (k != 0) {
+   k += j - 1;
+   goto found;
+   }
}
/* no bit halfway, go to next full short */
i /= MALLOC_BITS;
@@ -977,15 +974,8 @@ malloc_bytes(struct dir_info *d, size_t 
i = 0;
lp = >bits[i];
if (*lp) {
-   b = *lp;
-   k = 0;
-   u = 1;
-   for (;;) {
-   if (b & u)
-   goto found;
-   k++;
-   u <<= 1;
-   }
+   k = __builtin_ffs(*lp) - 1;
+   break;
}
}
 found:
@@ -996,7 +986,7 @@ found:
}
 #endif
 
-   *lp ^= u;
+   *lp ^= 1 << k;
 
/* If there are no more free, remove from free-list */
if (--bp->free == 0)



Re: request for testing: malloc bitmap scanning

2018-01-13 Thread Otto Moerbeek
On Sat, Jan 13, 2018 at 09:39:35AM +0100, Otto Moerbeek wrote:

> Hi,
> 
> This diff is based upon kshe's diff, but there's one differene: I am
> using the __builtin_ffs instead of ffs(3). Looking at the assembly
> generated by calling ffs(3) produces a function call, while the
> __builtin_ffs produces just a few machine instructions on all the
> platforms I've checked. __builtin_ffs also makes it more easy to port
> over this diff to the ld.so version of malloc.
> 
> Using a standalone test program both gcc and clang generate
> __builtin_ffs.  I do not know why __builtin_ffs is not generated by
> either gcc or clang when compiling libc.  I suppose the flags given to
> the compiler force it to generate a function call, but I did not find
> the actual flag that's to blame.
> 
> Please test on as much platforms as possible,


Hmmm, I now see an obvious way to make this diff better. Updated diff
will follow later,

-Otto
> 
> Index: malloc.c
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.239
> diff -u -p -r1.239 malloc.c
> --- malloc.c  8 Jan 2018 12:20:23 -   1.239
> +++ malloc.c  9 Jan 2018 11:44:40 -
> @@ -931,7 +931,7 @@ malloc_bytes(struct dir_info *d, size_t 
>   u_int i, r;
>   int j, listnum;
>   size_t k;
> - u_short u, b, *lp;
> + u_short *lp;
>   struct chunk_info *bp;
>   void *p;
>  
> @@ -960,15 +960,12 @@ malloc_bytes(struct dir_info *d, size_t 
>   /* start somewhere in a short */
>   lp = >bits[i / MALLOC_BITS];
>   if (*lp) {
> - b = *lp;
> - k = i % MALLOC_BITS;
> - u = 1 << k;
> - while (k < MALLOC_BITS) {
> - if (b & u)
> - goto found;
> - k++;
> - u <<= 1;
> - } 
> + j = i % MALLOC_BITS;
> + k = __builtin_ffs(*lp >> j);
> + if (k != 0) {
> + k += j - 1;
> + goto found;
> + }
>   }
>   /* no bit halfway, go to next full short */
>   i /= MALLOC_BITS;
> @@ -977,14 +974,10 @@ malloc_bytes(struct dir_info *d, size_t 
>   i = 0;
>   lp = >bits[i];
>   if (*lp) {
> - b = *lp;
> - k = 0;
> - u = 1;
> - for (;;) {
> - if (b & u)
> - goto found;
> - k++;
> - u <<= 1;
> + k = __builtin_ffs(*lp);
> + if (k != 0) {
> + k--;
> + break;
>   }
>   }
>   }
> @@ -996,7 +989,7 @@ found:
>   }
>  #endif
>  
> - *lp ^= u;
> + *lp ^= 1 << k;
>  
>   /* If there are no more free, remove from free-list */
>   if (--bp->free == 0)



request for testing: malloc bitmap scanning

2018-01-13 Thread Otto Moerbeek
Hi,

This diff is based upon kshe's diff, but there's one differene: I am
using the __builtin_ffs instead of ffs(3). Looking at the assembly
generated by calling ffs(3) produces a function call, while the
__builtin_ffs produces just a few machine instructions on all the
platforms I've checked. __builtin_ffs also makes it more easy to port
over this diff to the ld.so version of malloc.

Using a standalone test program both gcc and clang generate
__builtin_ffs.  I do not know why __builtin_ffs is not generated by
either gcc or clang when compiling libc.  I suppose the flags given to
the compiler force it to generate a function call, but I did not find
the actual flag that's to blame.

Please test on as much platforms as possible,

-Otto

Index: malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.239
diff -u -p -r1.239 malloc.c
--- malloc.c8 Jan 2018 12:20:23 -   1.239
+++ malloc.c9 Jan 2018 11:44:40 -
@@ -931,7 +931,7 @@ malloc_bytes(struct dir_info *d, size_t 
u_int i, r;
int j, listnum;
size_t k;
-   u_short u, b, *lp;
+   u_short *lp;
struct chunk_info *bp;
void *p;
 
@@ -960,15 +960,12 @@ malloc_bytes(struct dir_info *d, size_t 
/* start somewhere in a short */
lp = >bits[i / MALLOC_BITS];
if (*lp) {
-   b = *lp;
-   k = i % MALLOC_BITS;
-   u = 1 << k;
-   while (k < MALLOC_BITS) {
-   if (b & u)
-   goto found;
-   k++;
-   u <<= 1;
-   } 
+   j = i % MALLOC_BITS;
+   k = __builtin_ffs(*lp >> j);
+   if (k != 0) {
+   k += j - 1;
+   goto found;
+   }
}
/* no bit halfway, go to next full short */
i /= MALLOC_BITS;
@@ -977,14 +974,10 @@ malloc_bytes(struct dir_info *d, size_t 
i = 0;
lp = >bits[i];
if (*lp) {
-   b = *lp;
-   k = 0;
-   u = 1;
-   for (;;) {
-   if (b & u)
-   goto found;
-   k++;
-   u <<= 1;
+   k = __builtin_ffs(*lp);
+   if (k != 0) {
+   k--;
+   break;
}
}
}
@@ -996,7 +989,7 @@ found:
}
 #endif
 
-   *lp ^= u;
+   *lp ^= 1 << k;
 
/* If there are no more free, remove from free-list */
if (--bp->free == 0)