Re: svn commit: r334702 - head/sys/sys

2018-06-07 Thread Ravi Pokala
-Original Message-
From: Brooks Davis 
Date: 2018-06-07, Thursday at 09:27
To: Ravi Pokala 
Cc: "Jonathan T. Looney" , Mateusz Guzik , 
Mateusz Guzik , src-committers , 
, 
Subject: Re: svn commit: r334702 - head/sys/sys

> On Thu, Jun 07, 2018 at 12:01:00AM -0400, Ravi Pokala wrote:
>>> I believe the theory is that the compiler (remember, this is 
>>> __builtin_memset) can optimize away portions of the zeroing, or can 
>>> optimize zeroing for small sizes.
>> 
>> Ahhh! I didn't consider that the compiler would be doing analysis of the 
>> larger context, and potentially skipping zeroing parts that are set 
>> immediately after the call.
> 
> Clang does this.  It does make for some quite interesting object code,
> but the result is that zeroing with __builtin_memset() is basically free
> for mostly-initialized structures.
> 
> -- Brooks

Yeah, it's a subtle but obvious optimization in hindsight. Thanks for the info.

-Ravi (rpokala@)


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-07 Thread Brooks Davis
On Thu, Jun 07, 2018 at 12:01:00AM -0400, Ravi Pokala wrote:
> > I believe the theory is that the compiler (remember, this is 
> > __builtin_memset) can optimize away portions of the zeroing, or can 
> > optimize zeroing for small sizes.
> 
> Ahhh! I didn't consider that the compiler would be doing analysis of the 
> larger context, and potentially skipping zeroing parts that are set 
> immediately after the call.

Clang does this.  It does make for some quite interesting object code,
but the result is that zeroing with __builtin_memset() is basically free
for mostly-initialized structures.

-- Brooks


signature.asc
Description: PGP signature


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Ravi Pokala
> I believe the theory is that the compiler (remember, this is 
> __builtin_memset) can optimize away portions of the zeroing, or can optimize 
> zeroing for small sizes.

Ahhh! I didn't consider that the compiler would be doing analysis of the larger 
context, and potentially skipping zeroing parts that are set immediately after 
the call.

Thanks!

-Ravi (rpokala@)

-Original Message-
From: "Jonathan T. Looney" 
Date: 2018-06-06, Wednesday at 22:58
To: Ravi Pokala 
Cc: Mateusz Guzik , Mateusz Guzik , 
src-committers , , 

Subject: Re: svn commit: r334702 - head/sys/sys

> On Wed, Jun 6, 2018 at 10:14 PM, Ravi Pokala  wrote:
>>
>> -Original Message-
>> From:  on behalf of Mateusz Guzik 
>> 
>> Date: 2018-06-06, Wednesday at 09:01
>> To: Ravi Pokala 
>> Cc: Mateusz Guzik , src-committers 
>> , , 
>> 
>> Subject: Re: svn commit: r334702 - head/sys/sys
>>
>>> On Wed, Jun 6, 2018 at 1:35 PM, Ravi Pokala  wrote:
>>>
>>>>> + * Passing the flag down requires malloc to blindly zero the entire 
>>>>> object.
>>>>> + * In practice a lot of the zeroing can be avoided if most of the object
>>>>> + * gets explicitly initialized after the allocation. Letting the compiler
>>>>> + * zero in place gives it the opportunity to take advantage of this 
>>>>> state.
>>>>
>>>> This part, I still don't understand. :-(
>>>
>>> The call to bzero() is still for the full length passed in, so how does 
>>> this help?
>>>
>>> bzero is:
>>> #define bzero(buf, len) __builtin_memset((buf), 0, (len))
>> 
>> I'm afraid that doesn't answer my question; you're passing the full length 
>> to __builtin_memset() too.
> 
> I believe the theory is that the compiler (remember, this is 
> __builtin_memset) can optimize away portions of the zeroing, or can optimize 
> zeroing for small sizes.
> 
> For example, imagine you do this:
> 
> struct foo {
> uint32_t a;
> uint32_t b;
> };
> 
> struct foo *
> alloc_foo(void)
> {
> struct foo *rv;
> 
> rv = malloc(sizeof(*rv), M_TMP, M_WAITOK|M_ZERO);
> rv->a = 1;
> rv->b = 2;
> return (rv);
> }
> 
> In theory, the compiler can be smart enough to know that the entire structure 
> is initialized, so it is not necessary to zero it.
> 
> (I personally have not tested how well this works in practice. However, this 
> change theoretically lets the compiler be smarter and optimize away unneeded 
> work.)
> 
> At minimum, it should let the compiler replace calls to memset() (and the 
> loops there) with optimal instructions to zero the exact amount of memory 
> that needs to be initialized. (Again, I haven't personally tested how smart 
> the compilers we use are about producing optimal code in this situation.)
> 
> Jonathan


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Jonathan T. Looney
On Wed, Jun 6, 2018 at 10:14 PM, Ravi Pokala  wrote:
>
> -Original Message-
> From:  on behalf of Mateusz Guzik <
mjgu...@gmail.com>
> Date: 2018-06-06, Wednesday at 09:01
> To: Ravi Pokala 
> Cc: Mateusz Guzik , src-committers <
src-committ...@freebsd.org>, , <
svn-src-h...@freebsd.org>
> Subject: Re: svn commit: r334702 - head/sys/sys
>
> > On Wed, Jun 6, 2018 at 1:35 PM, Ravi Pokala  wrote:
> >
> >>> + * Passing the flag down requires malloc to blindly zero the entire
object.
> >>> + * In practice a lot of the zeroing can be avoided if most of the
object
> >>> + * gets explicitly initialized after the allocation. Letting the
compiler
> >>> + * zero in place gives it the opportunity to take advantage of this
state.
> >>
> >> This part, I still don't understand. :-(
> >
> > The call to bzero() is still for the full length passed in, so how does
this help?
> >
> > bzero is:
> > #define bzero(buf, len) __builtin_memset((buf), 0, (len))
>
> I'm afraid that doesn't answer my question; you're passing the full
length to __builtin_memset() too.


I believe the theory is that the compiler (remember, this is
__builtin_memset) can optimize away portions of the zeroing, or can
optimize zeroing for small sizes.

For example, imagine you do this:

struct foo {
uint32_t a;
uint32_t b;
};

struct foo *
alloc_foo(void)
{
struct foo *rv;

rv = malloc(sizeof(*rv), M_TMP, M_WAITOK|M_ZERO);
rv->a = 1;
rv->b = 2;
return (rv);
}

In theory, the compiler can be smart enough to know that the entire
structure is initialized, so it is not necessary to zero it.

(I personally have not tested how well this works in practice. However,
this change theoretically lets the compiler be smarter and optimize away
unneeded work.)

At minimum, it should let the compiler replace calls to memset() (and the
loops there) with optimal instructions to zero the exact amount of memory
that needs to be initialized. (Again, I haven't personally tested how smart
the compilers we use are about producing optimal code in this situation.)

Jonathan
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Ravi Pokala
-Original Message-
From:  on behalf of Mateusz Guzik 

Date: 2018-06-06, Wednesday at 09:01
To: Ravi Pokala 
Cc: Mateusz Guzik , src-committers 
, , 

Subject: Re: svn commit: r334702 - head/sys/sys

> On Wed, Jun 6, 2018 at 1:35 PM, Ravi Pokala  wrote:
> 
>>> + * Passing the flag down requires malloc to blindly zero the entire object.
>>> + * In practice a lot of the zeroing can be avoided if most of the object
>>> + * gets explicitly initialized after the allocation. Letting the compiler
>>> + * zero in place gives it the opportunity to take advantage of this state.
>>
>> This part, I still don't understand. :-(
> 
> The call to bzero() is still for the full length passed in, so how does this 
> help?
> 
> bzero is:
> #define bzero(buf, len) __builtin_memset((buf), 0, (len))

I'm afraid that doesn't answer my question; you're passing the full length to 
__builtin_memset() too.

>>> ...
>>> + *   _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);
>>> + *   if (((flags) & M_WAITOK) != 0 || _malloc_item != NULL)
>>> + *   bzero(_malloc_item, _size);
>>> + *
>>> + * If the flag is set, the compiler knows the left side is always true,
>>> + * therefore the entire statement is true and the callsite is:
>> 
>> I think you mean "... the *right* side is always true ...", since the left 
>> side is the check for the flag being set. "If the flag is set, compiler 
>> knows (the check for the flag being set) is always true" is tautological.
> 
> It explains how __builtin_constant_p(flags) being true allows the compiler
> to optimize out the flags-based check.

The __builtin_constant_p()s are in the conditional *before* the one I'm asking 
about. The test for M_WAITOK on the left being true, will cause the NULL-check 
on the right to be skipped because of the short-circuit semantics of binary ||. 
But because M_WAITOK is set, the NULL-check will pass anyway... Ah, and that's 
what you meant by "therefore the entire statement is true".

I think the wording was throwing me; it might be clearer English to say 
something like "If the flag is set -- and we only get here if that can be 
determined at compile-time, because of __builtin_constant_p() -- then the 
entire statement is true. This skips the NULL-check, but it will always pass if 
the flag is set anyway."

> I don't understand why this particular use runs into so much confusion.
> Just above it there is a M_ZERO check relying on the same property and
> receiving no attention.

In that context, it's clearer what's the condition is:
- "size" must be constant at compile-time
- "flags" must be constant at compile-time
- "flags" must have M_ZERO set

>>> ...
>>> + * If the flag is not set, the compiler knows the left size is always false
>>> + * and the NULL check is needed, therefore the callsite is:
>> 
>> Same issue here.

And same answer here too.

>>> ...
>>>  #ifdef _KERNEL
>>>  #define  malloc(size, type, flags) ({  
>>>   \
>> 
>> Now that I'm taking another look at this, I'm confused as to why the entire 
>> macro expansion is inside parentheses? (The braces make sense, since this is 
>> a block with local variables which need to be contained.)
> 
> It is to return the value (the last expression).

Yeah, Ben / Bruce / Conrad clarified that.

>>>   void *_malloc_item; \
>>> @@ -193,7 +228,8 @@ void  *malloc(size_t size, struct malloc_type 
>>> *type, in
>>>   if (__builtin_constant_p(size) && __builtin_constant_p(flags) &&\
>>>   ((flags) & M_ZERO) != 0) {  \
>>>   _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);  \
>>> - if (((flags) & M_WAITOK) != 0 || _malloc_item != NULL)  \
>>> + if (((flags) & M_WAITOK) != 0 ||\
>>> + __predict_true(_malloc_item != NULL))   \
>>>   bzero(_malloc_item, _size); \
>>>   } else {\
>>>   _malloc_item = malloc(_size, type, flags);  \
>> 
>> This confuses me too. If the constant-size/constant-flags/M_ZERO-is-set test 
>> fails, then it falls down to calling malloc(). Which we are in the middle of 
>> defining. So what does that expand to?
> 
> Expansion is not recursive, so this is an actual call to malloc. 

Ah, right. I swear I knew that at some point. :-)

> -- 
> Mateusz Guzik http://gmail.com>>

Thanks Mateusz,

-Ravi (rpokala@)


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Conrad Meyer
Hi Ben, Ravi,

GCC/Clang are fine with the just curly braces, or just do/while(0).

The core benefit of the ({ }) syntax is to allow multiple-statement
macros to have expression syntax externally (i.e., yield a value) —
which is not possible in standard C due to concession to simplistic
parsers.  Ben's link is a good one and does cover this in more detail.
The URL ("Statement-Exprs") is a hint ;-).

Best,
Conrad

On Wed, Jun 6, 2018 at 4:40 AM, Benjamin Kaduk  wrote:
> On Wed, Jun 6, 2018 at 6:35 AM, Ravi Pokala  wrote:
>>
>> Hi Mateusz,
>>
>> -Original Message-
>> From:  on behalf of Mateusz Guzik
>> 
>> Date: 2018-06-06, Wednesday at 01:08
>> To: , ,
>> 
>> Subject: svn commit: r334702 - head/sys/sys
>>
>> > ...
>> >  #ifdef _KERNEL
>> >  #define  malloc(size, type, flags) ({
>> > \
>>
>> Now that I'm taking another look at this, I'm confused as to why the
>> entire macro expansion is inside parentheses? (The braces make sense, since
>> this is a block with local variables which need to be contained.)
>
>
> This is a gcc (and clang) extension to allow the macro body to be a code
> block -- standard C gets unhappy with just the curly braces.
> https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html is a maybe-relevant
> page that google found me.
>
> -Ben
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Ravi Pokala
-Original Message-
From:  on behalf of Benjamin Kaduk 

Date: 2018-06-06, Wednesday at 07:40
To: Ravi Pokala 
Cc: Mateusz Guzik , src-committers 
, , 

Subject: Re: svn commit: r334702 - head/sys/sys

> On Wed, Jun 6, 2018 at 6:35 AM, Ravi Pokala  wrote:
>> 
>> Hi Mateusz,
>> 
>> -Original Message-
>> From:  on behalf of Mateusz Guzik 
>> 
>> Date: 2018-06-06, Wednesday at 01:08
>> To: , , 
>> 
>> Subject: svn commit: r334702 - head/sys/sys
>> 
>>> ...
>>>  #ifdef _KERNEL
>>>  #define  malloc(size, type, flags) ({  
>>>   \
>> 
>> Now that I'm taking another look at this, I'm confused as to why the entire 
>> macro expansion is inside parentheses? (The braces make sense, since this is 
>> a block with local variables which need to be contained.)
> 
> This is a gcc (and clang) extension to allow the macro body to be a code 
> block -- standard C gets unhappy with just the curly braces.  
> https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html is a maybe-relevant 
> page that google found me.

"Neat." Thanks Ben.

-Ravi

> -Ben 





___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Mateusz Guzik
On Wed, Jun 6, 2018 at 1:35 PM, Ravi Pokala  wrote:

> > + * Passing the flag down requires malloc to blindly zero the entire
> object.
> > + * In practice a lot of the zeroing can be avoided if most of the object
> > + * gets explicitly initialized after the allocation. Letting the
> compiler
> > + * zero in place gives it the opportunity to take advantage of this
> state.
>
> This part, I still don't understand. :-(
>
> The call to bzero() is still for the full length passed in, so how does
> this help?
>
>
bzero is:
#define bzero(buf, len) __builtin_memset((buf), 0, (len))

> ...
> > + *   _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);
> > + *   if (((flags) & M_WAITOK) != 0 || _malloc_item != NULL)
> > + *   bzero(_malloc_item, _size);
> > + *
> > + * If the flag is set, the compiler knows the left side is always true,
> > + * therefore the entire statement is true and the callsite is:
>
> I think you mean "... the *right* side is always true ...", since the left
> side is the check for the flag being set. "If the flag is set, compiler
> knows (the check for the flag being set) is always true" is tautological.
>

It explains how __builtin_constant_p(flags) being true allows the compiler
to optimize out the flags-based check.

I don't understand why this particular use runs into so much confusion.
Just above it there is a M_ZERO check relying on the same property and
receiving no attention.

> ...
> > + * If the flag is not set, the compiler knows the left size is always
> false
> > + * and the NULL check is needed, therefore the callsite is:
>
> Same issue here.
>
> > ...
> >  #ifdef _KERNEL
> >  #define  malloc(size, type, flags) ({
>   \
>
> Now that I'm taking another look at this, I'm confused as to why the
> entire macro expansion is inside parentheses? (The braces make sense, since
> this is a block with local variables which need to be contained.)
>
>
It is to return the value (the last expression).


> >   void *_malloc_item; \
> > @@ -193,7 +228,8 @@ void  *malloc(size_t size, struct malloc_type
> *type, in
> >   if (__builtin_constant_p(size) && __builtin_constant_p(flags) &&\
> >   ((flags) & M_ZERO) != 0) {  \
> >   _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);  \
> > - if (((flags) & M_WAITOK) != 0 || _malloc_item != NULL)  \
> > + if (((flags) & M_WAITOK) != 0 ||\
> > + __predict_true(_malloc_item != NULL))   \
> >   bzero(_malloc_item, _size); \
> >   } else {\
> >   _malloc_item = malloc(_size, type, flags);  \
>
> This confuses me too. If the constant-size/constant-flags/M_ZERO-is-set
> test fails, then it falls down to calling malloc(). Which we are in the
> middle of defining. So what does that expand to?
>
>
Expansion is not recursive, so this is an actual call to malloc.

-- 
Mateusz Guzik 
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Bruce Evans

On Wed, 6 Jun 2018, Benjamin Kaduk wrote:


On Wed, Jun 6, 2018 at 6:35 AM, Ravi Pokala  wrote:


Hi Mateusz,
...

...
 #ifdef _KERNEL
 #define  malloc(size, type, flags) ({

  \

Now that I'm taking another look at this, I'm confused as to why the
entire macro expansion is inside parentheses? (The braces make sense, since
this is a block with local variables which need to be contained.)


This is a gcc (and clang) extension to allow the macro body to be a code
block -- standard C gets unhappy with just the curly braces.
https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html is a maybe-relevant
page that google found me.


Not really.

This is a syntax error which is only accepted by broken compilers,
since the language standard is doubly mis-specified by hard-coding it
to a wrong value using CSTD=c89 in kern.mk.  This asks for c99 with
no extensions, but many extensions are needed.  These bugs are missing
in the corresponding userland makefile bsd.sys.mk.  That uses CSTD?=gnu99.
This asks for c99 with gnu extensions, which is what is needed, but allows
the user to ask for another standard by setting CSTD.

To allow the user to ask for c99, or just to not depend on compiler bugs
when kern.mk asks for c99, use of this and other extensions should be
marked with __extension__, as is most often done for this particular
extension.

Braces in macros are perfectly standard.  The extension is to allow a
compound statement (delimited by braces) to return a value.  This is done
by enclosing the statement in parentheses.  The value of the statement is
the value of the last expression in it.  This is an extension of c99, since
in c99 about the only things that can be enclosed in parentheses are
expressions, but general statements are not expressions.  Especially
compound statements.

This used to be properly documented (in installed documentation) in
/usr/share/info/gcc.info.gz.

Bruce
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Benjamin Kaduk
On Wed, Jun 6, 2018 at 6:35 AM, Ravi Pokala  wrote:

> Hi Mateusz,
>
> -Original Message-
> From:  on behalf of Mateusz Guzik
> 
> Date: 2018-06-06, Wednesday at 01:08
> To: , , <
> svn-src-h...@freebsd.org>
> Subject: svn commit: r334702 - head/sys/sys
>
> > ...
> >  #ifdef _KERNEL
> >  #define  malloc(size, type, flags) ({
>   \
>
> Now that I'm taking another look at this, I'm confused as to why the
> entire macro expansion is inside parentheses? (The braces make sense, since
> this is a block with local variables which need to be contained.)
>

This is a gcc (and clang) extension to allow the macro body to be a code
block -- standard C gets unhappy with just the curly braces.
https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html is a maybe-relevant
page that google found me.

-Ben
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r334702 - head/sys/sys

2018-06-06 Thread Ravi Pokala
Hi Mateusz,

-Original Message-
From:  on behalf of Mateusz Guzik 

Date: 2018-06-06, Wednesday at 01:08
To: , , 

Subject: svn commit: r334702 - head/sys/sys

> Author: mjg
> Date: Wed Jun  6 05:08:05 2018
> New Revision: 334702
> URL: https://svnweb.freebsd.org/changeset/base/334702
> 
> Log:
>   malloc: elaborate on r334545 due to frequent questions

I was one of the questioners. :-) Thank you for explaining the conditional 
logic.

> ...
> + * Passing the flag down requires malloc to blindly zero the entire object.
> + * In practice a lot of the zeroing can be avoided if most of the object
> + * gets explicitly initialized after the allocation. Letting the compiler
> + * zero in place gives it the opportunity to take advantage of this state.

This part, I still don't understand. :-(

The call to bzero() is still for the full length passed in, so how does this 
help?

> ...
> + *   _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);
> + *   if (((flags) & M_WAITOK) != 0 || _malloc_item != NULL)
> + *   bzero(_malloc_item, _size);
> + *
> + * If the flag is set, the compiler knows the left side is always true,
> + * therefore the entire statement is true and the callsite is:

I think you mean "... the *right* side is always true ...", since the left side 
is the check for the flag being set. "If the flag is set, compiler knows (the 
check for the flag being set) is always true" is tautological.

> ...
> + * If the flag is not set, the compiler knows the left size is always false
> + * and the NULL check is needed, therefore the callsite is:

Same issue here.

> ...
>  #ifdef _KERNEL
>  #define  malloc(size, type, flags) ({
> \

Now that I'm taking another look at this, I'm confused as to why the entire 
macro expansion is inside parentheses? (The braces make sense, since this is a 
block with local variables which need to be contained.)

>   void *_malloc_item; \
> @@ -193,7 +228,8 @@ void  *malloc(size_t size, struct malloc_type *type, 
> in
>   if (__builtin_constant_p(size) && __builtin_constant_p(flags) &&\
>   ((flags) & M_ZERO) != 0) {  \
>   _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);  \
> - if (((flags) & M_WAITOK) != 0 || _malloc_item != NULL)  \
> + if (((flags) & M_WAITOK) != 0 ||\
> + __predict_true(_malloc_item != NULL))   \
>   bzero(_malloc_item, _size); \
>   } else {\
>   _malloc_item = malloc(_size, type, flags);  \

This confuses me too. If the constant-size/constant-flags/M_ZERO-is-set test 
fails, then it falls down to calling malloc(). Which we are in the middle of 
defining. So what does that expand to?

Thanks,

Ravi (rpokala@)


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


svn commit: r334702 - head/sys/sys

2018-06-05 Thread Mateusz Guzik
Author: mjg
Date: Wed Jun  6 05:08:05 2018
New Revision: 334702
URL: https://svnweb.freebsd.org/changeset/base/334702

Log:
  malloc: elaborate on r334545 due to frequent questions
  
  While here annotate the NULL check as probably true.

Modified:
  head/sys/sys/malloc.h

Modified: head/sys/sys/malloc.h
==
--- head/sys/sys/malloc.h   Wed Jun  6 02:48:09 2018(r334701)
+++ head/sys/sys/malloc.h   Wed Jun  6 05:08:05 2018(r334702)
@@ -186,6 +186,41 @@ void   free(void *addr, struct malloc_type *type);
 void   free_domain(void *addr, struct malloc_type *type);
 void   *malloc(size_t size, struct malloc_type *type, int flags) __malloc_like
__result_use_check __alloc_size(1);
+/*
+ * Try to optimize malloc(..., ..., M_ZERO) allocations by doing zeroing in
+ * place if the size is known at compilation time.
+ *
+ * Passing the flag down requires malloc to blindly zero the entire object.
+ * In practice a lot of the zeroing can be avoided if most of the object
+ * gets explicitly initialized after the allocation. Letting the compiler
+ * zero in place gives it the opportunity to take advantage of this state.
+ *
+ * Note that the operation is only applicable if both flags and size are
+ * known at compilation time. If M_ZERO is passed but M_WAITOK is not, the
+ * allocation can fail and a NULL check is needed. However, if M_WAITOK is
+ * passed we know the allocation must succeed and the check can be elided.
+ *
+ * _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);
+ * if (((flags) & M_WAITOK) != 0 || _malloc_item != NULL)
+ * bzero(_malloc_item, _size);
+ *
+ * If the flag is set, the compiler knows the left side is always true,
+ * therefore the entire statement is true and the callsite is:
+ *
+ * _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);
+ * bzero(_malloc_item, _size);
+ *
+ * If the flag is not set, the compiler knows the left size is always false
+ * and the NULL check is needed, therefore the callsite is:
+ *
+ * _malloc_item = malloc(_size, type, (flags) &~ M_ZERO);
+ * if (_malloc_item != NULL)
+ * bzero(_malloc_item, _size); 
+ *
+ * The implementation is a macro because of what appears to be a clang 6 bug:
+ * an inline function variant ended up being compiled to a mere malloc call
+ * regardless of argument. gcc generates expected code (like the above).
+ */
 #ifdef _KERNEL
 #definemalloc(size, type, flags) ({
\
void *_malloc_item; \
@@ -193,7 +228,8 @@ void*malloc(size_t size, struct malloc_type *type, 
in
if (__builtin_constant_p(size) && __builtin_constant_p(flags) &&\
((flags) & M_ZERO) != 0) {  \
_malloc_item = malloc(_size, type, (flags) &~ M_ZERO);  \
-   if (((flags) & M_WAITOK) != 0 || _malloc_item != NULL)  \
+   if (((flags) & M_WAITOK) != 0 ||\
+   __predict_true(_malloc_item != NULL))   \
bzero(_malloc_item, _size); \
} else {\
_malloc_item = malloc(_size, type, flags);  \
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"