Hello, Ben.
Thanks for looking at this.

On 28.11.2017 01:54, Ben Pfaff wrote:
> On Mon, Nov 27, 2017 at 02:59:29PM +0300, Ilya Maximets wrote:
>>> I also verified the other case when posix_memalign isn't available and even 
>>> in that case
>>> it returns the address aligned on CACHE_LINE_SIZE boundary. I will send out 
>>> a patch to use
>>>  xzalloc_cacheline for allocating the memory.
>>
>> I don't know how you tested this, because it is impossible:
>>
>>      1. OVS allocates some memory:
>>              base = xmalloc(...);
>>
>>      2. Rounds it up to the cache line start:
>>              payload = (void **) ROUND_UP((uintptr_t) base, CACHE_LINE_SIZE);
>>
>>      3. Returns the pointer increased by 8 bytes:
>>              return (char *) payload + MEM_ALIGN;
>>
>> So, unless you redefined MEM_ALIGN to zero, you will never get aligned memory
>> address while allocating by xmalloc_cacheline() on system without 
>> posix_memalign().
> 
> Maybe we should make the non-HAVE_POSIX_MEMALIGN case better.  Something
> like this (compile tested only);

That's interesting, but the question here is a bit different.
We definitely can apply 5-10 more patches to fix this unaligned memory 
allocation,
paddings, holes in case of different cache line sizes / mutex sizes, whatever 
else
we forget to mention, but *do we need this*? Why we're making big efforts to 
make
this work as it should?

Padding/alignment of struct dp_netdev_pmd_thread:

- Cons:
        * Code complication.
        * Worse grouping of structure members.
          (In compare with what we can have without padding/alignment 
restrictions)
        * Poor extensibility: Even small struct modifications may require 
regrouping
          and rearranging of many other structure members. For example:
                
https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html
        * There is no any valuable performance impact (At least, all the 
results I saw
          shows close performance between 'with' and 'without' 
padding/alignment cases)
        * Broken now: Few additional patches required to make it work as 
intended.
                      (aligned memory allocation)
        * Targeted to optimize x86 only. Doesn't care about different cache 
line sizes,
          sizes of system/libc defined structures on different platforms. (Few 
more
          patches required to fix this)

- Pros:
        * Didn't find any.

I consider previously applied structure refactoring as an over-engineering 
which doesn't
have any positive impact.

---

Beside that, looking at the code below:
Do we care about false sharing on alloc/free operations? That is the difference 
between
old and new implementations. The new one will write and read bytes possibly 
placed
on a shared cacheline on alloc/free.

Best regards, Ilya Maximets.

> 
> diff --git a/lib/util.c b/lib/util.c
> index 9e6edd27ae4c..137091a3cd4f 100644
> --- a/lib/util.c
> +++ b/lib/util.c
> @@ -196,15 +196,9 @@ x2nrealloc(void *p, size_t *n, size_t s)
>      return xrealloc(p, *n * s);
>  }
>  
> -/* The desired minimum alignment for an allocated block of memory. */
> -#define MEM_ALIGN MAX(sizeof(void *), 8)
> -BUILD_ASSERT_DECL(IS_POW2(MEM_ALIGN));
> -BUILD_ASSERT_DECL(CACHE_LINE_SIZE >= MEM_ALIGN);
> -
> -/* Allocates and returns 'size' bytes of memory in dedicated cache lines.  
> That
> - * is, the memory block returned will not share a cache line with other data,
> - * avoiding "false sharing".  (The memory returned will not be at the start 
> of
> - * a cache line, though, so don't assume such alignment.)
> +/* Allocates and returns 'size' bytes of memory aligned to a cache line and 
> in
> + * dedicated cache lines.  That is, the memory block returned will not share 
> a
> + * cache line with other data, avoiding "false sharing".
>   *
>   * Use free_cacheline() to free the returned memory block. */
>  void *
> @@ -221,28 +215,37 @@ xmalloc_cacheline(size_t size)
>      }
>      return p;
>  #else
> -    void **payload;
> -    void *base;
> -
>      /* Allocate room for:
>       *
> -     *     - Up to CACHE_LINE_SIZE - 1 bytes before the payload, so that the
> -     *       start of the payload doesn't potentially share a cache line.
> +     *     - Header padding: Up to CACHE_LINE_SIZE - 1 bytes, to allow the
> +     *       pointer to be aligned exactly sizeof(void *) bytes before the
> +     *       beginning of a cache line.
>       *
> -     *     - A payload consisting of a void *, followed by padding out to
> -     *       MEM_ALIGN bytes, followed by 'size' bytes of user data.
> +     *     - Pointer: A pointer to the start of the header padding, to allow 
> us
> +     *       to free() the block later.
>       *
> -     *     - Space following the payload up to the end of the cache line, so
> -     *       that the end of the payload doesn't potentially share a cache 
> line
> -     *       with some following block. */
> -    base = xmalloc((CACHE_LINE_SIZE - 1)
> -                   + ROUND_UP(MEM_ALIGN + size, CACHE_LINE_SIZE));
> -
> -    /* Locate the payload and store a pointer to the base at the beginning. 
> */
> -    payload = (void **) ROUND_UP((uintptr_t) base, CACHE_LINE_SIZE);
> -    *payload = base;
> -
> -    return (char *) payload + MEM_ALIGN;
> +     *     - User data: 'size' bytes.
> +     *
> +     *     - Trailer padding: Enough to bring the user data up to a cache 
> line
> +     *       multiple.
> +     *
> +     * +---------------+---------+------------------------+---------+
> +     * | header        | pointer | user data              | trailer |
> +     * +---------------+---------+------------------------+---------+
> +     * ^               ^         ^
> +     * |               |         |
> +     * p               q         r
> +     *
> +     */
> +    void *p = xmalloc((CACHE_LINE_SIZE - 1)
> +                      + sizeof(void *)
> +                      + ROUND_UP(size, CACHE_LINE_SIZE));
> +    bool runt = PAD_SIZE((uintptr_t) p, CACHE_LINE_SIZE) < sizeof(void *);
> +    void *r = (void *) ROUND_UP((uintptr_t) p + (runt ? CACHE_LINE_SIZE : 0),
> +                                CACHE_LINE_SIZE);
> +    void **q = (void **) r - 1;
> +    *q = p;
> +    return r;
>  #endif
>  }
>  
> @@ -265,7 +268,8 @@ free_cacheline(void *p)
>      free(p);
>  #else
>      if (p) {
> -        free(*(void **) ((uintptr_t) p - MEM_ALIGN));
> +        void **q = (void **) p - 1;
> +        free(*q);
>      }
>  #endif
>  }
> 
> 
> 
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to