Re: new module 'aligned-malloc'

2020-07-22 Thread Paul Eggert

On 7/22/20 3:55 PM, Bruno Haible wrote:


Of course this performance issue is mostly just for MS-Windows, as other major
current platforms already have aligned_alloc or rough equivalent.


No, it's not only Windows. It's
   - macOS
   - Minix
   - native Windows


macOS has had posix_memalign since Mac OS X 10.6 (2009) - at least, that's 
what's in Apple's documentation 
. 
We needn't worry about older OS releases, as Apple doesn't support them.


Minix is moribund (its last release was 2014) and is no longer a significant 
porting target.


So the only significant issue for aligned_alloc is performance on native 
Windows. And my guess is that we can make it perform well-enough there, for many 
applications (including your application I think).




Re: new module 'aligned-malloc'

2020-07-22 Thread Paul Eggert

On 7/22/20 4:04 PM, Bruno Haible wrote:


Probably most posix_memalign / memalign implementations will round up the
request to a 16-bytes allocation. But if some implementation can give me just
8 bytes, properly aligned, without wasting the next 8 bytes, why should I not
make use of it?


I suspect the scenario you're suggesting isn't worth the hassle of doing the 
optimization. And quite possibly it wouldn't be an optimization at all, as 
"wasting" the next 8 bytes could improve CPU performance significantly for some 
apps on hardware platforms with split i and d caches where writing to the d 
cache invalidates the i cache of the same cache line.


Anyway, thanks for letting me know the application you had in mind.



Re: new module 'aligned-malloc'

2020-07-22 Thread Jeffrey Walton
On Wed, Jul 22, 2020 at 6:56 PM Bruno Haible  wrote:
>
> Hi Paul,
>
> > Of course this performance issue is mostly just for MS-Windows, as other 
> > major
> > current platforms already have aligned_alloc or rough equivalent.
>
> No, it's not only Windows. It's
>   - macOS
>   - Minix
>   - native Windows
>
> Here's the complete matrix:
>
>  posix_memalign   aligned_alloc   memalign
>
> glibc Y Y Y
> musl  Y Y Y
> macOS - - -
> FreeBSD   Y Y -
> NetBSDY - -
> OpenBSD   Y Y -
> AIX   Y - -
> HP-UX - - Y
> IRIX  - - Y
> Solaris 10- - Y
> Solaris 11Y Y -
> Minix - - -
> Haiku Y - Y
> Android   Y - Y
> CygwinY - Y
> native Windows- - -

This may be helpful for OS X... all pointers returned from malloc(),
calloc() and friends are 16-byte aligned. From the OS X man page:

MALLOC(3)BSD Library Functions ManualMALLOC(3)

NAME
 calloc, free, malloc, realloc, reallocf, valloc -- memory allocation

SYNOPSIS
 #include 

 void *
 calloc(size_t count, size_t size);

 void
 free(void *ptr);

 void *
 malloc(size_t size);

 void *
 realloc(void *ptr, size_t size);

 void *
 reallocf(void *ptr, size_t size);

 void *
 valloc(size_t size);

DESCRIPTION
 The malloc(), calloc(), valloc(), realloc(), and reallocf() functions
 allocate memory.  The allocated memory is aligned such that it can be
 used for any data type, including AltiVec- and SSE-related types.  The
 free() function frees allocations that were created via the preceding
 allocation functions.
 ...

Jeff



Re: new module 'aligned-malloc'

2020-07-22 Thread Bruno Haible
Paul Eggert wrote:
> What uses of posix_memalign have a SIZE that is not a multiple of ALIGNMENT?

My current use-case is to generate machine code in memory (like a just-in-time
compiler). For x86, you can easily have a basic block of 7 bytes, which ought to
be aligned on a 16-bytes boundary.

Probably most posix_memalign / memalign implementations will round up the
request to a 16-bytes allocation. But if some implementation can give me just
8 bytes, properly aligned, without wasting the next 8 bytes, why should I not
make use of it?

Bruno




Re: new module 'aligned-malloc'

2020-07-22 Thread Bruno Haible
Hi Paul,

> Of course this performance issue is mostly just for MS-Windows, as other 
> major 
> current platforms already have aligned_alloc or rough equivalent.

No, it's not only Windows. It's
  - macOS
  - Minix
  - native Windows

Here's the complete matrix:

 posix_memalign   aligned_alloc   memalign

glibc Y Y Y
musl  Y Y Y
macOS - - -
FreeBSD   Y Y -
NetBSDY - -
OpenBSD   Y Y -
AIX   Y - -
HP-UX - - Y
IRIX  - - Y
Solaris 10- - Y
Solaris 11Y Y -
Minix - - -
Haiku Y - Y
Android   Y - Y
CygwinY - Y
native Windows- - -

Bruno




Re: new module 'aligned-malloc'

2020-07-22 Thread Paul Eggert

On 7/21/20 9:21 AM, Bruno Haible wrote:

I prefer posix_memalign to aligned_alloc, because aligned_alloc requires
additionally that the SIZE is a multiple of the ALIGNMENT.


What uses of posix_memalign have a SIZE that is not a multiple of ALIGNMENT? I'm 
asking partly because every time I've done that in the past (admittedly not that 
often), I've ended up regretting it.




Re: new module 'aligned-malloc'

2020-07-22 Thread Paul Eggert

On 7/21/20 4:29 PM, Bruno Haible wrote:


However, the algorithm above would be _grossly_ inefficient - especially for
bigger alignments such as 128 or 512.
I don't think performance would be that bad, for many applications at least. I 
suppose we could measure overall performance if the topic ever comes up. Of 
course this performance issue is mostly just for MS-Windows, as other major 
current platforms already have aligned_alloc or rough equivalent.



In a lot of code I've seen, allocation and
deallocation of some data type is done nearby in the code; then it's not
a big burden to use xyz_free for the results of xyz_alloc.


True, and for these, aligned_malloc is OK. But I would rather not have "we 
should be compatible with aligned_malloc" leak into a bunch of other modules. As 
I see it, aligned_malloc is a crutch for MS-Windows (plus some less-important 
nonstandard platforms) and it shouldn't be driving our code or our APIs.




Re: new module 'aligned-malloc'

2020-07-22 Thread Paul Eggert

On 7/22/20 12:13 AM, Florian Weimer wrote:

I don't think it will work.  Try to get an allocation of 4096 bytes with
4096 bytes alignment using glibc malloc this way.


That's just a mental exercise since glibc malloc already has aligned_alloc, but 
I took the challenge anyway and found that it's eminently doable with glibc 
malloc alone. Run the attached program, and it should exit with status 0. At 
least, it works for me.


There is of course some runtime overhead, but the performance should be good 
enough for many, perhaps even most, Gnulib purposes; and I expect we can tweak 
it to run faster on Microsoft platforms (which seem to be the only major 
platform that still don't support aligned allocation natively) if this is a 
significant concern.
#include 
#include 
#include 
#include 

int
main (void)
{
  void *p;
  for (ptrdiff_t i = 4096; (intptr_t) (p = malloc (i)) & 4095; i++)
printf ("allocation of %td bytes yielded unaligned pointer %p\n",
	i, p);
  if (!p)
printf ("allocation failed\n");
  return 0;
}


Re: new module 'aligned-malloc'

2020-07-22 Thread Paul Eggert

On 7/21/20 2:43 PM, Jeffrey Walton wrote:

Eventually I hope even Microsoft will figure out how to do aligned allocation,
and even if my hope is dashed that's OK, our code will still work.

I believe Redmond has _aligned_malloc.


Unfortunately it requires that the storage be freed via _aligned_free (which is 
basically what the newly-introduced Gnulib module does). It's better to return a 
pointer that you can pass to plain 'free'.




Re: stack module

2020-07-22 Thread Marc Nieper-Wißkirchen
Am Sa., 23. Mai 2020 um 19:19 Uhr schrieb Bruno Haible :
>
> Marc Nieper-Wißkirchen wrote:
> > > I was expecting that you write
> > >
> > >   struct
> > >   {
> > > void *base; ...
> > >   }
> >
> > This removes type safety. The benefit of the current approach is that
> > stack types of different types are not compatible.
>
> Indeed. Yes, it's a difficult trade-off between debuggability, binary code 
> size,
> and type safety...

The alternative with the same type safety would be a source file with
stack code procedures meant for inclusion (without include guards).
The source file would expect a preprocessor defines GL_STACK_NAME,
GL_STACK_TYPE, and GL_STACK_EXTERN.

The file itself would contain code like the following:

#define _GL_STACK_PREFIX(name) _GL_CONCAT(GL_STACK_NAME, _GL_CONCAT(_, name))

typedef struct
{
  GL_STACK_TYPE *base;
  size_t size;
  size_t allocated;
}
GL_STACK_PREFIX(type);

GL_STACK_EXTERN GL_STACK_PREFIX(init) (GL_STACK_PREFIX(type) *stack)
{
  stack->base = NULL;
  stack->size = 0;
  stack->allocated = 0;
}

...

The advantage of this model is that it generalizes to other data
structures, for which a sole (or at least simple) macro implementation
is not possible.

What do you think?

Marc



Re: Handling of runpaths

2020-07-22 Thread Jeffrey Walton
On Fri, Jul 17, 2020 at 4:16 PM Bruno Haible  wrote:
>
> Hi Jeffrey,
>
> > I noticed my runpaths are re-ordered in libraries like
> > libgettextsrc.so, libtextstyle.so, libgettextpo.so, libgettextlib.so,
> > libgettextlib.so, libintl.so and libunistring.so. For example, I use
> > LDFLAGS of:
> >
> > -Wl,-runpath,'$ORIGIN/../lib' -Wl,-runpath,$(prefix)/lib
> > -Wl,--enable-new-dtags
> >
> > Later, when I audit the runpaths, I see the following (when building
> > OpenSSH and dependencies for /opt/ssh):
> >
> > RUNPATH: /opt/ssh/lib:$ORIGIN/../lib
> >
> > Notice the ordering has been changed.
> >
> > Is the re-ordering of runpaths expected?
>
> I don't know. You would need
>   1) to find the specification(s) of -runpath. It must be recent: a
>  GNU ld from 2015 does not support it,
>   2) to determine (using "gcc -v") whether it's libtool, gcc, or the
>  linker which does the reordering.
>
> The specification of DT_RUNPATH appears to be in [1].

Thanks Bruno.

I was talking to the OpenLDAP folks. Their module has 3 problems, so
it was the one I looked at first.

They knew the reordering problem. They said it was libtool. They said
in the past they used to provide a hacked libtool to work around it.
They said they stopped hacking libtool because it needed to happen too
frequently (each update?).

Jeff



Re: new module 'aligned-malloc'

2020-07-22 Thread Florian Weimer
* Paul Eggert:

> On 7/21/20 8:51 AM, Florian Weimer wrote:
>> The official aligned_alloc produces pointers compatible with free.
>> This module cannot do that.
>
> I don't see why not, at least on platforms of interest to Gnulib. On
> systems that provide no native way to do an aligned allocation, we
> merely keep calling malloc with suitable arguments until we get a
> pointer that is suitably aligned. We then free all the unsuitable
> storage we allocated along the way, and return the good pointer.

I don't think it will work.  Try to get an allocation of 4096 bytes with
4096 bytes alignment using glibc malloc this way.

Thanks,
Florian