Re: strlcpy version speed tests?

2020-07-04 Thread Stuart Longland
On 1/7/20 10:05 pm, Luke Small wrote:
> Are you clinging to traditions for some purpose?

Are you posting random pieces of code and asking for critique on them
for no apparent reason for some purpose?

To be clear, this was the sum and total of your first message in this
thread (excluding attachment for brevity):

> I made a couple different versions if anybody is interested!
> -Luke
Why?  Why strlcpy?  Why not strcpy?  Or memcpy?  Why not the whole libc?
 Zero context.  The email headers and the C source code attachment are
99% of the whole email.

None of those headers start with 'References:' or 'In-Reply-To:', it was
a completely detached email with no link to any existing discussion,
either declared explicitly or implied by its content.

Your single line message seemed like it was asking: "Am I allowed to
bench-test this?"  As if we have the power to stop you.  Go ahead,
bench-test away!

As to why the stock OpenBSD implementation is written a particular way?
 Well, likely a big part of it is wanting the code to behave the same
way in multiple scenarios, e.g. gcc vs clang, AMD64 vs ARM64 vs i386 vs
mips64 vs sparc vs … you get the picture.

Assembly is the "fastest" option, but requires one "implementation" for
each processor architecture, and receives no benefit from improvements
in optimising compilers.

C means it's written *once* and ideally will perform identically for all
systems, whilst also being easier to understand and maintain.  If a
problem is found on AMD64 for example, it's merely testing a fix already
committed there on other architectures to ensure they don't break.
Versus fixing it about 6 or 7 times, each time figuring out how to
express the same "fix" in _that_ processor's assembly dialect.

I think it naïve to assume that an implementation written to run faster
on one processor architecture and compiled with one compiler will
universally run faster on all other processor+compiler combinations.

Anyway, I've spent more words on this than I care to.  So if you don't
mind, I'll be instructing my email client to ignore this thread from
here on in.

Regards,
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.



Re: strlcpy version speed tests?

2020-07-04 Thread Otto Moerbeek
On Sat, Jul 04, 2020 at 09:07:35AM -0400, Brian Brombacher wrote:

> 
> >> On Jul 1, 2020, at 1:14 PM, gwes  wrote:
> >> 
> >> On 7/1/20 8:05 AM, Luke Small wrote:
> >> I spoke to my favorite university computer science professor who said
> >> ++n is faster than n++ because the function needs to store the initial
> >> value, increment, then return the stored value in the former case,
> >> while the later merely increments, and returns the value. Apparently,
> >> he is still correct on modern hardware.
> > For decades the ++ and *p could be out of order, in different
> > execution units, writes speculatively queued, assigned to aliased registers,
> > etc, etc, etc.
> > 
> > Geoff Steckel
> 
> Hey Luke,
> 
> I love the passion but try to focus your attention on the fact that their are 
> multiple architectures supported and compiler optimizations are key here.  Go 
> with Marc’s approach using arch/ asm.  Implementations can be made over time 
> for the various arch’s, if such an approach is desirable by the project.  You 
> can pull a well-optimized version based on your code, for your arch, and then 
> slim it down a bunch.
> 
> Cheers,
> Brian
> 
> [Not a project developer.  Just an observer.]
> 
> 

Another data point for consideration: the pdp11 instruction set had
post-increment and pre-decrement indirect memory reference
instructions. If I'm not mistaken, using pre-increment or post
decrement on this architecture would impose a penalty. So your
university computer science professor making such sweeping statements
maybe doesn't deserve to be your favorite.

-Otto



Re: strlcpy version speed tests?

2020-07-04 Thread Brian Brombacher


>> On Jul 1, 2020, at 1:14 PM, gwes  wrote:
>> 
>> On 7/1/20 8:05 AM, Luke Small wrote:
>> I spoke to my favorite university computer science professor who said
>> ++n is faster than n++ because the function needs to store the initial
>> value, increment, then return the stored value in the former case,
>> while the later merely increments, and returns the value. Apparently,
>> he is still correct on modern hardware.
> For decades the ++ and *p could be out of order, in different
> execution units, writes speculatively queued, assigned to aliased registers,
> etc, etc, etc.
> 
> Geoff Steckel

Hey Luke,

I love the passion but try to focus your attention on the fact that their are 
multiple architectures supported and compiler optimizations are key here.  Go 
with Marc’s approach using arch/ asm.  Implementations can be made over time 
for the various arch’s, if such an approach is desirable by the project.  You 
can pull a well-optimized version based on your code, for your arch, and then 
slim it down a bunch.

Cheers,
Brian

[Not a project developer.  Just an observer.]




Re: strlcpy version speed tests?

2020-07-01 Thread gwes

On 7/1/20 8:05 AM, Luke Small wrote:

I spoke to my favorite university computer science professor who said
++n is faster than n++ because the function needs to store the initial
value, increment, then return the stored value in the former case,
while the later merely increments, and returns the value. Apparently,
he is still correct on modern hardware.

For decades the ++ and *p could be out of order, in different
execution units, writes speculatively queued, assigned to aliased registers,
etc, etc, etc.

Geoff Steckel



Re: strlcpy version speed tests?

2020-07-01 Thread Marc Espie
On Wed, Jul 01, 2020 at 07:05:02AM -0500, Luke Small wrote:
> Are you clinging to traditions for some purpose? I gave two different
> versions. strlcpy3 is clearly more easily understood and even slightly
> faster and strlcpy4 which sets up the following workhorse lines which
> through timing the functions is hands down faster on my Xeon chips:
> 
> 
> strlcpy4:
> while (--nleft != 0)
>  if ((*++dst = *++src) == '\0')
> ...
> 
> the others:
> 
> while (--nleft != 0)
>   if ((*dst++ = *src++) == '\0')
> 
> ...
> 
> 
> I spoke to my favorite university computer science professor who said
> ++n is faster than n++ because the function needs to store the initial
> value, increment, then return the
> 
> stored value in the former case,
> 
> while the later merely increments, and returns the value. Apparently,
> he is still correct on modern hardware.

If you really care about speed, you should probably look into an
arch/
asm version instead



Re: strlcpy version speed tests?

2020-07-01 Thread Luke Small
Are you clinging to traditions for some purpose? I gave two different
versions. strlcpy3 is clearly more easily understood and even slightly
faster and strlcpy4 which sets up the following workhorse lines which
through timing the functions is hands down faster on my Xeon chips:


strlcpy4:
while (--nleft != 0)
 if ((*++dst = *++src) == '\0')
...

the others:

while (--nleft != 0)
  if ((*dst++ = *src++) == '\0')

...


I spoke to my favorite university computer science professor who said
++n is faster than n++ because the function needs to store the initial
value, increment, then return the

stored value in the former case,

while the later merely increments, and returns the value. Apparently,
he is still correct on modern hardware.

-- 
-Luke


Re: strlcpy version speed tests?

2020-06-30 Thread Luke Small
I suppose this strlcpy4 without a goto is more elegant
-Luke


On Tue, Jun 30, 2020 at 10:07 PM Luke Small  wrote:

> I made it SUPER easy to test my assertion. The code is there. No
> configuration needed.
>
> On Tue, Jun 30, 2020 at 9:59 PM Theo de Raadt  wrote:
>
>> Luke Small  wrote:
>>
>> > So did you run the program on one of those?
>>
>> Why would I?
>>
>> i see a sales pitch
>>
>> and i go BULLSHIT
>>
>> and I'm done
>>
>> --
> -Luke
>
#include 
#include 
#include 
#include 
#include 
#include 

/* cc strlcpy_test.c -pipe -O2 -o strlcpy_test && ./strlcpy_testfast */

/*
 * Copy string src to buffer dst of size dsize.  At most dsize-1
 * chars will be copied.  Always NUL terminates (unless dsize == 0).
 * Returns strlen(src); if retval >= dsize, truncation occurred.
 */
static size_t
strlcpy0(char *dst, const char *src, size_t dsize)
{
	const char *osrc = src;
	size_t nleft = dsize;

	/* Copy as many bytes as will fit. */
	if (nleft != 0) {
		while (--nleft != 0) {
			if ((*dst++ = *src++) == '\0')
break;
		}
	}

	/* Not enough room in dst, add NUL and traverse rest of src. */
	if (nleft == 0) {
		if (dsize != 0)
			*dst = '\0';		/* NUL-terminate dst */
		while (*src++)
			;
	}

	return(src - osrc - 1);	/* count does not include NUL */
}

static size_t
strlcpy3(char *dst, const char *src, size_t dsize)
{
	const char *osrc = src;
	size_t nleft = dsize;

	if (nleft != 0) {
		/* Copy as many bytes as will fit. */
		while (--nleft != 0)
			if ((*dst++ = *src++) == '\0')
return(src - osrc - 1);
		*dst = '\0';
	}
	
	/* Not enough room in dst, traverse rest of src. */
	while (*src++)
			;

	return(src - osrc - 1);	/* count does not include NUL */
}

static size_t
strlcpy4(char dst[], const char src[], size_t dsize)
{
	const char *osrc = src;
	size_t nleft = dsize;

	if (nleft != 0) {
		if (--nleft == 0) {
			*dst = '\0';	/* NUL-terminate dst */
			if (*src == '\0')
return 0;
		} else {
			/* Copy as many bytes as will fit. */
			if ((*dst = *src) == '\0')
return 0;
			while (--nleft != 0)
if ((*++dst = *++src) == '\0')
	return(src - osrc);
			dst[1] = '\0';	/* NUL-terminate dst */
		}
	} else if (*src == '\0')
		return 0;
	
	/* Not enough room in dst, traverse rest of src. */
	while (*++src)
			;
	

	return(src - osrc);	/* count does not include NUL */
}


int main()
{

	long double cpu_time_used;
	size_t y;
	struct timespec tv_start, tv_end;
	char *buffer, *buffer2;
	
	size_t n = 5;
	size_t m = n + 50;
	
	buffer = malloc(m);
	if (buffer == NULL) err(1, "malloc");
	buffer2 = malloc(n);
	if (buffer2 == NULL) err(1, "malloc");
	
	
	/* no intermediate '\0' */
	for (y = 0; y < m; ++y)
		buffer[y] = arc4random_uniform(255) + 1;
	buffer[m - 1] = '\0';




	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _start);
	strlcpy(buffer2, buffer, n);
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _end);

	cpu_time_used =
	(long double) (tv_end.tv_sec - tv_start.tv_sec) +
	(long double) (tv_end.tv_nsec - tv_start.tv_nsec) /
	(long double) 10;

	printf("\n\nstrlcpy\n");
	printf("time = %.9Lf\n\n\n", cpu_time_used);




	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _start);
	strlcpy0(buffer2, buffer, n);
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _end);

	cpu_time_used =
	(long double) (tv_end.tv_sec - tv_start.tv_sec) +
	(long double) (tv_end.tv_nsec - tv_start.tv_nsec) /
	(long double) 10;

	printf("\n\nstrlcpy0\n");
	printf("time = %.9Lf\n\n\n", cpu_time_used);





	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _start);
	strlcpy3(buffer2, buffer, n);
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _end);

	cpu_time_used =
	(long double) (tv_end.tv_sec - tv_start.tv_sec) +
	(long double) (tv_end.tv_nsec - tv_start.tv_nsec) /
	(long double) 10;

	printf("\n\nstrlcpy3\n");
	printf("time = %.9Lf\n\n\n", cpu_time_used);





	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _start);
	strlcpy4(buffer2, buffer, n);
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _end);

	cpu_time_used =
	(long double) (tv_end.tv_sec - tv_start.tv_sec) +
	(long double) (tv_end.tv_nsec - tv_start.tv_nsec) /
	(long double) 10;

	printf("\n\nstrlcpy4\n");
	printf("time = %.9Lf\n\n\n", cpu_time_used);



	return 0;
}


Re: strlcpy version speed tests?

2020-06-30 Thread Stuart Longland
On 1/7/20 11:18 am, Luke Small wrote:
> I made a couple different versions if anybody is interested!

You don't need our permission…
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.



strlcpy version speed tests?

2020-06-30 Thread Luke Small
I made a couple different versions if anybody is interested!
-Luke
#include 
#include 
#include 
#include 
#include 
#include 

/* cc strlcpy_test.c -pipe -O2 -o strlcpy_test && ./strlcpy_testfast */

/*
 * Copy string src to buffer dst of size dsize.  At most dsize-1
 * chars will be copied.  Always NUL terminates (unless dsize == 0).
 * Returns strlen(src); if retval >= dsize, truncation occurred.
 */
static size_t
strlcpy0(char *dst, const char *src, size_t dsize)
{
	const char *osrc = src;
	size_t nleft = dsize;

	/* Copy as many bytes as will fit. */
	if (nleft != 0) {
		while (--nleft != 0) {
			if ((*dst++ = *src++) == '\0')
break;
		}
	}

	/* Not enough room in dst, add NUL and traverse rest of src. */
	if (nleft == 0) {
		if (dsize != 0)
			*dst = '\0';		/* NUL-terminate dst */
		while (*src++)
			;
	}

	return(src - osrc - 1);	/* count does not include NUL */
}

static size_t
strlcpy3(char *dst, const char *src, size_t dsize)
{
	const char *osrc = src;
	size_t nleft = dsize;

	if (nleft != 0) {
		/* Copy as many bytes as will fit. */
		while (--nleft != 0)
			if ((*dst++ = *src++) == '\0')
return(src - osrc - 1);
		*dst = '\0';
	}
	
	/* Not enough room in dst, traverse rest of src. */
	while (*src++)
			;

	return(src - osrc - 1);	/* count does not include NUL */
}

static size_t
strlcpy4(char dst[], const char src[], size_t dsize)
{
	const char *osrc = src;
	size_t nleft = dsize;

	if (nleft != 0) {
		if (--nleft == 0)
		{
			*dst = '\0';
			if (*src == '\0')
return 0;
			goto strlcpy_jump;
		}
		/* Copy as many bytes as will fit. */
		if ((*dst = *src) == '\0')
			return 0;
		while (--nleft != 0)
			if ((*++dst = *++src) == '\0')
return(src - osrc);
		dst[1] = '\0';	/* NUL-terminate dst */
	} else if (*src == '\0')
		return 0;
	
	strlcpy_jump:
	
	/* Not enough room in dst, traverse rest of src. */
	while (*++src)
			;
	

	return(src - osrc);	/* count does not include NUL */
}


int main()
{

	long double cpu_time_used;
	size_t y;
	struct timespec tv_start, tv_end;
	char *buffer, *buffer2;
	
	size_t n = 5;
	size_t m = n + 500;
	
	buffer = malloc(m);
	if (buffer == NULL) err(1, "malloc");
	buffer2 = malloc(n);
	if (buffer2 == NULL) err(1, "malloc");
	
	
	/* no intermediate '\0' */
	for (y = 0; y < m; ++y)
		buffer[y] = arc4random_uniform(255) + 1;
	buffer[m - 1] = '\0';




	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _start);
	strlcpy(buffer2, buffer, n);
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _end);

	cpu_time_used =
	(long double) (tv_end.tv_sec - tv_start.tv_sec) +
	(long double) (tv_end.tv_nsec - tv_start.tv_nsec) /
	(long double) 10;

	printf("\n\nstrlcpy\n");
	printf("time = %.9Lf\n\n\n", cpu_time_used);




	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _start);
	strlcpy0(buffer2, buffer, n);
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _end);

	cpu_time_used =
	(long double) (tv_end.tv_sec - tv_start.tv_sec) +
	(long double) (tv_end.tv_nsec - tv_start.tv_nsec) /
	(long double) 10;

	printf("\n\nstrlcpy0\n");
	printf("time = %.9Lf\n\n\n", cpu_time_used);





	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _start);
	strlcpy3(buffer2, buffer, n);
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _end);

	cpu_time_used =
	(long double) (tv_end.tv_sec - tv_start.tv_sec) +
	(long double) (tv_end.tv_nsec - tv_start.tv_nsec) /
	(long double) 10;

	printf("\n\nstrlcpy3 \n");
	printf("time = %.9Lf\n\n\n", cpu_time_used);





	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _start);
	strlcpy4(buffer2, buffer, n);
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, _end);

	cpu_time_used =
	(long double) (tv_end.tv_sec - tv_start.tv_sec) +
	(long double) (tv_end.tv_nsec - tv_start.tv_nsec) /
	(long double) 10;

	printf("\n\nstrlcpy4 \n");
	printf("time = %.9Lf\n\n\n", cpu_time_used);



	return 0;
}