Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

Robert Haas Thu, 12 Jun 2014 09:26:18 -0700

On Thu, Jun 5, 2014 at 8:37 PM, Peter Geoghegan <p...@heroku.com> wrote:
> * A configure AC_TRY_RUN tests the suitability of the system strxfrm()
> implementation for the purposes of this optimization. There can be no
> "header bytes" (people at pgCon reported redundant bytes on the Mac
> OSX implementation at pgCon, to my great surprise), ...


I'm attaching a small test program demonstrating the Mac behavior.
Here's some sample output:

[rhaas ~]$ ./strxfrm en_US "" a aa ab abc abcd abcde peter Geoghegan _ %
"" -> ""
"a" -> "001S0000001S"
"aa" -> "001S001S0000001S001S"
"ab" -> "001S001T0000001S001T"
"abc" -> "001S001T001U0000001S001T001U"
"abcd" -> "001S001T001U001V0000001S001T001U001V"
"abcde" -> "001S001T001U001V001W0000001S001T001U001V001W"
"peter" -> "001b001W001f001W001d0000001b001W001f001W001d"
"Geoghegan" -> 
"0019001W001a001Y001Z001W001Y001S001`00000019001W001a001Y001Z001W001Y001S001`"
"_" -> "001Q0000001Q"
"%" -> "000W0000000W"

It appears that any string starting with the letter "a" will create
output that begins with 001S00 and the seventh character always
appears to be 0 or 1:

[rhaas ~]$ ./strxfrm en_US ab ac ad ae af a% a0 "a "
"ab" -> "001S001T0000001S001T"
"ac" -> "001S001U0000001S001U"
"ad" -> "001S001V0000001S001V"
"ae" -> "001S001W0000001S001W"
"af" -> "001S001X0000001S001X"
"a%" -> "001S000W0000001S000W"
"a0" -> "001S000b0000001S000b"
"a " -> "001S000R0000001S000R"

Also, the total number of bytes produced is apparently 8N+4, where N
is the length of the input string.  On a Linux system I tested, the
output included non-printable characters, so I adapted the test
program to print the results in hex.  Attaching that version, too.
Here, the length was 3N+2, except for 0-length strings which produce a
0-length result:

[rhaas@hydra ~]$ ./strxfrm-binary en_US "" a aa ab abc abcd abcde
peter Geoghegan _ %
"" ->  (0 bytes)
"a" -> 0c01020102 (5 bytes)
"aa" -> 0c0c010202010202 (8 bytes)
"ab" -> 0c0d010202010202 (8 bytes)
"abc" -> 0c0d0e0102020201020202 (11 bytes)
"abcd" -> 0c0d0e0f01020202020102020202 (14 bytes)
"abcde" -> 0c0d0e0f10010202020202010202020202 (17 bytes)
"peter" -> 1b101f101d010202020202010202020202 (17 bytes)
"Geoghegan" -> 12101a121310120c190102020202020202020201040202020202020202
(29 bytes)
"_" -> 0101010112 (5 bytes)
"%" -> 0101010139 (5 bytes)
[rhaas@hydra ~]$ ./strxfrm-binary en_US ab ac ad ae af a% a0 "a "
"ab" -> 0c0d010202010202 (8 bytes)
"ac" -> 0c0e010202010202 (8 bytes)
"ad" -> 0c0f010202010202 (8 bytes)
"ae" -> 0c10010202010202 (8 bytes)
"af" -> 0c11010202010202 (8 bytes)
"a%" -> 0c01020102010239 (8 bytes)
"a0" -> 0c02010202010202 (8 bytes)
"a " -> 0c01020102010211 (8 bytes)

Even though each input bytes is generating 3 output bytes, it's not
generating a group of output bytes for each input byte as appears to
be happening on MacOS X.  If it were doing that, then truncating the
blob to 8 bytes would only compare the first 2-3 bytes of the input
string.  In fact we do better.  If we change even the 8th letter in
the string to some other letter, the 8th output byte changes:

[rhaas@hydra ~]$ ./strxfrm-binary en_US aaaaaaaa aaaaaaab
"aaaaaaaa" -> 0c0c0c0c0c0c0c0c010202020202020202010202020202020202 (26 bytes)
"aaaaaaab" -> 0c0c0c0c0c0c0c0d010202020202020202010202020202020202 (26 bytes)

If we change the capitalization of the eighth byte, then the change
happens further out:

[rhaas@hydra ~]$ ./strxfrm-binary en_US aaaaaaaa aaaaaaaA
"aaaaaaaa" -> 0c0c0c0c0c0c0c0c010202020202020202010202020202020202 (26 bytes)
"aaaaaaaA" -> 0c0c0c0c0c0c0c0c010202020202020202010202020202020204 (26 bytes)

Still, it's fair to say that on this Linux system, the first 8 bytes
capture a significant portion of the entropy of the first 8 bytes of
the string, whereas on MacOS X you only get entropy from the first 2
bytes of the string.  It would be interesting to see results from
other platforms people might care about also.

> The cost of adding all of these ameliorating measures appears to be
> very low. We're so bottlenecked on memory bandwidth that the
> fixed-size overhead of maintaining poor man cardinality, and the extra
> cycles from hashing n keys for the benefit of HyperLogLog just don't
> seem to matter next to the big savings for n log n comparisons. The
> best case performance is seemingly just as good as before (about a
> 200% improvement in transaction throughput for one client, but closer
> to a 250% improvement with many clients and/or expensive collations),
> while the worst case is much much better.

I haven't looked at the patch yet, but this sounds promising.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <locale.h>
#include <string.h>

int
main(int argc, char **argv)
{
	char *buffer = NULL;
	size_t buflen = 0;

	if (argc < 3)
	{
		fprintf(stderr, "Usage: strxfrm {collation} {string}...\n");
		exit(1);
	}
	if (setlocale(LC_COLLATE, argv[1]) == NULL)
	{
		fprintf(stderr, "setlocale: %s\n", strerror(errno));
		exit(1);
	}
	argv += 2;

	while (*argv)
	{
		size_t r;

		r = strxfrm(buffer, *argv, buflen);
		if (r > buflen)
		{
			if (buffer != NULL)
				free(buffer);
			buflen = r + 1;
			buffer = malloc(buflen);
			if (buffer == NULL)
			{
				fprintf(stderr, "out of memory\n");
				exit(1);
			}
			continue;
		}
		printf("\"%s\" -> \"%s\"\n", *argv, buffer);
		++argv;
	}
	exit(0);
}

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <locale.h>
#include <string.h>

int
main(int argc, char **argv)
{
	char *buffer = NULL;
	size_t buflen = 0;
	char *extrabuf = NULL;
	size_t extralen = 0;
	const char *hexdigit = "0123456789abcdef";

	if (argc < 3)
	{
		fprintf(stderr, "Usage: strxfrm {collation} {string}...\n");
		exit(1);
	}
	if (setlocale(LC_COLLATE, argv[1]) == NULL)
	{
		fprintf(stderr, "setlocale: %s\n", strerror(errno));
		exit(1);
	}
	argv += 2;

	while (*argv)
	{
		size_t r;
		size_t k;

		r = strxfrm(buffer, *argv, buflen);
		if (r > buflen)
		{
			if (buffer != NULL)
				free(buffer);
			buflen = r + 1;
			buffer = malloc(buflen);
			if (buffer == NULL)
			{
				fprintf(stderr, "out of memory\n");
				exit(1);
			}
			continue;
		}
		if (extralen < 2 * r + 1)
		{
			if (extrabuf != NULL)
				free(extrabuf);
			extralen = 2 * r + 1;
			extrabuf = malloc(extralen);
			if (extrabuf == NULL)
			{
				fprintf(stderr, "out of memory\n");
				exit(1);
			}
		}
		for (k = 0; k < r; ++k)
		{
			unsigned char	c = buffer[k];
			extrabuf[2 * k] = hexdigit[c >> 4];
			extrabuf[2 * k + 1] = hexdigit[c & 15];
		}
		extrabuf[2 * k] = '\0';
		printf("\"%s\" -> %s (%d bytes)\n", *argv, extrabuf, r);
		++argv;
	}
	exit(0);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

Reply via email to