Re: [HACKERS] KNN-GiST with recheck

Alexander Korotkov Tue, 17 Feb 2015 05:23:46 -0800

On Sun, Feb 15, 2015 at 2:08 PM, Alexander Korotkov <[email protected]>
wrote:


> Following changes has been made in attached patch:
>
>  * Get sort operators from pathkeys.
>  * Recheck argument of distance function has been reverted.
>

Few comments were added and pairing heap comparison function was fixed in
attached version of patch (knn-gist-recheck-6.patch).
Also I expected that reordering in executor would be slower than reordering
in GiST because of maintaining two heaps instead of one. I've revised
version of patch with reordering in GiST to use pairing heap. I compare two
types of reordering on 10^7 random points and polygons. Results are below.
Test shows that overhead of reordering in executor is insignificant (less
than statistical error).

             Reorder in GiST   Reorder in executor
points
limit=10     0.10615           0.0880125
limit=100    0.23666875        0.2292375
limit=1000   1.51486875        1.5208375
polygons
limit=10     0.11650625        0.1347
limit=100    0.46279375        0.45294375
limit=1000   3.5170125         3.54868125

Revised patch with reordering in GiST is attached
(knn-gist-recheck-in-gist.patch) as well as testing script (test.py).

------
With best regards,
Alexander Korotkov.

knn-gist-recheck-6.patch
Description: Binary data

knn-gist-recheck-in-gist.patch
Description: Binary data

#!/usr/bin/env python
import psycopg2
import json
import sys

# Create test tables with following statements.
#
# create table p as (select point(random(), random()) from generate_series(1,10000000));
# create index p_idx on p using gist(v);
# create table g as (select polygon(3 + (random()*5)::int, circle(point(random(), random()), 0.001)) v from generate_series(1,10000000));
# create index g_idx on g using gist(v);

dbconn = psycopg2.connect("dbname='postgres' user='smagen' host='/tmp' password='' port=5431")

points = []
pointsCount = 16
tableName = None

def generatePoints(n):
	global points
	m = 1
	d = 0.5
	points.append((0, 0))
	while m <= n:
		for i in range(0, m):
			points.append((points[i][0] + d, points[i][1]    ))
			points.append((points[i][0]    , points[i][1] + d))
			points.append((points[i][0] + d, points[i][1] + d))
		d /= 2.0
		m *= 4

generatePoints(pointsCount)

def runKnn(point, limit):
	cursor = dbconn.cursor()
	cursor.execute("EXPLAIN (ANALYZE, FORMAT JSON) SELECT * FROM " + tableName + " ORDER BY v <-> %s::point LIMIT %s;", (point, limit))
	plan = cursor.fetchone()[0][0]
	cursor.close()
	return (plan['Planning Time'], plan['Execution Time'])

def makeTests(n, limit):
	planningTime = 0
	executionTime = 0
	for i in range(0, n):
		for j in range(0, pointsCount):
			point = '(' + str(points[j][0]) + ',' + str(points[j][1]) + ')'
			result = runKnn(point, limit)
			planningTime += result[0]
			executionTime += result[1]
	planningTime /= n * pointsCount
	executionTime /= n * pointsCount
	return (planningTime, executionTime)

if (len(sys.argv) < 2):
	print "Usage: %s table_name" % sys.argv[0]
	sys.exit(2)

tableName = sys.argv[1]

for limit in [10, 100, 1000]:
	result = makeTests(10, limit)
	print "limit: %s\nplanning:  %s\nexecution: %s" % (limit, result[0], result[1])

dbconn.close()

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] KNN-GiST with recheck

Reply via email to