On Sun, Feb 15, 2015 at 2:08 PM, Alexander Korotkov <[email protected]> wrote:
> Following changes has been made in attached patch:
>
> * Get sort operators from pathkeys.
> * Recheck argument of distance function has been reverted.
>
Few comments were added and pairing heap comparison function was fixed in
attached version of patch (knn-gist-recheck-6.patch).
Also I expected that reordering in executor would be slower than reordering
in GiST because of maintaining two heaps instead of one. I've revised
version of patch with reordering in GiST to use pairing heap. I compare two
types of reordering on 10^7 random points and polygons. Results are below.
Test shows that overhead of reordering in executor is insignificant (less
than statistical error).
Reorder in GiST Reorder in executor
points
limit=10 0.10615 0.0880125
limit=100 0.23666875 0.2292375
limit=1000 1.51486875 1.5208375
polygons
limit=10 0.11650625 0.1347
limit=100 0.46279375 0.45294375
limit=1000 3.5170125 3.54868125
Revised patch with reordering in GiST is attached
(knn-gist-recheck-in-gist.patch) as well as testing script (test.py).
------
With best regards,
Alexander Korotkov.
knn-gist-recheck-6.patch
Description: Binary data
knn-gist-recheck-in-gist.patch
Description: Binary data
#!/usr/bin/env python
import psycopg2
import json
import sys
# Create test tables with following statements.
#
# create table p as (select point(random(), random()) from generate_series(1,10000000));
# create index p_idx on p using gist(v);
# create table g as (select polygon(3 + (random()*5)::int, circle(point(random(), random()), 0.001)) v from generate_series(1,10000000));
# create index g_idx on g using gist(v);
dbconn = psycopg2.connect("dbname='postgres' user='smagen' host='/tmp' password='' port=5431")
points = []
pointsCount = 16
tableName = None
def generatePoints(n):
global points
m = 1
d = 0.5
points.append((0, 0))
while m <= n:
for i in range(0, m):
points.append((points[i][0] + d, points[i][1] ))
points.append((points[i][0] , points[i][1] + d))
points.append((points[i][0] + d, points[i][1] + d))
d /= 2.0
m *= 4
generatePoints(pointsCount)
def runKnn(point, limit):
cursor = dbconn.cursor()
cursor.execute("EXPLAIN (ANALYZE, FORMAT JSON) SELECT * FROM " + tableName + " ORDER BY v <-> %s::point LIMIT %s;", (point, limit))
plan = cursor.fetchone()[0][0]
cursor.close()
return (plan['Planning Time'], plan['Execution Time'])
def makeTests(n, limit):
planningTime = 0
executionTime = 0
for i in range(0, n):
for j in range(0, pointsCount):
point = '(' + str(points[j][0]) + ',' + str(points[j][1]) + ')'
result = runKnn(point, limit)
planningTime += result[0]
executionTime += result[1]
planningTime /= n * pointsCount
executionTime /= n * pointsCount
return (planningTime, executionTime)
if (len(sys.argv) < 2):
print "Usage: %s table_name" % sys.argv[0]
sys.exit(2)
tableName = sys.argv[1]
for limit in [10, 100, 1000]:
result = makeTests(10, limit)
print "limit: %s\nplanning: %s\nexecution: %s" % (limit, result[0], result[1])
dbconn.close()
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
