Re: [HACKERS] pg_stat_statements with query tree based normalization

2011-12-10 Thread Peter Geoghegan
On 10 December 2011 13:56, Greg Smith g...@2ndquadrant.com wrote:
 I heard about some bitrot creeping in here too, but it seems gone now; I had
 no problem merging Peter's development branch against master.  I've attached
 a newer patch of the main code, which fixes most of the earlier issues there
 were disclaimers about.

I'm aware of some further bugs in the patch that Greg posted regarding
the synchronisation of executor and planner plugins, so please bear
with me while I squash them.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements with query tree based normalization

2011-12-07 Thread Marti Raudsepp
On Wed, Dec 7, 2011 at 03:19, Peter Geoghegan pe...@2ndquadrant.com wrote:
 The results are...taking the median value of each set of runs as
 representative, my patch appears to run marginally faster than head.
 Of course, there is no reason to believe that it should, and I'm
 certain that the difference can be explained by noise, even though
 I've naturally strived to minimise noise.

You should use the t-test to distinguish whether two data sets show a
consistent difference or whether it's just noise. Excel/OpenOffice
have the TTEST() macro for this purpose. For statistics doofuses like
me, just pick mode=2 and type=3 as that's the most conservative.

If the TTEST result is less than 0.05 then you have 95% certainty that
the two dataset are consistently different. If not, you need more
consistent data.

More information here:
http://www.gifted.uconn.edu/siegle/research/t-test/t-test.html

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements with query tree based normalization

2011-12-06 Thread Peter Geoghegan
On 14 November 2011 04:42, Greg Smith g...@2ndquadrant.com wrote:
 The approach Peter used adds a single integer to the Const structure in
 order to have enough information to substitute ? in place of those.
  Adding and maintaining that is the only change outside of the extension
 made here, and that overhead is paid by everyone--not just consumers of this
 new code.

I've attempted to isolate that overhead, so far unsuccessfully. Attached are:

1. A simple python + psycopg2 script for repeatedly running a
succession of similar queries that explain would show as containing a
single Result node.  They contain 300 Const integer nodes by
default, which are simply selected.

2. The results of running the script on Greg's server, which has CPU
frequency scaling disabled. That's an ODS spreadsheet. Out of
consideration of filesize, I've deleted the query column in each
sheet, which wasn't actually useful information.

The results are...taking the median value of each set of runs as
representative, my patch appears to run marginally faster than head.
Of course, there is no reason to believe that it should, and I'm
certain that the difference can be explained by noise, even though
I've naturally strived to minimise noise.

If someone could suggest a more telling test case, or even a
worst-case, that would be useful. This was just my first run at this.
I know that the overhead will also exist in code not well-exercised by
these queries, but I imagine that any real-world query that attempts
to exercise them all is going to add other costs that dwarf the
additional overhead and further muddy the waters.

I intend to work through the known issues with this patch in the next
couple of days.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
#!/usr/bin/env python
# Performance test for pg_stat_statements normalization
# Peter Geoghegan

# This test is intended to isolate the overhead of additional
# infrastructure in the grammar and plan tree, which is used to 
# add a field to Const nodes so that the corresponding lexeme's
# length is known from the query tree.

# It is intended to be run without pg_stat_statements, because
# the really pertinent issue is if these changes impose any
# noticeable overhead on Postgres users that don't use
# pg_stat_statements, and the degree thereof.

# For best results, run with CPU frequency scaling disabled.

import psycopg2
import random 
import time
import csv
from optparse import OptionParser

# store results of a given run in a dedicated csv file
def serialize_to_file(times, filename):
	wrt = csv.writer(open(filename, 'wb'), delimiter=',')

	# mark median run for runs of this 
	# query (or if there is an even number of elements, near enough)
	median_i = (len(times) + 1) / 2 - 1 

	for i, v in enumerate(times):
		wrt.writerow([ v[0], time.ctime(v[1]), str(v[2]) +  seconds, '*' if i == median_i else 'n'])

def run_test(conn, num_its, num_const_nodes):
	# A very unsympathetic query here is one with lots of
	# Const nodes that explain shows as a single Result node.

	# This is because parsing has a large overhead
	# relative to planning and execution, and there is an unusually 
	# high number of Const nodes.

	# Use psuedo-random numbers with a consistent seed value - numbers 
	# used are deterministic for absolute consistency, though I don't 
	# believe that to be significant, at least for now.
	random.seed(55)

	cur = conn.cursor()

	times = []
	for i in range(0, num_its):
		# Generate new query with psuedo-random integer Const nodes
		qry = select 
		for i in range(0, num_const_nodes):
			n = random.randint(0, 1000)
			qry += str(n) + (,  if i != num_const_nodes - 1 else ;)
		begin = time.time()
		cur.execute(qry)
		end = time.time()

		elapsed = end - begin
		times.append((qry, begin, elapsed))
	
	
	# Sort values for reference, and to locate the median value	
	sort_vals = sorted(times, key=lambda tot_time: tot_time[2])
	serialize_to_file(sort_vals, test_results.csv)

def main():
	parser = OptionParser(description=)
	parser.add_option('-c', '--conninfo', type=str, help=libpq-style connection info string of database to connect to. 
		Can be omitted, in which case we get details from our environment. 
		You'll probably want to put this in double-quotes, like this: --conninfo \hostaddr=127.0.0.1 port=5432 dbname=postgres user=postgres\. , default=)
	parser.add_option('-n', '--num_its', type=int, help=Number of iterations (times a query is executed), default=5000)
	parser.add_option('-s', '--num_const_nodes', type=int, help=Number of Const nodes that each query contains, default=300)

	args = parser.parse_args()[0]
	conn_str = args.conninfo
	num_its = args.num_its 
	num_const_nodes = args.num_const_nodes

	conn = psycopg2.connect(conn_str)
	run_test(conn, num_its, num_const_nodes)

if __name__==__main__:
	main()



field_addition_results.ods
Description: application/vnd.oasis.opendocument.spreadsheet

--