[PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Tambet Matiisen

I was following the cpu_tuple_cost thread and wondering, if it could be
possible to make PQA style utility to calculate configuration-specific
values for planner cost constants. It could make use of output of
log_(statement|parser|planner|executor)_stats, tough I'm not sure if the
output contains anything useful for those purposes. 

Otherwise it could just collect statements, run EXPLAIN ANALYZE for all
of them and then play with planner cost constants to get the estimated
values as close as possible to actual values. Something like Goal Seek
in Excel, if you pardon my reference to MS :).

Somewhat similar project seems to be pgautotune from pgFoundry, but it
only considers shared_buffers, sort_mem and vacuum_mem. And it seems to
use synthetic data instead of actual database and actual statements from
log. And it has been inactive for a while.

  Tambet

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] column name is LIMIT

2005-03-21 Thread Jan Wieck
On 3/14/2005 4:26 AM, Qingqing Zhou wrote:
So is it to make SQL parser context-sensitive - say the parser will
understand that in statement SELECT * from LIMIT, LIMIT is just a table
name, instead of keyword?
More or less, yes. To use a reserved keyword as an identifier (table or 
column name), it must be enclosed in double quotes. Double quotes are 
also used to make identifiers case sensitive. So

select someval, SOMEVAL, someVAL from user;
is a valid query retrieving 3 distinct columns from the table user. 
There is a builtin function quote_ident() in PostgreSQL that is supposed 
to return a properly quoted string allowed as an identifier for whatever 
name is passed in. But it fails to do so for all lower case names that 
are reserved keywords.

The queries Slony executes on the replicas are constructed using that 
quoting function, and therefore Slony fails to build valid SQL for 
replicated table containing reserved keyword identifiers. One solution 
would be to brute-force quote all identifiers in Slony ... not sure what 
the side effects performance wise would be.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] column name is LIMIT

2005-03-21 Thread Tom Lane
Jan Wieck [EMAIL PROTECTED] writes:
 There is a builtin function quote_ident() in PostgreSQL that is supposed 
 to return a properly quoted string allowed as an identifier for whatever 
 name is passed in. But it fails to do so for all lower case names that 
 are reserved keywords.

Not any more ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Josh Berkus
Tambet,

 I was following the cpu_tuple_cost thread and wondering, if it could be
 possible to make PQA style utility to calculate configuration-specific
 values for planner cost constants. It could make use of output of
 log_(statement|parser|planner|executor)_stats, tough I'm not sure if the
 output contains anything useful for those purposes.

Yeah, that's something I need to look at.

 Otherwise it could just collect statements, run EXPLAIN ANALYZE for all
 of them and then play with planner cost constants to get the estimated
 values as close as possible to actual values. Something like Goal Seek
 in Excel, if you pardon my reference to MS :).

That's not really practical.   There are currently 5 major query tuning 
parameters, not counting the memory adjustments which really can't be left 
out.  You can't realistically test all combinations of 6 variables.

 Somewhat similar project seems to be pgautotune from pgFoundry, but it
 only considers shared_buffers, sort_mem and vacuum_mem. And it seems to
 use synthetic data instead of actual database and actual statements from
 log. And it has been inactive for a while.

Yeah, pg_autotune is a dead project.   Once we got OSDL able to run tests, we 
came up with some rules-of-thumb which are more useful than autotune's 
output.  More importantly, the approach doesn't scale to the 15-20 GUCs which 
we'd realistically want to test.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Josh Berkus
Stacy,

 Luckily they that had the chance to work with a truly fantastic DBA (the
 author of an Oracle Press performance tuning book even) before they could
 switch back.  He convinced them to make some of their indexes global.
 Performance dramatically improved (compared with both the unpartitioned
 schema, and the partitioned-and-locally-indexed schema), and they've since
 stayed with partitioned tables and a mix of local and global indexes.

Hmmm.  Wouldn't Greg's suggestion of a bitmap index which holds information on 
what values are found in what partition also solve this?Without 1/2 of 
the overhead imposed by global indexes?

I can actually see such a bitmap as being universally useful to the 
partitioning concept ... for one, it would resolve the whole partition on 
{value} issue.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Thomas F.O'Connell
If by not practical you mean, no one has implemented a multivariable 
testing approach, I'll agree with you. But multivariable testing is 
definitely a valid statistical approach to solving just such problems.

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Mar 21, 2005, at 11:51 AM, Josh Berkus wrote:
That's not really practical.   There are currently 5 major query tuning
parameters, not counting the memory adjustments which really can't be 
left
out.  You can't realistically test all combinations of 6 variables.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Jim C. Nasby
On Mon, Mar 21, 2005 at 09:55:03AM -0800, Josh Berkus wrote:
 Stacy,
 
  Luckily they that had the chance to work with a truly fantastic DBA (the
  author of an Oracle Press performance tuning book even) before they could
  switch back.  He convinced them to make some of their indexes global.
  Performance dramatically improved (compared with both the unpartitioned
  schema, and the partitioned-and-locally-indexed schema), and they've since
  stayed with partitioned tables and a mix of local and global indexes.
 
 Hmmm.  Wouldn't Greg's suggestion of a bitmap index which holds information 
 on 
 what values are found in what partition also solve this?Without 1/2 of 
 the overhead imposed by global indexes?
 
 I can actually see such a bitmap as being universally useful to the 
 partitioning concept ... for one, it would resolve the whole partition on 
 {value} issue.

I suspect both will have their uses. I've read quite a bit about global
v. local indexs in Oracle, and there are definately cases where global
is much better than local. Granted, there's some things with how Oracle
handles their catalog, etc. that might make local indexes more expensive
for them than they would be for PostgreSQL. It's also not clear how much
a 'partition bitmap' index would help.

As for the 'seqscan individual partitions' argument, that's not going to
work well at all for a case where you need to hit a relatively small
percentage of rows in a relatively large number of partitions. SELECT
... WHERE customer_id = 1 would be a good example of such a query
(assuming the table is partitioned on something like invoice_date).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-21 Thread Jim C. Nasby
On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  We probably also need multi-table indexes.
 
 As Josh says, that seems antithetical to the main point of partitioning,
 which is to be able to rapidly remove (and add) partitions of a table.
 If you have to do index cleaning before you can drop a partition, what's
 the point of partitioning?

Why would you need to do index cleaning first? Presumably the code that
goes to check a heap tuple that an index pointed at to ensure that it
was visible in the current transaction would be able to recognize if the
partition that tuple was in had been removed, and just ignore that index
entry. Granted, you'd need to clean the index up at some point
(presumably via vacuum), but it doesn't need to occur at partition drop
time.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Josh Berkus
Thomas,

 If by not practical you mean, no one has implemented a multivariable
 testing approach, I'll agree with you. But multivariable testing is
 definitely a valid statistical approach to solving just such problems.

Well, not practical as in:  would take either $10 million in equipment or 
10,000 hours or both

--Josh

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] What about utility to calculate planner cost constants?

2005-03-21 Thread Greg Stark
Josh Berkus josh@agliodbs.com writes:

  Otherwise it could just collect statements, run EXPLAIN ANALYZE for all
  of them and then play with planner cost constants to get the estimated
  values as close as possible to actual values. Something like Goal Seek
  in Excel, if you pardon my reference to MS :).
 
 That's not really practical.   There are currently 5 major query tuning 
 parameters, not counting the memory adjustments which really can't be left 
 out.  You can't realistically test all combinations of 6 variables.

I don't think it would be very hard at all actually.

It's just a linear algebra problem with a bunch of independent variables and a
system of equations. Solving for values for all of them is a straightforward
problem.

Of course in reality these variables aren't actually independent because the
costing model isn't perfect. But that wouldn't be a problem, it would just
reduce the accuracy of the results.

What's needed is for the explain plan to total up the costing penalties
independently. So the result would be something like

1000 * random_page_cost + 101 * sequential_page_cost + 2000 * index_tuple_cost
+ ...

In other words a tuple like 1000,101,2000,...

And explain analyze would produce the above tuple along with the resulting
time.

Some program would have to gather these values from the log or stats data and
gather them up into a large linear system and solve for values that minimize
the divergence from the observed times.



(costs penalties are currently normalized to sequential_page_cost being 1.
That could be maintained, or it could be changed to be normalized to an
expected 1ms.)

(Also, currently explain analyze has overhead that makes this impractical.
Ideally it could subtract out its overhead so the solutions would be accurate
enough to be useful)

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match