Re: [PERFORM] Using LIMIT 1 in plpgsql PERFORM statements

2005-10-23 Thread Karl O. Pinc


On 10/23/2005 04:02:35 PM, Josh Berkus wrote:


I'm wondering in what context it makes sense to call PERFORM on a
constant.


I like to write PERFORMs that return a constant when
selecting from a table.  It emphasizes that the
selection is being done for its side effects.

(Programs should be written for people to read
and only incidentally for computers to execute.
Programs that people can't read quickly
become useless whereas programs that can't run
quickly can be fixed.  Computers are easy.
People are difficult.)

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Using LIMIT 1 in plpgsql PERFORM statements

2005-10-23 Thread Karl O. Pinc


On 10/23/2005 04:02:35 PM, Josh Berkus wrote:

Karl,

> PERFORM 1 FROM foo;
> IF FOUND THEN ...
>
> is any slower than:
>
> PERFORM 1 FROM foo LIMIT 1;
> IF FOUND THEN ...

I'm wondering in what context it makes sense to call PERFORM on a
constant.


If you want to find out if the table has any rows.
I'm really interested in what happens when
there's a WHERE qualification.  I want to find
out if there's any of some particular sort of row.
But I figured it wasn't worth putting that into
the example because I didn't have anything
specific to put in the WHERE clause.  I suppose
I should have put it in anyway and followed with 

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Need help in setting optimal configuration for a huge

2005-10-22 Thread Karl O. Pinc


On 10/22/2005 04:15:25 PM, Kishore B wrote:


Can any you please suggest the best configuration to satisfy the above
requirements?


You've increased shared memory buffers, told the kernel
to allow more shared memory (lots), and otherwise increased
the parameters associated with memory?

If so you might want to post the relevant configs
here.

If the basic tuning does not help enough you may
want to upgrade to 8.0 as it has significant
performance improvements.

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Using LIMIT 1 in plpgsql PERFORM statements

2005-10-21 Thread Karl O. Pinc

Hi,

I'm wondering if the plpgsql code:

PERFORM 1 FROM foo;
IF FOUND THEN ...

is any slower than:

PERFORM 1 FROM foo LIMIT 1;
IF FOUND THEN ...

Seems like it _could_ be smart enough to know that

1) It's selecting from a real table and not a function

2) GET DIAGNOSTICS is not used

and therefore it does not have to do more than set
FOUND, and need find only one row/plan the query
to find only one row.  I'm particularly interested
in the query plan optimization aspect.

Would it be considered poor practice to rely on
such an optimization?

Thanks.

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Poor index choice -- multiple indexes of the same

2005-06-28 Thread Karl O. Pinc


On 06/28/2005 01:40:56 AM, Tom Lane wrote:

"Karl O. Pinc" <[EMAIL PROTECTED]> writes:
> I have a query

> select 1
>   from census
>   where date < '1975-9-21' and sname = 'RAD' and status != 'A'
>   limit 1;

> Explain analyze says it always uses the index made by:

>CREATE INDEX census_date_sname ON census (date, sname);

> this is even after I made the index:

>CREATE INDEX census_sname_date ON census (sname, date);

I don't believe that any existing release can tell the difference
between these two indexes as far as costs go.  I just recently
added some code to btcostestimate that would cause it to prefer
the index on (sname, date) but of course that's not released yet.

However: isn't the above query pretty seriously underspecified?
With a LIMIT and no ORDER BY, you are asking for a random one
of the rows matching the condition.  I realize that with
"select 1" you may not care much, but adding a suitable ORDER BY
would help push the planner towards using the right index.  In
this case "ORDER BY sname DESC, date DESC" would probably do the
trick.


Yes, that works.  I'd already tried "ORDER BY date DESC", before
I first wrote, and that did not work.  (I started with no LIMIT
either, and tried adding specifications until I gave up.  It's
very good that the new planner will figure out things by itself.)
"ORDER BY sname DESC" works as well.  This is a
bit odd, as with the constant in the = comparison "ORDER BY date
DESC" is the same as "ORDER BY sname DESC, date DESC".
I guess that's why I gave up on my attempts to get the planner
to use the (sname, date) index before I got to your solution.

Thanks everybody for the help.


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Poor index choice -- multiple indexes of the same

2005-06-27 Thread Karl O. Pinc


On 06/27/2005 09:36:51 PM, Karl O. Pinc wrote:


I'm doing this in a torture test script, loading data.
Every fibnocci number of rows * 100 I VACCUM ANALYZE.
So, 100, 200, 300, 500, 800, etc.


(And of course disconnect my client and re-connect so
as to use the new statistics.  sure would be nice if
I didn't have to do this.)

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Karl O. Pinc


On 06/27/2005 08:34:19 PM, Michael Fuhr wrote:

On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote:
> On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:
>
> >See timeofday().
>
> That only gives you the time at the start of the transaction,
> so you get no indication of how long anything in the
> transaction takes.

Did you read the documentation or try it?  Perhaps you're thinking
of now(), current_timestamp, and friends, which don't advance during
a transaction; but as the documentation states, "timeofday() returns
the wall-clock time and does advance during transactions."


Very sorry.  I did not read through the complete documentation.


I just ran tests on versions of PostgreSQL going back to 7.2.8 and
in all of them timeofday() advanced during a transaction.


For all your work a documentation patch is appended that
I think is easier to read and might avoid this problem
in the future.  If you don't read all the way through the
current cvs version then you might think, as I did,
that timeofday() is a CURRENT_TIMESTAMP related function.

Sorry, but 3 lines wrap in the patch
in my email client.  :(


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


--- func.sgml   2005-06-26 17:05:35.0 -0500
+++ func.sgml.new   2005-06-27 21:51:05.301097896 -0500
@@ -5787,15 +5787,6 @@



-There is also the function timeofday(), which  
for historical
-reasons returns a text string rather than a  
timestamp value:

-
-SELECT timeofday();
-Result: Sat Feb 17  
19:07:32.000126 2001 EST

-
-   
-
-   
 It is important to know that
 CURRENT_TIMESTAMP and related functions  
return

 the start time of the current transaction; their values do not
@@ -5803,8 +5794,7 @@
 the intent is to allow a single transaction to have a consistent
 notion of the current time, so that multiple
 modifications within the same transaction bear the same
-time stamp. timeofday()
-returns the wall-clock time and does advance during transactions.
+time stamp.



@@ -5815,6 +5805,18 @@



+There is also the function timeofday() which
+returns the wall-clock time and advances during transactions.  For
+historical reasons timeofday() returns a
+text string rather than a timestamp
+value:
+
+SELECT timeofday();
+Result: Sat Feb 17  
19:07:32.000126 2001 EST

+
+   
+
+   
 All the date/time data types also accept the special literal value
 now to specify the current date and time.   
Thus,

 the following three all return the same result:


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Poor index choice -- multiple indexes of the same

2005-06-27 Thread Karl O. Pinc


On 06/27/2005 05:37:41 PM, Josh Berkus wrote:

Karl,

> Seems to me that when there's a constant value in the query
> and an = comparision it will always be faster to use the (b-tree)
> index that's ordered first by the constant value, as then all
further
> blocks are guarenteed to have a higher relevant information
> density.  At least when compared with another index that has the
> same columns in it.

That really depends on the stats.   Such a choice would *not* be
appropriate if the < comparison was expected to return 1- rows while
the =
condition applied to 15% of the table.


We're talking internals here so I don't know what I'm talking
about, but, when the = comparison returns 15% of the table
you can find your way straight to the 1- (sic) relevent rows
because that 15% is further sorted by the second column of the
index.  So that's one disk read and after that when you scan
the rest of the blocks every datum is relevant/returned.
So your scan will pass through fewer disk blocks.  The only
case that would make sense to consider using the other
index is if the planner knew it could
get the answer in 1 disk read, in which case it should be
able to get the answer out of either index in one disk read
as both indexes are on the same columns.


What is your STATISTICS_TARGET for the relevant columns set to?


STATISTICS_TARGET is the default, which I read as 10 the docs.


When's
the last time you ran analyze?


I'm doing this in a torture test script, loading data.
Every fibnocci number of rows * 100 I VACCUM ANALYZE.
So, 100, 200, 300, 500, 800, etc.

Just for grins I've created the index I'd like it to use
and run VACUUM ANALYZE and shown the EXPLAIN ANALYZE below.


If this is all updated, you want to
post
the pg_stats rows for the relevant columns?


Pg_stats rows below.  (I've tried to wrap the lines short
so as not to mess up anybody's mailer.)

# create index census_sname_date on census (sname, date);
CREATE INDEX
# vacuum analyze census;
VACUUM
# explain analyze select 1 from census where date < '1975-9-21'
 and sname = 'RAD' and status != 'A' ;
QUERY
 PLAN
---
---

 Index Scan using census_date_sname on census  (cost=0.00..2169.51
rows=1437 width=0) (actual time=40.610..40.610 rows=0 loops=1)
   Index Cond: ((date < '1975-09-21'::date) AND (sname =
'RAD'::bpchar))
   Filter: (status <> 'A'::bpchar)
 Total runtime: 40.652 ms
(4 rows)

Compare with:

# drop index census_date_sname;
DROP INDEX
# explain analyze select date from census where sname = 'RAD'
 and date < '1975-9-21' and status != 'A' limit 1;
  QUERY
PLAN
---
---
 Limit  (cost=0.00..3.37 rows=1 width=4) (actual time=0.097..0.097
rows=0 loops=1)
   ->  Index Scan using census_sname_date on census   
(cost=0.00..5203.95 rows=1544 width=4) (actual time=0.094..0.094

rows=0 loops=1)
 Index Cond: ((sname = 'RAD'::bpchar) AND (date <
'1975-09-21'::date))
 Filter: (status <> 'A'::bpchar)
 Total runtime: 0.133 ms
(5 rows)




# select * from pg_stats where tablename = 'census' and (attname =
'sname' or attname = 'date');
 schemaname | tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals | most_common_freqs | histogram_bounds |
correlation
+---+-+---+---+---
-+
-+
--
--+---
-+
-
 babase | census | date | 0 | 4 | 4687 |
{1979-02-01,1976-06-16,1977-03-23,1978-08-25,1979-09-20,1971-06-28
,1972-04-28,1972-08-27,1974-04-06,1975-03-19}
|
{0.002,0.0017,0.0017,0.0017,0.0017,0.0013
,0.0013,0.0013,0.0013,0.0013}
|
{1959-07-15,1966-02-18,1969-02-22,1971-01-10,1972-07-26,1974-02-09
,1975-05-27,1976-07-28,1977-08-19,1978-08-07,1979-10-02}
| 1
 babase | census | sname | 0 | 7 | 177 |
{MAX,ALT,PRE,COW,EST,JAN,RIN,ZUM,DUT,LUL} |
{0.017,0.015,0.015,0.0146667
,0.014,0.014,0.0136667,0.0136667,0.013,0.013}
| {ALI,BUN,FAN,IBI,LER,NDO,PET,RUS,SLM,TOT,XEN} | 0.0446897
(2 rows)

Thanks.


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Karl O. Pinc


On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:

On Mon, Jun 27, 2005 at 11:30:45PM +, Karl O. Pinc wrote:
>
> Short of that I think I'm going to be reduced to
> writing a C function that returns the real
> system time so I can spatter my code with
> RAISE statements that indicate actual execution
> time.

See timeofday().


That only gives you the time at the start of the transaction,
so you get no indication of how long anything in the
transaction takes.


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Karl O. Pinc

Hi,

I'm having a hard time finding the poorly performing
statements in my plpgsql procedures, many of which
are triggers.  Am I missing something?

I can get the query plans by starting up a new
connection and doing:
SET DEBUG_PRINT_PLAN TO TRUE;
SET CLIENT_MIN_MESSAGES TO DEBUG1;
And then running code that exercises my functions.
Then I can find the queries that, in theory,
could have problems.  But problems remain
after this.

What I'd really like is a SET variable (or maybe
a clause in CREATE FUNCTION) that causes any
functions compiled to issue EXPLAIN ANALYZE output
and the query text itself, to be RAISEd.
Then I could watch the performance as it ran.

Short of that I think I'm going to be reduced to
writing a C function that returns the real
system time so I can spatter my code with
RAISE statements that indicate actual execution
time.

Is there a better approach?
Does anybody have such a C function handy?


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Forcing use of a particular index

2005-06-27 Thread Karl O. Pinc

http://www.postgresql.org/docs/8.0/static/indexes-examine.html

Says:

"If you do not succeed in adjusting the costs to be more
appropriate, then you may have to resort to forcing index
usage explicitly."

Is there a way to force a query to use a particular index?
If not, what does this sentence mean?

Thanks.


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Poor index choice -- multiple indexes of the same columns

2005-06-27 Thread Karl O. Pinc

Postgresql 8.0.3

Hi,

I have a query

select 1
 from census
 where date < '1975-9-21' and sname = 'RAD' and status != 'A'
 limit 1;

Explain analyze says it always uses the index made by:

  CREATE INDEX census_date_sname ON census (date, sname);

this is even after I made the index:

  CREATE INDEX census_sname_date ON census (sname, date);

I made census_sname_date because it ran too slow. By deleting
census_date_sname (temporarly, because my apps don't like this)
I can force the use of census_sname_date and the query runs fine.

Seems to me that when there's a constant value in the query
and an = comparision it will always be faster to use the (b-tree)
index that's ordered first by the constant value, as then all further
blocks are guarenteed to have a higher relevant information
density.  At least when compared with another index that has the
same columns in it.

As you might imagine there are relatively few sname values and
relatively many date values in my data.  I use a query like the
above in a trigger to enforce bounds limitations.  I don't
expect (want) to find any rows returned.

I've figured out how to avoid executing this code very often,
so this is not presently a serious problem for me.

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly