Re: [PERFORM] Checking = with timestamp field is slow

2004-11-05 Thread Michael Glaesemann
On Nov 5, 2004, at 4:16 PM, Antony Paul wrote:
where today::date = '2004-11-05';
This is the only condition in the query. There is a btree index on the
column today.
Is there any way to optimise it.
I'm sure others out there have better ideas, but you might want to try
where current_date = date '2004-11-05'
Might not make a difference at all, but perhaps PostgreSQL is coercing 
both values to timestamp or some other type as you're only providing a 
string to compare to a date. Then again, it might make no difference at 
all.

My 1 cent.
Michael Glaesemann
grzm myrealbox com
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Checking = with timestamp field is slow

2004-11-05 Thread Michael Glaesemann
On Nov 5, 2004, at 5:14 PM, Michael Glaesemann wrote:
On Nov 5, 2004, at 4:16 PM, Antony Paul wrote:
where today::date = '2004-11-05';
This is the only condition in the query. There is a btree index on the
column today.
Is there any way to optimise it.
I'm sure others out there have better ideas, but you might want to try
where current_date = date '2004-11-05'
Ach! just re-read that. today is one of your columns! Try
where today::date = date '2004-11-05'
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Checking = with timestamp field is slow

2004-11-05 Thread Michael Fuhr
On Fri, Nov 05, 2004 at 12:46:20PM +0530, Antony Paul wrote:

>I have a table which have more than 20 records. I need to get
> the records which matches like this
> 
> where today::date = '2004-11-05';
> 
> This is the only condition in the query. There is a btree index on the
> column today.  Is there any way to optimise it.

Is the today column a TIMESTAMP as the subject implies?  If so then
your queries probably aren't using the index because you're changing
the type to something that's not indexed.  Your queries should speed
up if you create an index on DATE(today):

CREATE INDEX foo_date_today_idx ON foo (DATE(today));

After creating the new index, use WHERE DATE(today) = '2004-11-05'
in your queries.  EXPLAIN ANALYZE should show that the index is
being used.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-05 Thread Leeuw van der, Tim
To what extent would your problems be solved by having a 2nd server, a replication 
system (such as slony-1, but there are others), and some sort of load-balancer in 
front of it? The load-balancing could be as simple as round-robin DNS server, 
perhaps...

Then when you need to do maintenance such a vacuum full, you can temporarily take 1 
server out of the load-balancer (I hope) and do maintenance, and then the other.
I don't know what that does to replication, but I would venture that replication 
systems should be designed to handle a node going offline.

Load balancing could also help to protect against server-overload and 1 server 
toppling over.

Of course, I don't know to what extent having another piece of hardware is an option, 
for you.

cheers,

--Tim

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Martin Foster
Sent: Friday, November 05, 2004 3:50 AM
To: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Restricting Postgres

[...]

Now is there an administrative command in PostgreSQL that will cause it 
to move into some sort of maintenance mode?   For me that could be 
exceedingly useful as it would still allow for an admin connection to be 
made and run a VACUUM FULL and such.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Checking = with timestamp field is slow

2004-11-05 Thread Andrew McMillan
On Fri, 2004-11-05 at 12:46 +0530, Antony Paul wrote:
> Hi all,
>I have a table which have more than 20 records. I need to get
> the records which matches like this
> 
> where today::date = '2004-11-05';
> 
> This is the only condition in the query. There is a btree index on the
> column today.
> Is there any way to optimise it.

Hi Antony,

I take it your field is called "today" (seems dodgy, but these things
happen...).  Anywa, have you tried indexing on the truncated value?

  create index xyz_date on xyz( today::date );
  analyze xyz;

That's one way.  It depends on how many of those 200,000 rows are on
each date too, as to whether it will get used by your larger query.

Regards,
Andrew.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
  When in doubt, tell the truth.
-- Mark Twain
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Checking = with timestamp field is slow

2004-11-05 Thread Michael Fuhr
On Fri, Nov 05, 2004 at 05:32:49PM +0900, Michael Glaesemann wrote:
> 
> On Nov 5, 2004, at 5:14 PM, Michael Glaesemann wrote:
> 
> >
> >On Nov 5, 2004, at 4:16 PM, Antony Paul wrote:
> >>where today::date = '2004-11-05';
> >>
> >>This is the only condition in the query. There is a btree index on the
> >>column today.
> >>Is there any way to optimise it.
> >
> >I'm sure others out there have better ideas, but you might want to try
> >
> >where current_date = date '2004-11-05'
> 
> Ach! just re-read that. today is one of your columns! Try
> 
> where today::date = date '2004-11-05'

Casting '2004-11-05' to DATE shouldn't be necessary, at least not
in 7.4.5.

test=> EXPLAIN ANALYZE SELECT * FROM foo WHERE today::DATE = '2004-11-05';
QUERY PLAN 
   
--
 Seq Scan on foo  (cost=0.00..268.00 rows=50 width=16) (actual time=0.592..50.854 
rows=1 loops=1)
   Filter: ((today)::date = '2004-11-05'::date)


As you can see, '2004-11-05' is already cast to DATE.  The sequential
scan is happening because there's no index on today::DATE.


test=> CREATE INDEX foo_date_idx ON foo (DATE(today));
CREATE INDEX
test=> EXPLAIN ANALYZE SELECT * FROM foo WHERE DATE(today) = '2004-11-05';
  QUERY PLAN   
   
--
 Index Scan using foo_date_idx on foo  (cost=0.00..167.83 rows=50 width=16) (actual 
time=0.051..0.061 rows=1 loops=1)
   Index Cond: (date(today) = '2004-11-05'::date)

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Checking = with timestamp field is slow

2004-11-05 Thread Christopher Browne
After a long battle with technology, [EMAIL PROTECTED] (Antony Paul), an earthling, 
wrote:
> Hi all,
>I have a table which have more than 20 records. I need to get
> the records which matches like this
>
> where today::date = '2004-11-05';
>
> This is the only condition in the query. There is a btree index on the
> column today.
> Is there any way to optimise it.

How about changing the criterion to:

  where today between '2004-11-05' and '2004-11-06';

That ought to make use of the index on "today".
-- 
"cbbrowne","@","ntlug.org"
http://www.ntlug.org/~cbbrowne/sgml.html
"People need to quit pretending they can invent THE interface and walk
away from it, like some Deist fantasy." -- Michael Peck

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Allen Landsidel
Hey people, long while since I posted here, but I'm having an index
issue that looks on the surface to be a little strange.

I have a text field that I'm trying to query on in a table with
millions of rows.  Stupid I know, but a fairly common stupid thing to
try to do.

For some reason it's a requirement that partial wildcard searches are
done on this field, such as "SELECT ... WHERE field LIKE 'A%'"

I thought an interesting way to do this would be to simply create
partial indexes for each letter on that field, and it works when the
query matches the WHERE clause in the index exactly like above.  The
problem is thus:

Say I have an index.. CREATE INDEX column_idx_a ON table (column)
WHERE column LIKE 'A%'

It seems to me that a query saying "SELECT column FROM table WHERE
column LIKE 'AA%';" should be just as fast or very close to the first
case up above.  However, explain tells me that this query is not using
the index above, which is what's not making sense to me.

Does the planner not realize that 'AA%' will always fall between 'A%'
and 'B', and thus that using the index would be the best way to go, or
am I missing something else that's preventing this from working?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Matt Clark
> For some reason it's a requirement that partial wildcard 
> searches are done on this field, such as "SELECT ... WHERE 
> field LIKE 'A%'"
> 
> I thought an interesting way to do this would be to simply 
> create partial indexes for each letter on that field, and it 
> works when the query matches the WHERE clause in the index 
> exactly like above.  The problem is thus:

I thought PG could use an ordinary index for 'like' conditions with just a
terminating '%'?

My other thought is that like 'A%' should grab about 1/26th of the table
anyway (if the initial character distribution is random), and so a
sequential scan might be the best plan anyway...

M


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Rod Taylor
> It seems to me that a query saying "SELECT column FROM table WHERE
> column LIKE 'AA%';" should be just as fast or very close to the first
> case up above.  However, explain tells me that this query is not using
> the index above, which is what's not making sense to me.

It looks for an exact expression match, and doesn't know about values
which are equal.

You can provide both clauses.

WHERE column LIKE 'A%' and column LIKE 'AA%';



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Allen Landsidel
On Fri, 05 Nov 2004 10:07:38 -0500, Rod Taylor <[EMAIL PROTECTED]> wrote:
> > It seems to me that a query saying "SELECT column FROM table WHERE
> > column LIKE 'AA%';" should be just as fast or very close to the first
> > case up above.  However, explain tells me that this query is not using
> > the index above, which is what's not making sense to me.
> 
> It looks for an exact expression match, and doesn't know about values
> which are equal.
> 
> You can provide both clauses.
> 
> WHERE column LIKE 'A%' and column LIKE 'AA%';

I see.  That's not really optimal either however as you can probably
see already.. adding AB, AC, AD...AZ is likely to be pretty bogus and
at the least is time consuming.

Matt Clark was right that it will use a standard index, which is in
fact what it's doing right now in the "SELECT column WHERE column LIKE
'AA%';" case.. however as I said, the table has millions of rows --
currently about 76 million, so even a full index scan is fairly slow.

The machine isn't all that hot performance wise either, a simple dual
800 P3 with a single 47GB Seagate SCSI.  The only redeeming factor is
that it has 2GB of memory, which I'm trying to make the most of with
these indexes.

So assuming this partial index situation isn't going to change (it
seems like it would be a fairly simple fix for someone that knows the
pg code however) I'm wondering if a subselect may speed things up any,
so I'm going to investigate that next.

Perhaps.. SELECT column FROM (SELECT column FROM table WHERE column
LIKE 'A%') AS sq WHERE column LIKE 'AA%';

The query planner thinks this will be pretty fast indeed, and does use
the index I am after.

OS is, of course, FreeBSD.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] postgresql amd-64

2004-11-05 Thread Merlin Moncure
Does anybody have any experiences with postgresql 7.4+ running on amd-64
in 64 bit mode?  Specifically, does it run quicker and if so do the
performance benefits justify the extra headaches running 64 bit linux?

Right now I'm building a dual Opteron 246 with 4 gig ddr400.  

Merlin

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Allen Landsidel
On Fri, 5 Nov 2004 10:32:43 -0500, Allen Landsidel <[EMAIL PROTECTED]> wrote:
> On Fri, 05 Nov 2004 10:07:38 -0500, Rod Taylor <[EMAIL PROTECTED]> wrote:
> 
> 
> > > It seems to me that a query saying "SELECT column FROM table WHERE
> > > column LIKE 'AA%';" should be just as fast or very close to the first
> > > case up above.  However, explain tells me that this query is not using
> > > the index above, which is what's not making sense to me.
> >
> > It looks for an exact expression match, and doesn't know about values
> > which are equal.
> >
> > You can provide both clauses.
> >
> > WHERE column LIKE 'A%' and column LIKE 'AA%';
> 
> I see.  That's not really optimal either however as you can probably
> see already.. adding AB, AC, AD...AZ is likely to be pretty bogus and
> at the least is time consuming.

I see now that you mean to add that to the SELECT clause and not the
index, my mistake.

> Perhaps.. SELECT column FROM (SELECT column FROM table WHERE column
> LIKE 'A%') AS sq WHERE column LIKE 'AA%';
> 
> The query planner thinks this will be pretty fast indeed, and does use
> the index I am after.

This was indeed pretty fast.  About 7 seconds, as was modifying the
WHERE as suggested above.

-Allen

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Bruno Wolff III
On Fri, Nov 05, 2004 at 09:39:16 -0500,
  Allen Landsidel <[EMAIL PROTECTED]> wrote:
> 
> For some reason it's a requirement that partial wildcard searches are
> done on this field, such as "SELECT ... WHERE field LIKE 'A%'"
> 
> I thought an interesting way to do this would be to simply create
> partial indexes for each letter on that field, and it works when the
> query matches the WHERE clause in the index exactly like above.  The
> problem is thus:

That may not help much except for prefixes that have a below average
number of occurences. If you are going to be select 1/26 of the records,
you are probably going to do about as well with a sequential scan as an
index scan.

Just having a normal index on the column will work if the database locale
is C. In 7.4 you can create an index usable by LIKE even in the database
locale isn't C, but I don't remember the exact syntax. You will be better
off having just one index rather than 26 partial indexes.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Checking = with timestamp field is slow

2004-11-05 Thread Michael Fuhr
On Fri, Nov 05, 2004 at 07:47:54AM -0500, Christopher Browne wrote:
> 
> How about changing the criterion to:
> 
>   where today between '2004-11-05' and '2004-11-06';
> 
> That ought to make use of the index on "today".

Yes it should, but it'll also return records that have a "today"
value of '2004-11-06 00:00:00' since "x BETWEEN y AND z" is equivalent
to "x >= y AND x <= z".  Try this instead:

  WHERE today >= '2004-11-05' AND today < '2004-11-06'

In another post I suggested creating an index on DATE(today).  The
above query should make that unnecessary, although in 8.0 such an
index would be used in queries like this:

  WHERE today IN ('2004-09-01', '2004-10-01', '2004-11-01');

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Allen Landsidel
On Fri, 5 Nov 2004 11:51:59 -0600, Bruno Wolff III <[EMAIL PROTECTED]> wrote:
> On Fri, Nov 05, 2004 at 09:39:16 -0500,
>  Allen Landsidel <[EMAIL PROTECTED]> wrote:
> >
> > For some reason it's a requirement that partial wildcard searches are
> > done on this field, such as "SELECT ... WHERE field LIKE 'A%'"
> >
> > I thought an interesting way to do this would be to simply create
> > partial indexes for each letter on that field, and it works when the
> > query matches the WHERE clause in the index exactly like above.  The
> > problem is thus:
> 
> That may not help much except for prefixes that have a below average
> number of occurences. If you are going to be select 1/26 of the records,
> you are probably going to do about as well with a sequential scan as an
> index scan.

The thing isn't that I want 1/26th of the records since the
distribution is not exactly equal among different letters, but more
importantly, there are about 76million rows currently, and for some
reason I am being told by the people with the pointy hair that a query
like "select foo,bar from table where foo like 'abc%';" is not an
uncommon type of query to run.  I don't know why it's common and to be
honest, I'm afraid to ask. ;)

With that many rows, and a normal index on the field, postgres figures
the best option for say "I%" is not an index scan, but a sequential
scan on the table, with a filter -- quite obviously this is slow as
heck, and yes, I've run analyze several times and in fact have the
vacuum analyze automated.

With the partial index the index scan is used and the cost drops from
0..2million to 0..9000 -- a vast improvement.

So I'm going to go with the partial indexes, and have a total of 36 of
them -- A-Z and 0-9.

> Just having a normal index on the column will work if the database locale
> is C. In 7.4 you can create an index usable by LIKE even in the database
> locale isn't C, but I don't remember the exact syntax. You will be better
> off having just one index rather than 26 partial indexes.

I haven't written a line of C in years, and it was never my strong
suit, so despite all my years doing development and sysadminning, the
locale stuff is still something of a mystery to me.

The locale though is C, the default, and will for the time being at
least be storing only ascii strings -- no unicode, other character
sets, or anything funky like that.

-Allen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] What is the difference between these?

2004-11-05 Thread Matt Nuzum
To me, these three queries seem identical... why doesn't the first one
(simplest to understand and write) go the same speed as the third one?

I'll I'm trying to do is get statistics for one day (in this case,
today) summarized.  Table has ~25M rows.  I'm using postgres 7.3.? on
rh linux 7.3 (note that i think the difference between the first two
might just be related to the data being in memory for the second
query).


 EXPLAIN ANALYZE
 select count(distinct sessionid) from usage_access where atime >
date_trunc('day', now());
QUERY PLAN  
--
 Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=580350.43..580350.43 rows=1 loops=1)
   ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=580164.48..580342.21 rows=2964 loops=1)
 Filter: (atime > date_trunc('day'::text, now()))
 Total runtime: 580350.65 msec
(4 rows)


 EXPLAIN ANALYZE
 select count(distinct sessionid) from (select * from usage_access
where atime > date_trunc('day', now())) as temp;
QUERY PLAN  
--
 Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=348012.85..348012.85 rows=1 loops=1)
   ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=347960.53..348004.68 rows=2964 loops=1)
 Filter: (atime > date_trunc('day'::text, now()))
 Total runtime: 348013.10 msec
(4 rows)


 EXPLAIN ANALYZE
 select count(distinct sessionid) from usage_access where atime
between date_trunc('day', now()) and date_trunc('day', now()) + '1
day'::interval;
QUERY PLAN  
--
 Aggregate  (cost=89324.98..89324.98 rows=1 width=4) (actual
time=27.84..27.84 rows=1 loops=1)
   ->  Index Scan using usage_access_atime on usage_access 
(cost=0.00..89009.39 rows=126237 width=4) (actual time=0.51..20.37
rows=2964 loops=1)
 Index Cond: ((atime >= date_trunc('day'::text, now())) AND
(atime <= (date_trunc('day'::text, now()) + '1 day'::interval)))
 Total runtime: 28.11 msec
(4 rows)

-- 
Matthew Nuzum   | Makers of "Elite Content Management System"
www.followers.net   | View samples of Elite CMS in action
[EMAIL PROTECTED]   | http://www.followers.net/portfolio/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] postgresql amd-64

2004-11-05 Thread Daniel Ceregatti
I have two dual opteron 248's with 4g of ram each, 6x36G 15k rpm ultra
320 scsi disks in hardware raid 5, and they are by far the fastest
machines I've user used. As far as this "headache" of using 64 bit
Linux, I've experienced no such thing. I'm using gentoo on both
machines, which are dedicated for postgres 7.4 and replicated with
slony. They're both quite fast and reliable. One machine even runs a
secondary instance of pg, pg 8 beta4 in this case, for development,
which also runs quite well.

Daniel

Merlin Moncure wrote:

>Does anybody have any experiences with postgresql 7.4+ running on amd-64
>in 64 bit mode?  Specifically, does it run quicker and if so do the
>performance benefits justify the extra headaches running 64 bit linux?
>
>Right now I'm building a dual Opteron 246 with 4 gig ddr400.  
>
>Merlin
>
>---(end of broadcast)---
>TIP 5: Have you checked our extensive FAQ?
>
>   http://www.postgresql.org/docs/faqs/FAQ.html
>  
>

-- 

Daniel Ceregatti - Programmer
Omnis Network, LLC

The forest is safe because a lion lives therein and the lion is safe because
it lives in a forest.  Likewise the friendship of persons rests on mutual help.
-- Laukikanyay.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] vacuum analyze slows sql query

2004-11-05 Thread patrick ~
Just wanted to know if there were any insights after looking at
requested 'explain analyze select ...'?


Thanks,
--patrick



__ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com 
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Matt Clark
> With that many rows, and a normal index on the field, 
> postgres figures the best option for say "I%" is not an index 
> scan, but a sequential scan on the table, with a filter -- 
> quite obviously this is slow as heck, and yes, I've run 
> analyze several times and in fact have the vacuum analyze automated.

Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an
ordinary index OK?  If so then...

The planner would usually assume (from what Tom usually says) that 1/26
selectivity isn't worth doing an index scan for, but in your case it's wrong
(maybe because the rows are very big?)

You may be able to get the planner to go for an index scan on "like 'I%'" by
tweaking the foo_cost variables in postgresql.conf 

Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%'
... ", or do that as a stored proc.

> With the partial index the index scan is used and the cost 
> drops from 0..2million to 0..9000 -- a vast improvement.

So there are really only 9000 rows out of 76 million starting with 'I'?  How
about combining some techniques - you could create an index on the first two
chars of the field (should be selective enough to give an index scan),
select from that, and select the actual data with the like clause.

CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
CREATE INDEX idx_all ON table (field);
SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
AS approx WHERE field LIKE 'DE%';

Any good?


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] What is the difference between these?

2004-11-05 Thread Matthew T. O'Connor
Matt Nuzum wrote:
To me, these three queries seem identical... why doesn't the first one
(simplest to understand and write) go the same speed as the third one?
 

If you look at the explain output, you will notice that only the 3rd 
query is using an Index Scan, where as the 1st and 2nd are doing a 
sequential scan over the entire table of 25M rows.  My guess is that the 
problem is related to outdated statistics on the atime column.  If you 
notice the 1st and 2nd queries estimate 8.4M rows returned at which 
point a seq scan is the right choice, but the 3rd query using the 
between statement only estimates 127k rows which make the Index a better 
option.  All of these queries only return 2964 rows so it looks like 
your stats are out of date.  Try running an analyze command right before 
doing any of these queries and see what happens.

I'll I'm trying to do is get statistics for one day (in this case,
today) summarized.  Table has ~25M rows.  I'm using postgres 7.3.? on
rh linux 7.3 (note that i think the difference between the first two
might just be related to the data being in memory for the second
query).
EXPLAIN ANALYZE
select count(distinct sessionid) from usage_access where atime >
date_trunc('day', now());
   QUERY PLAN  
--
Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=580350.43..580350.43 rows=1 loops=1)
  ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=580164.48..580342.21 rows=2964 loops=1)
Filter: (atime > date_trunc('day'::text, now()))
Total runtime: 580350.65 msec
(4 rows)

EXPLAIN ANALYZE
select count(distinct sessionid) from (select * from usage_access
where atime > date_trunc('day', now())) as temp;
   QUERY PLAN  
--
Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=348012.85..348012.85 rows=1 loops=1)
  ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=347960.53..348004.68 rows=2964 loops=1)
Filter: (atime > date_trunc('day'::text, now()))
Total runtime: 348013.10 msec
(4 rows)

EXPLAIN ANALYZE
select count(distinct sessionid) from usage_access where atime
between date_trunc('day', now()) and date_trunc('day', now()) + '1
day'::interval;
   QUERY PLAN  
--
Aggregate  (cost=89324.98..89324.98 rows=1 width=4) (actual
time=27.84..27.84 rows=1 loops=1)
  ->  Index Scan using usage_access_atime on usage_access 
(cost=0.00..89009.39 rows=126237 width=4) (actual time=0.51..20.37
rows=2964 loops=1)
Index Cond: ((atime >= date_trunc('day'::text, now())) AND
(atime <= (date_trunc('day'::text, now()) + '1 day'::interval)))
Total runtime: 28.11 msec
(4 rows)

 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Tom Lane
Allen Landsidel <[EMAIL PROTECTED]> writes:
> With that many rows, and a normal index on the field, postgres figures
> the best option for say "I%" is not an index scan, but a sequential
> scan on the table, with a filter -- quite obviously this is slow as
> heck, and yes, I've run analyze several times and in fact have the
> vacuum analyze automated.
> With the partial index the index scan is used and the cost drops from
> 0..2million to 0..9000 -- a vast improvement.

Hmm.  This suggests to me that you're using a non-C locale and so a
plain index *can't* be used for a LIKE query.  Can you force it to use
an indexscan by setting enable_seqscan = false?  If not then you've got
a locale problem.  As someone else pointed out, this can be worked
around by creating an index with the right operator class.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] What is the difference between these?

2004-11-05 Thread Tom Lane
Matt Nuzum <[EMAIL PROTECTED]> writes:
> To me, these three queries seem identical... why doesn't the first one
> (simplest to understand and write) go the same speed as the third one?

This is the standard problem that the planner has to guess about the
selectivity of inequalities involving non-constants (like now()).
The guesses are set up so that a one-sided inequality will use a
seqscan while a range constraint will use an indexscan.

See the pgsql-performance archives for other ways of persuading it
that an indexscan is a good idea.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Allen Landsidel
On Fri, 05 Nov 2004 14:57:40 -0500, Tom Lane <[EMAIL PROTECTED]> wrote:
> Allen Landsidel <[EMAIL PROTECTED]> writes:
> > With that many rows, and a normal index on the field, postgres figures
> > the best option for say "I%" is not an index scan, but a sequential
> > scan on the table, with a filter -- quite obviously this is slow as
> > heck, and yes, I've run analyze several times and in fact have the
> > vacuum analyze automated.
> > With the partial index the index scan is used and the cost drops from
> > 0..2million to 0..9000 -- a vast improvement.
> 
> Hmm.  This suggests to me that you're using a non-C locale and so a
> plain index *can't* be used for a LIKE query.  Can you force it to use
> an indexscan by setting enable_seqscan = false?  If not then you've got
> a locale problem.  As someone else pointed out, this can be worked
> around by creating an index with the right operator class.

Tom, disabling seqscan does cause it to use the index.

With seqscan enabled however, "AB%" will use the index, but "A%" will not.

The estimated cost for the query is much higher without the partial
indexes than it is with them, and the actual runtime of the query is
definitely longer without the partial indexes.

The locale is set in the postgresql.conf file as per default, with..

# These settings are initialized by initdb -- they may be changed
lc_messages = 'C'   # locale for system error message strings
lc_monetary = 'C'   # locale for monetary formatting
lc_numeric = 'C'# locale for number formatting
lc_time = 'C'   # locale for time formatting

-Allen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Allen Landsidel
On Fri, 5 Nov 2004 18:34:23 -, Matt Clark <[EMAIL PROTECTED]> wrote:
> > With that many rows, and a normal index on the field,
> > postgres figures the best option for say "I%" is not an index
> > scan, but a sequential scan on the table, with a filter --
> > quite obviously this is slow as heck, and yes, I've run
> > analyze several times and in fact have the vacuum analyze automated.
> 
> Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an
> ordinary index OK?  If so then...

That is correct.

> The planner would usually assume (from what Tom usually says) that 1/26
> selectivity isn't worth doing an index scan for, but in your case it's wrong
> (maybe because the rows are very big?)

The rows aren't big, it's a text field, a few ints, and a few
timestamps.  That's all.  The text field is the one we're querying on
here and lengthwise it's typically not over 32 chars.

> You may be able to get the planner to go for an index scan on "like 'I%'" by
> tweaking the foo_cost variables in postgresql.conf

That's true but I'd rather not, there are times when the seqscan will
have a faster net result (for other queries) and I'd rather not have
them suffer.

> Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%'
> ... ", or do that as a stored proc.

Holy cow.  Yeah that seems a little outrageous.  It would be cleaner
looking in "\d table" than having all these indexes at the cost of
having one very ugly query.

> > With the partial index the index scan is used and the cost
> > drops from 0..2million to 0..9000 -- a vast improvement.
> 
> So there are really only 9000 rows out of 76 million starting with 'I'?  How
> about combining some techniques - you could create an index on the first two
> chars of the field (should be selective enough to give an index scan),
> select from that, and select the actual data with the like clause.

I was talking about the cost, not the number of rows.  About 74,000
rows are returned but the query only takes about 8 seconds to run. --
with the partial index in place.

> CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
> CREATE INDEX idx_all ON table (field);
> SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
> AS approx WHERE field LIKE 'DE%';

That looks like a pretty slick way to create an index, I didn't know
there was such a way to do it.. but It appears that this will not work
with queries where the WHERE clause wants to find substrings longer
than 2 characters.

I will give it a try and see how it goes though I think I'm fairly
"settled" on creating all the other indexes, unless there is some
specific reason I shouldn't -- they are used in all cases where the
substring is >= 1 character, so long as I make sure the first where
clause (or inner select in a subquery) is the most ambiguous from an
index standpoint.

Going back to the initial problem -- having only one large, complete
index on the table (no partial indexes) the query "SELECT field FROM
table WHERE field LIKE 'A%';" does not use the index.  The query
"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use
the single large index if it exists.

Adding the partial index "CREATE INDEX idx_table_substrfield_A ON
table (field) WHERE field LIKE 'A%';" causes all queries with
substrings of any length to do index scans.provided I issue the query
as:

SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%';
 -- or even --
SELECT field FROM table WHERE field LIKE 'A%';

The latter query, without the partial index described, does a
sequential scan on the table itself instead of an index scan.

-Allen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Question regarding the file system

2004-11-05 Thread Gabriele Bartolini
Hi guys,
   I have been given a dual PIII with 768MB RAM and I am going to install 
PostgreSQL on it, for data warehousing reasons. I have also been given four 
160 Ultra SCSI disks (36MB each) with a RAID controller (Adaptec 2100). I 
am going to use a RAID5 architecture (this gives me approximately 103 GB of 
data) and install a Debian Linux on it: this machine will be dedicated 
exclusively to PostgreSQL.

   I was wondering which file system you suggest me: ext3 or reiserfs? 
Also, I was thinking of using the 2.6.x kernel which offers a faster thread 
support: will PostgreSQL gain anything from it or should I stick with 2.4.x?

Thank you very much,
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check 
maintainer
Current Location: Prato, Toscana, Italia
[EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The 
Inferno

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.788 / Virus Database: 533 - Release Date: 01/11/2004

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Tom Lane
Allen Landsidel <[EMAIL PROTECTED]> writes:
> With seqscan enabled however, "AB%" will use the index, but "A%" will not.

> The estimated cost for the query is much higher without the partial
> indexes than it is with them, and the actual runtime of the query is
> definitely longer without the partial indexes.

OK.  This suggests that the planner is drastically misestimating
the selectivity of the 'A%' clause, which seems odd to me since in
principle it could get that fairly well from the ANALYZE histogram.
But it could well be that you need to increase the resolution of the
histogram --- see ALTER TABLE SET STATISTICS.

Did you ever show us EXPLAIN ANALYZE results for this query?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] postgresql amd-64

2004-11-05 Thread Merlin Moncure
> I have two dual opteron 248's with 4g of ram each, 6x36G 15k rpm ultra
> 320 scsi disks in hardware raid 5, and they are by far the fastest
> machines I've user used. As far as this "headache" of using 64 bit
> Linux, I've experienced no such thing. I'm using gentoo on both
> machines, which are dedicated for postgres 7.4 and replicated with
> slony. They're both quite fast and reliable. One machine even runs a
> secondary instance of pg, pg 8 beta4 in this case, for development,
> which also runs quite well.

Good, I'll give it a shot and see what I come up with...thx.

Merlin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Allen Landsidel
On Fri, 05 Nov 2004 16:08:56 -0500, Tom Lane <[EMAIL PROTECTED]> wrote:


> Allen Landsidel <[EMAIL PROTECTED]> writes:
> > With seqscan enabled however, "AB%" will use the index, but "A%" will not.
>
> > The estimated cost for the query is much higher without the partial
> > indexes than it is with them, and the actual runtime of the query is
> > definitely longer without the partial indexes.
>
> OK.  This suggests that the planner is drastically misestimating
> the selectivity of the 'A%' clause, which seems odd to me since in
> principle it could get that fairly well from the ANALYZE histogram.
> But it could well be that you need to increase the resolution of the
> histogram --- see ALTER TABLE SET STATISTICS.

I will look into this.

>
> Did you ever show us EXPLAIN ANALYZE results for this query?

No, I didn't.  I am running it now without the partial index on to
give you the results but it's (the 'A%' problem query) been running
pretty much since I got this message (an hour ago) and is still not
finished.

The EXPLAIN results without the ANALYZE will have to suffice until
it's done, I can readd the index, and run it again, so you have both
to compare to.

First two queries run where both the main index, and the 'A%' index exist:

-- QUERY 1
search=# explain
search-# SELECT test_name FROM test WHERE test_name LIKE 'A%';
   QUERY PLAN
---
Index Scan using test_name_idx_a on "test"  (cost=0.00..8605.88
rows=391208 width=20)
  Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text))
  Filter: (test_name ~~ 'A%'::text)
(3 rows)

Time: 16.507 ms

-- QUERY 2
search=# explain
search-# SELECT test_name FROM test WHERE test_name LIKE 'A%' AND
test_name LIKE 'AB%';
  QUERY
PLAN
-
Index Scan using test_name_idx_a on "test"  (cost=0.00..113.79
rows=28 width=20)
  Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text)
AND (test_name >= 'AB'::text) AND (test_name < 'AC'::text))
  Filter: ((test_name ~~ 'A%'::text) AND (test_name ~~ 'AB%'::text))
(3 rows)

Time: 3.197 ms

Ok, now the same two queries after a DROP INDEX test_name_idx_a;

search=# explain
search-# SELECT test_name FROM test WHERE test_name LIKE 'A%';
 QUERY PLAN
---
Index Scan using test_name_unique on "test"  (cost=0.00..1568918.66
rows=391208 width=20)
  Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text))
  Filter: (test_name ~~ 'A%'::text)
(3 rows)

Time: 2.470 ms

search=# explain
search-# SELECT test_name FROM test WHERE test_name LIKE 'AB%';
   QUERY PLAN
---
Index Scan using test_name_unique on "test"  (cost=0.00..20379.49
rows=5081 width=20)
  Index Cond: ((test_name >= 'AB'::text) AND (test_name < 'AC'::text))
  Filter: (test_name ~~ 'AB%'::text)
(3 rows)

Time: 2.489 ms

--
Copying just the costs you can see the vast difference...
Index Scan using test_name_unique on "test"  (cost=0.00..1568918.66
rows=391208 width=20)
Index Scan using test_name_unique on "test"  (cost=0.00..20379.49
rows=5081 width=20)

vs

Index Scan using test_name_idx_a on "test"  (cost=0.00..8605.88
rows=391208 width=20)
Index Scan using test_name_idx_a on "test"  (cost=0.00..113.79
rows=28 width=20)

Lastly no, neither of these row guesstimates is correct..  I'll get
back and tell you how much they're off by if it's important, once this
query is done.

The odd thing is it used the index scan here each time -- that has not
always been the case with the main unique index, it's trying to make a
liar out of me heh.

I'm used to the estimates and plan changing from one vacuum analyze to
the next, even without any inserts or updates between.. the index scan
is always used however when I have the partial indexes in place, and
something like..

CREATE TEMP TABLE t1 AS
 SELECT field FROM table
 WHERE field LIKE 'A%'
 AND field LIKE 'AA%';

runs in 6-8 seconds as well, with a bit under 100k records.

-Allen

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] ia64 results with dbt2 and 8.0beta4

2004-11-05 Thread Mark Wong
Hi everyone,

Some more data I've collected, trying to best tune dbt-2 with
8.0beta4.  Was hoping for some suggestions, explanations for what I'm
seeing, etc.

A review of hardware I've got:

4 x 1.5Ghz Itanium 2
16GB memory
84 15K RPM disks (6 controlers, 12 channels)

Physical Database table layout (using LVM2 for tables using more than 1 disk):
- warehouse 2 disks
- district 2 disks
- order_line 2 disks
- customer 4 disks
- stock 12 disks
- log 12 disks
- orders 2 disks
- new_order 2 disks

- history 1 disk
- item 1 disk
- index1 1 disk
- index2 1 disk

All these tests are using a 500 warehouse database.


Test 1: http://www.osdl.org/projects/dbt2dev/results/dev4-010/188/
Metric: 3316

DB parameter changes from default:
bgwriter_percent   | 10
checkpoint_timeout | 300
checkpoint_segments| 800
checkpoint_timeout | 1800
default_statistics_target  | 1000
max_connections| 140
stats_block_level  | on
stats_command_string   | on
stats_row_level| on
wal_buffers| 128
wal_sync_method| fsync
work_mem   | 2048


Test 2: http://www.osdl.org/projects/dbt2dev/results/dev4-010/189/
Metric: 3261  -1.7% decrease Test 1

DB parameter changes from Test 1:
shared_buffers | 6

Noted changes:

The block read for the customer table decreases significantly according
to the database.


Test 3: http://www.osdl.org/projects/dbt2dev/results/dev4-010/190/
Metric: 3261  0% change from Test 2

DB parameter changes from Test 2:
effective_cache_size   | 22

Noted changes:

No apparent changes according to the charts.


Test 4: http://www.osdl.org/projects/dbt2dev/results/dev4-010/191/
Metric: 3323  1.9 increase from Test 3

DB parameter changes from Test 3:
checkpoint_segments| 1024
effective_cache_size   | 1000

Noted Changes:

The increased checkpoint_segments smothed out the throughput and
other i/o related stats.


Test 5: http://www.osdl.org/projects/dbt2dev/results/dev4-010/192/
Metric: 3149  -5% decrease from Test 4

DB parameter changes from Test 4:
shared_buffers | 8

Noted changes:

The graphs are starting to jump around a bit.  I figure 80,000
shared_buffers is too much.


Test 6: http://www.osdl.org/projects/dbt2dev/results/dev4-010/193/
Metric: 3277  4% increase from Test 5

DB parameter changes from Test 5:
random_page_cost   | 2
shared_buffers | 6

Noted changes:

Reducing the shared_buffers to the smoother performance found in Test 4
seemed to have disrupted by decreasing the random_page_cost to 2.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Matt Clark

So there are really only 9000 rows out of 76 million starting with 'I'?  How
about combining some techniques - you could create an index on the first two
chars of the field (should be selective enough to give an index scan),
select from that, and select the actual data with the like clause.
   

I was talking about the cost, not the number of rows.  About 74,000
rows are returned but the query only takes about 8 seconds to run. --
 

Well, 74000/7600 ~= 0.1%, way less than 1/26, so no surprise that an 
indexscan is better, and also no surprise that the planner can't know 
that I is such an uncommon initial char.

with the partial index in place.
 

CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
CREATE INDEX idx_all ON table (field);
SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
AS approx WHERE field LIKE 'DE%';
   

That looks like a pretty slick way to create an index, I didn't know
there was such a way to do it.. but It appears that this will not work
with queries where the WHERE clause wants to find substrings longer
than 2 characters.
 

I don't see why not, it just uses the functional index to grap the 
1/(ascii_chars^2) of the rows that are of obvious interest, and then 
uses the standard index to filter that set..  Where it won't work is 
where you just want one initial char!  Which is why I suggested the 
silly query rewrite...

Going back to the initial problem -- having only one large, complete
index on the table (no partial indexes) the query "SELECT field FROM
table WHERE field LIKE 'A%';" does not use the index.  The query
"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use
the single large index if it exists.
 

If you were planning the query, what would you do?  Assuming we're 
talking about A-Z as possible first chars, and assuming we don't know 
the distribution of those chars, then we have to assume 1/26 probability 
of each char, so a seq scan makes sense.  Whereas like 'JK%' should only 
pull 1/500 rows.

Adding the partial index "CREATE INDEX idx_table_substrfield_A ON
table (field) WHERE field LIKE 'A%';" causes all queries with
substrings of any length to do index scans.provided I issue the query
as:
SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%';
-- or even --
SELECT field FROM table WHERE field LIKE 'A%';
The latter query, without the partial index described, does a
sequential scan on the table itself instead of an index scan.
 

Yes, because (I assume, Tom will no doubt clarify/correct), by creating 
the partial indices you create a lot more information about the 
distribution of the first char - either that, or the planner simply 
always uses an exactly matching partial index if available.

I _think_ that creating 26 partial indexes on '?%' is essentially the 
same thing as creating one functional index on substr(field,1,1), just 
messier, unless the partial indexes cause the planner to do something 
special...

M
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] vacuum analyze slows sql query

2004-11-05 Thread patrick ~
Looking around at the pg_ tables and some PostgreSQL online
docs prompted by another post/reply on this list regarding
ALERT TABLE SET STATISTICS i found out that prior to a VACUUM
the following select (taken from the online docs) shows:

pkk=# select relname, relkind, reltuples, relpages from pg_class where relname
like 'pkk_%';
  relname  | relkind | reltuples | relpages 
---+-+---+--
 pkk_billing   | r   |  1000 |   10
 pkk_offer | r   |  1000 |   10
 pkk_offer_pkey| i   |  1000 |1
 pkk_purchase  | r   |  1000 |   10
 pkk_purchase_pkey | i   |  1000 |1
(5 rows)

Time: 1097.263 ms


and after a VACUUM:

pkk=# vacuum analyze ;
VACUUM
Time: 100543.359 ms


it shows:

pkk=# select relname, relkind, reltuples, relpages from pg_class where relname
like 'pkk_%'; 
  relname  | relkind |  reltuples  | relpages 
---+-+-+--
 pkk_billing   | r   |  714830 | 4930
 pkk_offer | r   | 618 |6
 pkk_offer_pkey| i   | 618 |4
 pkk_purchase  | r   | 1.14863e+06 | 8510
 pkk_purchase_pkey | i   | 1.14863e+06 | 8214
(5 rows)

Time: 3.868 ms



Further, I notice that if I were to delete rows from the
pg_statistic table I get the db in a state where the query
is fast again:

pkk=# explain analyze select offer_id, pkk_offer_has_pending_purch( offer_id )
from pkk_offer ;
   QUERY PLAN  
 
-
 Seq Scan on pkk_offer  (cost=0.00..13.72 rows=618 width=4) (actual
time=2415.739..1065709.092 rows=618 loops=1)
 Total runtime: 1065711.651 ms
(2 rows)

Time: 1065713.446 ms



pkk=# delete from pg_statistic where pg_statistic.starelid = pg_class.oid and
pg_class.relname like 'pkk_%';
DELETE 11
Time: 3.368 ms



pkk=# select offer_id, pkk_offer_has_pending_purch( offer_id ) from pkk_offer ;
(618 rows)

Time: 876.377 ms


pkk=# explain analyze select offer_id, pkk_offer_has_pending_purch( offer_id )
from pkk_offer ;
QUERY PLAN 
  
--
 Seq Scan on pkk_offer  (cost=0.00..13.72 rows=618 width=4) (actual
time=1.329..846.786 rows=618 loops=1)
 Total runtime: 848.170 ms
(2 rows)

Time: 849.958 ms




Now, I'm sure someone (a PostgreSQL developer most likely)
is about to shoot me for doing such a thing :-)

But, however *ugly, wrong, sacrilege* this may be, if this is
the only solution...err workaround I have that will help me
i must resort to it.

The only two questions I have about this are:

1. Is this really the only solution left for me?
2. Am I in anyway screwing the db doing this?


Best regards,
--patrick



__ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com 
 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] postgresql amd-64

2004-11-05 Thread Vishal Kashyap @ [Sai Hertz And Control Systems]
Merlin,




> Good, I'll give it a shot and see what I come up with...thx.
> 
Do share your experience with us.

-- 
With Best Regards,
Vishal Kashyap.
Did you know SaiPACS is one and only PACS
Management tool.
http://saihertz.com

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Tom Lane
Matt Clark <[EMAIL PROTECTED]> writes:
> Well, 74000/7600 ~= 0.1%, way less than 1/26, so no surprise that an 
> indexscan is better, and also no surprise that the planner can't know 
> that I is such an uncommon initial char.

But it *can* know that, at least given adequate ANALYZE statistics.
I'm pretty convinced that the basic answer to Allen's problem is to
increase the histogram size.  How large he needs to make it is not
clear --- obviously his data distribution is not uniform, but I don't
have a fix on how badly non-uniform.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] vacuum analyze slows sql query

2004-11-05 Thread Tom Lane
patrick ~ <[EMAIL PROTECTED]> writes:
> 1. Is this really the only solution left for me?

You still haven't followed the suggestions that were given to you
(ie, find out what is happening with the plan for the query inside
the problematic function).

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] vacuum analyze slows sql query

2004-11-05 Thread patrick ~
Hi Tom, -performance@,

I apologize if I didn't follow through with the PREPARE and
EXECUTE.  I assume that is what you are refering to.  After
reading the PostgreSQL docs on PREPARE statement I realized
two things: a) PREPARE is only session long and b) that I
can not (at least I haven't figured out how) PREPARE a
statement which would mimic my original select statement
which I could EXECUTE over all rows of pkk_offer table.

Best I could do is either:

 PREPARE pkk_01 ( interger ) select $1, pkk_offer_has_pending_purch( $1 ) from
pkk_offer ;

or

 PREPARE pkk_00 ( integer ) 

In the former case the EXPLAIN ANALYZE doesn't give enough
data (it is the same as w/o the PREPARE statement).  In the
latter case, I can only execute it with one offer_id at at
time.  Is this sufficient?

If so, here are the results before and after VACUUM ANALYZE:

pkk=# explain analyze execute pkk_00( 795 ) ;
  QUERY PLAN

 Result  (cost=8.57..8.58 rows=1 width=0) (actual time=0.095..0.096 rows=1
loops=1)
   InitPlan
 ->  Limit  (cost=0.00..8.57 rows=1 width=4) (actual time=0.083..0.084
rows=1 loops=1)
   ->  Index Scan using pur_offer_id_idx on pkk_purchase p0 
(cost=0.00..17.13 rows=2 width=4) (actual time=0.079..0.079 rows=1 loops=1)
 Index Cond: (offer_id = $1)
 Filter: expire_time)::timestamp with time zone > now()) OR
(expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR
(pending = true)))
 Total runtime: 0.238 ms
(7 rows)

pkk=# VACUUM ANALYZE ;
VACUUM
Time: 97105.589 ms

pkk=# explain analyze execute pkk_00( 795 ) ;
  QUERY PLAN

 Result  (cost=8.57..8.58 rows=1 width=0) (actual time=0.329..0.330 rows=1
loops=1)
   InitPlan
 ->  Limit  (cost=0.00..8.57 rows=1 width=4) (actual time=0.311..0.312
rows=1 loops=1)
   ->  Index Scan using pur_offer_id_idx on pkk_purchase p0 
(cost=0.00..17.13 rows=2 width=4) (actual time=0.307..0.307 rows=1 loops=1)
 Index Cond: (offer_id = $1)
 Filter: expire_time)::timestamp with time zone > now()) OR
(expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR
(pending = true)))
 Total runtime: 0.969 ms
(7 rows)

Time: 16.252 ms



In both before and after "Index Scan" is used on pur_offer_id_idx.
So, unless I'm missing something obvious here I am at a loss.

I went as far as doing the EXPLAIN ANALYZE EXECUTE pkk_00( offer_id )
for each offer_id in pkk_offer table one at a time (not manually but
by scripting it).  All instances use "Index Scan".

I only noticed a couple that had quite large "actual times" like
this following:


pkk=# explain analyze execute pkk_00( 2312 ) ;
   
 QUERY PLAN


 Result  (cost=8.57..8.58 rows=1 width=0) (actual time=21.279..21.282 rows=1
loops=1)
   InitPlan
 ->  Limit  (cost=0.00..8.57 rows=1 width=4) (actual time=21.256..21.258
rows=1 loops=1)
   ->  Index Scan using pur_offer_id_idx on pkk_purchase p0 
(cost=0.00..17.13 rows=2 width=4) (actual time=21.249..21.249 rows=1 loops=1)
 Index Cond: (offer_id = $1)
 Filter: expire_time)::timestamp with time zone > now()) OR
(expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR
(pending = true)))
 Total runtime: 21.435 ms
(7 rows)

Time: 22.541 ms


Which makes sense when you look at the number of entries this
offer_id has in pkk_purchase table vs offer_id = 795:

pkk=# select offer_id, count(*) from pkk_purchase where offer_id in ( 795, 2312
) group by offer_id ;
 offer_id | count 
--+---
  795 | 4
 2312 |  1015
(2 rows)

Time: 21.118 ms


--patrick




--- Tom Lane <[EMAIL PROTECTED]> wrote:

> patrick ~ <[EMAIL PROTECTED]> writes:
> > 1. Is this really the only solution left for me?
> 
> You still haven't followed the suggestions that were given to you
> (ie, find out what is happening with the plan for the query inside
> the problematic function).
> 
>   regards, tom lane



__ 
Do you Yahoo!? 
Check out the new Yahoo! Front Page. 
www.yahoo.com 
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] vacuum analyze slows sql query

2004-11-05 Thread Tom Lane
patrick ~ <[EMAIL PROTECTED]> writes:
>  PREPARE pkk_00 ( integer ) 

Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Allen Landsidel
On Fri, 05 Nov 2004 23:04:23 -0500, Tom Lane <[EMAIL PROTECTED]> wrote:
> Matt Clark <[EMAIL PROTECTED]> writes:
> > Well, 74000/7600 ~= 0.1%, way less than 1/26, so no surprise that an
> > indexscan is better, and also no surprise that the planner can't know
> > that I is such an uncommon initial char.
> 
> But it *can* know that, at least given adequate ANALYZE statistics.
> I'm pretty convinced that the basic answer to Allen's problem is to
> increase the histogram size.  How large he needs to make it is not
> clear --- obviously his data distribution is not uniform, but I don't
> have a fix on how badly non-uniform.
> 

Tom just an update, it's now 2am.. several hours since I started that
EXPLAIN ANALYZE and it still hasn't finished, so I've aborted it.  I
will do the example with the more precise substring instead to
illustrate the performance differences, both with and without the
substring index and report back here.

I'm also interested in something someone else posted, namely that the
36 indexes I have, "A%" through "Z%" and "0%" through "9%" could be
replaced with a single index like:

"CREATE INDEX idx_table_field_substr ON table substr(field, 1, 1);"

I'm wondering, histogram and other information aside, will this
function as well (or better) than creating all the individual indexes?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings