Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread Craig Ringer
Luigi N. Puleio wrote:

 SELECT 
  (a.column1)::date, MIN(b.column2) - a.column2
 FROM 
  table a
  inner join table b 
  on ((a.column1)::date = (b.column1)::date amd
 b.column3 = 'b' and (b.column1)::time without time
 zone = (a.column1)::time without time zone)
 WHERE 
 (a.column1)::date = '2008-04-09'
 a.column3 = 'a'
 GROUP BY a.column1
 
 and with this I have to obtain like 3-4 records from
 all those whole 50 records and with the explain
 analyze I get almost 6 seconds:
 
 Nested Loop (cost=0.00...52140.83 rows=1 width=34)
 (actual time=4311.756...5951.271 rows=1 loops=1)

With all that casting, is it possible that appropriate indexes aren't
being used because your WHERE / ON clauses aren't an exact type match
for the index?

Can you post the full EXPLAIN ANALYZE from the query? This snippet
doesn't even show how records are being looked up.

What about a \d of the table from psql, or at least a summary of the
involved column data types and associated indexes?

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread Pavan Deolasee
On Wed, Apr 9, 2008 at 3:21 PM, Luigi N. Puleio [EMAIL PROTECTED] wrote:
 Hello everyone!!

  I have a table with 17 columns and it has almost
  53 records and doing just a

  SELECT * FROM table

  with the EXPLAIN ANALYZE I get:

  Seq Scan on table (cost=0.00...19452.95 rows=529395
  width=170) (actual time=0.155...2194.294 rows=529395
  loops=1)
  total runtime=3679.039 ms

  and this table has a PK...
  Do you think is too much time for a simple select?...


Well, PK won't help you here because you are selecting all rows
from the table and that seq scan is the right thing for that.
Without knowing your hardware its difficult to judge if
the time taken is more or not. Anyways, I don't think there is much
tweaking you can do for such a query except making sure that
your table is not bloated with dead tuples.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread Matthew

On Wed, 9 Apr 2008, Pavan Deolasee wrote:

 I have a table with 17 columns and it has almost
 53 records and doing just a

 SELECT * FROM table



Well, PK won't help you here because you are selecting all rows
from the table and that seq scan is the right thing for that.


Yes. Like he said. Basically, you're asking the database to fetch all half 
a million rows. That's going to take some time, whatever hardware you 
have. The PK is completely irrelevant, because the query doesn't refer to 
it at all. To be honest, three seconds sounds pretty reasonable for that 
sort of query.


Matthew

--
There once was a limerick .sig
that really was not very big
It was going quite fine
Till it reached the fourth line

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread Luigi N. Puleio
 SELECT 
  (a.column1)::date, MIN(b.column2) - a.column2
 FROM 
  table a
  inner join table b 
  on ((a.column1)::date = (b.column1)::date amd
 b.column3 = 'b' and (b.column1)::time without time
 zone = (a.column1)::time without time zone)
 WHERE 
 (a.column1)::date = '2008-04-09'
 a.column3 = 'a'
 GROUP BY a.column1
 
 and with this I have to obtain like 3-4 records from
 all those whole 50 records and with the explain
 analyze I get almost 6 seconds:
 
 Nested Loop (cost=0.00...52140.83 rows=1 width=34)
 (actual time=4311.756...5951.271 rows=1 loops=1)

 With all that casting, is it possible that appropriate indexes aren't
 being used because your WHERE / ON clauses aren't an exact type match
 for the index?

You mean to put an index on date with timestamptz datatype column?...

 Can you post the full EXPLAIN ANALYZE from the query? This snippet
 doesn't even show how records are being looked up.

HashAggregate (cost=52236.31..52236.33 rows=1 width=34) (actual 
time=7004.779...7004.782 rows=1 loops=1)
  - Nested Loop (cost=0.00..52236.30 rows=1 width=34) (actual 
time=3939.450..7004.592 rows=1 loops=1)
   Join filter: ((inner.calldate)::time without time zone = 
(outer.calldate)::time without time zone)
 - Seq Scan on table a (cost=0.00..27444.03 rows=1 width=26) (actual 
time=2479.199..2485.266 rows=3 loops=1) 
  Filter: (((calldate)::date = '2008-04-09'::date) AND ((src)::text = 
'410'::text) AND (substr((dst)::text, 1, 4)='*100'::text) AND ((lastdata)::text 
='/dati/ita/loginok'::text)) 
  -Seq Scan on table b (cost=0.00..24792.22 rows=3 width=16) (actual 
time=1504.508..1506.374 rows=1 loops=3)
  Filter: lastdata)::text ='/dati/ita/logoutok'::text) AND 
('410'::text=(src)::text) AND ('2008-04-09'::date = (calldate)::date))
Total runtime: 7005.706 ms

 What about a \d of the table from psql, or at least a summary of the
 involved column data types and associated indexes?

this table has an acctid column which is PK then most of the other columns are 
varchar(80) or so

So for 4 records result, 7 seconds are too way a lot I guess... but as I said 
before I'm gonna wait if the responsible of the server did a VACUUM on the 
table...

What do you think?...


Thanks again to all.
Ciao,
Luigi



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread Craig Ringer
Luigi N. Puleio wrote:

 With all that casting, is it possible that appropriate indexes aren't
 being used because your WHERE / ON clauses aren't an exact type match
 for the index?
 
 You mean to put an index on date with timestamptz datatype column?...

Er ... I'm not quite sure what you mean. Do you mean an index on a cast
of the column, eg:

CREATE INDEX some_idx_name ON some_table ( some_timestamp_field::date )

then ... maybe. It's hard to be sure when there is so little information
available. It shouldn't be necessary, but there are certainly uses for
that sort of thing - for example, I use a couple of functional indexes
in the schema I'm working on at the moment. It's probably a good idea to
look at ways to avoid doing that first, though.

 Can you post the full EXPLAIN ANALYZE from the query? This snippet
 doesn't even show how records are being looked up.
 
 HashAggregate (cost=52236.31..52236.33 rows=1 width=34) (actual 
 time=7004.779...7004.782 rows=1 loops=1)
   - Nested Loop (cost=0.00..52236.30 rows=1 width=34) (actual 
 time=3939.450..7004.592 rows=1 loops=1)
Join filter: ((inner.calldate)::time without time zone = 
 (outer.calldate)::time without time zone)
  - Seq Scan on table a (cost=0.00..27444.03 rows=1 width=26) (actual 
 time=2479.199..2485.266 rows=3 loops=1) 
   Filter: (((calldate)::date = '2008-04-09'::date) AND ((src)::text = 
 '410'::text) AND (substr((dst)::text, 1, 4)='*100'::text) AND 
 ((lastdata)::text ='/dati/ita/loginok'::text)) 
   -Seq Scan on table b (cost=0.00..24792.22 rows=3 width=16) (actual 
 time=1504.508..1506.374 rows=1 loops=3)
   Filter: lastdata)::text ='/dati/ita/logoutok'::text) AND 
 ('410'::text=(src)::text) AND ('2008-04-09'::date = (calldate)::date))
 Total runtime: 7005.706 ms

Personally, I'd want to get rid of all those casts first. Once that's
cleaned up I'd want to look at creating appropriate indexes on your
tables. If necessary, I might even create a composite index on
(lastdata,src,calldate) .

 What about a \d of the table from psql, or at least a summary of the
 involved column data types and associated indexes?
 
 this table has an acctid column which is PK then most of the other columns 
 are varchar(80) or so

Do you mean that the columns involved in your WHERE and ON clauses, the
ones you're casting to date, timestamp, etc, are stored as VARCHAR? If
so, it's no surprise that the query is slow because you're forcing
PostgreSQL to convert a string to a date, timestamp, or time datatype to
do anything with it ... and you're doing it many times in every query.
That will be VERY slow, and prevent the use of (simple) indexes on those
columns.

If you're really storing dates/times as VARCHAR, you should probably
look at some changes to your database design, starting with the use of
appropriate data types.

That's all guesswork, because you have not provided enough information.

Can you please post the output of psql's \d command on the table in
question?

If for some reason you cannot do that, please at least include the data
type of the primary key and all fields involved in the query, as well as
a list of all the indexes on both tables.

The easy way to do that is to just launch psql then run:

\d table

and paste the output to an email.

 So for 4 records result, 7 seconds are too way a lot I guess... but as I said 
 before I'm gonna wait if the responsible of the server did a VACUUM on the 
 table...

 What do you think?...

If you're really casting VARCHAR to DATE, TIME, TIMESTAMP, etc on demand
then personally I really doubt that dead rows are your problem.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread Luigi N. Puleio
 With all that casting, is it possible that appropriate indexes aren't
 being used because your WHERE / ON clauses aren't an exact type match
 for the index?
 
 You mean to put an index on date with timestamptz datatype column?...

 Er ... I'm not quite sure what you mean. Do you mean an index on a cast
 of the column, eg:

 CREATE INDEX some_idx_name ON some_table ( some_timestamp_field::date )

 then ... maybe. It's hard to be sure when there is so little information
 available. It shouldn't be necessary, but there are certainly uses for
 that sort of thing - for example, I use a couple of functional indexes
 in the schema I'm working on at the moment. It's probably a good idea to
 look at ways to avoid doing that first, though.

 Can you post the full EXPLAIN ANALYZE from the query? This snippet
 doesn't even show how records are being looked up.
 
 HashAggregate (cost=52236.31..52236.33 rows=1 width=34) (actual 
 time=7004.779...7004.782 rows=1 loops=1)
   - Nested Loop (cost=0.00..52236.30 rows=1 width=34) (actual 
 time=3939.450..7004.592 rows=1 loops=1)
Join filter: ((inner.calldate)::time without time zone = 
 (outer.calldate)::time without time zone)
  - Seq Scan on table a (cost=0.00..27444.03 rows=1 width=26) (actual 
 time=2479.199..2485.266 rows=3 loops=1) 
   Filter: (((calldate)::date = '2008-04-09'::date) AND ((src)::text = 
 '410'::text) AND (substr((dst)::text, 1, 4)='*100'::text) AND 
 ((lastdata)::text ='/dati/ita/loginok'::text)) 
   -Seq Scan on table b (cost=0.00..24792.22 rows=3 width=16) (actual 
 time=1504.508..1506.374 rows=1 loops=3)
   Filter: lastdata)::text ='/dati/ita/logoutok'::text) AND 
 ('410'::text=(src)::text) AND ('2008-04-09'::date = (calldate)::date))
 Total runtime: 7005.706 ms

 Personally, I'd want to get rid of all those casts first. Once that's
 cleaned up I'd want to look at creating appropriate indexes on your
 tables. If necessary, I might even create a composite index on
 (lastdata,src,calldate) .

 What about a \d of the table from psql, or at least a summary of the
 involved column data types and associated indexes?
 
 this table has an acctid column which is PK then most of the other columns 
 are varchar(80) or so

 Do you mean that the columns involved in your WHERE and ON clauses, the
 ones you're casting to date, timestamp, etc, are stored as VARCHAR? If
 so, it's no surprise that the query is slow because you're forcing
 PostgreSQL to convert a string to a date, timestamp, or time datatype to
 do anything with it ... and you're doing it many times in every query.
 That will be VERY slow, and prevent the use of (simple) indexes on those
 columns.

 If you're really storing dates/times as VARCHAR, you should probably
 look at some changes to your database design, starting with the use of
 appropriate data types.

 That's all guesswork, because you have not provided enough information.

 Can you please post the output of psql's \d command on the table in
 question?

 If for some reason you cannot do that, please at least include the data
 type of the primary key and all fields involved in the query, as well as
 a list of all the indexes on both tables.

 The easy way to do that is to just launch psql then run:

 \d table

 and paste the output to an email.

 So for 4 records result, 7 seconds are too way a lot I guess... but as I said 
 before I'm gonna wait if  the responsible of the server did a VACUUM on the 
 table...

 What do you think?...

 If you're really casting VARCHAR to DATE, TIME, TIMESTAMP, etc on demand
 then personally I really doubt that dead rows are your problem.


Well, this table has a primary key index on first column called acctid which is 
an integer; instead the calldate column is a TIMESTAMPTZ and in fact I'm using 
to do (calldate)::date in the ON clause because since the time part of that 
column is always different and in the nesting I have to identificate the date 
is the same...

the other two columns (src and lastdata) are both VARCHAR(80) and the query is 
this one:

EXPLAIN ANALYZE
SELECT
  (a.calldate)::date,
  a.src,
  a.dst,
  MIN(e.calldate) - a.calldate
FROM
cdr a
INNER JOIN cdr e
ON ((e.calldate)::date = (a.calldate)::date AND e.src = a.src
AND e.lastdata = '/dati/ita/logoutok' AND e.calldate = a.calldate)
WHERE
 (a.calldate)::date = '2008-04-09'
 AND a.src = '410'
 AND substr(a.dst, 1, 4) = '*100'
 AND a.lastdata = '/dati/ita/loginok'
GROUP BY
  a.calldate, a.src, a.dst





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread Craig Ringer

Luigi N. Puleio wrote:

If for some reason you cannot do that, please at least include the data
type of the primary key and all fields involved in the query, as well as
a list of all the indexes on both tables.
If you won't show people on the list your table definitions, or at least 
the information shown above, then it's less likely that anybody can help 
you or will spend the time trying to help you.


Personally I think you may need some functional/cast, and possibly 
composite, indexes to avoid the looping sequential scan as I said 
before. However, that's guesswork without some more information as 
repeatedly stated and requested. I'm not going to bother replying to any 
further mail just to say so again.



Try reading the documentation chapter about indexes:

http://www.postgresql.org/docs/current/static/indexes.html

and about query optimisation:

http://www.postgresql.org/docs/current/static/performance-tips.html

then experiment with various indexes to see what works best. Think about 
the data types. Remember that you can build an index on a cast of a 
field, on multiple fields, on function calls, or basically any other 
simple expression or expressions, but that complex indexes will cost 
more to build and maintain and might be bigger (and thus slower to search).



Anyway, I'm done.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread PFC


Well, this table has a primary key index on first column called acctid  
which is an integer; instead the calldate column is a TIMESTAMPTZ and in  
fact I'm using to do (calldate)::date in the ON clause because since the  
time part of that column is always different and in the nesting I have  
to identificate the date is the same...


the other two columns (src and lastdata) are both VARCHAR(80) and the  
query is this one:


Tip for getting answers from this list :
	You should just post the output of \d yourtable from psql, it would be  
quicker than writing a paragraph... Be lazy, lol.


	So, basically if I understand you are doing a self-join on your table,  
you want all rows from the same day, and you're doing something with the  
dates, and...


Tip for getting answers from this list :
	Explain (in english) what your query actually does, someone might come up  
with a better idea on HOW to do it.


Snip :


EXPLAIN ANALYZE
SELECT
  (a.calldate)::date,
  a.src,
  a.dst,
  MIN(e.calldate) - a.calldate
FROM
cdr a
INNER JOIN cdr e
ON ((e.calldate)::date = (a.calldate)::date AND e.src = a.src
AND e.lastdata = '/dati/ita/logoutok' AND e.calldate =  
a.calldate)

WHERE
 (a.calldate)::date = '2008-04-09'
 AND a.src = '410'
 AND substr(a.dst, 1, 4) = '*100'
 AND a.lastdata = '/dati/ita/loginok'
GROUP BY
  a.calldate, a.src, a.dst


OK, I assume you have an index on calldate, which is a TIMESTAMPTZ ?
(in that case, why is it called calldate, and not calltimestamp ?...)

Bad news, the index is useless for this condition :
(a.calldate)::date = '2008-04-09'
	There, you are asking postgres to scan the entire table, convert the  
column to date, and test. Bad.


In order to use the index, you could rewrite it as something like :
	a.calldate = '2008-04-09' AND a.calldate  ('2008-04-09'::DATE + '1  
DAY'::INTERVAL)

This is a RANGE query (just like BETWEEN) which is index-friendly.

	Personnaly, I wouldn't do it that way : since you use the date (and not  
the time, I presume you only use the time for display purposes) I would  
just store the timestamptz in calltimestamp and the date in calldate,  
with a trigger to ensure the date is set to calltimestamp::date every time  
a row is inserted/updated.
	This is better than a function index since you use that column a lot in  
your query, it will be slightly faster, and it will save a lot of  
timestamptz-date casts hence it will save CPU cycles


	Try this last option (separate date column), and repost EXPLAIN ANALYZE  
of your query so it can be optimized further.


	Also, PLEASE don't use substr(), use a.dst LIKE '*100%', look in the  
manual. LIKE 'foo%' is indexable if you create the proper index.









--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] EXPLAIN detail

2008-04-09 Thread Richard Broersma
On Wed, Apr 9, 2008 at 11:41 AM, PFC [EMAIL PROTECTED] wrote:
In order to use the index, you could rewrite it as something like :
a.calldate = '2008-04-09' AND a.calldate  ('2008-04-09'::DATE + '1
 DAY'::INTERVAL)
This is a RANGE query (just like BETWEEN) which is index-friendly.

Another option would be to create a functional index on date_trunc(
'day', cdr.calldate)

then using a where condition like:

date_trunc(a.calldate) = '2008-04-09'

would definitely use an index.


-- 
Regards,
Richard Broersma Jr.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance