Re: [PERFORM] Tuning, configuration for 7.3.5 on a Sun E4500

2005-03-08 Thread tsarevich
Analyze has been run on the database quite frequently during the
course of us trying to figure out this performance issue.  It is also
a task that is crontabbed nightly.


On Mon, 7 Mar 2005 09:31:06 -0800, Josh Berkus  wrote:
> Tsarevich,
> 
> > When running queries we are experiencing much bigger result times than
> > anticipated.
> >
> > Attached is a copy of our postgresql.conf file and of our the table
> > definitions and row counts.
> 
> Looks like you haven't run ANALYZE on the database anytime recently.  Try that
> and re-run.
> 
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [pgsql-hackers-win32] [PERFORM] Help with tuning this query (with

2005-03-08 Thread Magnus Hagander
> > RDTSC is a bad source of information for this kind of thing, as the 
> > CPU frequency might vary.
> 
> One thought that was bothering me was that if the CPU goes 
> idle while waiting for disk I/O, its clock might stop or slow 
> down dramatically.
> If we believed such a counter for EXPLAIN, we'd severely 
> understate the cost of disk I/O.
> 
> I dunno if that is the case on any Windows hardware or not, 
> but none of this thread is making me feel confident that we 
> know what QueryPerformanceCounter does measure.

I'm "reasonaly confident" that QPC will measure actual wallclock time as
passed, using a chip that is external to the CPU. (Don't ask me which
chip :P).

The docs specifically say: "Note that the frequency of the
high-resolution performance counter is not the processor speed." 

It also indicates that it is possible for hardware not to support it, in
which case the frequency will be reported as zero. I don't know any
remotely modern wintel system that doesn't, though - it seems this may
be referring to the old MIPS port of NT that didn't have it.

I also find:
"Depending on the processor and exact version of NT you're using, on an
Intel you get either the Time Stamp Counter, or the 1.1... MHz timer
built into the motherboard."


So I think we're perfectly safe relying on it. And certainly not alone
in doing so :-)

//Magnus

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] bad plan

2005-03-08 Thread Gaetano Mendola
I posted this on hackers, but I had to post it here.

===
Hi all,
running a 7.4.5 engine, I'm facing this bad plan:


empdb=# explain analyze SELECT 
name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
empdb-#FROM v_sc_user_request
empdb-#WHERE
empdb-# login = 'babinow1'
empdb-#LIMIT 10 ;

QUERY PLAN
--
 Limit  (cost=1716.38..1716.39 rows=1 width=232) (actual 
time=52847.239..52847.322 rows=10 loops=1)
   ->  Subquery Scan v_sc_user_request  (cost=1716.38..1716.39 rows=1 
width=232) (actual time=52847.234..52847.301 rows=10 loops=1)
 ->  Sort  (cost=1716.38..1716.39 rows=1 width=201) (actual 
time=52847.219..52847.227 rows=10 loops=1)
   Sort Key: sr.id_sat_request
   ->  Nested Loop Left Join  (cost=1478.82..1716.37 rows=1 
width=201) (actual time=3254.483..52847.064 rows=31 loops=1)
 Join Filter: ("outer".id_package = "inner".id_package)
 ->  Nested Loop  (cost=493.09..691.55 rows=1 width=193) 
(actual time=347.665..940.582 rows=31 loops=1)
   ->  Nested Loop  (cost=493.09..688.49 rows=1 
width=40) (actual time=331.446..505.628 rows=31 loops=1)
 Join Filter: ("inner".id_user = 
"outer".id_user)
 ->  Index Scan using user_login_login_key on 
user_login ul  (cost=0.00..4.00 rows=2 width=16) (actual time=12.065..12.071 
rows=1 loops=1)
   Index Cond: ((login)::text = 
'babinow1'::text)
 ->  Materialize  (cost=493.09..531.37 
rows=7656 width=28) (actual time=167.654..481.813 rows=8363 loops=1)
   ->  Seq Scan on sat_request sr  
(cost=0.00..493.09 rows=7656 width=28) (actual time=167.644..467.344 rows=8363 
loops=1)
 Filter: (request_time > (now() - 
'1 mon'::interval))
   ->  Index Scan using url_pkey on url u  
(cost=0.00..3.05 rows=1 width=161) (actual time=13.994..14.000 rows=1 loops=31)
 Index Cond: ("outer".id_url = u.id_url)
 ->  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 
width=12) (actual time=25.328..1668.754 rows=493 loops=31)
   ->  Merge Join  (cost=985.73..1011.01 rows=1103 
width=130) (actual time=25.321..1666.666 rows=493 loops=31)
 Merge Cond: ("outer".id_program = 
"inner".id_program)
 ->  Sort  (cost=20.74..20.97 rows=93 width=19) 
(actual time=0.385..0.431 rows=47 loops=31)
   Sort Key: programs.id_program
   ->  Seq Scan on programs  
(cost=0.00..17.70 rows=93 width=19) (actual time=0.022..11.709 rows=48 loops=1)
 Filter: (id_program <> 0)
 ->  Sort  (cost=964.99..967.75 rows=1102 
width=115) (actual time=14.592..15.218 rows=493 loops=31)
   Sort Key: sequences.id_program
   ->  Merge Join  (cost=696.16..909.31 
rows=1102 width=115) (actual time=79.717..451.495 rows=493 loops=1)
 Merge Cond: ("outer".id_package = 
"inner".id_package)
 ->  Merge Left Join  
(cost=0.00..186.59 rows=1229 width=103) (actual time=0.101..366.854 rows=1247 
loops=1)
   Merge Cond: 
("outer".id_package = "inner".id_package)
   ->  Index Scan using 
packages_pkey on packages p  (cost=0.00..131.04 rows=1229 width=103) (actual 
time=0.048..163.503 rows=1247 loops=1)
   ->  Index Scan using 
package_security_id_package_key on package_security ps  (cost=0.00..46.83 
rows=855 width=4) (actual time=0.022..178.599 rows=879 loops=1)
 ->  Sort  (cost=696.16..705.69 
rows=3812 width=16) (actual time=79.582..79.968 rows=493 loops=1)
   Sort Key: 
sequences.id_package
   ->  Seq Scan on sequences  
(cost=0.00..469.42 rows=3812 width=16) (actual time=0.012..78.863 rows=493 
loops=1)
   

Re: [PERFORM] bad plan

2005-03-08 Thread Richard Huxton
Gaetano Mendola wrote:
running a 7.4.5 engine, I'm facing this bad plan:
empdb=# explain analyze SELECT 
name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
empdb-#FROM v_sc_user_request
empdb-#WHERE
empdb-# login = 'babinow1'
empdb-#LIMIT 10 ;

 ->  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 
width=12) (actual time=25.328..1668.754 rows=493 loops=31)
   ->  Merge Join  (cost=985.73..1011.01 rows=1103 
width=130) (actual time=25.321..1666.666 rows=493 loops=31)
 Merge Cond: ("outer".id_program = 
"inner".id_program)
The problem to address is in this subquery. That's a total of 31 x 
(1668.754 - 25.328) = 50seconds (about).

Since your query is so simple, I'm guessing v_sc_user_request is a view. 
Can you provide the definition?
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] pl/pgsql faster than raw SQL?

2005-03-08 Thread Markus Bertheau ☭
Hi, I have the following strange situation:

oocms=# vacuum full analyze;
VACUUM
oocms=# \df+ class_get_number_of_objects

   ÐÐÐÑÐÐ ÑÑÐÐÑÐÐ
 ÐÑÐÐÐ | ÐÐÑ | ÐÐÐ ÑÑ 
ÑÐÐÑÐÑÑÐÑÐ | ÐÐÐÑ ÑÑ ÐÑÐÑÐÐÐÑÐÐ | 
ÐÐÐÑ |  ÐÐÑÐ   | ÐÑÑÐÐÐÑÐ ÑÐÐÑÑ |  
 ÐÐÐÑ
---+-+---++--+-++---
 oocms | class_get_number_of_objects | integer   | text 
  | oocms| plpgsql |
DECLARE
arg_class_name ALIAS FOR $1;
BEGIN
IF arg_class_name IS NULL THEN
RAISE WARNING 'class_get_number_of_objects() with NULL class 
name called';
RETURN NULL;
END IF;
RETURN
count(1)
FROM
objects
WHERE
class = arg_class_name;
END;
   | Return the number of existing or deleted objects of a class. 
Arguments: the name of the class
(1 ÑÑ)

oocms=# explain analyze select count(1) from objects where class = 'Picture';
   QUERY PLAN

 Aggregate  (cost=278.16..278.16 rows=1 width=0) (actual time=44.121..44.123 
rows=1 loops=1)
   ->  Seq Scan on objects  (cost=0.00..267.65 rows=4205 width=0) (actual 
time=0.030..33.325 rows=4308 loops=1)
 Filter: ("class" = 'Picture'::text)
 Total runtime: 44.211 ms
(ÑÐÐ: 4)

oocms=# explain analyze select class_get_number_of_objects('Picture');
  QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=27.019..27.022 rows=1 
loops=1)
 Total runtime: 27.062 ms
(ÑÐÐ: 2)


I.e. a function takes 27 ms to do what takes an equivalent piece of sql
43 ms. How can this be explained?

Some more info:

oocms=# select class_get_number_of_objects('Picture');
 class_get_number_of_objects
-
4308
(1 ÑÑ)

oocms=# select count(1) from objects;
 count
---
 13332
(1 ÑÑ)

oocms=# \d objects
   ÐÑÐ "oocms.objects"
  ÐÐÐ  |   ÐÐÐ| 
ÑÐÐÐÑÐÑÑ
---+--+---
 object_id | integer  | not null default 
nextval('oocms.objects_object_id_seq'::text)
 class | text | not null
 created   | timestamp with time zone | not null default 
('now'::text)::timestamp(6) with time zone
ÐÑÑ:
"objects_pkey" PRIMARY KEY, btree (object_id)
"fo" btree ("class")
ÐÐÑÐÐÐÑÐÐÐÑ ÐÐ ÐÐÐÑÐÐÐÑ ÐÐÑÑÑ:
"objects_class_fkey" FOREIGN KEY ("class") REFERENCES classes(name) ON 
UPDATE CASCADE


-- 
Markus Bertheau â <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] bad plan

2005-03-08 Thread Gaetano Mendola
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Richard Huxton wrote:
> Gaetano Mendola wrote:
> 
>> running a 7.4.5 engine, I'm facing this bad plan:
>>
>> empdb=# explain analyze SELECT
>> name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
>>
>> empdb-#FROM v_sc_user_request
>> empdb-#WHERE
>> empdb-# login = 'babinow1'
>> empdb-#LIMIT 10 ;
> 
> 
>>  ->  Subquery Scan vsp  (cost=985.73..1016.53
>> rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31)
>>->  Merge Join  (cost=985.73..1011.01
>> rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31)
>>  Merge Cond: ("outer".id_program =
>> "inner".id_program)
> 
> 
> The problem to address is in this subquery. That's a total of 31 x
> (1668.754 - 25.328) = 50seconds (about).
> 
> Since your query is so simple, I'm guessing v_sc_user_request is a view.
> Can you provide the definition?

Of course:



CREATE OR REPLACE VIEW v_sc_user_request AS
  SELECT
  *
  FROM
  v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package )
  WHERE
  vsr.request_time > now() - '1 month'::interval AND
  vsr.expired = FALSE
  ORDER BY id_sat_request DESC
;


CREATE OR REPLACE VIEW v_sc_packages AS
  SELECT
 *
  FROM
  v_programs   vpr,
  v_packages   vpk,
  v_sequences  vs

  WHERE
  JOIN -
  vpr.id_program = vs.id_program AND
  vpk.id_package = vs.id_package AND
 ---
  vs.estimated_start IS NOT NULL
;

CREATE OR REPLACE VIEW v_sat_request AS
  SELECT
 *
  FROM
 sat_request sr,
 url u,
 user_login  ul
  WHERE
  JOIN -
 sr.id_url  = u.id_url AND
 sr.id_user = ul.id_user
 ---
;


that column expired was added since yesterday

Regards
Gaetano Mendola









-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCLZkD7UpzwH2SGd4RAv8/AKCA5cNfu6vEKZ6m/ke1JsVRdsOTXQCbBMt4
ZPTFjwyb52CrFxdUTD6gejs=
=STzz
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] pl/pgsql faster than raw SQL?

2005-03-08 Thread Richard Huxton
Markus Bertheau â wrote:
oocms=# explain analyze select count(1) from objects where class = 'Picture';
   QUERY PLAN

 Aggregate  (cost=278.16..278.16 rows=1 width=0) (actual time=44.121..44.123 
rows=1 loops=1)
   ->  Seq Scan on objects  (cost=0.00..267.65 rows=4205 width=0) (actual 
time=0.030..33.325 rows=4308 loops=1)
 Filter: ("class" = 'Picture'::text)
 Total runtime: 44.211 ms
(ÑÐÐ: 4)
oocms=# explain analyze select class_get_number_of_objects('Picture');
  QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=27.019..27.022 rows=1 
loops=1)
 Total runtime: 27.062 ms
Well, you're saving planning time with the plpgsql version, but that's 
not going to come to 17ms (you'd hope). The EXPLAIN will take up time 
itself, and it can look deeper into the SQL version. Try timing two 
scripts with 100 of each and see if they really differ by that much.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] bad plan

2005-03-08 Thread Richard Huxton
Gaetano Mendola wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Richard Huxton wrote:
Gaetano Mendola wrote:

running a 7.4.5 engine, I'm facing this bad plan:
empdb=# explain analyze SELECT
name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
empdb-#FROM v_sc_user_request
empdb-#WHERE
empdb-# login = 'babinow1'
empdb-#LIMIT 10 ;

->  Subquery Scan vsp  (cost=985.73..1016.53
rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31)
  ->  Merge Join  (cost=985.73..1011.01
rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31)
Merge Cond: ("outer".id_program =
"inner".id_program)

The problem to address is in this subquery. That's a total of 31 x
(1668.754 - 25.328) = 50seconds (about).
Since your query is so simple, I'm guessing v_sc_user_request is a view.
Can you provide the definition?

Of course:

CREATE OR REPLACE VIEW v_sc_user_request AS
  SELECT
  *
  FROM
  v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package )
  WHERE
  vsr.request_time > now() - '1 month'::interval AND
  vsr.expired = FALSE
  ORDER BY id_sat_request DESC
;
CREATE OR REPLACE VIEW v_sc_packages AS
  SELECT
 *
  FROM
  v_programs   vpr,
  v_packages   vpk,
  v_sequences  vs
  WHERE
  JOIN -
  vpr.id_program = vs.id_program AND
  vpk.id_package = vs.id_package AND
 ---
  vs.estimated_start IS NOT NULL
;
CREATE OR REPLACE VIEW v_sat_request AS
  SELECT
 *
  FROM
 sat_request sr,
 url u,
 user_login  ul
  WHERE
  JOIN -
 sr.id_url  = u.id_url AND
 sr.id_user = ul.id_user
 ---
;
OK, so looking at the original EXPLAIN the order of processing seems to be:
1. v_sat_request is evaluated and filtered on login='...' (lines 7..15)
This gives us 31 rows
2. The left-join from v_sat_request to v_sc_packages is processed (lines 
5..6)
This involves the subquery scan on vsp (from line 16) where it seems to 
think the best idea is a merge join of programs to sequences.

So - I think we need to look at the performance of your view 
"v_sc_packages" and the views that it depends on. OK - can you reply to 
this with just the definitions of v_sc_packages and what it depends on, 
and we can have a look at that.

Do you need all these tables involved in this query? I don't think PG is 
smart enough to completely discard a join if it's not needed by the 
output. Thinking about it, I'm not sure you could safely.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] pl/pgsql faster than raw SQL?

2005-03-08 Thread John A Meinel
Markus Bertheau â wrote:
Hi, I have the following strange situation:
 

...
oocms=# explain analyze select count(1) from objects where class = 'Picture';
  QUERY PLAN

Aggregate  (cost=278.16..278.16 rows=1 width=0) (actual time=44.121..44.123 
rows=1 loops=1)
  ->  Seq Scan on objects  (cost=0.00..267.65 rows=4205 width=0) (actual 
time=0.030..33.325 rows=4308 loops=1)
Filter: ("class" = 'Picture'::text)
Total runtime: 44.211 ms
(ÑÐÐ: 4)
oocms=# explain analyze select class_get_number_of_objects('Picture');
 QUERY PLAN
--
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=27.019..27.022 rows=1 
loops=1)
Total runtime: 27.062 ms
(ÑÐÐ: 2)
I.e. a function takes 27 ms to do what takes an equivalent piece of sql
43 ms. How can this be explained?
Some more info:
 

In explain analyze, there is a per-row overhead of 2 gettimeofday() 
calls. This is usually very low and hidden in I/O, but on queries where 
you go through a lot of rows,  but things are cached in ram, it can show up.
So the explain analyze is going deep into the SQL query.
With a stored procedure, explain analyze only runs the procedure, it 
doesn't instrument the actual function. So you don't have that per-row 
overhead.

For an alternate accurate view. Try:
# \timing
# explain analyze select count(1) from objects where class = 'Picture';
# explain analyze select class_get_number_of_objects('Picture');
\timing will also give you the time it takes to run the query, but it 
doesn't instrument anything.

John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Tuning, configuration for 7.3.5 on a Sun E4500

2005-03-08 Thread Josh Berkus
Tsarevich,

> Analyze has been run on the database quite frequently during the
> course of us trying to figure out this performance issue.  It is also
> a task that is crontabbed nightly.

Hmmm.  Then you probably need to up the STATISTICS levels on the target 
column, because PG is mis-estimating the number of rows returned 
significantly.   That's done by:

ALTER TABLE {table} ALTER COLUMN {column} SET STATISTICS {number}

Generally, I find that if mis-estimation occurs, you need to raise statistics 
to at least 250.

Here's where I see the estimation issues with your EXPLAIN:

                                                   ->  Index Scan
using component_commercial_order_id_ix on component  (cost=0.00..3.85
rows=1 width=28) (actual time=0.17..0.18 rows=1 loops=46376)
                                                         Index Cond:
(component.commercial_order_id = "outer".commercial_order_id)
                                                         Filter:
((raised_dtm >= '2003-01-01 00:00:00'::timestamp without time zone)
AND (raised_dtm <= '2005-01-01 23:59:59'::timestamp without time zone)
AND ((component_type_id = 3) OR (component_type_id = 2) OR
(component_type_id = 1)))

                 ->  Index Scan using communication_component_id_ix on
communication  (cost=0.00..20.90 rows=16 width=8) (actual
time=0.12..0.14 rows=1 loops=34638)
                       Index Cond: (component_id = $0)

So it looks like you need to raise the stats on communication.component_id and 
component.commercial_order_id,raised_dtm,component_type_id.   You also may 
want to consider a multi-column index on the last set.

BTW, if you have any kind of data update traffic at all, ANALYZE once a day is 
not adequate.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] bad plan

2005-03-08 Thread Gaetano Mendola
Richard Huxton wrote:

> OK, so looking at the original EXPLAIN the order of processing seems to be:
> 1. v_sat_request is evaluated and filtered on login='...' (lines 7..15)
> This gives us 31 rows
> 2. The left-join from v_sat_request to v_sc_packages is processed (lines
> 5..6)
> This involves the subquery scan on vsp (from line 16) where it seems to
> think the best idea is a merge join of programs to sequences.

Whel basically v_sc_packages depends on other 3 views that are just a simple
interface to a plain table.


If I execute a select only on this table I get reasonable executions time:


=== cpu_tuple_cost = 0.07

# explain analyze select * from v_sc_packages where id_package = 19628;
  
QUERY PLAN
---
 Nested Loop  (cost=0.00..15.96 rows=1 width=131) (actual time=41.450..41.494 
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11.86 rows=1 width=116) (actual 
time=1.022..1.055 rows=1 loops=1)
 ->  Nested Loop Left Join  (cost=0.00..7.89 rows=1 width=104) (actual 
time=0.330..0.345 rows=1 loops=1)
   ->  Index Scan using packages_pkey on packages p  
(cost=0.00..3.90 rows=1 width=104) (actual time=0.070..0.075 rows=1 loops=1)
 Index Cond: (id_package = 19628)
   ->  Index Scan using package_security_id_package_key on 
package_security ps  (cost=0.00..3.91 rows=1 width=4) (actual time=0.232..0.237 
rows=1 loops=1)
 Index Cond: ("outer".id_package = ps.id_package)
 ->  Index Scan using idx_sequences_id_package on sequences  
(cost=0.00..3.90 rows=1 width=16) (actual time=0.670..0.685 rows=1 loops=1)
   Index Cond: (19628 = id_package)
   Filter: (estimated_start IS NOT NULL)
   ->  Index Scan using programs_pkey on programs  (cost=0.00..4.02 rows=1 
width=19) (actual time=0.078..0.086 rows=1 loops=1)
 Index Cond: (programs.id_program = "outer".id_program)
 Filter: (id_program <> 0)
 Total runtime: 42.650 ms
(14 rows)

=== cpu_tuple_cost = 0.01

# explain analyze select * from v_sc_packages where id_package = 19628;
   
QUERY PLAN

 Nested Loop  (cost=0.00..15.54 rows=1 width=131) (actual time=25.062..69.977 
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11.56 rows=1 width=116) (actual 
time=5.396..50.299 rows=1 loops=1)
 ->  Nested Loop Left Join  (cost=0.00..7.71 rows=1 width=104) (actual 
time=5.223..32.842 rows=1 loops=1)
   ->  Index Scan using packages_pkey on packages p  
(cost=0.00..3.84 rows=1 width=104) (actual time=0.815..7.235 rows=1 loops=1)
 Index Cond: (id_package = 19628)
   ->  Index Scan using package_security_id_package_key on 
package_security ps  (cost=0.00..3.85 rows=1 width=4) (actual 
time=4.366..25.555 rows=1 loops=1)
 Index Cond: ("outer".id_package = ps.id_package)
 ->  Index Scan using idx_sequences_id_package on sequences  
(cost=0.00..3.84 rows=1 width=16) (actual time=0.147..17.422 rows=1 loops=1)
   Index Cond: (19628 = id_package)
   Filter: (estimated_start IS NOT NULL)
   ->  Index Scan using programs_pkey on programs  (cost=0.00..3.96 rows=1 
width=19) (actual time=0.043..0.049 rows=1 loops=1)
 Index Cond: (programs.id_program = "outer".id_program)
 Filter: (id_program <> 0)
 Total runtime: 70.254 ms
(14 rows)


and I get the best with this:

=== cpu_tuple_cost = 0.001


# explain analyze select * from v_sc_packages where id_package = 19628;
   
QUERY PLAN
-
 Nested Loop  (cost=0.00..15.48 rows=1 width=131) (actual time=2.516..2.553 
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..7.78 rows=1 width=31) (actual time=1.439..1.457 
rows=1 loops=1)
 ->  Index Scan using idx_sequences_id_package on sequences  
(cost=0.00..3.83 rows=1 width=16) (actual time=0.442..0.450 rows=1 loops=1)
   Index Cond: (19628 = id_package)
   Filter: (estimated_start IS NOT NULL)
 ->  Index Scan using programs_pkey on programs  (cost=0.00..3.95 
rows=1 width=19) (actual time=0.972..0.978 rows=1 loops=1)
   Index Cond: (programs.id_program = "outer".id_program)
   Filter: (id_program <> 0)
   ->  Nested Loop Left Join  (cost=0.00..7.68 rows=1 width=104) (actual 
time=0.110..0.125 rows=1 l

[PERFORM] index scan on =, but not < ?

2005-03-08 Thread Rick Schumeyer








I have two index questions.  The first is about an
issue that has been recently discussed,

and I just wanted to be sure of my understanding. 
Functions like count(), max(), etc. will

use sequential scans instead of index scans because the
index doesn’t know which rows

are actually visible…is this correct?

 

Second:

 

I created an index in a table with over 10 million rows.

The index is on field x, which is a double.

 

The following command, as I expected, results in an index
scan:

 

=# explain select * from data where x = 0;

  
QUERY PLAN

-

 Index Scan using data_x_ix on data  (cost=0.00..78.25
rows=19 width=34)

   Index Cond: (x = 0::double precision)

(2 rows)

 

 

But this command, in which the only difference if >
instead of =, is a sequential scan.

 

=# explain select * from data where x > 0;

   
QUERY PLAN

--

 Seq Scan on data  (cost=0.00..1722605.20
rows=62350411 width=34)

   Filter: (x > 0::double precision)

(2 rows)

 

Why is this?

(This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if
it matters)








Re: [PERFORM] index scan on =, but not < ?

2005-03-08 Thread Thomas F . O'Connell
Your hypothesis about index usage of count() and max() is correct.
As for why you see index usage in your first example query and not your  
second: compare the number of rows in question. An index is extremely  
useful if 19 rows will be returned. But when 62350411 rows will be  
returned, you're talking about a substantial fraction of the table. A  
sequential scan will probably correctly be judged to be faster by the  
planner.

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Mar 8, 2005, at 12:35 PM, Rick Schumeyer wrote:
I have two index questions.  The first is about an issue that has been  
recently discussed,
and I just wanted to be sure of my understanding.  Functions like  
count(), max(), etc. will
use sequential scans instead of index scans because the index doesn’t  
know which rows
are actually visible…is this correct?

 
Second:
 
I created an index in a table with over 10 million rows.
The index is on field x, which is a double.
The following command, as I expected, results in an index scan:
=# explain select * from data where x = 0;
   QUERY PLAN
--- 
--
 Index Scan using data_x_ix on data  (cost=0.00..78.25 rows=19  
width=34)
   Index Cond: (x = 0::double precision)
(2 rows)
 

But this command, in which the only difference if > instead of =, is a  
sequential scan.

=# explain select * from data where x > 0;
    QUERY PLAN
--
 Seq Scan on data  (cost=0.00..1722605.20 rows=62350411 width=34)
   Filter: (x > 0::double precision)
(2 rows)
Why is this?
(This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if it matters)

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] index scan on =, but not < ?

2005-03-08 Thread John Arbash Meinel
Rick Schumeyer wrote:
I have two index questions. The first is about an issue that has been
recently discussed,
and I just wanted to be sure of my understanding. Functions like
count(), max(), etc. will
use sequential scans instead of index scans because the index doesn’t
know which rows
are actually visible…is this correct?
Actually, index scans are chosen whenever the cost is expected to be
cheaper than a sequential scan. This is generally about < 10% of the
total number of rows.
Second:
I created an index in a table with over 10 million rows.
The index is on field x, which is a double.
The following command, as I expected, results in an index scan:
=# explain select * from data where x = 0;
QUERY PLAN
-
Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34)
Index Cond: (x = 0::double precision)
(2 rows)
Since you have 10m rows, when it expects to get only 19 rows, it is much
faster to use an index.
But this command, in which the only difference if > instead of =, is a
sequential scan.
=# explain select * from data where x > 0;
QUERY PLAN
--
Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34)
Filter: (x > 0::double precision)
(2 rows)
Here, pg expects to find 62M rows (you must have significantly more than
10M rows). In this case a sequential scan is much faster than an indexed
one, so that's what pg does.
Why is this?
(This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if it matters)
If you think there is truly a performance problem, try attaching the
results of "explain analyze" in which we might be able to tell you that
your statistics inaccurate (run vacuum analyze if you haven't).
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] index scan on =, but not < ?

2005-03-08 Thread Rick Schumeyer
That makes a lot of sense.  Sure enough, if I change the query from 
WHERE x > 0  (which return a lot of rows) to
WHERE x > 0 AND x < 1
I now get an index scan.

> As for why you see index usage in your first example query and not your
> second: compare the number of rows in question. An index is extremely
> useful if 19 rows will be returned. But when 62350411 rows will be
> returned, you're talking about a substantial fraction of the table. A
> sequential scan will probably correctly be judged to be faster by the
> planner.
> 


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] index scan on =, but not < ?

2005-03-08 Thread Dennis Bjorklund
On Tue, 8 Mar 2005, Rick Schumeyer wrote:

> =# explain select * from data where x = 0;
> -
>  Index Scan using data_x_ix on data  (cost=0.00..78.25 rows=19 width=34)
>Index Cond: (x = 0::double precision)
>  
> But this command, in which the only difference if > instead of =, is a
> sequential scan.
> 
> =# explain select * from data where x > 0;
> --
>  Seq Scan on data  (cost=0.00..1722605.20 rows=62350411 width=34)
>Filter: (x > 0::double precision)
> 
> Why is this?

That is because it's faster to execute the x>0 query with a seq. scan then 
a index scan. Postgresql is doing the right thing here.

Pg estimates that the first query will return 19 rows and that the second 
query will return 62350411 rows. To return 62350411 rows it's faster to 
just scan the table and not use the index.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] bad plan

2005-03-08 Thread Richard Huxton
Gaetano Mendola wrote:
Richard Huxton wrote:

OK, so looking at the original EXPLAIN the order of processing seems to be:
1. v_sat_request is evaluated and filtered on login='...' (lines 7..15)
This gives us 31 rows
2. The left-join from v_sat_request to v_sc_packages is processed (lines
5..6)
This involves the subquery scan on vsp (from line 16) where it seems to
think the best idea is a merge join of programs to sequences.

Whel basically v_sc_packages depends on other 3 views that are just a simple
interface to a plain table.
If I execute a select only on this table I get reasonable executions time:
=== cpu_tuple_cost = 0.07
# explain analyze select * from v_sc_packages where id_package = 19628;
  
QUERY PLAN
---
 Nested Loop  (cost=0.00..15.96 rows=1 width=131) (actual time=41.450..41.494 
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11.86 rows=1 width=116) (actual 
time=1.022..1.055 rows=1 loops=1)
 ->  Nested Loop Left Join  (cost=0.00..7.89 rows=1 width=104) (actual 
time=0.330..0.345 rows=1 loops=1)
   ->  Index Scan using packages_pkey on packages p  
(cost=0.00..3.90 rows=1 width=104) (actual time=0.070..0.075 rows=1 loops=1)
 Index Cond: (id_package = 19628)
   ->  Index Scan using package_security_id_package_key on 
package_security ps  (cost=0.00..3.91 rows=1 width=4) (actual time=0.232..0.237 
rows=1 loops=1)
 Index Cond: ("outer".id_package = ps.id_package)
 ->  Index Scan using idx_sequences_id_package on sequences  
(cost=0.00..3.90 rows=1 width=16) (actual time=0.670..0.685 rows=1 loops=1)
   Index Cond: (19628 = id_package)
   Filter: (estimated_start IS NOT NULL)
   ->  Index Scan using programs_pkey on programs  (cost=0.00..4.02 rows=1 
width=19) (actual time=0.078..0.086 rows=1 loops=1)
 Index Cond: (programs.id_program = "outer".id_program)
 Filter: (id_program <> 0)
 Total runtime: 42.650 ms
(14 rows)

=== cpu_tuple_cost = 0.01

=== cpu_tuple_cost = 0.001
I don't know what you think you're measuring, but it's nothing to do 
with the plans. If you look at the plans carefully, you'll see they're 
all the same. The "cost" numbers change because that's the parameter 
you're changing.

I'm not sure it makes sense to vary cpu_tuple_cost from 0.07 down to 
0.001 - that's a factor of 70 difference. I might be tempted to halve or 
double it, but even then only after some serious testing.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] bad plan

2005-03-08 Thread Tom Lane
Gaetano Mendola <[EMAIL PROTECTED]> writes:
>> Since your query is so simple, I'm guessing v_sc_user_request is a view.
>> Can you provide the definition?

> Of course:

I don't think you've told us the whole truth about the v_sc_packages
view.  The definition as given doesn't work at all (it'll have
duplicate column names), but more to the point, if it were that simple
then the planner would fold it into the parent query.  The subquery
scan node indicates that folding did not occur.  The most likely reason
for that is that there's an ORDER BY in the view.

Putting ORDER BYs in views that you intend to use as components of other
views is a bad practice from a performance perspective...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?

2005-03-08 Thread Josh Berkus
PG Hackers,

What follows is iostat output from a TPC-H test on Solaris 10.The machine 
is creating indexes on a table which is 50G in size, so it needs to use 
pgsql_tmp for internal swapping:

   ttymd15  sd1   sd2   sd3cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
   0   84 22526 12111  1024   150   00  5634 3371   30  8  
0 61
   0  242 24004 13371  1024   150   00  6007 3551   33  8  
0 59
   0   85 22687 12771  1024   150   00  5656 3221   31  8  
0 62
   0   85 20876 10991  1024   290   00  5185 2921   28  7  
0 64

md15 is WAL (pg_xlog).   
sd3 is PGDATA.   
sd1 i pgsql_tmp.

As you can see, we're getting a nice 23mb/s peak for WAL (thanks to 
forcedirectio) and database writes peak at 6mb/s.  However, pgsql_tmp, which 
is being used heavily, hovers around 1mb/s, and never goes above 1.5mb/s.   
This seems to be throttling the whole system.

Any suggestions on why this should be?   Do we have a performance bug in the 
pg_tmp code?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] bad plan

2005-03-08 Thread Richard Huxton
Tom Lane wrote:
Putting ORDER BYs in views that you intend to use as components of other
views is a bad practice from a performance perspective...
There are also a lot of views involved here for very few output columns. 
Tom - is the planner smart enough to optimise-out unneeded columns from 
a SELECT * view if it's part of a join/subquery and you only use one or 
two columns?

Secondly, in the original plan we have:
->  Nested Loop Left Join  (cost=1478.82..1716.37 rows=1 width=201) 
(actual time=3254.483..52847.064 rows=31 loops=1)

Now, we've got 31 rows instead of 1 here. The one side of the join ends 
up as:
->  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 width=12) (actual 
time=25.328..1668.754 rows=493 loops=31)
->  Merge Join  (cost=985.73..1011.01 rows=1103 width=130) (actual 
time=25.321..1666.666 rows=493 loops=31)

Would I be right in thinking the planner doesn't materialise the 
subquery because it's expecting 1 loop not 31? If there were 1 row the 
plan would seem OK to me.

Is there any mileage in the idea of a "lazy" planner that keeps some 
alternative paths around in case they're needed? Or a reactive one that 
can re-plan nodes when assumptions turn out to be wrong?

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] bad plan

2005-03-08 Thread Tom Lane
Richard Huxton  writes:
> There are also a lot of views involved here for very few output columns. 
> Tom - is the planner smart enough to optimise-out unneeded columns from 
> a SELECT * view if it's part of a join/subquery and you only use one or 
> two columns?

If the view gets flattened, yes, but I believe that it's not bright
enough to do so when it can't flatten the view.  You could tell easily
enough by looking at the row-width estimates at various levels of the
plan.  (Let's see ... in Gaetano's plan the SubqueryScan is returning
12-byte rows where its input MergeJoin is returning 130-byte rows,
so sure enough the view is computing a lot of stuff that then gets
thrown away.)

> Would I be right in thinking the planner doesn't materialise the 
> subquery because it's expecting 1 loop not 31? If there were 1 row the 
> plan would seem OK to me.

Right; it doesn't see any predicted gain from the extra cost of
materializing.  But to me the main problem here is not that, it is that
the entire shape of the plan would likely be different if it weren't for
the "optimization fence" that the Subquery Scan node represents.  I
suspect too that the use of mergejoin as opposed to anything else within
the vsp subplan is driven more by the need to produce sorted output than
by what is the cheapest way to get the rows.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Query Optimization

2005-03-08 Thread James G Wilkinson
All,
I hope that this is the right place to post.  I am relatively new to
PostgreSQL (i.e., < 1 year in coding) and am just starting to
delve into the issues of query optimization.  I have hunted around
the web for the basics of query optimization, but I have not had
much success in interpreting the documents.  I have also been
trying to learn the basics of the EXPLAIN commandalso without
much success, but I will keep trying.
Anyway, here is what the system reports on the following command:
EXPLAIN SELECT a.country_code, a.state_county_fips,
   icell, jcell, a.beld3_species_id, pollutant_code,
   SUM(b.ratio * d.emissions_factor * a.percent_ag *
   e.ag_fraction * 1) as normalized_emissions
FROM "globals"."biogenic_beld3_data" a, 
"spatial"."tmpgrid" b, 
"globals"."biogenic_emissions_factors" d,
"globals"."biogenic_beld3_ag_data" e
WHERE a.beld3_icell=b.b_icell AND 
 a.beld3_jcell=b.b_jcell AND
 a.country_code=e.country_code AND
 a.state_county_fips=e.state_county_fips AND
 a.beld3_species_id=d.beld3_species_id AND
 a.ag_forest_records > 0 AND
 a.percent_ag > 0 AND d.emissions_factor > 0
GROUP BY a.country_code, a.state_county_fips, icell, jcell,
 a.beld3_species_id, pollutant_code
ORDER BY a.country_code, a.state_county_fips, icell, jcell,
 a.beld3_species_id, pollutant_code;

 QUERY 
PLAN 
---
GroupAggregate  (cost=65034.94..71110.50 rows=151889 width=73)
->Sort  (cost=65034.94..65414.66 rows=151889 width=73)
   Sort Key: a.country_code, a.state_county_fips, b.icell, b.jcell,
 a.beld3_species_id, d.pollutant_code
   ->Hash Join  (cost=33749.64..37412.88 rows=151889 width=73)
   Hash Cond: ("outer".beld3_species_id = "inner".beld3_species_id)
   ->Merge Join  (cost=33728.84..35303.61 rows=37972 width=56)
   Merge Cond: ((("outer".country_code)::text = 
"inner"."?column8?") AND
(("outer".state_county_fips)::text = 
"inner"."?column9?"))
   ->Index Scan using biogenic_beld3_ag_data_pk on 
biogenic_beld3_ag_data e 
   (cost=0.00..806.68 rows=20701 width=26)
   ->Sort  (cost=33728.84..33741.67 rows=5131 width=45)
   Sort Key: (a.country_code)::text, 
(a.state_county_fips)::text
   ->Nested Loop  (cost=0.00..33412.65 rows=5131 width=45)
   ->Seq Scan on biogenic_beld3_data a  
(cost=0.00..3593.02 rows=5637 width=37)
   Filter: ((ag_forest_records > 0) AND (percent_ag 
> 0::numeric))
   ->Index Scan using tmpgrid_pk on tmpgrid b  
(cost=0.00..5.27 rows=1 width=24)
   Index Cond: ((b.b_icell = "outer".beld3_icell) AND
(b.b_jcell = "outer".beld3_jcell))
   ->Hash (cost=18.50..18.50 rows=920 width=21)
   ->Seq Scan on biogenic_emissions_factors d  
(cost=0.00..18.50 rows=920 width=21)
   Filter: (emissions_factor > 0::numeric)
(18 rows)

Firstly, I  am frankly mystified on how to interpret all this.  If anyone
could point me to a document or two that will help me decipher this,
I will greatly appreciate it.
Secondly, I have figured out that SEQ SCANs are typically bad.  I am
concerned that a SEQ SCAN is being performed on 'biogenic_beld3_data'
which is the largest table in the query.  I would rather have a SEQ SCAN
be performed on 'tmpgrid' which contains the keys that subset the data
from 'biogenic_beld3_data.'  Is this naive on my part?
Thirdly, I have run EXPLAIN on other queries that report back a
GroupAggregate Cost= that runs in about 30 minutes
on my relatively highend linux machine.  But when I run this particular
query, it takes on the order of 90 minutes to complete.  Any thoughts
on why this happens will be appreciated.
Finally, if anyone can be so kind as to provide insight on how to better
optimize this query, I will, again, be deeply grateful.
Thanks in advance.
terrakit
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?

2005-03-08 Thread Josh Berkus
People:

> As you can see, we're getting a nice 23mb/s peak for WAL (thanks to
> forcedirectio) and database writes peak at 6mb/s.  However, pgsql_tmp,
> which is being used heavily, hovers around 1mb/s, and never goes above
> 1.5mb/s. This seems to be throttling the whole system.

Never mind, I'm a dork.I accidentally cut the "SET maintenance_work_mem = 
200" out of my config file, and it was running with the default 
1024K 

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?

2005-03-08 Thread Josh Berkus
H.

> > As you can see, we're getting a nice 23mb/s peak for WAL (thanks to
> > forcedirectio) and database writes peak at 6mb/s.  However, pgsql_tmp,
> > which is being used heavily, hovers around 1mb/s, and never goes above
> > 1.5mb/s. This seems to be throttling the whole system.
>
> Never mind, I'm a dork.I accidentally cut the "SET maintenance_work_mem
> = 200" out of my config file, and it was running with the default 1024K

Maybe I'm not an idiot (really!)  even with almost 2GB of maintenance_mem, PG 
still writes to pgsql_tmp no faster than 2MB/s.I think there may be an 
artificial bottleneck there.  Question is, PostgreSQL, OS or hardware?

Suggestions?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] bad plan

2005-03-08 Thread Gaetano Mendola
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Lane wrote:
> Gaetano Mendola <[EMAIL PROTECTED]> writes:
> 
>>>Since your query is so simple, I'm guessing v_sc_user_request is a view.
>>>Can you provide the definition?
> 
> 
>>Of course:
> 
> 
> I don't think you've told us the whole truth about the v_sc_packages
> view.  The definition as given doesn't work at all (it'll have
> duplicate column names), but more to the point, if it were that simple
> then the planner would fold it into the parent query.  The subquery
> scan node indicates that folding did not occur.  The most likely reason
> for that is that there's an ORDER BY in the view.

I didn't say the complete truth because the view definition is long so I just 
omitted
all fields.

explain analyze SELECT 
name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
FROM v_sc_user_request
WHERE login = 'babinow1'
LIMIT 10 ;

these are the complete  definitions of views involved in the query:



CREATE OR REPLACE VIEW v_sc_user_request AS
  SELECT
  vsr.id_sat_request   AS 
id_sat_request,
  vsr.id_user  AS 
id_user,
  vsr.loginAS login,
  vsr.url  AS url,
  vsr.name AS name,
  vsr.descrAS descr,
  vsr.size AS size,
  trunc(vsr.size/1024.0/1024.0,2)  AS 
size_mb,
  vsr.id_sat_request_statusAS 
id_sat_request_status,
  sp_lookup_key('sat_request_status',   vsr.id_sat_request_status) AS 
request_status,
  sp_lookup_descr('sat_request_status', vsr.id_sat_request_status) AS 
request_status_descr,
  vsr.id_url_statusAS 
id_url_status,
  sp_lookup_key('url_status',   vsr.id_url_status) AS 
url_status,
  sp_lookup_descr('url_status', vsr.id_url_status) AS 
url_status_descr,
  vsr.url_time_stamp   AS 
url_time_stamp,
  date_trunc('seconds',vsr.request_time)   AS 
request_time_stamp,
  vsr.id_package   AS 
id_package,
  COALESCE(date_trunc('seconds',vsp.estimated_start)::text,'NA')   AS 
estimated_start

  FROM
  v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package )
  WHERE
  vsr.request_time > now() - '1 month'::interval AND
  vsr.expired = FALSE
  ORDER BY id_sat_request DESC
;




CREATE OR REPLACE VIEW v_sat_request AS
  SELECT
 sr.id_user   AS id_user,
 ul.login AS login,
 sr.id_sat_requestAS id_sat_request,
 u.id_url AS id_url,
 u.urlAS url,
 u.name   AS name,
 u.descr  AS descr,
 u.size   AS size,
 u.storageAS storage,
 sr.id_packageAS id_package,
 sr.id_sat_request_status AS id_sat_request_status,
 sr.request_time  AS request_time,
 sr.work_time AS request_work_time,
 u.id_url_status  AS id_url_status,
 u.time_stamp AS url_time_stamp,
 sr.expired   AS expired
  FROM
 sat_request sr,
 url u,
 user_login  ul
  WHERE
  JOIN -
 sr.id_url  = u.id_url AND
 sr.id_user = ul.id_user
 ---
;




CREATE OR REPLACE VIEW v_sc_packages AS
  SELECT

 vpr.id_programAS id_program,
 vpr.name  AS program_name,

 vpk.id_packageAS id_package,
 date_trunc('seconds', vs.estimated_start) AS estimated_start,

 vpk.name  AS package_name,
 vpk.TYPE  AS TYPE,
 vpk.description   AS description,
 vpk.targetAS target,
 vpk.fec   AS fec_alg,
 vpk.output_group - vpk.input_groupAS fec_redundancy,
 vpk.priority  AS priority,
 vpk.updatable AS updatable,
 vpk.auto_listen   AS auto_listen,
 vpk.start_fileAS start_file,
 vpk.view_target_group AS view_target_group,
 vpk.target_group  AS target_group

  FROM
  v_programs   vpr,
  v_packages   vpk,
  v_sequences  vs

  WHERE

Re: [PERFORM] pl/pgsql faster than raw SQL?

2005-03-08 Thread Gaetano Mendola
Markus Bertheau â wrote:
> Hi, I have the following strange situation:

that is no so strange. I have an example where:

SELECT * FROM my_view WHERE field1 = 'New'; ==> 800 seconds

SELECT * FROM my_view;  ==> 2 seconds

the only solution I had was to write a function table with
the second select in a loop that was returnin the row if
the field1 was equal = 'New'.
It's strange but happen.



Regards
Gaetano Mendola









---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] vacuum full, why multiple times ?

2005-03-08 Thread Gaetano Mendola
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,
this is the third email that I post but I do not see it in archives,
the email was too long I believe so this time I will limit the rows.
Basically I'm noticing that a simple vacuum full is not enough to
shrink completelly the table:

# vacuum full verbose url;
INFO:  vacuuming "public.url"
INFO:  "url": found 268392 removable, 21286 nonremovable row versions in 8563 
pages
DETAIL:  22 dead row versions cannot be removed yet.
Nonremovable row versions range from 104 to 860 bytes long.
There were 13924 unused item pointers.
Total free space (including removable row versions) is 63818404 bytes.
4959 pages are or will become empty, including 7 at the end of the table.
8296 pages containing 63753840 free bytes are potential move destinations.
CPU 0.33s/0.12u sec elapsed 9.55 sec.

[SNIPPED]

INFO:  "url": moved 2 row versions, truncated 8563 to 8550 pages


and after 4 vacuum full:



empdb=# vacuum full verbose url;
INFO:  vacuuming "public.url"
INFO:  "url": found 13 removable, 21264 nonremovable row versions in 8504 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 104 to 860 bytes long.
There were 280528 unused item pointers.
Total free space (including removable row versions) is 63349188 bytes.
4913 pages are or will become empty, including 0 at the end of the table.
8234 pages containing 63340628 free bytes are potential move destinations.
CPU 0.17s/0.04u sec elapsed 0.49 sec.

[SNIPPED]

INFO:  "url": moved 5666 row versions, truncated 8504 to 621 pages




anyone knows why ? I had the same behaviour with a 46000 rows table with
46000 pages! It was reduced to 3000 pages after 7 vacuum full.


Regards
Gaetano Mendola


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCLksV7UpzwH2SGd4RAoz3AKDvXSx3w/jRz/NR1pgtrxIZs8cJcwCg/0xm
zSr0sPDBkp8V1WXjREoVdLk=
=EHv2
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?

2005-03-08 Thread Aaron Birkland
> Maybe I'm not an idiot (really!)  even with almost 2GB of maintenance_mem, PG
> still writes to pgsql_tmp no faster than 2MB/s.I think there may be an
> artificial bottleneck there.  Question is, PostgreSQL, OS or hardware?

I'm curious: what is your cpu usage while this is happening?  I've
noticed similar slow index creation behaviour, but I did not make any
connection to pgsql_temp (because it was not on a separate partition).
 I was indexing an oid field of a 700GB table and it took about four
days on a 1.2GHz UltraSparcIII (solaris 9, 8GB core).  I noticed that
the one CPU that was  pegged at near 100%, leading me to believe it
was CPU bound.  Odd thing is that the same operation on a 2GHz Pentium
IV box (Linux) on the same data took about a day.Truss showed that
a great majority of that time was in userland.

-Aaron

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] pl/pgsql faster than raw SQL?

2005-03-08 Thread John A Meinel
Gaetano Mendola wrote:
Markus Bertheau â wrote:
 

Hi, I have the following strange situation:
   

that is no so strange. I have an example where:
SELECT * FROM my_view WHERE field1 = 'New'; ==> 800 seconds
SELECT * FROM my_view;  ==> 2 seconds
the only solution I had was to write a function table with
the second select in a loop that was returnin the row if
the field1 was equal = 'New'.
It's strange but happen.

Regards
Gaetano Mendola
 

That sounds more like you had bad statistics on the field1 column, which 
caused postgres to switch from a seqscan to an index scan, only there 
were so many rows with field1='New' that it actually would have been 
faster with a seqscan.

Otherwise what you did is very similar to the "nested loop" of postgres 
which it selects when appropriate.

The other issue with views is that depending on their definition, 
sometimes postgres can flatten them out and optimize the query, and 
sometimes it can't. Order by is one of the big culprits for bad queries 
involving views.

John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Query Optimization

2005-03-08 Thread John A Meinel
James G Wilkinson wrote:
All,
...
Firstly, I  am frankly mystified on how to interpret all this.  If anyone
could point me to a document or two that will help me decipher this,
I will greatly appreciate it.
I assume you have looked at:
http://www.postgresql.org/docs/8.0/static/performance-tips.html
And didn't find it helpful enough. I'm not really sure what help you are 
asking. Are you saying that this query is performing slowly and you want 
to speed it up? Or you just want to understand how to interpret the 
output of explain?

Secondly, I have figured out that SEQ SCANs are typically bad.  I am
concerned that a SEQ SCAN is being performed on 'biogenic_beld3_data'
which is the largest table in the query.  I would rather have a SEQ SCAN
be performed on 'tmpgrid' which contains the keys that subset the data
from 'biogenic_beld3_data.'  Is this naive on my part?
It depends how much data is being extracted. If you have 1,000,000 rows, 
and only need 10, then an index scan is wonderful. If you need 999,999, 
then a sequential scan is much better (the break even point is <10%)
From the explain, it thinks it is going to be needing 5,637 rows from 
biogenic_beld3_data, what is that portion relative to the total?

The values at least look like you've run vacuum analyze. Have you tried 
running "explain analyze" instead of just explain? Then you can see if 
the planners estimates are accurate.

If you want some help to force it, you could try a subselect query. 
Something like:

select * from biogenic_beld3_data b where b.beld3_icell = (select 
b_icell from tmpgrid_pk) and b.beld3_jcell = (select b_jcell from 
tmpgrid_pk);

Thirdly, I have run EXPLAIN on other queries that report back a
GroupAggregate Cost= that runs in about 30 minutes
on my relatively highend linux machine.  But when I run this particular
query, it takes on the order of 90 minutes to complete.  Any thoughts
on why this happens will be appreciated.
Remember cost is in terms of page fetches, not in seconds.
Probably it is just an issue of postgres mis-estimating the selectivity 
of one of your queries.
Also, you have a fairly complex SUM occurring involving 4 
multiplications on an estimated 150,000 rows. While doesn't seem like it 
should take 90 minutes, it also isn't a trivial operation.

Finally, if anyone can be so kind as to provide insight on how to better
optimize this query, I will, again, be deeply grateful.
Thanks in advance.
terrakit
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] vacuum full, why multiple times ?

2005-03-08 Thread Michael Fuhr
On Wed, Mar 09, 2005 at 02:02:13AM +0100, Gaetano Mendola wrote:

> Basically I'm noticing that a simple vacuum full is not enough to
> shrink completelly the table:
> 
> # vacuum full verbose url;
> INFO:  vacuuming "public.url"
> INFO:  "url": found 268392 removable, 21286 nonremovable row versions in 8563 
> pages
> DETAIL:  22 dead row versions cannot be removed yet.

How busy is the database?  I'd guess that each time you run VACUUM,
there are still open transactions that have visibility to the dead
rows, so VACUUM doesn't touch them.  Those transactions eventually
complete, and eventually VACUUM FULL does what you're expecting.
I don't know if that's the only possible cause, but I get results
similar to yours if I have transactions open when I run VACUUM.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] 64bit Opteron multi drive raid. Help with best config settings

2005-03-08 Thread David B
Hi folks,

I'm about to start testing PGv8 on an Opteron 64bit box with 12GB Ram
running RedHat.
A bunch of raided drives in the backend.

Expecting 50GB of data per month (100GB+ initial load).

I do not see any example config settings. Have some MySql experience
and and for it there are config settings for small or large server
operations.

Does PG have similar examples (if so they are well hiddenâat least
from Google search).

If not can any of you send me a typical config for such an environment.
I basically want to get a good setting so I can good insert
performanceâlow vol selects.

DB data consists of customer info and their call history.
Lots of customersâlots of call history. Not that wide of rows.

Want to be able to insert at rate of 1000's per sec. Should think thats poss.

Any help you folks can provide to optimize this in a shootout between
it and MSql (we hope to be move from Oracle)

Tx,
David

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] 64bit Opteron multi drive raid. Help with best config

2005-03-08 Thread Karim Nassar
On Tue, 2005-03-08 at 19:07 -0800, David B wrote:
> I do not see any example config settings. Have some MySql experience
> and and for it there are config settings for small or large server
> operations.

For starters, this might be helpful:

http://www.powerpostgresql.com/PerfList

Then this:
http://www.powerpostgresql.com/Downloads/annotated_conf_80.html
http://www.powerpostgresql.com/Downloads/annotated_conf_80.pdf

Someone else might have an example config for you.

HTH,

-- 
Karim Nassar
Department of Computer Science
Box 15600, College of Engineering and Natural Sciences
Northern Arizona University,  Flagstaff, Arizona 86011
Office: (928) 523-5868 -=- Mobile: (928) 699-9221


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] index scan on =, but not < ?

2005-03-08 Thread Bruno Wolff III
On Tue, Mar 08, 2005 at 13:35:53 -0500,
  Rick Schumeyer <[EMAIL PROTECTED]> wrote:
> I have two index questions.  The first is about an issue that has been
> recently discussed,
> 
> and I just wanted to be sure of my understanding.  Functions like count(),
> max(), etc. will
> 
> use sequential scans instead of index scans because the index doesn't know
> which rows
> 
> are actually visible.is this correct?

Not exactly. If the number of rows to be examined is on the order of 5%
of the table, an index scan will probably be slower than a sequential
scan. The visibility issue makes index scans slower in the case that
the only columns of interest are in the index.
Another issue is that max could in theory use an index, but there isn't
a mechanism for Postgres to know how to do this in general for aggregates
where it is possible. There have been discussions in the past about
how this could be done, but no one has done it yet.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?

2005-03-08 Thread Tom Lane
Josh Berkus  writes:
> Maybe I'm not an idiot (really!)  even with almost 2GB of maintenance_mem, PG
> still writes to pgsql_tmp no faster than 2MB/s.I think there may be an 
> artificial bottleneck there.  Question is, PostgreSQL, OS or hardware?

AFAIR that's just fwrite() ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?

2005-03-08 Thread Josh Berkus
Tom,

> > Maybe I'm not an idiot (really!)  even with almost 2GB of
> > maintenance_mem, PG still writes to pgsql_tmp no faster than 2MB/s.I
> > think there may be an artificial bottleneck there.  Question is,
> > PostgreSQL, OS or hardware?
>
> AFAIR that's just fwrite() ...

Well, are there any hacks to speed it up?   It's about doubling the amount of 
time it takes to create an index on a very large table.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?

2005-03-08 Thread Tom Lane
Josh Berkus  writes:
>> AFAIR that's just fwrite() ...

> Well, are there any hacks to speed it up?   It's about doubling the amount of
> time it takes to create an index on a very large table.

Huh?  Doubled compared to what?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] index scan on =, but not < ?

2005-03-08 Thread Jim C. Nasby
On Tue, Mar 08, 2005 at 10:38:21PM -0600, Bruno Wolff III wrote:
> Not exactly. If the number of rows to be examined is on the order of 5%
> of the table, an index scan will probably be slower than a sequential
> scan. The visibility issue makes index scans slower in the case that

Shouldn't that be 50%?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] index scan on =, but not < ?

2005-03-08 Thread Bruno Wolff III
On Tue, Mar 08, 2005 at 22:55:19 -0600,
  "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 08, 2005 at 10:38:21PM -0600, Bruno Wolff III wrote:
> > Not exactly. If the number of rows to be examined is on the order of 5%
> > of the table, an index scan will probably be slower than a sequential
> > scan. The visibility issue makes index scans slower in the case that
> 
> Shouldn't that be 50%?

No. When you are doing an index scan of a significant part of the table,
you will fetch some heap pages more than once. You will also be fetching
blocks out of order, so you will lose out on read ahead optimization
by the OS. This assumes that you don't get a lot of cache hits on the
help pages. If a significant portion of the table is cached, then the
trade off point will be at a higher percentage of the table.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?

2005-03-08 Thread Josh Berkus
Tom,

> Huh?  Doubled compared to what?

Compared to how much data writing I can do to the database when pgsql_tmp 
isn't engaged.

In other words, when pgsql_tmp isn't being written, database writing is 9mb/s.  
  
When pgsql_tmp gets engaged, that drops to 4mb/s.

Alternatively, the WAL drive, which is the same hardware, will write at 
10mb/s.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq