Re: [PERFORM] Tuning, configuration for 7.3.5 on a Sun E4500
Analyze has been run on the database quite frequently during the course of us trying to figure out this performance issue. It is also a task that is crontabbed nightly. On Mon, 7 Mar 2005 09:31:06 -0800, Josh Berkus wrote: > Tsarevich, > > > When running queries we are experiencing much bigger result times than > > anticipated. > > > > Attached is a copy of our postgresql.conf file and of our the table > > definitions and row counts. > > Looks like you haven't run ANALYZE on the database anytime recently. Try that > and re-run. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [pgsql-hackers-win32] [PERFORM] Help with tuning this query (with
> > RDTSC is a bad source of information for this kind of thing, as the > > CPU frequency might vary. > > One thought that was bothering me was that if the CPU goes > idle while waiting for disk I/O, its clock might stop or slow > down dramatically. > If we believed such a counter for EXPLAIN, we'd severely > understate the cost of disk I/O. > > I dunno if that is the case on any Windows hardware or not, > but none of this thread is making me feel confident that we > know what QueryPerformanceCounter does measure. I'm "reasonaly confident" that QPC will measure actual wallclock time as passed, using a chip that is external to the CPU. (Don't ask me which chip :P). The docs specifically say: "Note that the frequency of the high-resolution performance counter is not the processor speed." It also indicates that it is possible for hardware not to support it, in which case the frequency will be reported as zero. I don't know any remotely modern wintel system that doesn't, though - it seems this may be referring to the old MIPS port of NT that didn't have it. I also find: "Depending on the processor and exact version of NT you're using, on an Intel you get either the Time Stamp Counter, or the 1.1... MHz timer built into the motherboard." So I think we're perfectly safe relying on it. And certainly not alone in doing so :-) //Magnus ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] bad plan
I posted this on hackers, but I had to post it here. === Hi all, running a 7.4.5 engine, I'm facing this bad plan: empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp empdb-#FROM v_sc_user_request empdb-#WHERE empdb-# login = 'babinow1' empdb-#LIMIT 10 ; QUERY PLAN -- Limit (cost=1716.38..1716.39 rows=1 width=232) (actual time=52847.239..52847.322 rows=10 loops=1) -> Subquery Scan v_sc_user_request (cost=1716.38..1716.39 rows=1 width=232) (actual time=52847.234..52847.301 rows=10 loops=1) -> Sort (cost=1716.38..1716.39 rows=1 width=201) (actual time=52847.219..52847.227 rows=10 loops=1) Sort Key: sr.id_sat_request -> Nested Loop Left Join (cost=1478.82..1716.37 rows=1 width=201) (actual time=3254.483..52847.064 rows=31 loops=1) Join Filter: ("outer".id_package = "inner".id_package) -> Nested Loop (cost=493.09..691.55 rows=1 width=193) (actual time=347.665..940.582 rows=31 loops=1) -> Nested Loop (cost=493.09..688.49 rows=1 width=40) (actual time=331.446..505.628 rows=31 loops=1) Join Filter: ("inner".id_user = "outer".id_user) -> Index Scan using user_login_login_key on user_login ul (cost=0.00..4.00 rows=2 width=16) (actual time=12.065..12.071 rows=1 loops=1) Index Cond: ((login)::text = 'babinow1'::text) -> Materialize (cost=493.09..531.37 rows=7656 width=28) (actual time=167.654..481.813 rows=8363 loops=1) -> Seq Scan on sat_request sr (cost=0.00..493.09 rows=7656 width=28) (actual time=167.644..467.344 rows=8363 loops=1) Filter: (request_time > (now() - '1 mon'::interval)) -> Index Scan using url_pkey on url u (cost=0.00..3.05 rows=1 width=161) (actual time=13.994..14.000 rows=1 loops=31) Index Cond: ("outer".id_url = u.id_url) -> Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) -> Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) Merge Cond: ("outer".id_program = "inner".id_program) -> Sort (cost=20.74..20.97 rows=93 width=19) (actual time=0.385..0.431 rows=47 loops=31) Sort Key: programs.id_program -> Seq Scan on programs (cost=0.00..17.70 rows=93 width=19) (actual time=0.022..11.709 rows=48 loops=1) Filter: (id_program <> 0) -> Sort (cost=964.99..967.75 rows=1102 width=115) (actual time=14.592..15.218 rows=493 loops=31) Sort Key: sequences.id_program -> Merge Join (cost=696.16..909.31 rows=1102 width=115) (actual time=79.717..451.495 rows=493 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Merge Left Join (cost=0.00..186.59 rows=1229 width=103) (actual time=0.101..366.854 rows=1247 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Index Scan using packages_pkey on packages p (cost=0.00..131.04 rows=1229 width=103) (actual time=0.048..163.503 rows=1247 loops=1) -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..46.83 rows=855 width=4) (actual time=0.022..178.599 rows=879 loops=1) -> Sort (cost=696.16..705.69 rows=3812 width=16) (actual time=79.582..79.968 rows=493 loops=1) Sort Key: sequences.id_package -> Seq Scan on sequences (cost=0.00..469.42 rows=3812 width=16) (actual time=0.012..78.863 rows=493 loops=1)
Re: [PERFORM] bad plan
Gaetano Mendola wrote: running a 7.4.5 engine, I'm facing this bad plan: empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp empdb-#FROM v_sc_user_request empdb-#WHERE empdb-# login = 'babinow1' empdb-#LIMIT 10 ; -> Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) -> Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) Merge Cond: ("outer".id_program = "inner".id_program) The problem to address is in this subquery. That's a total of 31 x (1668.754 - 25.328) = 50seconds (about). Since your query is so simple, I'm guessing v_sc_user_request is a view. Can you provide the definition? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] pl/pgsql faster than raw SQL?
Hi, I have the following strange situation: oocms=# vacuum full analyze; VACUUM oocms=# \df+ class_get_number_of_objects ÐÐÐÑÐÐ ÑÑÐÐÑÐÐ ÐÑÐÐÐ | ÐÐÑ | ÐÐÐ ÑÑ ÑÐÐÑÐÑÑÐÑÐ | ÐÐÐÑ ÑÑ ÐÑÐÑÐÐÐÑÐÐ | ÐÐÐÑ | ÐÐÑÐ | ÐÑÑÐÐÐÑÐ ÑÐÐÑÑ | ÐÐÐÑ ---+-+---++--+-++--- oocms | class_get_number_of_objects | integer | text | oocms| plpgsql | DECLARE arg_class_name ALIAS FOR $1; BEGIN IF arg_class_name IS NULL THEN RAISE WARNING 'class_get_number_of_objects() with NULL class name called'; RETURN NULL; END IF; RETURN count(1) FROM objects WHERE class = arg_class_name; END; | Return the number of existing or deleted objects of a class. Arguments: the name of the class (1 ÑÑ) oocms=# explain analyze select count(1) from objects where class = 'Picture'; QUERY PLAN Aggregate (cost=278.16..278.16 rows=1 width=0) (actual time=44.121..44.123 rows=1 loops=1) -> Seq Scan on objects (cost=0.00..267.65 rows=4205 width=0) (actual time=0.030..33.325 rows=4308 loops=1) Filter: ("class" = 'Picture'::text) Total runtime: 44.211 ms (ÑÐÐ: 4) oocms=# explain analyze select class_get_number_of_objects('Picture'); QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=0) (actual time=27.019..27.022 rows=1 loops=1) Total runtime: 27.062 ms (ÑÐÐ: 2) I.e. a function takes 27 ms to do what takes an equivalent piece of sql 43 ms. How can this be explained? Some more info: oocms=# select class_get_number_of_objects('Picture'); class_get_number_of_objects - 4308 (1 ÑÑ) oocms=# select count(1) from objects; count --- 13332 (1 ÑÑ) oocms=# \d objects ÐÑÐ "oocms.objects" ÐÐÐ | ÐÐÐ| ÑÐÐÐÑÐÑÑ ---+--+--- object_id | integer | not null default nextval('oocms.objects_object_id_seq'::text) class | text | not null created | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone ÐÑÑ: "objects_pkey" PRIMARY KEY, btree (object_id) "fo" btree ("class") ÐÐÑÐÐÐÑÐÐÐÑ ÐÐ ÐÐÐÑÐÐÐÑ ÐÐÑÑÑ: "objects_class_fkey" FOREIGN KEY ("class") REFERENCES classes(name) ON UPDATE CASCADE -- Markus Bertheau â <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] bad plan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Richard Huxton wrote: > Gaetano Mendola wrote: > >> running a 7.4.5 engine, I'm facing this bad plan: >> >> empdb=# explain analyze SELECT >> name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp >> >> empdb-#FROM v_sc_user_request >> empdb-#WHERE >> empdb-# login = 'babinow1' >> empdb-#LIMIT 10 ; > > >> -> Subquery Scan vsp (cost=985.73..1016.53 >> rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) >>-> Merge Join (cost=985.73..1011.01 >> rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) >> Merge Cond: ("outer".id_program = >> "inner".id_program) > > > The problem to address is in this subquery. That's a total of 31 x > (1668.754 - 25.328) = 50seconds (about). > > Since your query is so simple, I'm guessing v_sc_user_request is a view. > Can you provide the definition? Of course: CREATE OR REPLACE VIEW v_sc_user_request AS SELECT * FROM v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package ) WHERE vsr.request_time > now() - '1 month'::interval AND vsr.expired = FALSE ORDER BY id_sat_request DESC ; CREATE OR REPLACE VIEW v_sc_packages AS SELECT * FROM v_programs vpr, v_packages vpk, v_sequences vs WHERE JOIN - vpr.id_program = vs.id_program AND vpk.id_package = vs.id_package AND --- vs.estimated_start IS NOT NULL ; CREATE OR REPLACE VIEW v_sat_request AS SELECT * FROM sat_request sr, url u, user_login ul WHERE JOIN - sr.id_url = u.id_url AND sr.id_user = ul.id_user --- ; that column expired was added since yesterday Regards Gaetano Mendola -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCLZkD7UpzwH2SGd4RAv8/AKCA5cNfu6vEKZ6m/ke1JsVRdsOTXQCbBMt4 ZPTFjwyb52CrFxdUTD6gejs= =STzz -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pl/pgsql faster than raw SQL?
Markus Bertheau â wrote: oocms=# explain analyze select count(1) from objects where class = 'Picture'; QUERY PLAN Aggregate (cost=278.16..278.16 rows=1 width=0) (actual time=44.121..44.123 rows=1 loops=1) -> Seq Scan on objects (cost=0.00..267.65 rows=4205 width=0) (actual time=0.030..33.325 rows=4308 loops=1) Filter: ("class" = 'Picture'::text) Total runtime: 44.211 ms (ÑÐÐ: 4) oocms=# explain analyze select class_get_number_of_objects('Picture'); QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=0) (actual time=27.019..27.022 rows=1 loops=1) Total runtime: 27.062 ms Well, you're saving planning time with the plpgsql version, but that's not going to come to 17ms (you'd hope). The EXPLAIN will take up time itself, and it can look deeper into the SQL version. Try timing two scripts with 100 of each and see if they really differ by that much. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] bad plan
Gaetano Mendola wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Richard Huxton wrote: Gaetano Mendola wrote: running a 7.4.5 engine, I'm facing this bad plan: empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp empdb-#FROM v_sc_user_request empdb-#WHERE empdb-# login = 'babinow1' empdb-#LIMIT 10 ; -> Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) -> Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) Merge Cond: ("outer".id_program = "inner".id_program) The problem to address is in this subquery. That's a total of 31 x (1668.754 - 25.328) = 50seconds (about). Since your query is so simple, I'm guessing v_sc_user_request is a view. Can you provide the definition? Of course: CREATE OR REPLACE VIEW v_sc_user_request AS SELECT * FROM v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package ) WHERE vsr.request_time > now() - '1 month'::interval AND vsr.expired = FALSE ORDER BY id_sat_request DESC ; CREATE OR REPLACE VIEW v_sc_packages AS SELECT * FROM v_programs vpr, v_packages vpk, v_sequences vs WHERE JOIN - vpr.id_program = vs.id_program AND vpk.id_package = vs.id_package AND --- vs.estimated_start IS NOT NULL ; CREATE OR REPLACE VIEW v_sat_request AS SELECT * FROM sat_request sr, url u, user_login ul WHERE JOIN - sr.id_url = u.id_url AND sr.id_user = ul.id_user --- ; OK, so looking at the original EXPLAIN the order of processing seems to be: 1. v_sat_request is evaluated and filtered on login='...' (lines 7..15) This gives us 31 rows 2. The left-join from v_sat_request to v_sc_packages is processed (lines 5..6) This involves the subquery scan on vsp (from line 16) where it seems to think the best idea is a merge join of programs to sequences. So - I think we need to look at the performance of your view "v_sc_packages" and the views that it depends on. OK - can you reply to this with just the definitions of v_sc_packages and what it depends on, and we can have a look at that. Do you need all these tables involved in this query? I don't think PG is smart enough to completely discard a join if it's not needed by the output. Thinking about it, I'm not sure you could safely. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pl/pgsql faster than raw SQL?
Markus Bertheau â wrote: Hi, I have the following strange situation: ... oocms=# explain analyze select count(1) from objects where class = 'Picture'; QUERY PLAN Aggregate (cost=278.16..278.16 rows=1 width=0) (actual time=44.121..44.123 rows=1 loops=1) -> Seq Scan on objects (cost=0.00..267.65 rows=4205 width=0) (actual time=0.030..33.325 rows=4308 loops=1) Filter: ("class" = 'Picture'::text) Total runtime: 44.211 ms (ÑÐÐ: 4) oocms=# explain analyze select class_get_number_of_objects('Picture'); QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=0) (actual time=27.019..27.022 rows=1 loops=1) Total runtime: 27.062 ms (ÑÐÐ: 2) I.e. a function takes 27 ms to do what takes an equivalent piece of sql 43 ms. How can this be explained? Some more info: In explain analyze, there is a per-row overhead of 2 gettimeofday() calls. This is usually very low and hidden in I/O, but on queries where you go through a lot of rows, but things are cached in ram, it can show up. So the explain analyze is going deep into the SQL query. With a stored procedure, explain analyze only runs the procedure, it doesn't instrument the actual function. So you don't have that per-row overhead. For an alternate accurate view. Try: # \timing # explain analyze select count(1) from objects where class = 'Picture'; # explain analyze select class_get_number_of_objects('Picture'); \timing will also give you the time it takes to run the query, but it doesn't instrument anything. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Tuning, configuration for 7.3.5 on a Sun E4500
Tsarevich, > Analyze has been run on the database quite frequently during the > course of us trying to figure out this performance issue. It is also > a task that is crontabbed nightly. Hmmm. Then you probably need to up the STATISTICS levels on the target column, because PG is mis-estimating the number of rows returned significantly. That's done by: ALTER TABLE {table} ALTER COLUMN {column} SET STATISTICS {number} Generally, I find that if mis-estimation occurs, you need to raise statistics to at least 250. Here's where I see the estimation issues with your EXPLAIN: -> Index Scan using component_commercial_order_id_ix on component (cost=0.00..3.85 rows=1 width=28) (actual time=0.17..0.18 rows=1 loops=46376) Index Cond: (component.commercial_order_id = "outer".commercial_order_id) Filter: ((raised_dtm >= '2003-01-01 00:00:00'::timestamp without time zone) AND (raised_dtm <= '2005-01-01 23:59:59'::timestamp without time zone) AND ((component_type_id = 3) OR (component_type_id = 2) OR (component_type_id = 1))) -> Index Scan using communication_component_id_ix on communication (cost=0.00..20.90 rows=16 width=8) (actual time=0.12..0.14 rows=1 loops=34638) Index Cond: (component_id = $0) So it looks like you need to raise the stats on communication.component_id and component.commercial_order_id,raised_dtm,component_type_id. You also may want to consider a multi-column index on the last set. BTW, if you have any kind of data update traffic at all, ANALYZE once a day is not adequate. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] bad plan
Richard Huxton wrote: > OK, so looking at the original EXPLAIN the order of processing seems to be: > 1. v_sat_request is evaluated and filtered on login='...' (lines 7..15) > This gives us 31 rows > 2. The left-join from v_sat_request to v_sc_packages is processed (lines > 5..6) > This involves the subquery scan on vsp (from line 16) where it seems to > think the best idea is a merge join of programs to sequences. Whel basically v_sc_packages depends on other 3 views that are just a simple interface to a plain table. If I execute a select only on this table I get reasonable executions time: === cpu_tuple_cost = 0.07 # explain analyze select * from v_sc_packages where id_package = 19628; QUERY PLAN --- Nested Loop (cost=0.00..15.96 rows=1 width=131) (actual time=41.450..41.494 rows=1 loops=1) -> Nested Loop (cost=0.00..11.86 rows=1 width=116) (actual time=1.022..1.055 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..7.89 rows=1 width=104) (actual time=0.330..0.345 rows=1 loops=1) -> Index Scan using packages_pkey on packages p (cost=0.00..3.90 rows=1 width=104) (actual time=0.070..0.075 rows=1 loops=1) Index Cond: (id_package = 19628) -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..3.91 rows=1 width=4) (actual time=0.232..0.237 rows=1 loops=1) Index Cond: ("outer".id_package = ps.id_package) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.90 rows=1 width=16) (actual time=0.670..0.685 rows=1 loops=1) Index Cond: (19628 = id_package) Filter: (estimated_start IS NOT NULL) -> Index Scan using programs_pkey on programs (cost=0.00..4.02 rows=1 width=19) (actual time=0.078..0.086 rows=1 loops=1) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) Total runtime: 42.650 ms (14 rows) === cpu_tuple_cost = 0.01 # explain analyze select * from v_sc_packages where id_package = 19628; QUERY PLAN Nested Loop (cost=0.00..15.54 rows=1 width=131) (actual time=25.062..69.977 rows=1 loops=1) -> Nested Loop (cost=0.00..11.56 rows=1 width=116) (actual time=5.396..50.299 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..7.71 rows=1 width=104) (actual time=5.223..32.842 rows=1 loops=1) -> Index Scan using packages_pkey on packages p (cost=0.00..3.84 rows=1 width=104) (actual time=0.815..7.235 rows=1 loops=1) Index Cond: (id_package = 19628) -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..3.85 rows=1 width=4) (actual time=4.366..25.555 rows=1 loops=1) Index Cond: ("outer".id_package = ps.id_package) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.84 rows=1 width=16) (actual time=0.147..17.422 rows=1 loops=1) Index Cond: (19628 = id_package) Filter: (estimated_start IS NOT NULL) -> Index Scan using programs_pkey on programs (cost=0.00..3.96 rows=1 width=19) (actual time=0.043..0.049 rows=1 loops=1) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) Total runtime: 70.254 ms (14 rows) and I get the best with this: === cpu_tuple_cost = 0.001 # explain analyze select * from v_sc_packages where id_package = 19628; QUERY PLAN - Nested Loop (cost=0.00..15.48 rows=1 width=131) (actual time=2.516..2.553 rows=1 loops=1) -> Nested Loop (cost=0.00..7.78 rows=1 width=31) (actual time=1.439..1.457 rows=1 loops=1) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.83 rows=1 width=16) (actual time=0.442..0.450 rows=1 loops=1) Index Cond: (19628 = id_package) Filter: (estimated_start IS NOT NULL) -> Index Scan using programs_pkey on programs (cost=0.00..3.95 rows=1 width=19) (actual time=0.972..0.978 rows=1 loops=1) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) -> Nested Loop Left Join (cost=0.00..7.68 rows=1 width=104) (actual time=0.110..0.125 rows=1 l
[PERFORM] index scan on =, but not < ?
I have two index questions. The first is about an issue that has been recently discussed, and I just wanted to be sure of my understanding. Functions like count(), max(), etc. will use sequential scans instead of index scans because the index doesn’t know which rows are actually visible…is this correct? Second: I created an index in a table with over 10 million rows. The index is on field x, which is a double. The following command, as I expected, results in an index scan: =# explain select * from data where x = 0; QUERY PLAN - Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34) Index Cond: (x = 0::double precision) (2 rows) But this command, in which the only difference if > instead of =, is a sequential scan. =# explain select * from data where x > 0; QUERY PLAN -- Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34) Filter: (x > 0::double precision) (2 rows) Why is this? (This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if it matters)
Re: [PERFORM] index scan on =, but not < ?
Your hypothesis about index usage of count() and max() is correct. As for why you see index usage in your first example query and not your second: compare the number of rows in question. An index is extremely useful if 19 rows will be returned. But when 62350411 rows will be returned, you're talking about a substantial fraction of the table. A sequential scan will probably correctly be judged to be faster by the planner. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Mar 8, 2005, at 12:35 PM, Rick Schumeyer wrote: I have two index questions. The first is about an issue that has been recently discussed, and I just wanted to be sure of my understanding. Functions like count(), max(), etc. will use sequential scans instead of index scans because the index doesn’t know which rows are actually visible…is this correct? Second: I created an index in a table with over 10 million rows. The index is on field x, which is a double. The following command, as I expected, results in an index scan: =# explain select * from data where x = 0; QUERY PLAN --- -- Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34) Index Cond: (x = 0::double precision) (2 rows) But this command, in which the only difference if > instead of =, is a sequential scan. =# explain select * from data where x > 0; QUERY PLAN -- Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34) Filter: (x > 0::double precision) (2 rows) Why is this? (This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if it matters) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] index scan on =, but not < ?
Rick Schumeyer wrote: I have two index questions. The first is about an issue that has been recently discussed, and I just wanted to be sure of my understanding. Functions like count(), max(), etc. will use sequential scans instead of index scans because the index doesn’t know which rows are actually visible…is this correct? Actually, index scans are chosen whenever the cost is expected to be cheaper than a sequential scan. This is generally about < 10% of the total number of rows. Second: I created an index in a table with over 10 million rows. The index is on field x, which is a double. The following command, as I expected, results in an index scan: =# explain select * from data where x = 0; QUERY PLAN - Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34) Index Cond: (x = 0::double precision) (2 rows) Since you have 10m rows, when it expects to get only 19 rows, it is much faster to use an index. But this command, in which the only difference if > instead of =, is a sequential scan. =# explain select * from data where x > 0; QUERY PLAN -- Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34) Filter: (x > 0::double precision) (2 rows) Here, pg expects to find 62M rows (you must have significantly more than 10M rows). In this case a sequential scan is much faster than an indexed one, so that's what pg does. Why is this? (This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if it matters) If you think there is truly a performance problem, try attaching the results of "explain analyze" in which we might be able to tell you that your statistics inaccurate (run vacuum analyze if you haven't). John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] index scan on =, but not < ?
That makes a lot of sense. Sure enough, if I change the query from WHERE x > 0 (which return a lot of rows) to WHERE x > 0 AND x < 1 I now get an index scan. > As for why you see index usage in your first example query and not your > second: compare the number of rows in question. An index is extremely > useful if 19 rows will be returned. But when 62350411 rows will be > returned, you're talking about a substantial fraction of the table. A > sequential scan will probably correctly be judged to be faster by the > planner. > ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] index scan on =, but not < ?
On Tue, 8 Mar 2005, Rick Schumeyer wrote: > =# explain select * from data where x = 0; > - > Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34) >Index Cond: (x = 0::double precision) > > But this command, in which the only difference if > instead of =, is a > sequential scan. > > =# explain select * from data where x > 0; > -- > Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34) >Filter: (x > 0::double precision) > > Why is this? That is because it's faster to execute the x>0 query with a seq. scan then a index scan. Postgresql is doing the right thing here. Pg estimates that the first query will return 19 rows and that the second query will return 62350411 rows. To return 62350411 rows it's faster to just scan the table and not use the index. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] bad plan
Gaetano Mendola wrote: Richard Huxton wrote: OK, so looking at the original EXPLAIN the order of processing seems to be: 1. v_sat_request is evaluated and filtered on login='...' (lines 7..15) This gives us 31 rows 2. The left-join from v_sat_request to v_sc_packages is processed (lines 5..6) This involves the subquery scan on vsp (from line 16) where it seems to think the best idea is a merge join of programs to sequences. Whel basically v_sc_packages depends on other 3 views that are just a simple interface to a plain table. If I execute a select only on this table I get reasonable executions time: === cpu_tuple_cost = 0.07 # explain analyze select * from v_sc_packages where id_package = 19628; QUERY PLAN --- Nested Loop (cost=0.00..15.96 rows=1 width=131) (actual time=41.450..41.494 rows=1 loops=1) -> Nested Loop (cost=0.00..11.86 rows=1 width=116) (actual time=1.022..1.055 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..7.89 rows=1 width=104) (actual time=0.330..0.345 rows=1 loops=1) -> Index Scan using packages_pkey on packages p (cost=0.00..3.90 rows=1 width=104) (actual time=0.070..0.075 rows=1 loops=1) Index Cond: (id_package = 19628) -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..3.91 rows=1 width=4) (actual time=0.232..0.237 rows=1 loops=1) Index Cond: ("outer".id_package = ps.id_package) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.90 rows=1 width=16) (actual time=0.670..0.685 rows=1 loops=1) Index Cond: (19628 = id_package) Filter: (estimated_start IS NOT NULL) -> Index Scan using programs_pkey on programs (cost=0.00..4.02 rows=1 width=19) (actual time=0.078..0.086 rows=1 loops=1) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) Total runtime: 42.650 ms (14 rows) === cpu_tuple_cost = 0.01 === cpu_tuple_cost = 0.001 I don't know what you think you're measuring, but it's nothing to do with the plans. If you look at the plans carefully, you'll see they're all the same. The "cost" numbers change because that's the parameter you're changing. I'm not sure it makes sense to vary cpu_tuple_cost from 0.07 down to 0.001 - that's a factor of 70 difference. I might be tempted to halve or double it, but even then only after some serious testing. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] bad plan
Gaetano Mendola <[EMAIL PROTECTED]> writes: >> Since your query is so simple, I'm guessing v_sc_user_request is a view. >> Can you provide the definition? > Of course: I don't think you've told us the whole truth about the v_sc_packages view. The definition as given doesn't work at all (it'll have duplicate column names), but more to the point, if it were that simple then the planner would fold it into the parent query. The subquery scan node indicates that folding did not occur. The most likely reason for that is that there's an ORDER BY in the view. Putting ORDER BYs in views that you intend to use as components of other views is a bad practice from a performance perspective... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?
PG Hackers, What follows is iostat output from a TPC-H test on Solaris 10.The machine is creating indexes on a table which is 50G in size, so it needs to use pgsql_tmp for internal swapping: ttymd15 sd1 sd2 sd3cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 0 84 22526 12111 1024 150 00 5634 3371 30 8 0 61 0 242 24004 13371 1024 150 00 6007 3551 33 8 0 59 0 85 22687 12771 1024 150 00 5656 3221 31 8 0 62 0 85 20876 10991 1024 290 00 5185 2921 28 7 0 64 md15 is WAL (pg_xlog). sd3 is PGDATA. sd1 i pgsql_tmp. As you can see, we're getting a nice 23mb/s peak for WAL (thanks to forcedirectio) and database writes peak at 6mb/s. However, pgsql_tmp, which is being used heavily, hovers around 1mb/s, and never goes above 1.5mb/s. This seems to be throttling the whole system. Any suggestions on why this should be? Do we have a performance bug in the pg_tmp code? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] bad plan
Tom Lane wrote: Putting ORDER BYs in views that you intend to use as components of other views is a bad practice from a performance perspective... There are also a lot of views involved here for very few output columns. Tom - is the planner smart enough to optimise-out unneeded columns from a SELECT * view if it's part of a join/subquery and you only use one or two columns? Secondly, in the original plan we have: -> Nested Loop Left Join (cost=1478.82..1716.37 rows=1 width=201) (actual time=3254.483..52847.064 rows=31 loops=1) Now, we've got 31 rows instead of 1 here. The one side of the join ends up as: -> Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) -> Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) Would I be right in thinking the planner doesn't materialise the subquery because it's expecting 1 loop not 31? If there were 1 row the plan would seem OK to me. Is there any mileage in the idea of a "lazy" planner that keeps some alternative paths around in case they're needed? Or a reactive one that can re-plan nodes when assumptions turn out to be wrong? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] bad plan
Richard Huxton writes: > There are also a lot of views involved here for very few output columns. > Tom - is the planner smart enough to optimise-out unneeded columns from > a SELECT * view if it's part of a join/subquery and you only use one or > two columns? If the view gets flattened, yes, but I believe that it's not bright enough to do so when it can't flatten the view. You could tell easily enough by looking at the row-width estimates at various levels of the plan. (Let's see ... in Gaetano's plan the SubqueryScan is returning 12-byte rows where its input MergeJoin is returning 130-byte rows, so sure enough the view is computing a lot of stuff that then gets thrown away.) > Would I be right in thinking the planner doesn't materialise the > subquery because it's expecting 1 loop not 31? If there were 1 row the > plan would seem OK to me. Right; it doesn't see any predicted gain from the extra cost of materializing. But to me the main problem here is not that, it is that the entire shape of the plan would likely be different if it weren't for the "optimization fence" that the Subquery Scan node represents. I suspect too that the use of mergejoin as opposed to anything else within the vsp subplan is driven more by the need to produce sorted output than by what is the cheapest way to get the rows. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Query Optimization
All, I hope that this is the right place to post. I am relatively new to PostgreSQL (i.e., < 1 year in coding) and am just starting to delve into the issues of query optimization. I have hunted around the web for the basics of query optimization, but I have not had much success in interpreting the documents. I have also been trying to learn the basics of the EXPLAIN commandalso without much success, but I will keep trying. Anyway, here is what the system reports on the following command: EXPLAIN SELECT a.country_code, a.state_county_fips, icell, jcell, a.beld3_species_id, pollutant_code, SUM(b.ratio * d.emissions_factor * a.percent_ag * e.ag_fraction * 1) as normalized_emissions FROM "globals"."biogenic_beld3_data" a, "spatial"."tmpgrid" b, "globals"."biogenic_emissions_factors" d, "globals"."biogenic_beld3_ag_data" e WHERE a.beld3_icell=b.b_icell AND a.beld3_jcell=b.b_jcell AND a.country_code=e.country_code AND a.state_county_fips=e.state_county_fips AND a.beld3_species_id=d.beld3_species_id AND a.ag_forest_records > 0 AND a.percent_ag > 0 AND d.emissions_factor > 0 GROUP BY a.country_code, a.state_county_fips, icell, jcell, a.beld3_species_id, pollutant_code ORDER BY a.country_code, a.state_county_fips, icell, jcell, a.beld3_species_id, pollutant_code; QUERY PLAN --- GroupAggregate (cost=65034.94..71110.50 rows=151889 width=73) ->Sort (cost=65034.94..65414.66 rows=151889 width=73) Sort Key: a.country_code, a.state_county_fips, b.icell, b.jcell, a.beld3_species_id, d.pollutant_code ->Hash Join (cost=33749.64..37412.88 rows=151889 width=73) Hash Cond: ("outer".beld3_species_id = "inner".beld3_species_id) ->Merge Join (cost=33728.84..35303.61 rows=37972 width=56) Merge Cond: ((("outer".country_code)::text = "inner"."?column8?") AND (("outer".state_county_fips)::text = "inner"."?column9?")) ->Index Scan using biogenic_beld3_ag_data_pk on biogenic_beld3_ag_data e (cost=0.00..806.68 rows=20701 width=26) ->Sort (cost=33728.84..33741.67 rows=5131 width=45) Sort Key: (a.country_code)::text, (a.state_county_fips)::text ->Nested Loop (cost=0.00..33412.65 rows=5131 width=45) ->Seq Scan on biogenic_beld3_data a (cost=0.00..3593.02 rows=5637 width=37) Filter: ((ag_forest_records > 0) AND (percent_ag > 0::numeric)) ->Index Scan using tmpgrid_pk on tmpgrid b (cost=0.00..5.27 rows=1 width=24) Index Cond: ((b.b_icell = "outer".beld3_icell) AND (b.b_jcell = "outer".beld3_jcell)) ->Hash (cost=18.50..18.50 rows=920 width=21) ->Seq Scan on biogenic_emissions_factors d (cost=0.00..18.50 rows=920 width=21) Filter: (emissions_factor > 0::numeric) (18 rows) Firstly, I am frankly mystified on how to interpret all this. If anyone could point me to a document or two that will help me decipher this, I will greatly appreciate it. Secondly, I have figured out that SEQ SCANs are typically bad. I am concerned that a SEQ SCAN is being performed on 'biogenic_beld3_data' which is the largest table in the query. I would rather have a SEQ SCAN be performed on 'tmpgrid' which contains the keys that subset the data from 'biogenic_beld3_data.' Is this naive on my part? Thirdly, I have run EXPLAIN on other queries that report back a GroupAggregate Cost= that runs in about 30 minutes on my relatively highend linux machine. But when I run this particular query, it takes on the order of 90 minutes to complete. Any thoughts on why this happens will be appreciated. Finally, if anyone can be so kind as to provide insight on how to better optimize this query, I will, again, be deeply grateful. Thanks in advance. terrakit ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?
People: > As you can see, we're getting a nice 23mb/s peak for WAL (thanks to > forcedirectio) and database writes peak at 6mb/s. However, pgsql_tmp, > which is being used heavily, hovers around 1mb/s, and never goes above > 1.5mb/s. This seems to be throttling the whole system. Never mind, I'm a dork.I accidentally cut the "SET maintenance_work_mem = 200" out of my config file, and it was running with the default 1024K -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?
H. > > As you can see, we're getting a nice 23mb/s peak for WAL (thanks to > > forcedirectio) and database writes peak at 6mb/s. However, pgsql_tmp, > > which is being used heavily, hovers around 1mb/s, and never goes above > > 1.5mb/s. This seems to be throttling the whole system. > > Never mind, I'm a dork.I accidentally cut the "SET maintenance_work_mem > = 200" out of my config file, and it was running with the default 1024K Maybe I'm not an idiot (really!) even with almost 2GB of maintenance_mem, PG still writes to pgsql_tmp no faster than 2MB/s.I think there may be an artificial bottleneck there. Question is, PostgreSQL, OS or hardware? Suggestions? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] bad plan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: > Gaetano Mendola <[EMAIL PROTECTED]> writes: > >>>Since your query is so simple, I'm guessing v_sc_user_request is a view. >>>Can you provide the definition? > > >>Of course: > > > I don't think you've told us the whole truth about the v_sc_packages > view. The definition as given doesn't work at all (it'll have > duplicate column names), but more to the point, if it were that simple > then the planner would fold it into the parent query. The subquery > scan node indicates that folding did not occur. The most likely reason > for that is that there's an ORDER BY in the view. I didn't say the complete truth because the view definition is long so I just omitted all fields. explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp FROM v_sc_user_request WHERE login = 'babinow1' LIMIT 10 ; these are the complete definitions of views involved in the query: CREATE OR REPLACE VIEW v_sc_user_request AS SELECT vsr.id_sat_request AS id_sat_request, vsr.id_user AS id_user, vsr.loginAS login, vsr.url AS url, vsr.name AS name, vsr.descrAS descr, vsr.size AS size, trunc(vsr.size/1024.0/1024.0,2) AS size_mb, vsr.id_sat_request_statusAS id_sat_request_status, sp_lookup_key('sat_request_status', vsr.id_sat_request_status) AS request_status, sp_lookup_descr('sat_request_status', vsr.id_sat_request_status) AS request_status_descr, vsr.id_url_statusAS id_url_status, sp_lookup_key('url_status', vsr.id_url_status) AS url_status, sp_lookup_descr('url_status', vsr.id_url_status) AS url_status_descr, vsr.url_time_stamp AS url_time_stamp, date_trunc('seconds',vsr.request_time) AS request_time_stamp, vsr.id_package AS id_package, COALESCE(date_trunc('seconds',vsp.estimated_start)::text,'NA') AS estimated_start FROM v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package ) WHERE vsr.request_time > now() - '1 month'::interval AND vsr.expired = FALSE ORDER BY id_sat_request DESC ; CREATE OR REPLACE VIEW v_sat_request AS SELECT sr.id_user AS id_user, ul.login AS login, sr.id_sat_requestAS id_sat_request, u.id_url AS id_url, u.urlAS url, u.name AS name, u.descr AS descr, u.size AS size, u.storageAS storage, sr.id_packageAS id_package, sr.id_sat_request_status AS id_sat_request_status, sr.request_time AS request_time, sr.work_time AS request_work_time, u.id_url_status AS id_url_status, u.time_stamp AS url_time_stamp, sr.expired AS expired FROM sat_request sr, url u, user_login ul WHERE JOIN - sr.id_url = u.id_url AND sr.id_user = ul.id_user --- ; CREATE OR REPLACE VIEW v_sc_packages AS SELECT vpr.id_programAS id_program, vpr.name AS program_name, vpk.id_packageAS id_package, date_trunc('seconds', vs.estimated_start) AS estimated_start, vpk.name AS package_name, vpk.TYPE AS TYPE, vpk.description AS description, vpk.targetAS target, vpk.fec AS fec_alg, vpk.output_group - vpk.input_groupAS fec_redundancy, vpk.priority AS priority, vpk.updatable AS updatable, vpk.auto_listen AS auto_listen, vpk.start_fileAS start_file, vpk.view_target_group AS view_target_group, vpk.target_group AS target_group FROM v_programs vpr, v_packages vpk, v_sequences vs WHERE
Re: [PERFORM] pl/pgsql faster than raw SQL?
Markus Bertheau â wrote: > Hi, I have the following strange situation: that is no so strange. I have an example where: SELECT * FROM my_view WHERE field1 = 'New'; ==> 800 seconds SELECT * FROM my_view; ==> 2 seconds the only solution I had was to write a function table with the second select in a loop that was returnin the row if the field1 was equal = 'New'. It's strange but happen. Regards Gaetano Mendola ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] vacuum full, why multiple times ?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, this is the third email that I post but I do not see it in archives, the email was too long I believe so this time I will limit the rows. Basically I'm noticing that a simple vacuum full is not enough to shrink completelly the table: # vacuum full verbose url; INFO: vacuuming "public.url" INFO: "url": found 268392 removable, 21286 nonremovable row versions in 8563 pages DETAIL: 22 dead row versions cannot be removed yet. Nonremovable row versions range from 104 to 860 bytes long. There were 13924 unused item pointers. Total free space (including removable row versions) is 63818404 bytes. 4959 pages are or will become empty, including 7 at the end of the table. 8296 pages containing 63753840 free bytes are potential move destinations. CPU 0.33s/0.12u sec elapsed 9.55 sec. [SNIPPED] INFO: "url": moved 2 row versions, truncated 8563 to 8550 pages and after 4 vacuum full: empdb=# vacuum full verbose url; INFO: vacuuming "public.url" INFO: "url": found 13 removable, 21264 nonremovable row versions in 8504 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 104 to 860 bytes long. There were 280528 unused item pointers. Total free space (including removable row versions) is 63349188 bytes. 4913 pages are or will become empty, including 0 at the end of the table. 8234 pages containing 63340628 free bytes are potential move destinations. CPU 0.17s/0.04u sec elapsed 0.49 sec. [SNIPPED] INFO: "url": moved 5666 row versions, truncated 8504 to 621 pages anyone knows why ? I had the same behaviour with a 46000 rows table with 46000 pages! It was reduced to 3000 pages after 7 vacuum full. Regards Gaetano Mendola -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCLksV7UpzwH2SGd4RAoz3AKDvXSx3w/jRz/NR1pgtrxIZs8cJcwCg/0xm zSr0sPDBkp8V1WXjREoVdLk= =EHv2 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?
> Maybe I'm not an idiot (really!) even with almost 2GB of maintenance_mem, PG > still writes to pgsql_tmp no faster than 2MB/s.I think there may be an > artificial bottleneck there. Question is, PostgreSQL, OS or hardware? I'm curious: what is your cpu usage while this is happening? I've noticed similar slow index creation behaviour, but I did not make any connection to pgsql_temp (because it was not on a separate partition). I was indexing an oid field of a 700GB table and it took about four days on a 1.2GHz UltraSparcIII (solaris 9, 8GB core). I noticed that the one CPU that was pegged at near 100%, leading me to believe it was CPU bound. Odd thing is that the same operation on a 2GHz Pentium IV box (Linux) on the same data took about a day.Truss showed that a great majority of that time was in userland. -Aaron ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] pl/pgsql faster than raw SQL?
Gaetano Mendola wrote: Markus Bertheau â wrote: Hi, I have the following strange situation: that is no so strange. I have an example where: SELECT * FROM my_view WHERE field1 = 'New'; ==> 800 seconds SELECT * FROM my_view; ==> 2 seconds the only solution I had was to write a function table with the second select in a loop that was returnin the row if the field1 was equal = 'New'. It's strange but happen. Regards Gaetano Mendola That sounds more like you had bad statistics on the field1 column, which caused postgres to switch from a seqscan to an index scan, only there were so many rows with field1='New' that it actually would have been faster with a seqscan. Otherwise what you did is very similar to the "nested loop" of postgres which it selects when appropriate. The other issue with views is that depending on their definition, sometimes postgres can flatten them out and optimize the query, and sometimes it can't. Order by is one of the big culprits for bad queries involving views. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Query Optimization
James G Wilkinson wrote: All, ... Firstly, I am frankly mystified on how to interpret all this. If anyone could point me to a document or two that will help me decipher this, I will greatly appreciate it. I assume you have looked at: http://www.postgresql.org/docs/8.0/static/performance-tips.html And didn't find it helpful enough. I'm not really sure what help you are asking. Are you saying that this query is performing slowly and you want to speed it up? Or you just want to understand how to interpret the output of explain? Secondly, I have figured out that SEQ SCANs are typically bad. I am concerned that a SEQ SCAN is being performed on 'biogenic_beld3_data' which is the largest table in the query. I would rather have a SEQ SCAN be performed on 'tmpgrid' which contains the keys that subset the data from 'biogenic_beld3_data.' Is this naive on my part? It depends how much data is being extracted. If you have 1,000,000 rows, and only need 10, then an index scan is wonderful. If you need 999,999, then a sequential scan is much better (the break even point is <10%) From the explain, it thinks it is going to be needing 5,637 rows from biogenic_beld3_data, what is that portion relative to the total? The values at least look like you've run vacuum analyze. Have you tried running "explain analyze" instead of just explain? Then you can see if the planners estimates are accurate. If you want some help to force it, you could try a subselect query. Something like: select * from biogenic_beld3_data b where b.beld3_icell = (select b_icell from tmpgrid_pk) and b.beld3_jcell = (select b_jcell from tmpgrid_pk); Thirdly, I have run EXPLAIN on other queries that report back a GroupAggregate Cost= that runs in about 30 minutes on my relatively highend linux machine. But when I run this particular query, it takes on the order of 90 minutes to complete. Any thoughts on why this happens will be appreciated. Remember cost is in terms of page fetches, not in seconds. Probably it is just an issue of postgres mis-estimating the selectivity of one of your queries. Also, you have a fairly complex SUM occurring involving 4 multiplications on an estimated 150,000 rows. While doesn't seem like it should take 90 minutes, it also isn't a trivial operation. Finally, if anyone can be so kind as to provide insight on how to better optimize this query, I will, again, be deeply grateful. Thanks in advance. terrakit ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] vacuum full, why multiple times ?
On Wed, Mar 09, 2005 at 02:02:13AM +0100, Gaetano Mendola wrote: > Basically I'm noticing that a simple vacuum full is not enough to > shrink completelly the table: > > # vacuum full verbose url; > INFO: vacuuming "public.url" > INFO: "url": found 268392 removable, 21286 nonremovable row versions in 8563 > pages > DETAIL: 22 dead row versions cannot be removed yet. How busy is the database? I'd guess that each time you run VACUUM, there are still open transactions that have visibility to the dead rows, so VACUUM doesn't touch them. Those transactions eventually complete, and eventually VACUUM FULL does what you're expecting. I don't know if that's the only possible cause, but I get results similar to yours if I have transactions open when I run VACUUM. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] 64bit Opteron multi drive raid. Help with best config settings
Hi folks, I'm about to start testing PGv8 on an Opteron 64bit box with 12GB Ram running RedHat. A bunch of raided drives in the backend. Expecting 50GB of data per month (100GB+ initial load). I do not see any example config settings. Have some MySql experience and and for it there are config settings for small or large server operations. Does PG have similar examples (if so they are well hiddenâat least from Google search). If not can any of you send me a typical config for such an environment. I basically want to get a good setting so I can good insert performanceâlow vol selects. DB data consists of customer info and their call history. Lots of customersâlots of call history. Not that wide of rows. Want to be able to insert at rate of 1000's per sec. Should think thats poss. Any help you folks can provide to optimize this in a shootout between it and MSql (we hope to be move from Oracle) Tx, David ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] 64bit Opteron multi drive raid. Help with best config
On Tue, 2005-03-08 at 19:07 -0800, David B wrote: > I do not see any example config settings. Have some MySql experience > and and for it there are config settings for small or large server > operations. For starters, this might be helpful: http://www.powerpostgresql.com/PerfList Then this: http://www.powerpostgresql.com/Downloads/annotated_conf_80.html http://www.powerpostgresql.com/Downloads/annotated_conf_80.pdf Someone else might have an example config for you. HTH, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221 signature.asc Description: This is a digitally signed message part
Re: [PERFORM] index scan on =, but not < ?
On Tue, Mar 08, 2005 at 13:35:53 -0500, Rick Schumeyer <[EMAIL PROTECTED]> wrote: > I have two index questions. The first is about an issue that has been > recently discussed, > > and I just wanted to be sure of my understanding. Functions like count(), > max(), etc. will > > use sequential scans instead of index scans because the index doesn't know > which rows > > are actually visible.is this correct? Not exactly. If the number of rows to be examined is on the order of 5% of the table, an index scan will probably be slower than a sequential scan. The visibility issue makes index scans slower in the case that the only columns of interest are in the index. Another issue is that max could in theory use an index, but there isn't a mechanism for Postgres to know how to do this in general for aggregates where it is possible. There have been discussions in the past about how this could be done, but no one has done it yet. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?
Josh Berkus writes: > Maybe I'm not an idiot (really!) even with almost 2GB of maintenance_mem, PG > still writes to pgsql_tmp no faster than 2MB/s.I think there may be an > artificial bottleneck there. Question is, PostgreSQL, OS or hardware? AFAIR that's just fwrite() ... regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?
Tom, > > Maybe I'm not an idiot (really!) even with almost 2GB of > > maintenance_mem, PG still writes to pgsql_tmp no faster than 2MB/s.I > > think there may be an artificial bottleneck there. Question is, > > PostgreSQL, OS or hardware? > > AFAIR that's just fwrite() ... Well, are there any hacks to speed it up? It's about doubling the amount of time it takes to create an index on a very large table. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?
Josh Berkus writes: >> AFAIR that's just fwrite() ... > Well, are there any hacks to speed it up? It's about doubling the amount of > time it takes to create an index on a very large table. Huh? Doubled compared to what? regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] index scan on =, but not < ?
On Tue, Mar 08, 2005 at 10:38:21PM -0600, Bruno Wolff III wrote: > Not exactly. If the number of rows to be examined is on the order of 5% > of the table, an index scan will probably be slower than a sequential > scan. The visibility issue makes index scans slower in the case that Shouldn't that be 50%? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] index scan on =, but not < ?
On Tue, Mar 08, 2005 at 22:55:19 -0600, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > On Tue, Mar 08, 2005 at 10:38:21PM -0600, Bruno Wolff III wrote: > > Not exactly. If the number of rows to be examined is on the order of 5% > > of the table, an index scan will probably be slower than a sequential > > scan. The visibility issue makes index scans slower in the case that > > Shouldn't that be 50%? No. When you are doing an index scan of a significant part of the table, you will fetch some heap pages more than once. You will also be fetching blocks out of order, so you will lose out on read ahead optimization by the OS. This assumes that you don't get a lot of cache hits on the help pages. If a significant portion of the table is cached, then the trade off point will be at a higher percentage of the table. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Why would writes to pgsql_tmp bottleneck at 1mb/s?
Tom, > Huh? Doubled compared to what? Compared to how much data writing I can do to the database when pgsql_tmp isn't engaged. In other words, when pgsql_tmp isn't being written, database writing is 9mb/s. When pgsql_tmp gets engaged, that drops to 4mb/s. Alternatively, the WAL drive, which is the same hardware, will write at 10mb/s. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq