Re: [PERFORM] index usage on queries on inherited tables
On 04/27/2011 04:32 PM, Robert Haas wrote: In the first case, PostgreSQL evidently thinks that using the indexes will be slower than just ignoring them. You could find out whether it's right by trying it with enable_seqscan=off. My point is that this is just a problem with inherited tables. It should be obvious to postgres that few rows are being returned, but in the inherited tables case it doesn't use indexes. This was just an example. In a 52 gig table I have a select id from table limit 1 order by id desc returns instantly, but as soon as you declare a child table it tries to seq scan all the tables. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] planner and worst case scenario
Here is my query, that returns one row: SELECT f1, f2,(SELECT dfield FROM d WHERE d.ukey = f1) FROM m WHERE status IN(2) AND jid IN(17674) ORDER BY pkey DESC LIMIT 25 OFFSET 0; Here was the really bad plan chosen. This didn't come back for a long while and had to be cancelled: QUERY PLAN -- Limit (cost=0.00..10493.05 rows=25 width=118) - Index Scan Backward using m_pkey on m (cost=0.00..1883712.97 rows=4488 width=118) Filter: ((status = 2) AND (jid = 17674)) SubPlan - Index Scan using d_pkey on d (cost=0.00..3.83 rows=1 width=24) Index Cond: (ukey = $0) (6 rows) After an ANALYZE the plan was much better: QUERY PLAN -- Limit (cost=22060.13..22060.19 rows=25 width=119) - Sort (cost=22060.13..22067.61 rows=2993 width=119) Sort Key: serial - Index Scan using m_jid_uid_key on m (cost=0.00..21887.32 rows=2993 width=119) Index Cond: (jid = 17674) Filter: (status = 2) SubPlan - Index Scan using d_pkey on d (cost=0.00..3.83 rows=1 width=24) Index Cond: (ukey = $0) (9 rows) The thing is since there was only 1 row in the (very big) table with that jid, the ANALYZE didn't include that row in the stats table, so I'm figuring there was a small random change that made it choose the better query. Doing: ALTER TABLE m ALTER jid SET STATISTICS 1000; produce a much more accurate row guess: QUERY PLAN -- Limit (cost=2909.65..2909.71 rows=25 width=115) - Sort (cost=2909.65..2910.64 rows=395 width=115) Sort Key: serial - Index Scan using m_jid_uid_key on m (cost=0.00..2892.61 rows=395 width=115) Index Cond: (jbid = 17674) Filter: (status = 2) SubPlan - Index Scan using d_pkey on d (cost=0.00..3.83 rows=1 width=24) Index Cond: (userkey = $0) (9 rows) It seems the problem is that the pg planner goes for the job with the lowest projected time, but ignores the worst case scenario. I think the odds of this problem happening again are lower since the SET STATISTICS, but I don't know what triggered the really bad plan in the first place. Did pg think that because so many rows would match the limit would be filled up soon, so that a more accurate and lower assumption would cause it to choose the better plan? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] PostgreSQL performance in simple queries
Tom Lane wrote: : * JDBC With JDBC out of the core, I'm not sure why we still have a JDBC section in the core TODO. Speaking of which why is the jdbc site so hard to find? For that matter the new foundry can only be found through the news article on the front page. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] shared buffer size on linux
See http://kerneltrap.org/node/view/3148, about 40% down, under the header 2.6 -aa patchset, object-based reverse mapping. Does this mean that the more shared memory the bigger the potential for a swap storm? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL performance in simple queries
Neil Conway wrote: PostgreSQL ( 7.5) won't consider using an indexscan when the predicate involves an integer literal and the column datatype is int2 or int8. Is this fixed for 7.5? It isn't checked off on the TODO list at http://developer.postgresql.org/todo.php ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Using LIKE expression problem..
Use the text_pattern_ops operator when creating the index, see: http://www.postgresql.org/docs/7.4/static/indexes-opclass.html Michael Ryan S. Puncia wrote: Sorry .. I am a newbie and I don't know :( How can I know that I am in C locale ? How can I change my database to use C locale? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Kings-Lynne Sent: Wednesday, May 12, 2004 3:59 PM To: Michael Ryan S. Puncia Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Using LIKE expression problem.. Are you in a non-C locale? Chris Michael Ryan S. Puncia wrote: Yes , I already do that but the same result .. LIKE uses seq scan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Kings-Lynne Sent: Wednesday, May 12, 2004 2:48 PM To: Michael Ryan S. Puncia Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Using LIKE expression problem.. In the query plan ..it uses seq scan rather than index scan .. why ? I have index on lastname, firtname. Have you run VACUUM ANALYZE; on the table recently? Chris ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] cache table
scott.marlowe wrote: I think you might be interested in materialized views. You could create this as a materialized view which should be very fast to just select * from. That seems to be the count table I envisioned. It just hides the details for me. It still has the problems of an extra UPDATE every time the data table is updated and generating a lot of dead tuples. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] cache table
I have a big table with some int fields. I frequently need to do queries like: SELECT if2, count(*) FROM table WHERE if1 = 20 GROUP BY if2; The problem is that this is slow and frequently requires a seqscan. I'd like to cache the results in a second table and update the counts with triggers, but this would a) require another UPDATE for each INSERT/UPDATE which would slow down adding and updating of data and b) produce a large amount of dead rows for vacuum to clear out. It would also be nice if this small table could be locked into the pg cache somehow. It doesn't need to store the data on disk because the counts can be generated from scratch? So what is the best solution to this problem? I'm sure it must come up pretty often. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Insert only tables and vacuum performance
Tom Lane wrote: Joseph Shraibman [EMAIL PROTECTED] writes: I have a table that is never updated, only INSERTED into. Is there a way I can prevent vacuum wasting time on this table What makes you think vacuum is wasting much time on this table? AFAICS it will only update any unfixed hint bits ... regards, tom lane INFO: elog: found 0 removable, 12869411 nonremovable row versions in 196195 pages DETAIL: 0 dead row versions cannot be removed yet. There were 5 unused item pointers. 0 pages are entirely empty. CPU 31.61s/4.53u sec elapsed 1096.83 sec. It took 1096.83 seconds, and what did it accomplish? And what are hint bits? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] analyzer/planner and clustered rows
How does the analyzer/planner deal with rows clustered together? Does it just assume that if this col is clustered on then the actual data will be clustered? What if the data in the table happens to be close together because it was inserted together originally? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Insert only tables and vacuum performance
Joseph Shraibman wrote: I have a table that is never updated, only INSERTED into. Is there a way I can prevent vacuum wasting time on this table besides vacuuming each table in the db by itself and omitting this table? How feasable would it be to have a marker somewhere in pg that is updated since last vacuum that would be cleared when vacuum runs, and if set vacuum will ignore that table? Or even better an offset into the datatable for the earliest deleted row, so if you have a table where you update the row shortly after insert and then never touch it vacuum can skip most of the table (inserts are done at the end of the table, right?) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Help with query plan inconsistencies
I'm going to ask because someone else surely will: Do you regularily vacuum/analyze the database? Woody Woodring wrote: Hello, I am using postgres 7.4.2 as a backend for geocode data for a mapping application. My question is why can't I get a consistent use of my indexes during a query, I tend to get a lot of seq scan results. I use a standard query: SELECT lat, long, mac, status FROM ( SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d' )AS FOO WHERE (long=X1) AND (long=X2) AND (lat=Y1) AND (lat=Y2) Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map viewing area. QUERY PLAN #1 #2 are from when I get a view from 10 miles out, sometimes it uses the index(#1) and most of the time not(#2). I do run into plans that seq scan both sides of the join. QUERY PLAN #3 is when I view from 5 miles out, and I have much greater chance of getting index scans ( about 90% of the time). I have listed information about the database below. Cable_billing ~500,000 rows updated once per day Davic ~500,000 rows, about 100 rows update per minute Any info or suggestions would be appreciated. Woody twc-ral-overview=# \d cable_billing; Table public.cable_billing Column | Type | Modifiers -++--- cable_billingid | integer| not null mac | macaddr| not null account | integer| number | character varying(10) | address | character varying(200) | region | character varying(30) | division| integer| franchise | integer| node| character varying(10) | lat | numeric| long| numeric| trunk | character varying(5) | ps | character varying(5) | fd | character varying(5) | le | character varying(5) | update | integer| boxtype | character(1) | Indexes: cable_billing_pkey primary key btree (mac), cable_billing_account_index btree (account), cable_billing_lat_long_idx btree (lat, long), cable_billing_node_index btree (node), cable_billing_region_index btree (region) twc-ral-overview=# \d davic Table public.davic Column | Type | Modifiers -+---+--- davicid | integer | not null mac | macaddr | not null source | character varying(20) | status | smallint | updtime | integer | type| character varying(10) | avail1 | integer | Indexes: davic_pkey primary key btree (mac) twc-ral-overview=# vacuum analyze; VACUUM twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM (SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d') AS foo WHERE (long=-78.70723462816063) AND (long=-78.53096764204116) AND (lat=35.5741118787) AND (lat=35.66366331376857); QUERY PLAN #1 - Nested Loop Left Join (cost=0.00..23433.18 rows=1871 width=34) (actual time=0.555..5095.434 rows=3224 loops=1) - Index Scan using cable_billing_lat_long_idx on cable_billing (cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931 rows=3224 loops=1) Index Cond: ((lat = 35.5741118787) AND (lat = 35.66366331376857) AND (long = -78.70723462816063) AND (long = -78.53096764204116)) Filter: (boxtype = 'd'::bpchar) - Index Scan using davic_pkey on davic (cost=0.00..6.01 rows=1 width=8) (actual time=1.476..1.480 rows=1 loops=3224) Index Cond: (outer.mac = davic.mac) Total runtime: 5100.028 ms (7 rows) twc-ral-overview=# vacuum analyze; VACUUM twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM (SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1 ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE boxtype='d') AS foo WHERE (long=-78.87878592206046) AND (long=-78.70220280717479) AND (lat=35.71703190638861) AND (lat=35.80658335998006); QUERY PLAN #2 --- Nested Loop Left Join (cost=0.00..76468.90 rows=9223 width=34) (actual time=0.559..17387.427 rows=19997 loops=1) - Seq Scan on cable_billing (cost=0.00..20837.76 rows=9223 width=32) (actual time=0.290..7117.799
[PERFORM] two seperate queries run faster than queries ORed together
explain SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 AND (u.status = 3 ) AND NOT u.boolfield ; QUERY PLAN -- Aggregate (cost=45707.84..45707.84 rows=1 width=4) - Nested Loop (cost=0.00..45707.16 rows=273 width=4) - Seq Scan on usertable u (cost=0.00..44774.97 rows=272 width=4) Filter: ((pkey = 260) AND (status = 3) AND (NOT boolfield)) - Index Scan using d_pkey on d (cost=0.00..3.41 rows=1 width=4) Index Cond: (d.ukey = outer.ukey) explain SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 AND (d.status = 3 ) AND NOT u.boolfield ; QUERY PLAN -- Aggregate (cost=28271.38..28271.38 rows=1 width=4) - Nested Loop (cost=0.00..28271.38 rows=1 width=4) - Seq Scan on d (cost=0.00..28265.47 rows=1 width=4) Filter: (status = 3) - Index Scan using u_pkey on u (cost=0.00..5.89 rows=1 width=4) Index Cond: ((outer.ukey = u.ukey) AND (u.pkey = 260)) Filter: (NOT boolfield) explain SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ; QUERY PLAN --- Aggregate (cost=128867.45..128867.45 rows=1 width=4) - Hash Join (cost=32301.47..128866.77 rows=272 width=4) Hash Cond: (outer.ukey = inner.ukey) Join Filter: ((inner.status = 3) OR (outer.status = 3)) - Seq Scan on u (cost=0.00..41215.97 rows=407824 width=6) Filter: ((pkey = 260) AND (NOT boolfield)) - Hash (cost=25682.98..25682.98 rows=1032998 width=6) - Seq Scan on d (cost=0.00..25682.98 rows=1032998 width=6) ... so what do I do? It would be a real pain to rewrite this query to run twice and add the results up, especially since I don't always know beforehand when it will be faster based on different values to the query. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] two seperate queries run faster than queries ORed together
Richard Huxton wrote: On Thursday 18 March 2004 21:21, Joseph Shraibman wrote: explain SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ; QUERY PLAN --- Aggregate (cost=128867.45..128867.45 rows=1 width=4) - Hash Join (cost=32301.47..128866.77 rows=272 width=4) Hash Cond: (outer.ukey = inner.ukey) Join Filter: ((inner.status = 3) OR (outer.status = 3)) - Seq Scan on u (cost=0.00..41215.97 rows=407824 width=6) Filter: ((pkey = 260) AND (NOT boolfield)) There's your problem. For some reason it thinks it's getting 407,824 rows back from that filtered seq-scan. I take it that pkey is a primary-key and is defined as being UNIQUE? If you actually did have several hundred thousand matches then a seq-scan might be sensible. No, pkey is not the primary key in this case. The number of entries in u that have pkey 260 and not boolfield is 344706. The number of those that have status == 3 is 7. To total number of entries in d that have status == 3 is 4. I'd start by analyze-ing the table in question, Is done every night. The problem is that it seems the planner doesn't think to do the different parts of the OR seperately and then combine the answers. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] two seperate queries run faster than queries ORed together
Tom Lane wrote: Joseph Shraibman [EMAIL PROTECTED] writes: No, pkey is not the primary key in this case. The number of entries in u that have pkey 260 and not boolfield is 344706. ... and every one of those rows *must* be included in the join input, *If* you use one big join in the first place. If postgres ran the query to first get the values with status == 3 from u, then ran the query to get the entries from d, then combined them, the result would be the same but the output faster. Instead it is doing seq scans on both tables and doing an expensive join that returns only a few rows. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] two seperate queries run faster than queries ORed together
Stephan Szabo wrote: On Mon, 22 Mar 2004, Joseph Shraibman wrote: Tom Lane wrote: Joseph Shraibman [EMAIL PROTECTED] writes: No, pkey is not the primary key in this case. The number of entries in u that have pkey 260 and not boolfield is 344706. ... and every one of those rows *must* be included in the join input, *If* you use one big join in the first place. If postgres ran the query to first get the values with status == 3 from u, then ran the query to get the entries from d, then combined them, the result would be the same but the output faster. Instead it is doing seq scans on both tables and Well, you have to be careful on the combination to not give the wrong answers if there's a row with u.status=3 that matches a row d.status=3. Right you would have to avoid duplicates. The existing DISTINCT code should be able to handle that. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Postgresql on SAN
Josh Berkus wrote: See above. Also keep in mind that PostgreSQL's use of I/O should improve 100% in version 7.5. Really? What happened? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])