Re: [PERFORM] query a table with lots of coulmns
On 19 September 2014 13:51, Björn Wittich bjoern_witt...@gmx.de wrote: Hi mailing list, I am relatively new to postgres. I have a table with 500 coulmns and about 40 mio rows. I call this cache table where one column is a unique key (indexed) and the 499 columns (type integer) are some values belonging to this key. Now I have a second (temporary) table (only 2 columns one is the key of my cache table) and I want do an inner join between my temporary table and the large cache table and export all matching rows. I found out, that the performance increases when I limit the join to lots of small parts. But it seems that the databases needs a lot of disk io to gather all 499 data columns. Is there a possibilty to tell the databases that all these colums are always treated as tuples and I always want to get the whole row? Perhaps the disk oraganization could then be optimized? Hi, do you have indexes on the columns you use for joins? Szymon
Re: [PERFORM] question about partial index
On 18 March 2014 22:26, Yu Zhao yzha...@gmail.com wrote: In PostgreSQL 9.3.3 Documentation 11.8. Partial Indexes Example 11-2 (http://www.postgresql.org/docs/9.3/interactive/indexes-partial.html), the partial index is created CREATE INDEX orders_unbilled_index ON orders (order_nr) WHERE billed is not true; And the suggested use mode is SELECT * FROM orders WHERE billed is not true AND order_nr 1; My question is after an update to the billed column is done, will PG automatically add or remove records whose billed are just set to false or true to/from the b-tree? Thanks in advance. Hi, the short answer is: yes, it will work as you expect. The long answer is: no, it will not simply add/remove because postgres keeps many different versions of the same row, so when you change the column from false to true, the new row version will be added to the index, when you change from true to false, the previous rows will be still stored in the index as well, because there could be some older transaction which should see some older version of the row. The mechanism is quite internal, and you shouldn't bother. As a database user you should just see, that the index is updated automatically, and it will store all rows where billed = true. regards, Szymon
Re: [PERFORM] SQL performance
On 2 June 2013 21:39, Robert DiFalco robert.difa...@gmail.com wrote: I have a table called contacts. It has a BIGINT owner_id which references a record in the user table. It also has a BIGINT user_id which may be null. Additionally it has a BOOLEAN blocked column to indicate if a contact is blocked. The final detail is that multiple contacts for an owner may reference the same user. I have a query to get all the user_ids of a non-blocked contact that is a mutual contact of the user. The important part of the table looks like this: CREATE TABLE contacts ( id BIGINT PRIMARY KEY NOT NULL, // generated blocked BOOL, owner_id BIGINT NOT NULL, user_id BIGINT, FOREIGN KEY ( owner_id ) REFERENCES app_users ( id ) ON DELETE CASCADE, FOREIGN KEY ( user_id ) REFERENCES app_users ( id ) ON DELETE SET NULL ); CREATE INDEX idx_contact_owner ON contacts ( owner_id ); CREATE INDEX idx_contact_mutual ON contacts ( owner_id, user_id ) WHERE user_id IS NOT NULL AND NOT blocked; The query looks like this: explain analyze verbose select c.user_id from contact_entity c where c.owner_id=24 and c.user_id24 and c.user_id IS NOT NULL and NOT c.blocked and (exists ( select 1 from contact_entity c1 where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT NULL and c1.user_id=24)) group by c.user_id; This will get all the users for user 24 that are mutual unblocked contacts but exclude the user 24. I have run this through explain several times and I'm out of ideas on the index. I note that I can also right the query like this: explain analyze verbose select distinct c.user_id from contact_entity c left outer join contact_entity c1 on c1.owner_id = c.user_id and c1.user_id = c.owner_id where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id 24 AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL group by c.user_id; I don't notice a big difference in the query plans. I also notice no difference if I replace the GROUP BY with DISTINCT. My question is, can this be tightened further in a way I haven't been creative enough to try? Does it matter if I use the EXISTS versus the OUTER JOIN or the GROUP BY versus the DISTINCT. Is there a better index and I just have not been clever enough to come up with it yet? I've tried a bunch. Thanks in advance!! Robert Hi Robert, could you show us the plans? thanks, Szymon
Re: [PERFORM] [planner] Ignore order by in subselect if parrent do count(*)
On 1 March 2012 13:02, Marcin Mirosław mar...@mejor.pl wrote: W dniu 01.03.2012 12:50, Szymon Guz pisze: Hi Szymon, If you have only 2 rows in the table, then the plan really doesn't matter too much. Sorting two rows would be really fast :) Try to check it with 10k rows. It doesn't matter (in this case) how many records is in user_profile table. Planner does sorting. Here is version with more rows: $ explain (analyze,verbose,buffers) SELECT count(*) from (select * from users_profile order by id) u_p; QUERY PLAN --- Aggregate (cost=1593639.92..1593639.93 rows=1 width=0) (actual time=11738.498..11738.498 rows=1 loops=1) Output: count(*) Buffers: shared hit=2499 read=41749 written=10595, temp read=17107 written=17107 - Sort (cost=1443640.26..1468640.21 rows=977 width=4) (actual time=9804.461..10963.911 rows=1000 loops=1) Output: users_profile.id Sort Key: users_profile.id Sort Method: external sort Disk: 136856kB Buffers: shared hit=2499 read=41749 written=10595, temp read=17107 written=17107 - Seq Scan on public.users_profile (cost=0.00..144247.77 rows=977 width=4) (actual time=0.021..1192.202 rows=1000 loops=1) Output: users_profile.id Buffers: shared hit=2499 read=41749 written=10595 Total runtime: 11768.199 ms (12 rows) And without order by: $ explain (analyze,verbose,buffers) SELECT count(*) from (select * from users_profile ) u_p; QUERY PLAN Aggregate (cost=169247.71..169247.72 rows=1 width=0) (actual time=1757.613..1757.613 rows=1 loops=1) Output: count(*) Buffers: shared hit=2522 read=41726 - Seq Scan on public.users_profile (cost=0.00..144247.77 rows=977 width=0) (actual time=0.032..946.166 rows=1000 loops=1) Output: users_profile.id Buffers: shared hit=2522 read=41726 Total runtime: 1757.656 ms (7 rows) Could you provide the postgres version and the structure of users_profile table (with indexes)? - Szymon
Re: [PERFORM] Query tuning help
On 11 October 2011 19:52, CS DBA cs_...@consistentstate.com wrote: Hi all ; I'm trying to tune a difficult query. I have 2 tables: cust_acct (9million rows) cust_orders (200,000 rows) Here's the query: SELECT a.account_id, a.customer_id, a.order_id, a.primary_contact_id, a.status, a.customer_location_id, a.added_date, o.agent_id, p.order_location_id_id, COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id FROM cust_acct a JOIN cust_orders o ON a.order_id = p.order_id; I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds. We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB I've tried separating the queries as filtering queries joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan: Hash Join (cost=151.05..684860.30 rows=9783130 width=100) Hash Cond: (a.order_id = o.order_id) - Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92) - Hash (cost=122.69..122.69 rows=2269 width=12) - Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12) Thanks in advance for any help, tips, etc... Hi, two simple questions: - do you really need getting all 9M rows? - show us the table structure, together with index definitions regards Szymon
Re: [PERFORM] postgresql query runtime
On 11 October 2011 21:08, Radhya sahal rad_cs_2...@yahoo.com wrote: Hi I want to know how can i measure runtime query in postgresql if i use command line psql? not explain rutime for the query such as the runtime which appear in pgadmin ? such as Total query runtime: 203 ms. run this in psql: \t regards Szymon
Re: [PERFORM] postgresql query runtime
On 11 October 2011 21:13, Szymon Guz mabew...@gmail.com wrote: On 11 October 2011 21:08, Radhya sahal rad_cs_2...@yahoo.com wrote: Hi I want to know how can i measure runtime query in postgresql if i use command line psql? not explain rutime for the query such as the runtime which appear in pgadmin ? such as Total query runtime: 203 ms. run this in psql: \t regards Szymon yes... \timing of course... I think I shouldn't send emails when I've got a fever :) - Szymon
Re: [PERFORM] Which Join is better
On 2 August 2011 08:42, Adarsh Sharma adarsh.sha...@orkash.com wrote: Dear all, Just want to know which join is better for querying data faster. I have 2 tables A ( 70 GB ) B ( 7 MB ) A has 10 columns B has 3 columns.Indexes exist on both tables's ids. select p.* from table A p, B q where p.id=q.id or select p.* from table B q , A p where q.id=p.id Hi, it really doesn't matter. PostgreSQL can reorder the joins as it likes. And you can always check, but I think the plans will be the same. regards Szymon
Re: [PERFORM] help speeding up a query in postgres 8.4.5
On 5 April 2011 21:25, Maria L. Wilson maria.l.wilso...@nasa.gov wrote: Would really appreciate someone taking a look at the query below Thanks in advance! this is on a linux box... Linux dsrvr201.larc.nasa.gov 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37 EST 2009 x86_64 x86_64 x86_64 GNU/Linux explain analyze select MIN(IV.STRTDATE), MAX(IV.ENDDATE) from GRAN_VER GV left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID, INVSENSOR INVS where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and INVS.sensor_id='13' Aggregate (cost=736364.52..736364.53 rows=1 width=8) (actual time=17532.930..17532.930 rows=1 loops=1) - Hash Join (cost=690287.33..734679.77 rows=336949 width=8) (actual time=13791.593..17323.080 rows=924675 loops=1) Hash Cond: (invs.granule_id = gv.granule_id) - Seq Scan on invsensor invs (cost=0.00..36189.41 rows=1288943 width=4) (actual time=0.297..735.375 rows=1277121 loops=1) Filter: (sensor_id = 13) - Hash (cost=674401.52..674401.52 rows=1270865 width=16) (actual time=13787.698..13787.698 rows=1270750 loops=1) - Hash Join (cost=513545.62..674401.52 rows=1270865 width=16) (actual time=1998.702..13105.578 rows=1270750 loops=1) Hash Cond: (gv.granule_id = iv.granule_id) - Seq Scan on gran_ver gv (cost=0.00..75224.90 rows=4861490 width=4) (actual time=0.008..1034.885 rows=4867542 loops=1) - Hash (cost=497659.81..497659.81 rows=1270865 width=12) (actual time=1968.918..1968.918 rows=1270750 loops=1) - Bitmap Heap Scan on inventory iv (cost=24050.00..497659.81 rows=1270865 width=12) (actual time=253.542..1387.957 rows=1270750 loops=1) Recheck Cond: (inv_id = 65) - Bitmap Index Scan on inven_idx1 (cost=0.00..23732.28 rows=1270865 width=0) (actual time=214.364..214.364 rows=1270977 loops=1) Index Cond: (inv_id = 65) Total runtime: 17533.100 ms some additional info. the table inventory is about 4481 MB and also has postgis types. the table gran_ver is about 523 MB the table INVSENSOR is about 217 MB the server itself has 32G RAM with the following set in the postgres conf shared_buffers = 3GB work_mem = 64MB maintenance_work_mem = 512MB wal_buffers = 6MB let me know if I've forgotten anything! thanks a bunch!! Maria Wilson NASA/Langley Research Center Hampton, Virginia m.l.wil...@nasa.gov Hi, could you show us indexes that you have on all tables from this query? Have you tried running vacuum analyze on those tables? Do you have autovacuum active? regards Szymon
Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences
On 5 November 2010 11:59, A B gentosa...@gmail.com wrote: Hi there. If you just wanted PostgreSQL to go as fast as possible WITHOUT any care for your data (you accept 100% dataloss and datacorruption if any error should occur), what settings should you use then? I'm just curious, what do you need that for? regards Szymon
Re: [PERFORM] which one is faster
On 26 October 2010 12:56, AI Rumman rumman...@gmail.com wrote: Which one is faster? select count(*) from talble or select count(id) from table where id is the primary key. Check the query plan, both queries are the same. regards Szymon
Re: [PERFORM] which one is faster
2010/10/26 Marcin Mirosław mar...@mejor.pl W dniu 26.10.2010 12:59, Szymon Guz pisze: both queries are the same. IMHO they aren't the same, but they returns the same value in this case. I mean count(field) doesn't count NULL values, count(*) does it. I'm writing this only for note:) Regards Yup, indeed. I omitted that note, as it was written that the field is primary key :). regards Szymon
Re: [PERFORM] which one is faster
2010/10/26 Grzegorz Jaśkiewicz gryz...@gmail.com implementation wise, count(*) is faster. Very easy to test: SELECT COUNT(*) FROM generate_series(1,100) a, generate_series(1,1000) b; SELECT COUNT(a) FROM generate_series(1,100) a, generate_series(1,1000) b; ;] Well, strange. Why is that slower?
Re: [PERFORM] which one is faster
2010/10/26 Grzegorz Jaśkiewicz gryz...@gmail.com 2010/10/26 Szymon Guz mabew...@gmail.com: Well, strange. Why is that slower? To answer that fully, you would need to see the implementation. suffice to say, count(a) does: if (a NULL) { count++; } and count(*) does: count++; Yup, I was afraid of that, even if there is not null on the column... but I think usually nobody notices the difference with count. regards Szymon
Re: [PERFORM] why index is not working in operation?
2010/7/22 AI Rumman rumman...@gmail.com I have a table. \d email_track Table public.email_track Column | Type | Modifiers +-+ crmid | integer | not null default 0 mailid | integer | not null default 0 count | integer | Indexes: email_track_pkey PRIMARY KEY, btree (crmid, mailid) CLUSTER email_track_count_idx btree (count) explain analyze select * from email_track where count 10 ; QUERY PLAN Bitmap Heap Scan on email_track (cost=12.79..518.05 rows=1941 width=12) (actual time=0.430..3.047 rows=1743 loops=1) Recheck Cond: (count 10) - Bitmap Index Scan on email_track_count_idx (cost=0.00..12.79 rows=1941 width=0) (actual time=0.330..0.330 rows=1743 loops=1) Index Cond: (count 10) Total runtime: 4.702 ms (5 rows) explain analyze select * from email_track where count 1 ; QUERY PLAN -- Seq Scan on email_track (cost=0.00..1591.65 rows=88851 width=12) (actual time=0.011..118.499 rows=88852 loops=1) Filter: (count 1) Total runtime: 201.206 ms (3 rows) I don't know why index scan is not working for count 1 operation. Any idea please. Database knows, due to table statistics, that the query 10 would return small (1941) number of rows, while query 1 would return big (88851) number of rows. The small and big is quite relative, but the result is that the database knows, that it would be faster not to use index, if the number of returning rows is big. regards Szymon Guz
Re: [PERFORM] Slow function in queries SELECT clause.
2010/6/19 Davor J. dav...@live.com I think I have read what is to be read about queries being prepared in plpgsql functions, but I still can not explain the following, so I thought to post it here: Suppose 2 functions: factor(int,int) and offset(int, int). Suppose a third function: convert(float,int,int) which simply returns $1*factor($2,$3)+offset($2,$3) All three functions are IMMUTABLE. Very simple, right? Now I have very fast AND very slow executing queries on some 150k records: VERY FAST (half a second): SELECT data*factor(1,2)+offset(1,2) FROM tbl_data; VERY SLOW (a minute): SELECT convert(data, 1, 2) FROM tbl_data; The slowness cannot be due to calling a function 150k times. If I define convert2(float,int,int) to return a constant value, then it executes in about a second. (still half as slow as the VERY FAST query). I assume that factor and offset are cached in the VERY FAST query, and not in the slow one? If so, why not and how can I force it? Currently I need only one function for conversions. Regards, Davor Hi, show us the code of those two functions and explain analyze of those queries. regards Szymon Guz
Re: [PERFORM] query hangs
2010/6/10 AI Rumman rumman...@gmail.com Can anyone please tell me why the following query hangs? This is a part of a large query. explain select * from vtiger_emaildetails inner join vtiger_vantage_email_track on vtiger_emaildetails.emailid = vtiger_vantage_email_track.mailid left join vtiger_seactivityrel on vtiger_seactivityrel.activityid = vtiger_emaildetails.emailid QUERY PLAN - Merge Left Join (cost=9500.30..101672.51 rows=2629549 width=506) Merge Cond: (outer.emailid = inner.activityid) - Merge Join (cost=9500.30..11658.97 rows=88852 width=498) Merge Cond: (outer.emailid = inner.mailid) - Index Scan using vtiger_emaildetails_pkey on vtiger_emaildetails (cost=0.00..714.40 rows=44595 width=486) - Sort (cost=9500.30..9722.43 rows=88852 width=12) Sort Key: vtiger_vantage_email_track.mailid - Seq Scan on vtiger_vantage_email_track (cost=0.00..1369.52 rows=88852 width=12) - Index Scan using seactivityrel_activityid_idx on vtiger_seactivityrel (cost=0.00..28569.29 rows=1319776 width=8) (9 rows) select relname, reltuples, relpages from pg_class where relname in ('vtiger_emaildetails','vtiger_vantage_email_track','vtiger_seactivityrel'); relname | reltuples | relpages +-+-- vtiger_emaildetails| 44595 | 1360 vtiger_seactivityrel | 1.31978e+06 | 6470 vtiger_vantage_email_track | 88852 | 481 (3 rows) Could you define what you mean by 'hangs'? Does it work or not? Check table pg_locks for locking issues, maybe the query is just slow but not hangs. Notice that the query just returns 2M rows, that can be quite huge number due to your database structure, data amount and current server configuration. regards Szymon Guz
Re: [PERFORM] query hangs
2010/6/10 AI Rumman rumman...@gmail.com I found only AccessShareLock in pg_locks during the query. And the query does not return data though I have been waiting for 10 mins. Do you have any idea ? On Thu, Jun 10, 2010 at 5:26 PM, Szymon Guz mabew...@gmail.com wrote: 2010/6/10 AI Rumman rumman...@gmail.com Can anyone please tell me why the following query hangs? This is a part of a large query. explain select * from vtiger_emaildetails inner join vtiger_vantage_email_track on vtiger_emaildetails.emailid = vtiger_vantage_email_track.mailid left join vtiger_seactivityrel on vtiger_seactivityrel.activityid = vtiger_emaildetails.emailid QUERY PLAN - Merge Left Join (cost=9500.30..101672.51 rows=2629549 width=506) Merge Cond: (outer.emailid = inner.activityid) - Merge Join (cost=9500.30..11658.97 rows=88852 width=498) Merge Cond: (outer.emailid = inner.mailid) - Index Scan using vtiger_emaildetails_pkey on vtiger_emaildetails (cost=0.00..714.40 rows=44595 width=486) - Sort (cost=9500.30..9722.43 rows=88852 width=12) Sort Key: vtiger_vantage_email_track.mailid - Seq Scan on vtiger_vantage_email_track (cost=0.00..1369.52 rows=88852 width=12) - Index Scan using seactivityrel_activityid_idx on vtiger_seactivityrel (cost=0.00..28569.29 rows=1319776 width=8) (9 rows) select relname, reltuples, relpages from pg_class where relname in ('vtiger_emaildetails','vtiger_vantage_email_track','vtiger_seactivityrel'); relname | reltuples | relpages +-+-- vtiger_emaildetails| 44595 | 1360 vtiger_seactivityrel | 1.31978e+06 | 6470 vtiger_vantage_email_track | 88852 | 481 (3 rows) Could you define what you mean by 'hangs'? Does it work or not? Check table pg_locks for locking issues, maybe the query is just slow but not hangs. Notice that the query just returns 2M rows, that can be quite huge number due to your database structure, data amount and current server configuration. regards Szymon Guz 1. Make vacuum analyze on used tables. 2. Check how long it would take if you limit the number of returned rows just to 100 3. Do you have indexes on used columns? regards Szymon Guz
Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present
2010/6/2 Jori Jovanovich j...@dimensiology.com hi, I have a problem space where the main goal is to search backward in time for events. Time can go back very far into the past, and so the table can get quite large. However, the vast majority of queries are all satisfied by relatively recent data. I have an index on the row creation date and I would like almost all of my queries to have a query plan looking something like: [CUT] Do you have autovacuum running? Have you tried updating statistics? regards Szymon Guz
Re: [PERFORM] Optimizer showing wrong rows in plan
2010/3/28 Tadipathri Raghu traghu@gmail.com Hi All, Example on optimizer === postgres=# create table test(id int); CREATE TABLE postgres=# insert into test VALUES (1); INSERT 0 1 postgres=# select * from test; id 1 (1 row) postgres=# explain select * from test; QUERY PLAN Seq Scan on test (cost=0.00..34.00 *rows=2400* width=4) (1 row) In the above, example the optimizer is retreiving those many rows where there is only one row in that table. If i analyze am geting one row. No, the optimizer is not retrieving anything, it just assumes that there are 2400 rows because that is the number of rows that exists in the statictics for this table. The optimizer just tries to find the best plan and to optimize the query plan for execution taking into consideration all information that can be found for this table (it also looks in the statistics information about rows from this table). postgres=# ANALYZE test; ANALYZE postgres=# explain select * from test; QUERY PLAN Seq Scan on test (cost=0.00..1.01 *rows=1* width=4) (1 row) My question here is, what it retreiving as rows when there is no such. One more thing, if i wont do analyze and run the explain plan for three or more times, then catalogs getting updated automatically and resulting the correct row as 1. Now ANALYZE changed the statistics for this table and now the planner knows that there is just one row. In the background there can work autovacuum so it changes rows automatically (the autovacuum work characteristic depends on the settings for the database). Q2. Does explain , will update the catalogs automatically. No, explain doesn't update table's statistics. regards Szymon Guz
Re: [PERFORM] Optimizer showing wrong rows in plan
2010/3/28 Tadipathri Raghu traghu@gmail.com Hi Guz, Thank you for the prompt reply. No, the optimizer is not retrieving anything, it just assumes that there are 2400 rows because that is the number of rows that exists in the statictics for this table. The optimizer just tries to find the best plan and to optimize the query plan for execution taking into consideration all information that can be found for this table (it also looks in the statistics information about rows from this table). So, whats it assuming here as rows(2400). Could you explain this. It is assuming that there are 2400 rows in this table. Probably you've deleted some rows from the table leaving just one. regards Szymon Guz