Re: [HACKERS] Curious buildfarm failures (fwd)
On Wed, 16 Jan 2013, Andres Freund wrote: So unless somebody suggest otherwise, i'm going to switch to gcc on this buildfarm. What about switching to -O1 of 11.1? I don't know. It is up to -hackers to decide. I think that icc on ia64 have shown bugginess time after time. But if you think that buildfarm with icc 11.1 -O1 carry more information than say running gcc, i can still run icc. S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Curious buildfarm failures (fwd)
Hi, On Wed, 16 Jan 2013, Andres Freund wrote: On 2013-01-16 01:28:09 -0500, Tom Lane wrote: It's a compiler bug. Thanks for investigating. But I'm not sure there is any way other way for me other than switching to gcc, because intel stopped providing their IA64 version of compilers free of charge even for non-commercial/educational people: http://software.intel.com/en-us/intel-software-development-tools-for-intel-itanium-processors (their website is also a bit of maze, so I don't see anywhere the updated versions of the 10.0, 10.1, 11.1 compilers that I have) So unless somebody suggest otherwise, i'm going to switch to gcc on this buildfarm. Cheers, Sergey * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Curious buildfarm failures (fwd)
On Tue, 15 Jan 2013, Tom Lane wrote: BTW, I just finished trying to reproduce this on an IA64 machine belonging to Red Hat, without success. So that seems to eliminate any possibility of the machine architecture being the trigger issue. The compiler's still a likely cause though. Anybody have a similar ICC version (dugong's says it is 10.0 20070809) to try? Also, Sergey, can you find a non-dot-zero release to try? I think it is indeed the main issue. I've tried 10.1 ( 10.1.011 ) and it doesn't fail. When I tried 11.1 (icc (ICC) 11.1 20100401 ) it failed in a quite strange way (I don't quite remember it happenning before) test tablespace ... ok parallel group (18 tests): txid int2 text name oid varchar int4 char money float8 uuid float4 int8 boolean bit enum numeric rangetypes boolean ... ok char ... ok name ... ok varchar ... ok text ... ok int2 ... ok int4 ... ok int8 ... ok oid ... ok float4 ... ok float8 ... ok bit ... ok numeric ... ok txid ... ok uuid ... ok enum ... ok money... ok rangetypes ... FAILED (test process exited with exit code 2) test strings ... FAILED (test process exited with exit code 2) test numerology ... FAILED (test process exited with exit code 2) parallel group (19 tests): path interval time inet circle macaddr comments timestamp timestamptz reltime date tstypes tinterval abstime timetz lseg box polygon point point... FAILED (test process exited with exit code 2) lseg ... FAILED (test process exited with exit code 2) box ... FAILED (test process exited with exit code 2) path ... FAILED (test process exited with exit code 2) polygon ... FAILED (test process exited with exit code 2) circle ... FAILED (test process exited with exit code 2) date ... FAILED (test process exited with exit code 2) time ... FAILED (test process exited with exit code 2) timetz ... FAILED (test process exited with exit code 2) timestamp... FAILED (test process exited with exit code 2) timestamptz ... FAILED (test process exited with exit code 2) interval ... FAILED (test process exited with exit code 2) abstime ... FAILED (test process exited with exit code 2) reltime ... FAILED (test process exited with exit code 2) tinterval... FAILED (test process exited with exit code 2) inet ... FAILED (test process exited with exit code 2) macaddr ... FAILED (test process exited with exit code 2) tstypes ... FAILED (test process exited with exit code 2) comments ... FAILED (test process exited with exit code 2) parallel group (6 tests): geometry regex horology type_sanity oidjoins opr_sanity geometry ... FAILED horology ... FAILED regex... ok oidjoins ... ok type_sanity ... ok opr_sanity ... ok test insert ... ok test create_function_1... ok test create_type ... ok test create_table ... ok test create_function_2... ok parallel group (2 tests): copyselect copy copy ... ok copyselect ... ok parallel group (2 tests): create_operator create_misc create_misc ... ok create_operator ... ok parallel group (2 tests): create_view create_index create_index ... FAILED (test process exited with exit code 2) create_view ... ok parallel group (11 tests): constraints triggers create_cast create_function_3 updatable_views inherit drop_if_exists create_aggregate create_table_like typed_table vacuum create_aggregate ... FAILED (test process exited with exit code 2) create_function_3... FAILED (test process exited with exit code 2) create_cast ... FAILED (test process exited with exit code 2) constraints ... FAILED (test process exited with exit code 2) triggers ... FAILED (test process exited with exit code 2) inherit ... FAILED (test process exited with exit code 2) create_table_like... FAILED (test process exite
Re: [HACKERS] Curious buildfarm failures (fwd)
On Tue, 15 Jan 2013, Andres Freund wrote: Any chance you could run make check again but with log_statement=all and log_min_messages=debug2? That might tell us a bit more, whether the dependency code doesn't work right or whether the checkpoint is doing strange things. Here it is : 2013-01-15 23:06:18 MSK [50f5a8aa.1162:1] DEBUG: SlruScanDirectory invoking callback on pg_notify/ 2013-01-15 23:06:18 MSK [50f5a8aa.1162:2] DEBUG: removing file "pg_notify/" 2013-01-15 23:06:18 MSK [50f5a8aa.1162:3] DEBUG: max_safe_fds = 985, usable_fds = 1000, already_open = 5 2013-01-15 23:06:18 MSK [50f5a8aa.1167:1] LOG: database system was shut down at 2013-01-15 23:06:18 MSK 2013-01-15 23:06:18 MSK [50f5a8aa.1167:2] DEBUG: checkpoint record is at 0/17700E0 2013-01-15 23:06:18 MSK [50f5a8aa.1167:3] DEBUG: redo record is at 0/17700E0; shutdown TRUE 2013-01-15 23:06:18 MSK [50f5a8aa.1167:4] DEBUG: next transaction ID: 0/686; next OID: 12031 2013-01-15 23:06:18 MSK [50f5a8aa.1167:5] DEBUG: next MultiXactId: 1; next MultiXactOffset: 0 2013-01-15 23:06:18 MSK [50f5a8aa.1167:6] DEBUG: oldest unfrozen transaction ID: 676, in database 1 2013-01-15 23:06:18 MSK [50f5a8aa.1167:7] DEBUG: transaction ID wrap limit is 2147484323, limited by database with OID 1 2013-01-15 23:06:18 MSK [50f5a8aa.1168:1] DEBUG: checkpointer updated shared memory configuration values 2013-01-15 23:06:18 MSK [50f5a8aa.116b:1] LOG: autovacuum launcher started 2013-01-15 23:06:18 MSK [50f5a8aa.1162:4] LOG: database system is ready to accept connections 2013-01-15 23:06:19 MSK [50f5a8aa.1162:5] DEBUG: forked new backend, pid=4463 socket=8 2013-01-15 23:06:19 MSK [50f5a8aa.1162:6] DEBUG: server process (PID 4463) exited with exit code 0 2013-01-15 23:06:19 MSK [50f5a8aa.1162:7] DEBUG: forked new backend, pid=4465 socket=8 2013-01-15 23:06:19 MSK [50f5a8ab.1171:1] LOG: statement: CREATE DATABASE "regression" TEMPLATE=template0 2013-01-15 23:06:19 MSK [50f5a8aa.1168:2] LOG: checkpoint starting: immediate force wait 2013-01-15 23:06:19 MSK [50f5a8aa.1168:3] DEBUG: SlruScanDirectory invoking callback on pg_multixact/offsets/ 2013-01-15 23:06:19 MSK [50f5a8aa.1168:4] DEBUG: SlruScanDirectory invoking callback on pg_multixact/members/ 2013-01-15 23:06:19 MSK [50f5a8aa.1168:5] DEBUG: attempting to remove WAL segments older than log file 2013-01-15 23:06:19 MSK [50f5a8aa.1168:6] DEBUG: SlruScanDirectory invoking callback on pg_subtrans/ 2013-01-15 23:06:19 MSK [50f5a8aa.1168:7] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.000 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s 2013-01-15 23:06:19 MSK [50f5a8aa.1168:8] LOG: checkpoint starting: immediate force wait 2013-01-15 23:06:19 MSK [50f5a8aa.1168:9] DEBUG: attempting to remove WAL segments older than log file 2013-01-15 23:06:19 MSK [50f5a8aa.1168:10] DEBUG: SlruScanDirectory invoking callback on pg_subtrans/ 2013-01-15 23:06:19 MSK [50f5a8aa.1168:11] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.000 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s 2013-01-15 23:06:19 MSK [50f5a8aa.1162:8] DEBUG: server process (PID 4465) exited with exit code 0 2013-01-15 23:06:19 MSK [50f5a8aa.1162:9] DEBUG: forked new backend, pid=4467 socket=8 2013-01-15 23:06:19 MSK [50f5a8ab.1173:1] LOG: statement: ALTER DATABASE "regression" SET lc_messages TO 'C';ALTER DATABASE "regression" SET lc_monetary TO 'C';ALTER DATABASE "regression" SET lc_numeric TO 'C';ALTER DATABASE "regression" SET lc_time TO 'C';ALTER DATABASE "regression" SET timezone_abbreviations TO 'Default'; 2013-01-15 23:06:19 MSK [50f5a8aa.1162:10] DEBUG: server process (PID 4467) exited with exit code 0 2013-01-15 23:06:19 MSK [50f5a8aa.1162:11] DEBUG: forked new backend, pid=4469 socket=8 2013-01-15 23:06:19 MSK [50f5a8ab.1175:1] LOG: statement: CREATE TABLESPACE testspace LOCATION '/home/math/pg_git/src/test/regress/testtablespace'; 2013-01-15 23:06:19 MSK [50f5a8ab.1175:2] LOG: statement: ALTER TABLESPACE testspace SET (random_page_cost = 1.0); 2013-01-15 23:06:19 MSK [50f5a8ab.1175:3] LOG: statement: ALTER TABLESPACE testspace SET (some_nonexistent_parameter = true); 2013-01-15 23:06:19 MSK [50f5a8ab.1175:4] ERROR: unrecognized parameter "some_nonexistent_parameter" 2013-01-15 23:06:19 MSK [50f5a8ab.1175:5] STATEMENT: ALTER TABLESPACE testspace SET (some_nonexistent_parameter = true); 2013-01-15 23:06:19 MSK [50f5a8ab.1175:6] LOG: statement: ALTER TABLESPACE testspace RESET (random_page_cost = 2.0); 2013-01-15 23:06:19 MSK [50f5a8ab.1175:7] ERROR: RESET must not include values for parameters 2013-01-15 23:06:19 MSK [50f5a8ab.1175:8] STATEMENT: ALTER TABLESPACE testspace RESET (random_page_cost = 2.0); 2013-01-15 23:
Re: [HACKERS] Curious buildfarm failures (fwd)
Hi, Date: Tue, 15 Jan 2013 11:57:07 -0500 From: Tom Lane To: Andres Freund Cc: m...@sai.msu.ru, pgsql-hackers@postgreSQL.org, Andrew Dunstan Subject: Re: Curious buildfarm failures Well, it could be quite reproducible, if for example what's happening is that the DROP is failing to wait for the checkpointer at all. Is there a way to enable log_checkpoints during a buildfarm run? It'd be good to get timestamps added to the postmaster log entries, too. Here is the log output from the failing pg_regress after enabling checkpoints and timestamps: 2013-01-15 21:20:19 MSK [50f58fd3.589e:1] LOG: database system was shut down at 2013-01-15 21:20:19 MS K 2013-01-15 21:20:19 MSK [50f58fd3.58a2:1] LOG: autovacuum launcher started 2013-01-15 21:20:19 MSK [50f58fd3.5899:1] LOG: database system is ready to accept connections 2013-01-15 21:20:20 MSK [50f58fd3.589f:1] LOG: checkpoint starting: immediate force wait 2013-01-15 21:20:21 MSK [50f58fd3.589f:2] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 transact ion log file(s) added, 0 removed, 0 recycled; write=0.604 s, sync=0.000 s, total=0.605 s; sync files=0, longest=0.000 s, average=0.000 s 2013-01-15 21:20:21 MSK [50f58fd3.589f:3] LOG: checkpoint starting: immediate force wait 2013-01-15 21:20:21 MSK [50f58fd3.589f:4] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transact ion log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.000 s; sync files=0, longest=0.000 s, average=0.000 s 2013-01-15 21:20:21 MSK [50f58fd5.58ac:1] ERROR: unrecognized parameter "some_nonexistent_parameter" 2013-01-15 21:20:21 MSK [50f58fd5.58ac:2] STATEMENT: ALTER TABLESPACE testspace SET (some_nonexistent_ parameter = true); 2013-01-15 21:20:21 MSK [50f58fd5.58ac:3] ERROR: RESET must not include values for parameters 2013-01-15 21:20:21 MSK [50f58fd5.58ac:4] STATEMENT: ALTER TABLESPACE testspace RESET (random_page_cos t = 2.0); 2013-01-15 21:20:21 MSK [50f58fd5.58ac:5] ERROR: duplicate key value violates unique constraint "anind ex" 2013-01-15 21:20:21 MSK [50f58fd5.58ac:6] DETAIL: Key (column1)=(1) already exists. 2013-01-15 21:20:21 MSK [50f58fd5.58ac:7] STATEMENT: INSERT INTO testschema.atable VALUES(1); 2013-01-15 21:20:21 MSK [50f58fd5.58ac:8] ERROR: directory "/no/such/location" does not exist 2013-01-15 21:20:21 MSK [50f58fd5.58ac:9] STATEMENT: CREATE TABLESPACE badspace LOCATION '/no/such/loc ation'; 2013-01-15 21:20:21 MSK [50f58fd5.58ac:10] ERROR: tablespace "nosuchspace" does not exist 2013-01-15 21:20:21 MSK [50f58fd5.58ac:11] STATEMENT: CREATE TABLE bar (i int) TABLESPACE nosuchspace; 2013-01-15 21:20:21 MSK [50f58fd3.589f:5] LOG: checkpoint starting: immediate force wait 2013-01-15 21:20:21 MSK [50f58fd3.589f:6] LOG: checkpoint complete: wrote 37 buffers (0.2%); 0 transac tion log file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.000 s, total=0.001 s; sync files=0 , longest=0.000 s, average=0.000 s 2013-01-15 21:20:21 MSK [50f58fd5.58ac:12] ERROR: tablespace "testspace" is not empty 2013-01-15 21:20:21 MSK [50f58fd5.58ac:13] STATEMENT: DROP TABLESPACE testspace; 2013-01-15 21:20:21 MSK [50f58fd3.589f:7] LOG: checkpoint starting: immediate force wait 2013-01-15 21:20:21 MSK [50f58fd3.589f:8] LOG: checkpoint complete: wrote 9 buffers (0.1%); 0 transact ion log file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.000 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s 2013-01-15 21:20:21 MSK [50f58fd5.58ac:14] ERROR: tablespace "testspace" is not empty 2013-01-15 21:20:21 MSK [50f58fd5.58ac:15] STATEMENT: DROP TABLESPACE testspace; And I do see the tblspc file left after the finish of "make check": tmp_check/data/pg_tblspc/16385/PG_9.3_201212081/16384/16387 Cheers, S PS I wouldn't be surprised that it is a compiler bug though. But I did see the failure with newer icc as well. * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value
On Sun, 2 Sep 2012, Peter Geoghegan wrote: On 2 September 2012 16:26, Sergey Koposov wrote: That's why we support altering that value with an ALTER TABLE...ALTER COLUMN DDL statement. You might at least consider increasing the statistics target for the column first though, to see if that will make the n_distinct value better accord with reality. Thanks, I've forgot about that possibility. After fixing the n_distinct to the correct value the index scan became the preferred option. and the row number estimate for the index scan became more realistic Sorry for the noise. Sergey * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value
On Sun, 2 Sep 2012, Tom Lane wrote: Sergey Koposov writes: The problem is definitely the misestimation here: -> Index Scan using data_objid_idx on data d1 (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050 rows=248 loops=11456) Index Cond: (objid = t.objid) The planner thinks that indexscan will be 2000 times more expensive than it really is (assuming that the cost per retrieved row is linear, which it isn't entirely, but close enough for the moment). Of course, it's also thinking the bitmap component scan on the same index will be 2000 times more expensive than reality, but that has only perhaps a 4X impact on the total cost of the bitmap scan, since the use of the other index is what dominates there. With a more accurate idea of this join condition's selectivity, I'm pretty certain it would have gone for a plain indexscan, or else a bitmap scan using only this index. So if there's a bug here, it's in eqjoinsel(). Can you pinpoint anything unusual about the stats of the objid columns? Here are the pg_stats rows for two tables. schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals ! | most_common_freq! s ! | !
Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value
Thanks for your comments. On Sun, 2 Sep 2012, Peter Geoghegan wrote: On 2 September 2012 06:21, Sergey Koposov wrote: I think that this kind of question is better suited to the pgsql-performance list. Granted, it was presented as a bug report (though they're generally sent to -bugs rather than -hackers), but I don't think that this is a valid bug. The reason is that was inclined to think that it is a bug is that I encountered a similar bug before with bitmap scans and very big tables http://archives.postgresql.org/pgsql-hackers/2011-08/msg00958.php Furthermore 2 orders of magnitudes more of CPU time for bitmap scans comparing to the index scan didn't sound right to me (although obviously I'm not in the position to claim that it's 100% bug). One obvious red-flag from your query plans is that there is a misestimation of the row return count of a few orders of magnitude in the Bitmap Index Scan node. Did you trying performing an ANALYZE to see if that helped? It may also be helpful to show pg_stats entries for both the data.mjd and test.mjd columns. You may find, prior to doing an ANALYZE, that there is no entries for one of those tables. The main large table is static and was analyzed. The test table was as well. But as mentioned in another recent email, the query is the join, so column correlation is a problem. Turning off the enable_* planner options in production is generally discouraged. Certainly, you'd be crazy to do that on a server-wide basis. I'm using PG for data mining, data analysis purposes with very few clients connected and very large tables, so enable_* is used quite often to fix incorrect plans due to incorrect selectivities. Regards, Sergey * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value
On Sun, 2 Sep 2012, Pavel Stehule wrote: statistics on data_objid_idx table are absolutly out - so planner cannot find optimal plan That doesn't have anything to do with the problem, AFAIU. First, the data table is static and was analysed. Second, the query in question is the join, and afaik the estimation of the number of rows is known to be incorrect, in the case of column correlation. Third, according at least to my understanding in the fully cached regime bitmap scan should not take two orders of magnitude more CPU time than index scan. Sergey Regard Pavel Stehule Index Cond: (objid = t.objid) Total runtime: 66622.026 ms (11 rows) Here is the output when bitmap scans are disabled: QUERY PLAN QUERY PLAN Limit (cost=0.00..329631941.65 rows=1 width=68) (actual time=0.082..906.876 rows=1 loops=1) -> Nested Loop (cost=0.00..4979486036.95 rows=151062 width=68) (actual time=0.081..905.683 rows=1 loops=1) Join Filter: (t.mjd = d1.mjd) -> Seq Scan on test t (cost=0.00..2632.77 rows=151677 width=28) (actual time=0.009..1.679 rows=11456 loops=1) -> Index Scan using data_objid_idx on data d1 (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050 rows=248 loops=11456) Index Cond: (objid = t.objid) Total runtime: 907.462 ms When the bitmap scans are enabled the "prof" of postgres shows 47.10% postmaster postgres [.] hash_search_with_hash_value | --- hash_search_with_hash_value 11.06% postmaster postgres [.] hash_seq_search | --- hash_seq_search 6.95% postmaster postgres [.] hash_any | --- hash_any 5.17% postmaster postgres [.] _bt_checkkeys | --- _bt_checkkeys 4.07% postmaster postgres [.] tbm_add_tuples | --- tbm_add_tuples 3.41% postmaster postgres [.] hash_search | --- hash_search And the last note is that the crts.data table which is being bitmap scanned is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan code is somehow unprepared to combine two bitmaps for such a big table, and this leads to the terrible performance. Regards, Sergey PS Here are the schemas of the tables, just in case: wsdb=> \d test Table "koposov.test" Column | Type | Modifiers -+--+--- mjd | double precision | fieldid | bigint | intmag | integer | objid | bigint | wsdb=> \d crts.data Table "crts.data" Column | Type | Modifiers +--+--- objid | bigint | mjd| double precision | mag| real | emag | real | ra | double precision | dec| double precision | Indexes: "data_mjd_idx" btree (mjd) WITH (fillfactor=100) "data_objid_idx" btree (objid) WITH (fillfactor=100) "data_q3c_ang2ipix_idx" btree (q3c_ang2ipix(ra, "dec")) WITH (fillfactor=100) PPS shared_buffers=10GB, work_mem=1GB All the test shown here were don in fully cached regime. PPS I can believe that what I'm seeing is a feature, not a bug of bitmap scans, and I can live with disabling them, but I still thought it's worth reporting. * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value
Hi, I'm experiencing the case when bitmap scan is ~ 70 times slower than index scan which seems to be caused by 1) very big table 2) some hash search logic (hash_search_with_hash_value ) Here is the explain analyze of the query with bitmap scans allowed: wsdb=> explain analyze select * from test as t, crts.data as d1 where d1.objid=t.objid and d1.mjd=t.mjd limit 1; QUERY PLAN --- Limit (cost=11514.04..115493165.44 rows=1 width=68) (actual time=27.512..66620.231 rows=1 loops=1) -> Nested Loop (cost=11514.04..1799585184.18 rows=155832 width=68) (actual time=27.511..66616.807 rows=1 loops=1) -> Seq Scan on test t (cost=0.00..2678.40 rows=156240 width=28) (actual time=0.010..4.685 rows=11456 loops=1) -> Bitmap Heap Scan on data d1 (cost=11514.04..11518.05 rows=1 width=40) (actual time=5.807..5.807 rows=1 loops=11456) Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid)) -> BitmapAnd (cost=11514.04..11514.04 rows=1 width=0) (actual time=5.777..5.777 rows=0 loops=11456) -> Bitmap Index Scan on data_mjd_idx (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241 loops=11456) Index Cond: (mjd = t.mjd) -> Bitmap Index Scan on data_objid_idx (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248 loops=11456) Index Cond: (objid = t.objid) Total runtime: 66622.026 ms (11 rows) Here is the output when bitmap scans are disabled: QUERY PLAN QUERY PLAN Limit (cost=0.00..329631941.65 rows=1 width=68) (actual time=0.082..906.876 rows=1 loops=1) -> Nested Loop (cost=0.00..4979486036.95 rows=151062 width=68) (actual time=0.081..905.683 rows=1 loops=1) Join Filter: (t.mjd = d1.mjd) -> Seq Scan on test t (cost=0.00..2632.77 rows=151677 width=28) (actual time=0.009..1.679 rows=11456 loops=1) -> Index Scan using data_objid_idx on data d1 (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050 rows=248 loops=11456) Index Cond: (objid = t.objid) Total runtime: 907.462 ms When the bitmap scans are enabled the "prof" of postgres shows 47.10% postmaster postgres [.] hash_search_with_hash_value | --- hash_search_with_hash_value 11.06% postmaster postgres [.] hash_seq_search | --- hash_seq_search 6.95% postmaster postgres [.] hash_any | --- hash_any 5.17% postmaster postgres [.] _bt_checkkeys | --- _bt_checkkeys 4.07% postmaster postgres [.] tbm_add_tuples | --- tbm_add_tuples 3.41% postmaster postgres [.] hash_search | --- hash_search And the last note is that the crts.data table which is being bitmap scanned is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan code is somehow unprepared to combine two bitmaps for such a big table, and this leads to the terrible performance. Regards, Sergey PS Here are the schemas of the tables, just in case: wsdb=> \d test Table "koposov.test" Column | Type | Modifiers -+--+--- mjd | double precision | fieldid | bigint | intmag | integer | objid | bigint | wsdb=> \d crts.data Table "crts.data" Column | Type | Modifiers +--+--- objid | bigint | mjd| double precision | mag| real | emag | real | ra | double precision | dec| double precision | Indexes: "data_mjd_idx" btree (mjd) WITH (fillfactor=100) "data_objid_idx" btree (objid) WITH (fillfactor=100) "data_q3c_ang2ipix_idx" btree (q3c_ang2ipix(ra, "dec")) WITH (fillfactor=100) PPS shared_buffers=10GB, work_mem=1GB All the test shown here were don in fully cached regime. PPS I can believe that what I'm seeing is a feature, not a bug of bitmap scans, and I can live with disabling them, but I still thought it's worth reporting. * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel
Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
On Thu, 7 Jun 2012, Tom Lane wrote: I extended the patch to also cover DropDatabaseBuffers, FlushRelationBuffers, and FlushDatabaseBuffers, which have got the exact same type of full-pool scan loop, and committed it. Thanks everybody for the patches/commits. Sergey * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
On Tue, 5 Jun 2012, Simon Riggs wrote: Sounds less good and we'd need reasonable proof it actually did anything useful without being dangerous. Doing an initial unlocked test speeds things up another 2.69 fold (on top of 3.55 for your patch) for me, with 1GB of shared buffers. That seems like it should be worthwhile. How do we go about getting reasonable proof that it is safe? That's enough for me. So is it planned to apply that patch for 9.2 ? Thanks, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 6 Jun 2012, Merlin Moncure wrote: I think this is the expected result. In the single user case the spinklock never spins and only has to make the cpu-locking cache instructions once. can we see results @24 threads? Here https://docs.google.com/open?id=0B7koR68V2nM1NDJHLUhNSS0zbUk * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 6 Jun 2012, Ants Aasma wrote: On Wed, Jun 6, 2012 at 2:27 PM, Sergey Koposov wrote: I've quickly tested your lockfree-getbuffer.patch patch with the test case you provided and I barely see any improvement (2% at max) https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U tested with 24 core (48 ht cores, Xeon E7- 4807). Although the tps vs number of threads looks weird Was this the range scan on the test table? (sorry about the error in the query, the x should really be id) In that case the results look really suspicious. Yes, my fault partially, because without much thought I've put "value" instead of "x" in the script. Now after replacing it by "id" the tps are much smaller. Here is the tps vs nthreads I did test up to 10 threads on my 24 cpu system (I disabled HT though): https://docs.google.com/open?id=0B7koR68V2nM1Nk9OcWNJOTRrYVE Your patch clearly improve the situation (the peak tps is ~ 10% higher), but the general picture is the same: flattening of tps vs nthreads. Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Hi, On Mon, 4 Jun 2012, Ants Aasma wrote: On Mon, Jun 4, 2012 at 7:44 PM, Merlin Moncure wrote: I tried to keep it simple at first to find an answer to the question if it's even worth trying before expending large effort on it. If anyone with a multisocket machine would chip in, I'd love to know how this patch handles on larger machines. I think the most interesting workloads are read only loads with heavy buffer trashing but inside OS memory. Select only pgbench with 32MB shared_buffers was withín error margin, although slightly faster on my machine (Intel i2500K). The workload that I used to demonstrate gain was an aggregated index scan to minimise other overheads. I've quickly tested your lockfree-getbuffer.patch patch with the test case you provided and I barely see any improvement (2% at max) https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U tested with 24 core (48 ht cores, Xeon E7- 4807). Although the tps vs number of threads looks weird Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, 31 May 2012, Jeff Janes wrote: No, idt_match is getting filled by multi-threaded copy() and then joined with 4 other big tables like idt_phot. The result is then split into partitions. That does make things more complicated. But you could you partition it at that level and then do the joins partition-wise? Unfortunately the information to understand in which partition the data needs to be put in is only available from the idt_match table. So I have to do at least one join with idt_match. But I will experiment further with ways to perform queries, I just stopped doing that because I saw these problems with scalability. I don't have much experience at data partitioning (well, I do, but the experience is with partitioning in Perl with terabytes of flat files, not in PG :) ) but I think that once you have your partitioning keys you want to apply them the same way up and down the data set. I'm not sure what you mean by "the same way up and down the data set". Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
On Fri, 1 Jun 2012, Simon Riggs wrote: Why do you have 10,000 tables and why is it important to drop them so quickly? 1 tables are there, because that's the number of partitions. And I'm dropping them at the moment, because I'm doing testing. So it won't be really crucial for production. But I still thought it was worth reporting. Especially when the table dropping took .5 a sec. The problem is that when I set up the shared_buffers say to 48G, the timings of the tables rise significantly again. If its that important, why not run the drop in parallel sessions? Yes, before there was a strong reason to do that, now the timings are more manageable, but maybe I'll implement that. Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
On Thu, 31 May 2012, Simon Riggs wrote: That struck me as a safe and easy optimisation. This was a problem I'd been trying to optimise for 9.2, so I've written a patch that appears simple and clean enough to be applied directly. Thanks! The patch indeed improved the timings, The dropping of 100 tables in a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec (it would be nice to reduce it further though, because the dropping of 1 tables still takes ~10 min). Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, 31 May 2012, Robert Haas wrote: Oh, ho. So from this we can see that the problem is that we're getting huge amounts of spinlock contention when pinning and unpinning index pages. It would be nice to have a self-contained reproducible test case for this, so that we could experiment with it on other systems. I have created it a few days ago: http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php It is still valid. And I'm using exactly it to test. The only thing to change is to create a two-col index and drop another index. The scripts are precisely the ones I'm using now. The problem is that in order to see a really big slowdown (10 times slower than a single thread) I've had to raise the buffers to 48g but it was slow for smaller shared buffer settings as well. But I'm not sure how sensitive the test is to the hardware. Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 30 May 2012, Jeff Janes wrote: But the question now is whether there is a *PG* problem here or not, or is it Intel's or Linux's problem ? Because still the slowdown was caused by locking. If there wouldn't be locking there wouldn't be any problems (as demonstrated a while ago by just cat'ting the files in multiple threads). You cannot have a traditional RDBMS without locking. From your I understand the need of significant locking when there concurrent writes, but not when there only reads. But I'm not a RDBMS expert, so that's maybe that's misunderstanding on my side. description of the problem, I probably wouldn't be using a traditional database system at all for this, but rather flat files and Perl. Flat files and perl for 25-50 TB of data over few years is a bit extreme ;) Or at least, I would partition the data before loading it to the DB, rather than trying to do it after. I intensionally did otherwise, because I thought that PG will to be much smarter than me in juggling the data I'm ingesting (~ tens of gig each day), join the appropriate bits of data and then split by partitions. Unfortunately I see that there are some scalability issues on the way, which I didn't expect. Those aren't fatal, but slightly disappointing. But anyway, is idt_match a fairly static table? If so, I'd partition that into 16 tables, and then have each one of your tasks join against a different one of those tables. That should relieve the contention on the index root block, and might have some other benefits as well. No, idt_match is getting filled by multi-threaded copy() and then joined with 4 other big tables like idt_phot. The result is then split into partitions. And I was trying different approaches to fully utilize the CPUs and/or I/O and somehow parallize the queries. That's the reasoning for somewhat contrived queries in my test. Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, 31 May 2012, Florian Pflug wrote: Wait, so performance *increased* by spreading the backends out over as many dies as possible, not by using as few as possible? That'd be exactly the opposite of what I'd have expected. (I'm assuming that cores on one die have ascending ids on linux. If you could post the contents of /proc/cpuinfo, we could verify that) Yes, you are correct. And I also can confirm that the cpus in the cpuinfo are ordered by "physical id" e.g. they go like 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 I did a specific test with just 6 threads (== number of cores per cpu) and ran it on a single phys cpu, it took ~ 12 seconds for each thread, and when I tried to spread it across 4 cpus it took 7-9 seconds per thread. But all these numbers are anyway significantly better then when I didn't use taskset. Which probably means without it the processes were jumping from core to core ? ... Because still the slowdown was caused by locking. If there wouldn't be locking there wouldn't be any problems (as demonstrated a while ago by just cat'ting the files in multiple threads). Yup, we'll have to figure out a way to reduce the locking overhead. 9.2 already scales much better to a large number of cores than previous versions did, but your test case shows that there's still room for improvement. Yes, and unfortunately these scaling problems in read-only cpu bound queries, where naively I wound't expect any. Thanks, Sergey * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 30 May 2012, Florian Pflug wrote: I wonder if the huge variance could be caused by non-uniform synchronization costs across different cores. That's not all that unlikely, because at least some cache levels (L2 and/or L3, I think) are usually shared between all cores on a single die. Thus, a cache bouncing line between cores on the same die might very well be faster then it bouncing between cores on different dies. On linux, you can use the taskset command to explicitly assign processes to cores. The easiest way to check if that makes a difference is to assign one core for each connection to the postmaster before launching your test. Assuming that cpu assignment are inherited to child processes, that should then spread your backends out over exactly the cores you specify. Wow, thanks! This seems to be working to some extend. I've found that distributing each thread x ( 0(reminder, that i have HT disabled and in total I have 4 cpus with 6 proper cores each) gives quite good results. And after a few runs, I seem to be getting a more or less stable results for the multiple threads, with the performance of multithreaded runs going from 6 to 11 seconds for various threads. (another reminder is that 5-6 seconds is roughly the timing of a my queries running in a single thread). So to some extend one can say that the problem is partially solved (i.e. it is probably understood) But the question now is whether there is a *PG* problem here or not, or is it Intel's or Linux's problem ? Because still the slowdown was caused by locking. If there wouldn't be locking there wouldn't be any problems (as demonstrated a while ago by just cat'ting the files in multiple threads). Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 30 May 2012, Merlin Moncure wrote: How big is idt_match? What if you drop all indexes on idt_match, encouraging all the backends to do hash joins against it, which occur in local memory and so don't have contention? You just missed his post -- it's only 3G. can you run your 'small' working set against 48gb shared buffers? Just tried 3times and it actually got much worse ~ 70-80 seconds per query in the parallel setup ( i.e. >10 times slower than the single run) The oprofile then looks like this: CPU: Intel Architectural Perfmon, speed 1862 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10 samples %linenr info symbol name --- 883523 46.3676 (no location information) s_lock 883523 100.000 (no location information) s_lock [self] --- 303984 15.9532 (no location information) PinBuffer 303984 100.000 (no location information) PinBuffer [self] The problem is that since there is that variability in times, I don't really 100% know whether this trend of slow-down with increasing shared memory is genuine or not. I've also tried just in case shared_buffers=1G, and it is still very slow (50 sec). After that I changed the shared buffers back to 10G and the timings got back to 25 sec. Weird... I still wonder whether there is problem with the way the locking is done (as referenced in the recent "droping tables slowiness" thread). Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 30 May 2012, Merlin Moncure wrote: hurk -- ISTM that since IOS is masikng the heap lookups, there must be contention on the index itself? Does this working set fit in shared memory? If so, what happens when you do a database restart and repeat the IOS test? The dataset fits well in shared buffers Here are the data sizes public.idt_photoobservation_small| 521 MB | 528756 public.idt_match_idx | 1084 MB| 528762 public.idt_match_transitid_healpixid_idx | 1522 MB| 1955140 public.idt_match | 2906 MB| 528753 And shared buffers are 10G If I restart the db the timings do not change significantly. There is always some variation which I don't really understand, e.g. the parallel runs sometimes take 18s, or 25 seconds, or 30 seconds per thread. So there is something else affecting the runs -- I don't know, maybe that's related to which thread starts first, or what is the starting point of the seq scan... (there is no other activity on the machine btw). Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 30 May 2012, Merlin Moncure wrote: Hm, why aren't we getting a IOS? Just for kicks (assuming this is test data), can we drop the index on just transitid, leaving the index on transitid, healpixid?Is enable_indexonlyscan on? Has idt_match been vacuumed? What kind of plan do you get when do: Okay dropping the index on transitid solved the issue with indexonlyscan but didn't solve the original problem. Actually the indexonlyscan made the sequential queries faster but not the parallel ones. Here is a side by side explain analyze again for indexonly scans Bogged: QUERY PLAN - Sort (cost=82262274.19..82262286.69 rows=5000 width=498) (actual time=19635.318..19635.318 rows=0 loops=1) Sort Key: y.x Sort Method: quicksort Memory: 25kB -> Subquery Scan on y (cost=0.00..82261967.00 rows=5000 width=498) (actual time=19635.282..19635.282 rows=0 loops=1) Filter: ((y.x % 16::bigint) = 7) Rows Removed by Filter: 100 -> Limit (cost=0.00..82246967.00 rows=100 width=490) (actual time=0.076..19373.922 rows=100 loops=1) -> Seq Scan on idt_photoobservation_small o (cost=0.00..82246967.00 rows=100 width=490) (actual time=0.074..19179.336 rows=100 loops=1) SubPlan 1 -> Index Only Scan using idt_match_transitid_healpixid_idx on idt_match m (cost=0.00..82.17 rows=1 width=8) (actual time=0.016..0.017 rows=1 loops=100) Index Cond: (transitid = o.transitid) Heap Fetches: 0 Total runtime: 19702.727 ms (13 rows) *** vs non-bogged *** QUERY PLAN - Sort (cost=82262274.19..82262286.69 rows=5000 width=498) (actual time=6198.334..6198.334 rows=0 loops=1) Sort Key: y.x Sort Method: quicksort Memory: 25kB -> Subquery Scan on y (cost=0.00..82261967.00 rows=5000 width=498) (actual time=6198.285..6198.285 rows=0 loops=1) Filter: ((y.x % 16::bigint) = 7) Rows Removed by Filter: 100 -> Limit (cost=0.00..82246967.00 rows=100 width=490) (actual time=0.116..5932.766 rows=100 loops=1) -> Seq Scan on idt_photoobservation_small o (cost=0.00..82246967.00 rows=100 width=490) (actual time=0.113..5730.012 rows=100 loops=1) SubPlan 1 -> Index Only Scan using idt_match_transitid_healpixid_idx on idt_match m (cost=0.00..82.17 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=100) Index Cond: (transitid = o.transitid) Heap Fetches: 0 Total runtime: 6239.512 ms (13 rows) ** Cheers, S PS I didn't know that index on two columns together with the indexonlyscan can help the queries based on one of those columns. Very helpful! * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, 30 May 2012, Merlin Moncure wrote: 1. Can we see an explain analyze during a 'bogged' case? Here is the one to one comparison of the 'bogged' ** QUERY PLAN -- Sort (cost=63835201.73..63835214.23 rows=5000 width=498) (actual time=18007.500..18007.500 rows=0 loops=1) Sort Key: y.x Sort Method: quicksort Memory: 25kB -> Subquery Scan on y (cost=0.00..63834894.54 rows=5000 width=498) (actual time=18007.454..18007.454 rows=0 loops=1) Filter: ((y.x % 16::bigint) = 7) Rows Removed by Filter: 100 -> Limit (cost=0.00..63819894.51 rows=102 width=490) (actual time=0.047..17734.570 rows=100 loops=1) -> Seq Scan on idt_photoobservation_small o (cost=0.00..63819894.51 rows=102 width=490) (actual time=0.045..17543.902 rows=100 loops=1) SubPlan 1 -> Index Scan using idt_match_transitid_idx on idt_match m (cost=0.00..63.74 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=100) Index Cond: (transitid = o.transitid) Total runtime: 18056.866 ms (12 rows) Time: 18067.929 ms * vs non-bogged: ** QUERY PLAN -- Sort (cost=63835201.73..63835214.23 rows=5000 width=498) (actual time=6635.133..6635.133 rows=0 loops=1) Sort Key: y.x Sort Method: quicksort Memory: 25kB -> Subquery Scan on y (cost=0.00..63834894.54 rows=5000 width=498) (actual time=6635.091..6635.091 rows=0 loops=1) Filter: ((y.x % 16::bigint) = 7) Rows Removed by Filter: 100 -> Limit (cost=0.00..63819894.51 rows=102 width=490) (actual time=0.059..6344.683 rows=100 loops=1) -> Seq Scan on idt_photoobservation_small o (cost=0.00..63819894.51 rows=102 width=490) (actual time=0.056..6149.429 rows=100 loops=1) SubPlan 1 -> Index Scan using idt_match_transitid_idx on idt_match m (cost=0.00..63.74 rows=1 width=8) (actual time=0.003..0.004 rows=1 loops=100) Index Cond: (transitid = o.transitid) Total runtime: 6669.215 ms (12 rows) Time: 6673.991 ms ** 2. Can we try to get 'index only scan' working over idt_match? That's should be a matter of adjusting the index so that it's: create index on idt_match (transitid, healpixid); I don't think I could do that. I created the index, you asked (although it is unclear why it would help...), tried explain, and it still used a simple index scan. After disabling the index scan it decided to use the bitmap scan (which isn't actually faster in the multithreaded setup. And after disabling the bitmapscan it switched to seqscan). Cheers, Sergey PS Just for the reference the current indices on idt_match are "idt_match_idx" btree (healpixid) "idt_match_transitid_healpixid_idx" btree (transitid, healpixid) "idt_match_transitid_idx" btree (transitid) * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Here is the actual explain analyze of the query on the smaller dataset which I have been using for the recent testing. test=# explain analyze create table _tmp0 as select * from ( select *, (select healpixid from idt_match as m where m.transitid=o.transitid) as x from idt_photoobservation_small as o offset 0 ) as y where x%16=0 order by x; QUERY PLAN p-- Sort (cost=63835201.73..63835214.23 rows=5000 width=498) (actual time=8203.041..8252.216 rows=173696 loops=1) Sort Key: y.x Sort Method: quicksort Memory: 182555kB -> Subquery Scan on y (cost=0.00..63834894.54 rows=5000 width=498) (actual time=0.102..7602.947 rows=173696 loops=1) Filter: ((y.x % 16::bigint) = 0) Rows Removed by Filter: 826304 -> Limit (cost=0.00..63819894.51 rows=102 width=490) (actual time=0.041..7296.401 rows=100 loops=1) -> Seq Scan on idt_photoobservation_small o (cost=0.00..63819894.51 rows=102 width=490) (actual time=0.038..7094.555 rows=100 loops=1) SubPlan 1 -> Index Scan using idt_match_transitid_idx on idt_match m (cost=0.00..63.74 rows=1 width=8) (actual time=0.003..0.004 rows=1 loops=100) Index Cond: (transitid = o.transitid) Total runtime: 8908.631 ms * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
Hello, I was running some tests on PG9.2beta where I'm creating and dropping large number of tables (~ 2). And I noticed that table dropping was extremely slow -- e.g. like half a second per table. I tried to move the table dropping into bigger transactions (100 per one transaction) (increasing in the same time max_locks_per_trans to 128). And still, the commits took ~ 50-60 seconds. I tried to oprofile it, and I saw that 99% is spend on DropRelFileNodeBuffers, and when compiled with symbols (CFLAGS=-g) I saw that actually most of the time is spend in tas() function, see below: PU: Intel Architectural Perfmon, speed 1862 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10 samples %linenr info symbol name --- 831665 86.6756 s_lock.h:204tas 831665 100.000 s_lock.h:204tas [self] --- 122573 12.7745 bufmgr.c:2048 DropRelFileNodeBuffers 122573 100.000 bufmgr.c:2048 DropRelFileNodeBuffers [self] --- 849 0.0885 xlog.c:697 XLogInsert 849 100.000 xlog.c:697 XLogInsert [self] --- 269 0.0280 catcache.c:444 CatalogCacheIdInvalidate 269 100.000 catcache.c:444 CatalogCacheIdInvalidate [self] --- 232 0.0242 catcache.c:1787 PrepareToInvalidateCacheTuple 232 100.000 catcache.c:1787 PrepareToInvalidateCacheTuple [self] --- 202 0.0211 dynahash.c:807 hash_search_with_hash_value 202 100.000 dynahash.c:807 hash_search_with_hash_value [self] --- 199 0.0207 nbtsearch.c:344 _bt_compare 199 100.000 nbtsearch.c:344 _bt_compare [self] --- 198 0.0206 list.c:506 list_member_oid 198 100.000 list.c:506 list_member_oid [self] Is the current behaviour expected ? Because I saw the comment around droprelfilenodebuffers, saying "XXX currently it sequentially searches the buffer pool, should be changed to more clever ways of searching. However, this routine is used only in code paths that aren't very performance-critical, and we shouldn't slow down the hot paths to make it faster ". Maybe it is stupid, but I also wonder whether the root cause for what I'm seeing can be also responsible for the problems I recently reported about the scaling and locking http://archives.postgresql.org/pgsql-hackers/2012-05/msg01118.php Some additional info: The database is accessed in a single thread, shared_buffers are 10G. The tables themselves are empty essentially. the cpu if it matters is 4 times Xeon E7- 4807 (Westmere architecture). I did a vacuum full of everything just in case and it didn't help And another maybe important factor is that I noticed is that pg_catalog.pg_attribute is quite large (238 meg) (because of the large number of tables times columns). I also stopped PG with gdb a few times and it was always at this backtrace: (gdb) bt #0 tas (lock=0x7fa4e3007538 "\001") at ../../../../src/include/storage/s_lock.h:218 #1 0x006e6956 in DropRelFileNodeBuffers (rnode=..., forkNum=VISIBILITYMAP_FORKNUM, firstDelBlock=0) at bufmgr.c:2062 #2 0x0070c014 in smgrdounlink (reln=0x1618210, forknum=VISIBILITYMAP_FORKNUM, isRedo=0 '\000') at smgr.c:354 #3 0x0051ecf6 in smgrDoPendingDeletes (isCommit=1 '\001') at storage.c:364 #4 0x004a7b33 in CommitTransaction () at xact.c:1925 #5 0x004a8479 in CommitTransactionCommand () at xact.c:2524 #6 0x00710b3f in finish_xact_command () at postgres.c:2419 #7 0x0070ff4e in exec_execute_message (portal_name=0x1608990 "", max_rows=1) at postgres.c:1956 #8 0x00712988 in PostgresMain (argc=2, argv=0x1548568, username=0x1548390 with only forknum changing to MAIN_FORKNUM or FSM_FORKNUM Thanks in advance, Sergey * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.pos
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Hi, I did another test using the same data and the same code, which I've provided before and the performance of the single thread seems to be degrading quadratically with the number of threads. Here are the results: Nthreads Time_to_execute_one_thread 1 8.1 2 7.8 3 8.1 4 9.0 5 10.2 6 11.4 7 13.3 8 16.1 9 19.0 10 21.4 11 23.8 12 27.3 13 30.2 14 32.0 15 34.1 16 37.5 Regards, Sergey On Sat, 26 May 2012, Stephen Frost wrote: * Sergey Koposov (kopo...@ast.cam.ac.uk) wrote: Turning off synch seq scans doesn't help either. 18 sec multithreaded run vs 7 sec single threaded. Alright, can you just time 'cat' when they're started a few seconds or whatever apart from each other? I can't imagine it being affected in the same way as these, but seems like it wouldn't hurt to check. Thanks, Stephen * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Sat, 26 May 2012, Stephen Frost wrote: Alright, can you just time 'cat' when they're started a few seconds or whatever apart from each other? I can't imagine it being affected in the same way as these, but seems like it wouldn't hurt to check. I've tryed cat'ting a created in advance 8gig file on the same filesystem. And during the multi-threaded run it takes ~4sec, roughly the same as when run separately. * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Sat, 26 May 2012, Robert Haas wrote: This theory is seeming fairly plausible - how can we test it? How about trying it with synchronize_seqscans = off? If the synchronized-seqscan logic is causing contention on the buf mapping locks and individual buffer locks, that should fix it. Turning off synch seq scans doesn't help either. 18 sec multithreaded run vs 7 sec single threaded. S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, 25 May 2012, Merlin Moncure wrote: can you hack this in heapam.c and see if it helps? line 131-ish: if (!RelationUsesLocalBuffers(scan->rs_rd) && scan->rs_nblocks > NBuffers / 4) becomes if (!RelationUsesLocalBuffers(scan->rs_rd)) (also you can set the partition count back). The first few runs of my test with the suggested change made the multithreaded queries even slower by a factor of two. E.g. 44 seconds for multithreaded vs ~ 7sec single threaded. (with the default 9.2beta I get ~ 14-22sec for multithreaded run) But now after another few reruns of my test, i see again this variability in times for multithreaded runs. It went down to 18-22seconds. Then again to 44 sec. So overall the change you suggested either doesn't improve the situation or makes it worse . Regards, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, 25 May 2012, Robert Haas wrote: On Thu, May 24, 2012 at 7:26 PM, Sergey Koposov wrote: Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted by blk. Not sure whether that's of much use or not: What are the top dozen or so entries if you sort by shacq? Here it is: PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511 PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546 PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602 PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871 PID 7115 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7115 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7114 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7114 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7113 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7113 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7112 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7112 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7111 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7111 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7110 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7110 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7109 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7109 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7108 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7108 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151 PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126 PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200 PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230 PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153 PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231 PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178 PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215 PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177 PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139 PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167 PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255 PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214 PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226 PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226 PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271 PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137 PID 7115 lwlock 38: shacq 333512 exacq 581 blk 106 PID 7109 lwlock 38: shacq 40 exacq 322 blk 115 PID 7111 lwlock 38: shacq 06 exacq 882 blk 152 PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183 PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165 PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163 PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183 PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237 PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138 PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215 PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140 PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192 PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262 PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318 PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240 PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177 PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109 PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378 PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158 PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321 PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306 PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387 PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449 PID 7108 lwlock 36: shacq 205287 exacq 1592 blk 78 PID 7115 lwlock 36: shacq 204845 exacq 535 blk 86 PID 7111 lwlock 36: shacq 204675 exacq 830 blk 108 PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109 PID 7109 lwlock 36: shacq 204581 exacq 363 blk 83 PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112 PID 7110 lwlock 36: shacq 204546 exacq 1014 blk 106 PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140 PID 7108 lwlock 35: shacq 136067 exacq 1537 blk 56 PID 7115 lwlock 35: shacq 135532 exacq 494 blk 47 PID 7111 lwlock 35: shacq 135404 exacq 820 blk 76 PID 7109 lwlock 35: shacq 135362 exacq 326 blk 56 PID 7114 lwlock 35: shacq 135342 exacq 879 blk 102 PID 7110 lwlock 35: shacq 135333 exacq 1016 blk 88 PID 7113 lwlock 35: shacq 135318 exacq 829 blk 75 PID 7112 lwlock 35: shacq 135257 exacq 1229 blk 98 PID 7108 lwlock 41: shacq 134338 exacq 1540 blk 55 PID 7115 lwlock 41: shacq 133929 exacq 542 blk 50 PID 7109 lwlock 41: shacq 133781 exacq 360 blk 49 PID 7111 lwlock 41: shacq 1
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, 25 May 2012, Merlin Moncure wrote: how likely is it to see multiple simultaneous scans in the real world? Interleaving scans like that is not a very effective optimization -- if it was me, it'd be trying to organize something around a partitioned tid scan for parallel sequential access. Regarding the "real world use". I can say for myself is that the reason I'm running the queries like the ones I have in my test is that I have a large inflow of data every day, tens of gb, which has to be processed and since PG doesn't have any parallelization of queries built in, and running the processing in a single thread will take too long, I'm trying to work around that by manually running multiple threads in PG and trying to split the work among them. So not having scalability here is going to hurt us (but I admit that it's probably not very popular use-case for PG). Regards, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, 25 May 2012, Merlin Moncure wrote: These are all on the buffer partition locks. That makes sense...I was wrong earlier: this case was in fact 'create table as', not 'insert select' which rules out both the freelist lock and the wal insert lock because create table as gets to use both a bulk insert strategy and wal avoiding logic (assuming wal log level is not 'archive or higher'. So, why aren't the lock partitions helping here? FYI the WAL level is default, and here are the relevand changed params from the .conf (autovacuum was disabled but the data was vacuumed manually). max_connections = 100 # (change requires restart) shared_buffers = 10GB # min 128kB work_mem = 500MB# min 64kB maintenance_work_mem = 500MB# min 1MB synchronous_commit = off# synchronization level; checkpoint_segments = 20# in logfile segments, min 1, 16MB each enable_hashjoin = off enable_mergejoin = off effective_cache_size = 10GB autovacuum = off# Enable autovacuum subprocess? 'on' Regards, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, 24 May 2012, Jeff Janes wrote: Add #define LWLOCK_STATS near the top of: src/backend/storage/lmgr/lwlock.c and recompile and run a reduced-size workload. When the processes exits, they will dump a lot of data about LWLock usage to the logfile. Generally the LWLock with the most blocks on it will be the main culprit. Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted by blk. Not sure whether that's of much use or not: PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546 PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466 PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449 PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401 PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397 PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387 PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378 PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321 PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318 PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306 PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271 PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262 PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255 PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240 PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237 PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231 PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230 PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226 PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226 PID 7114 lwlock 0: shacq 0 exacq 14864 blk 219 PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215 PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215 PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214 PID 7111 lwlock 0: shacq 0 exacq 13638 blk 205 PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200 PID 7114 lwlock 7: shacq 0 exacq 547 blk 193 PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192 PID 7110 lwlock 0: shacq 0 exacq 16862 blk 191 PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183 PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183 PID 7113 lwlock 7: shacq 0 exacq 548 blk 178 PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178 PID 7115 lwlock 7: shacq 0 exacq 549 blk 177 PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177 PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177 PID 7112 lwlock 0: shacq 0 exacq 19538 blk 172 PID 7111 lwlock 7: shacq 0 exacq 549 blk 172 PID 7115 lwlock 17: shacq 0 exacq 9927 blk 167 PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167 PID 7112 lwlock 7: shacq 0 exacq 548 blk 166 PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165 PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163 PID 7110 lwlock 7: shacq 0 exacq 548 blk 162 PID 7109 lwlock 7: shacq 0 exacq 548 blk 161 PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158 PID 7112 lwlock 45: shacq 130843 exacq 1201 blk 154 PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153 PID 7111 lwlock 38: shacq 06 exacq 882 blk 152 PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151 PID 7108 lwlock 17: shacq 0 exacq 23165 blk 147 PID 7108 lwlock 7: shacq 0 exacq 549 blk 145 PID 7113 lwlock 0: shacq 0 exacq 13394 blk 142 PID 7112 lwlock 46: shacq 132972 exacq 1196 blk 142 PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140 PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140 PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139 PID 7114 lwlock 45: shacq 130818 exacq 903 blk 139 PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138 PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137 PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126 PID 7115 lwlock 0: shacq 0 exacq 3161 blk 125 PID 7110 lwlock 40: shacq 129577 exacq 1021 blk 122 PID 7113 lwlock 45: shacq 130833 exacq 814 blk 120 PID 7111 lwlock 17: shacq 0 exacq 4607 blk 119 PID 7109 lwlock 0: shacq 0 exacq 5711 blk 119 PID 7110 lwlock 45: shacq 130865 exacq 1001 blk 117 PID 7109 lwlock 38: shacq 40 exacq 322 blk 115 PID 7114 lwlock 46: shacq 132997 exacq 860 blk 112 PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112 PID 7112 lwlock 40: shacq 129528 exacq 1234 blk 111 PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109 PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109 PID 7112 lwlock 41: shacq 133600 exacq 1136 blk 109 PID 7111 lwlock 36: shacq 204675 exacq 830 blk 108 PID 7108 lwlock 0: shacq 0 exacq 6504 blk 108 PID 7115 lwlock 38: shacq 333512 exacq 581 blk 106 PID 7110 lwlock 36: shacq 204546 exacq 1014 blk 106 PID 7113 lwlock 37: shacq 127040 exacq 79
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Hi, On Thu, 24 May 2012, Robert Haas wrote: Not sure. It might be some other LWLock, but it's hard to tell which one from the information provided. If you could tell what's the best way to find out the info that you need, then I could run it reasonably quickly. S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, 24 May 2012, Robert Haas wrote: As you can see, raw performance isn't much worse with the larger data sets, but scalability at high connection counts is severely degraded once the working set no longer fits in shared_buffers. Actually the problem persits even when I trim the dataset size to be within the shared_buffers. Here is the dump (0.5 gig in size, tested with shared_buffers=10G, work_mem=500Mb): http://www.ast.cam.ac.uk/~koposov/files/dump.gz And I attach the script For my toy dataset the performance of a single thread goes down from ~6.4 to 18 seconds (~ 3 times worse), And actually while running the script repeatedly on my main machine, for some reason I saw some variation in terms of how much threaded execution is slower than a single thread. Now I see 25 seconds for multi threaded run vs the same ~ 6 second for a single thread. The oprofile shows 782355 21.5269 s_lock 782355 100.000 s_lock [self] --- 709801 19.5305 PinBuffer 709801 100.000 PinBuffer [self] --- 3264578.9826 LWLockAcquire 326457 100.000 LWLockAcquire [self] --- 3094378.5143 UnpinBuffer 309437 100.000 UnpinBuffer [self] --- 2529726.9606 ReadBuffer_common 252972 100.000 ReadBuffer_common [self] --- 2015585.5460 LockBuffer 201558 100.000 LockBuffer [self] It is interesting that On another machine with much smaller shared memory (3G), smaller RAM (12G), smaller number of cpus and PG 9.1 running I was getting consistently ~ 7.2 vs 4.5 sec (for multi vs single thread) PS Just in case the CPU on the main machine I'm testing is Xeon(R) CPU E7- 4807 (the total number of real cores is 24) * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ script.sh Description: Bourne shell script drop table _tmpXX ; \timing create table _tmpXX as select * from ( select *, (select healpixid from idt_match as m where m.transitid=o.transitid) as x from idt_photoobservation_small as o offset 0 ) as y where x%16=XX order by x; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, 24 May 2012, Robert Haas wrote: On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov wrote: I guess there is nothing catastrophically wrong with that, but still I'm very suprised that you get severe locking problems (factor of two slow-down) when running parallel read-only transactions. Me, too. How many concurrent connections are you running, and does your working set exceed shared_buffers? Can you provide a self-contained reproducible test case? The last tests I've been doing were with 8 connections. And the working set is roughly 30Gig, which is ~ 3x the shared buffers. (but ~ 50% of RAM). Regarding the test-case, I'll try to see whether I can still observe the same slowing down if I chop the main table by a factor of few, so I can put the data somewhere for download. S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, 24 May 2012, Merlin Moncure wrote: Are you sure? I looked at all the ReleasePredicateLocks calls and they appear to be guarded by: /* Nothing to do if this is not a serializable transaction */ if (MySerializableXact == InvalidSerializableXact) return false; What's the default isolation mode set to? Sorry, it was indeed my mistake. I was pointing opreport to the binary which was recompiled, while the old version was still running , so I guess oprofile was picking up wrong function names Here is the correct oprofile: when multiple queries are running : samples %symbol name --- 952457 13.8715 LWLockAcquire 952457 100.000 LWLockAcquire [self] --- 779077 11.3464 PinBuffer 779077 100.000 PinBuffer [self] --- 759898 11.0671 s_lock 759898 100.000 s_lock [self] --- 6897536.7178 slot_deform_tuple 689753 100.000 slot_deform_tuple [self] --- 5260025.1230 UnpinBuffer 526002 100.000 UnpinBuffer [self] When only one is running: samples %symbol name --- 163268 14.0343 slot_deform_tuple 163268 100.000 slot_deform_tuple [self] --- 126018 10.8324 _bt_compare 126018 100.000 _bt_compare [self] --- 1139759.7972 ExecProject 113975 100.000 ExecProject [self] --- 49760 4.2773 FunctionCall2Coll 49760100.000 FunctionCall2Coll [self] --- 49164 4.2261 LWLockAcquire 49164100.000 LWLockAcquire [self] --- 43526 3.7414 hash_search_with_hash_value 43526100.000 hash_search_with_hash_value [self] I guess there is nothing catastrophically wrong with that, but still I'm very suprised that you get severe locking problems (factor of two slow-down) when running parallel read-only transactions. Sergey * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Hi, I've been running some tests on pg 9.2beta1 and in particular a set of queries like create table _tmp0 as select * from ( select *, (select healpixid from idt_match as m where m.transitid=o.transitid) as x from idt_photoobservation as o offset 0 ) as y where x%16=ZZZ order by x; (where ZZZ is some number integer number 0<=ZZZ<16) With the following plan: -- Sort (cost=3228814504.96..3228815137.21 rows=252902 width=498) Sort Key: y.x -> Subquery Scan on y (cost=0.00..3228791809.25 rows=252902 width=498) Filter: ((y.x % 16::bigint) = 0) -> Limit (cost=0.00..3228033102.41 rows=50580456 width=490) -> Seq Scan on idt_photoobservation o (cost=0.00..3228033102.41 rows=50580456 width=490) SubPlan 1 -> Index Scan using idt_match_transitid_idx on idt_match m (cost=0.00..63.74 rows=1 width=8) Index Cond: (transitid = o.transitid) The schema of the tables are: e2es2=> \d idt_match Table "public.idt_match" Column | Type | Modifiers ---+--+--- sourceid | bigint | transitid | bigint | healpixid | bigint | flagsxm | smallint | Indexes: "idt_match_idx" btree (healpixid) "idt_match_transitid_idx" btree (transitid) Table "public.idt_photoobservation" Column | Type | Modifiers ---+--+--- transitid | bigint | fluxbp0 | real | more columns Indexes: "idt_photoobservation_idx" btree (transitid) And I noticed than when I run the query like the one shown above in parallel (in multiple connections for ZZZ=0...8) the performance of each query drops down significantly (factor of 2)(despite the fact that during the execution of the query postgres is mostly CPU bound). So I tried to oprofile it and strace it, And that's what I saw: The strace -c of each process shows something like that # Process 18660 detached % time seconds usecs/call callserrors syscall -- --- --- - - 76.250.001342 0268987 semop 23.750.000418 0 61694 read 0.000.00 0 138 lseek 0.000.00 0 355 select 0.000.00 0 3 kill -- --- --- - - 100.000.001760331177 total ### And the oprofile shows this on top: --- 2863981 25.7117 ReleasePredicateLocks 2863981 100.000 ReleasePredicateLocks [self] --- 1246629 11.1917 LocalBufferAlloc 1246629 100.000 LocalBufferAlloc [self] --- 1135393 10.1931 CheckForSerializableConflictIn 1135393 100.000 CheckForSerializableConflictIn [self] So there is a lot of locking for some reason, And I was surprised to see anything related to SSI in the profile at all, because I'm not running serializable transactions (I was just running my queries from multiple psql sessions). Is there a reasonable explanation for what I'm seeing ? Thanks in advance, Sergey More info: there is no other concurrent activity on the machine. shared_buffers is 10G, The machine has ~ 60G of RAM, 24 cores and proper RAID Another note is that the oprofile and strace were obtained during the stage when the idt_photo.. table was scanned and data was being stored in the pgsql_tmp (not during the last bit when the data in pgsql_tmp were actually being sorted). * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers