Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
Hi I have now tested through insert and the updated and it works extremely good with out doing any partitioning on big table (428.812.8392 rows) this case, when we used a common index as Tom Suggested. We are able to insert 172.000 rows pr. second. The number of rows are competed based total time from when we start to read the csv files and until the last file is done. We use GNU parallel and run 5 threads. The number of inserts are actually 172.000 * 2 because first we copy the rows into a temp table and there we prepare the date and then insert them into the common big main table. There is no errors in the log. We are able update 98.000 rows pr, second. Since each update also means one insert we are close 200.000 inserts and updates pr. second. For update we give a column column that is null a value. Thats is done for all the 4.3 billions rows. We run 5 threads in parallel here also, and there is no error and no dead locks. To get around the problem with duplication of indexes it's solvable in this project because first we add date and then we do analyses, this means that we can have different indexes when adding data and we are using them. In this project we going add about 25 billions geo located observations which which will be used for doing analyses. I suppose that we at some level have to do partitioning but so far Postgres has worked extremely well even if it's based on MVCC. Postgres/Postgis software and communities are sure for sure really fun to work with Postgres/Postgis open source software hold a very high quality. Thanks. Lars Fra: pgsql-performance-ow...@postgresql.org på vegne av Tom Lane Sendt: 24. oktober 2016 14:52 Til: Lars Aksel Opsahl Kopi: pgsql-performance@postgresql.org Emne: Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows Lars Aksel Opsahl writes: > In this example I have two tables one with 4 billion rows and another with > 5 rows and then I try to do a standard simple join between this two > tables and this takes 397391 ms. with this SQL (the query plan is added is > further down) This particular query would work a lot better if you had an index on nora_bc25_observation (point_uid_ref, epoch), ie both join columns in one index. I get the impression that that ought to be the primary key of the table, which would be an even stronger reason to have a unique index on it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
Hi Yes that helps, I tested this on now on the first column now. This basically means that only the first column in multiple column index may be used in single column query. EXPLAIN analyze SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o WHERE o.point_uid_ref = 15 ; QUERY PLAN Aggregate (cost=45540.97..45540.98 rows=1 width=42) (actual time=24.715..24.715 rows=1 loops=1) -> Bitmap Heap Scan on nora_bc25_observation o (cost=477.66..45427.40 rows=45430 width=42) (actual time=6.436..19.006 rows=43832 loops=1) Recheck Cond: (point_uid_ref = 15) -> Bitmap Index Scan on idx_met_vaer_wisline_nora_bc25_observation_test (cost=0.00..466.30 rows=45430 width=0) (actual time=6.320..6.320 rows=43832 loops=1) Index Cond: (point_uid_ref = 15) Total runtime: 24.767 ms (6 rows) Thanks Lars Fra: Scott Marlowe Sendt: 24. oktober 2016 22:23 Til: Lars Aksel Opsahl Kopi: Tom Lane; pgsql-performance@postgresql.org Emne: Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows On Mon, Oct 24, 2016 at 2:07 PM, Lars Aksel Opsahl wrote: > Hi > > Yes this makes both the update and both selects much faster. We are now down > to 3000 ms. for select, but then I get a problem with another SQL where I > only use epoch in the query. > > SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o WHERE > o.epoch = 128844; > count > --- > 97831 > (1 row) > Time: 92763.389 ms > > To get the SQL above work fast it seems like we also need a single index on > the epoch column, this means two indexes on the same column and that eats > memory when we have more than 4 billion rows. > > Is it any way to avoid to two indexes on the epoch column ? You could try reversing the order. Basically whatever comes first in a two column index is easier / possible for postgres to use like a single column index. If not. then you're probably stuck with two indexes. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
Hi Yes this makes both the update and both selects much faster. We are now down to 3000 ms. for select, but then I get a problem with another SQL where I only use epoch in the query. SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o WHERE o.epoch = 128844; count --- 97831 (1 row) Time: 92763.389 ms To get the SQL above work fast it seems like we also need a single index on the epoch column, this means two indexes on the same column and that eats memory when we have more than 4 billion rows. Is it any way to avoid to two indexes on the epoch column ? Thanks. Lars EXPLAIN analyze SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o WHERE o.epoch = 128844; -[ RECORD 1 ]- QUERY PLAN | Aggregate (cost=44016888.13..44016888.14 rows=1 width=42) (actual time=91307.470..91307.471 rows=1 loops=1) -[ RECORD 2 ]- QUERY PLAN | -> Index Scan using idx_met_vaer_wisline_nora_bc25_observation_test on nora_bc25_observation o (cost=0.58..44016649.38 rows=95500 width=42) (actual time=1.942..91287.495 rows=97831 loops=1) -[ RECORD 3 ]- QUERY PLAN | Index Cond: (epoch = 128844) -[ RECORD 4 ]- QUERY PLAN | Total runtime: 91307.534 ms EXPLAIN analyze SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o, met_vaer_wisline.new_data n WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch; -[ RECORD 1 ] QUERY PLAN | Aggregate (cost=131857.71..131857.72 rows=1 width=42) (actual time=182.459..182.459 rows=1 loops=1) -[ RECORD 2 ] QUERY PLAN | -> Nested Loop (cost=0.58..131727.00 rows=52283 width=42) (actual time=0.114..177.420 rows=5 loops=1) -[ RECORD 3 ] QUERY PLAN | -> Seq Scan on new_data n (cost=0.00..1136.00 rows=5 width=8) (actual time=0.050..7.873 rows=5 loops=1) -[ RECORD 4 ] QUERY PLAN | -> Index Scan using idx_met_vaer_wisline_nora_bc25_observation_test on nora_bc25_observation o (cost=0.58..2.60 rows=1 width=50) (actual time=0.003..0.003 rows=1 loops=5) -[ RECORD 5 ] QUERY PLAN | Index Cond: ((point_uid_ref = n.id_point) AND (epoch = n.epoch)) -[ RECORD 6 ] QUERY PLAN | Total runtime: 182.536 ms Time: 3095.618 ms Lars Fra: pgsql-performance-ow...@postgresql.org på vegne av Tom Lane Sendt: 24. oktober 2016 14:52 Til: Lars Aksel Opsahl Kopi: pgsql-performance@postgresql.org Emne: Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows Lars Aksel Opsahl writes: > In this example I have two tables one with 4 billion rows and another with > 5 rows and then I try to do a standard simple join between this two > tables and this takes 397391 ms. with this SQL (the query plan is added is > further down) This particular query would work a lot better if you had an index on nora_bc25_observation (point_uid_ref, epoch), ie both join columns in one index. I get the impression that that ought to be the primary key of the table, which would be an even stronger reason to have a unique index on it. regards
[PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
| integer | not null windspeed_10m | real| air_temperature_2m | real| Indexes: "idx_met_vaer_wisline_nora_bc25_observation_epoch" btree (epoch) "idx_met_vaer_wisline_nora_bc25_observation_point_uid_ref" btree (point_uid_ref) \d met_vaer_wisline.new_data ; Unlogged table "met_vaer_wisline.new_data" Column | Type| Modifiers +---+--- windspeed_10m | real | air_temperature_2m | real | lon| character varying | not null lat| character varying | not null epoch | integer | epoch_as_numeric | numeric | not null rest | character varying | id_point | integer | Indexes: "idx_met_vaer_wisline_new_data_epoch" btree (epoch) "idx_met_vaer_wisline_new_data_id_point" btree (id_point) vacuum analyze met_vaer_wisline.nora_bc25_observation; vacuum analyze met_vaer_wisline.new_data; SELECT count(*) from met_vaer_wisline.new_data; count --- 5 (1 row) SELECT count(*) from met_vaer_wisline.nora_bc25_observation ; count 4263866304 Thanks . Lars
Re: [PERFORM] Which Join is better
Unless you use the explicit join syntax: select p.* from A p join B q on (p.id = q.id) and also set join_collapse_limit= 1 The order of the joins is determined by the planner. Also explain is your friend :) From: Adarsh Sharma To: pgsql-performance@postgresql.org Sent: Monday, August 1, 2011 11:42 PM Subject: [PERFORM] Which Join is better Dear all, Just want to know which join is better for querying data faster. I have 2 tables A ( 70 GB ) & B ( 7 MB ) A has 10 columns & B has 3 columns.Indexes exist on both tables's ids. select p.* from table A p, B q where p.id=q.id or select p.* from table B q , A p where q.id=p.id Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] synchronous_commit off
No: The commit has the same guarantees as a synchronous commit w.r.t. data consistency. The commit can only fail (as a whole) due to hardware problems or postgres backend crashes. And yes: The client commit returns, but the server can fail later and not persist the transaction and it will be lost (again as a whole). Your application should be able to tolerate losing the latest committed transactions if you use this. The difference to fsync=off is that a server crash will leave the database is a consistent state with just the latest transactions lost. From: Anibal David Acosta To: pgsql-performance@postgresql.org Sent: Monday, August 1, 2011 6:29 AM Subject: [PERFORM] synchronous_commit off Can a transaction committed asynchronously report an error, duplicate key or something like that, causing a client with a OK transaction but server with a FAILED transaction. Thanks
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Thanks Pavan! I think the most important points are still that: 1. The WAL write should be happening asynchronously (if that is possible) 2. There should be an option do not perform these compactions if the page is only touched by reads. (Assuming that when most of the databaseresides in the cache these optimizations are less important.) -- Lars - Original Message - From: Pavan Deolasee To: Merlin Moncure Cc: lars ; Kevin Grittner ; Ivan Voras ; pgsql-performance@postgresql.org Sent: Wednesday, July 27, 2011 7:15 AM Subject: Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database On Wed, Jul 13, 2011 at 10:52 AM, Merlin Moncure wrote: ... There are couple of other (very important) things that HOT does, but probably its not advertised a lot. Even for non-HOT updates (which means either indexed columns were changed or page ran out of free space) or deletes, HOT prunes those tuples and instead mark the line pointer as DEAD. The page is defragmented and dead space is recovered. Each such dead tuple now only consumes two bytes in the page until vacuum removes the dead line pointers. Thats the reason why OP is seeing the behavior even when index columns are being updated. We made a few adjustments to ensure that a page is not pruned too early. So we track the oldest XID that did any updates/deletes to the page and attempt pruning only when the RecentXmin is past the XID. We also mark the page as "full" if some previous update did not find enough free space to do in-block update and use that hint to decide if we should attempt to prune the page. Finally, we prune only if we get the cleanup lock without blocking. What might be worth looking at this condition in pruneheap.c: /* * We prune when a previous UPDATE failed to find enough space on the page * for a new tuple version, or when free space falls below the relation's * fill-factor target (but not less than 10%). * * Checking free space here is questionable since we aren't holding any * lock on the buffer; in the worst case we could get a bogus answer. It's * unlikely to be *seriously* wrong, though, since reading either pd_lower * or pd_upper is probably atomic. Avoiding taking a lock seems more * important than sometimes getting a wrong answer in what is after all * just a heuristic estimate. */ minfree = RelationGetTargetPageFreeSpace(relation, HEAP_DEFAULT_FILLFACTOR); minfree = Max(minfree, BLCKSZ / 10); if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree) { So if the free space in a page falls below the fill-factor or 10% of the block size, we would try to prune the page. We probably need to revisit this area and see if we need to tune HOT ever better. One option could be to see how much space we are going to free and carry out the operation only if its significant enough to justify the cost. I know we had done several benchmarking tests while HOT development, but the tuning mechanism still may not be perfect for all kinds of work loads and it would probably never be. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/14/2011 08:47 AM, Tom Lane wrote: The implementation I was imagining was to define another bit in the info parameter for XLogInsert, say XLOG_NON_TRANSACTIONAL. This could be a high-order bit that would not go to disk. Anytime it was *not* set, XLogInsert would set a global boolean that would remember that the current transaction wrote a transactional WAL record. This is the right default since the vast majority of call sites are writing records that we would want to have flushed at commit. There are just a couple of places that would need to be changed to add this flag to their calls. regards, tom lane If you have a patch in mind I'm happy to test it on my setup and report back. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/14/2011 04:03 PM, Jeff Janes wrote: On Wed, Jul 13, 2011 at 3:41 PM, lars wrote: On 07/13/2011 11:42 AM, Kevin Grittner wrote: So transactions without an XID *are* sensitive to synchronous_commit. That's likely a useful clue. How much did it help the run time of the SELECT which followed the UPDATE? It has surprisingly little impact on the SELECT side: If your fsync is truly fsyncing, it seems like it should have considerable effect. Could you strace with both -ttt and -T, with and without synchronous commit? Cheers, Jeff Ok, here we go: "Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 <5.357152> 1310774187.750791 gettimeofday({1310774187, 750809}, NULL) = 0 <0.22> 1310774187.751023 lseek(12, 0, SEEK_END) = 329908224 <0.23> 1310774187.751109 lseek(15, 0, SEEK_END) = 396607488 <0.22> 1310774187.751186 lseek(18, 0, SEEK_END) = 534175744 <0.22> 1310774187.751360 lseek(12, 0, SEEK_END) = 329908224 <0.23> 1310774187.753389 brk(0x248e000)= 0x248e000 <0.26> 1310774187.753953 brk(0x24ce000)= 0x24ce000 <0.23> 1310774187.755158 brk(0x254e000)= 0x254e000 <0.24> 1310774187.766605 brk(0x245)= 0x245 <0.000170> 1310774187.766852 lseek(23, 4513792, SEEK_SET) = 4513792 <0.23> 1310774187.766927 write(23, "f\320\1\0\1\0\0\0\320\0\0\0\0\340D-\22\0\0\0\30@!0"..., 32768) = 32768 <0.75> 1310774187.767071 fdatasync(23) = 0 <0.002618> 1310774187.769760 gettimeofday({1310774187, 769778}, NULL) = 0 <0.22> 1310774187.769848 sendto(5, "\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0>\0\2\0"..., 232, 0, NULL, 0) = 232 <0.64> 1310774187.769993 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 <0.000199> (23 is the WAL fd) vs. "Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 <7.343720> 1310774306.978767 gettimeofday({1310774306, 978785}, NULL) = 0 <0.21> 1310774306.978994 lseek(12, 0, SEEK_END) = 330883072 <0.24> 1310774306.979080 lseek(15, 0, SEEK_END) = 397131776 <0.21> 1310774306.979157 lseek(18, 0, SEEK_END) = 534732800 <0.22> 1310774306.979332 lseek(12, 0, SEEK_END) = 330883072 <0.22> 1310774306.983096 brk(0x248e000)= 0x248e000 <0.26> 1310774306.983653 brk(0x24ce000)= 0x24ce000 <0.23> 1310774306.984667 brk(0x254e000)= 0x254e000 <0.23> 1310774306.996040 brk(0x245)= 0x245 <0.000168> 1310774306.996298 gettimeofday({1310774306, 996317}, NULL) = 0 <0.21> 1310774306.996388 sendto(5, "\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0>\0\2\0"..., 232, 0, NULL, 0) = 232 <0.78> 1310774306.996550 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 <0.000202> So the difference is only 2ms. The size of the WAL buffers written is on 32k, Here's an example with more dirty rows (I basically let the updater run for a while dirtying very many rows). "Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 <23.690018> 1310775141.398780 gettimeofday({1310775141, 398801}, NULL) = 0 <0.28> 1310775141.399018 lseek(12, 0, SEEK_END) = 372514816 <0.23> 1310775141.399105 lseek(15, 0, SEEK_END) = 436232192 <0.22> 1310775141.399185 lseek(18, 0, SEEK_END) = 573620224 <0.23> 1310775141.399362 lseek(12, 0, SEEK_END) = 372514816 <0.24> 1310775141.414017 brk(0x249)= 0x249 <0.28> 1310775141.414575 brk(0x24d)= 0x24d <0.25> 1310775141.415600 brk(0x255)= 0x255 <0.24> 1310775141.417757 semop(229383, {{0, -1, 0}}, 1) = 0 <0.24> ... 1310775141.448998 semop(229383, {{0, -1, 0}}, 1) = 0 <0.25> 1310775141.453134 brk(0x2452000)= 0x2452000 <0.000167> 1310775141.453377 fadvise64(22, 0, 0, POSIX_FADV_DONTNEED) = 0 <0.25> 1310775141.453451 close(22) = 0 <0.32> 1310775141.453537 open("pg_xlog/000100D100C2", O_RDWR) = 22 <0.59> 1310775141.453696 write(22, "f\320\3\0\1\0\0\0\321\0\0\0\0\0\0\3023\356\17N\23l\vN\0\0\0\1\0 \0\0"..., 5365760) = 5365760 <0.005991> 1310775141.459798 write(22, "f\320\1\0\1\0\0\0\321\0\0\0\0\340Q\302`\5\0\915!000"..., 9019392) = 9019392 <0.010062> 1310775141.469965 fdatasync(22) = 0 <0.231385> 1310775141.701424 semop(229383, {{2, 1, 0}}, 1) = 0 <0.31> 1310775141.702657 gettimeofday({1310775141, 702682}, NULL) = 0 <0.28> 1310775141.702765 sendto(5, "\2\0\0\0\350\0\0\0\1
Re: [PERFORM] Statistics and Multi-Column indexes
On 07/10/2011 02:31 PM, Samuel Gendler wrote: What about partitioning tables by tenant id and then maintaining indexes on each partition independent of tenant id, since constraint exclusion should handle filtering by tenant id for you. That seems like a potentially more tolerable variant of #5 How many tenants are we talking about? I gather partitioning starts to become problematic when the number of partitions gets large. I thought I had replied... Apparently I didn't. The database can grow in two dimensions: The number of tenants and the number of rows per tenant. We have many tenants with relatively little data and a few with a lot of data. So the number of tenants is known ahead of time and might be 1000's. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/13/2011 11:42 AM, Kevin Grittner wrote: So transactions without an XID *are* sensitive to synchronous_commit. That's likely a useful clue. How much did it help the run time of the SELECT which followed the UPDATE? It has surprisingly little impact on the SELECT side: => set synchronous_commit = on; => update test set created_by = '001' where tenant = '001'; UPDATE 3712 Time: 384.702 ms lars=> select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 36.571 ms => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 5.702 ms => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 5.822 ms => set synchronous_commit = off; SET Time: 0.145 ms => update test set created_by = '001' where tenant = '001'; UPDATE 3712 Time: 96.227 ms => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 32.422 ms => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 6.080 ms I tried it multiple times, and while the numbers change by 5-10ms the relationship is the same. The same results show when I use my JDBC code to run updates/selects as fast as possible. When synchronous_commit is off for the SELECTing process it seems to be slightly faster. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/13/2011 07:46 AM, Kevin Grittner wrote: I've mentioned this in a hand-wavy general sense, but I should have mentioned specifics ages ago: for a database where the active portion of the database is fully cached, it is best to set seq_page_cost and random_page_cost to the same value, somewhere in the 0.1 to 0.05 range. (In your case I would use 0.05.) In highly cached databases I have sometimes also found it necessary to increase cpu_tuple_cost. (In your case I might try 0.02.) I've been doing that for other tests already (I didn't want to add too many variations here). The Bitmap Heap scans through the table are only useful for spinning media and not the cache (just to state the obvious). As an aside: I found that queries in a cold database take almost twice as long when I make that change, so for spinning media this is very important. Which raises an interesting question -- what happens to the timings if your SELECTs are done with synchronous_commit = off? Just tried that... In that case the WAL is still written (as seen via iostat), but not synchronously by the transaction (as seen by strace). -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Aggregate (cost=12284.68..12284.69 rows=1 width=0) (actual time=30.738..30.739 rows=1 loops=1) Buffers: shared hit=2001 -> Bitmap Heap Scan on test (cost=91.78..12276.38 rows=3319 width=0) (actua l time=2.589..28.361 rows=3712 loops=1) Recheck Cond: (tenant = '001'::bpchar) Filter: (created_date = '2011-06-30'::date) Buffers: shared hit=2001 -> Bitmap Index Scan on i1 (cost=0.00..90.95 rows=3319 width=0) (actu al time=2.301..2.301 rows=17123 loops=1) Index Cond: (tenant = '001'::bpchar) Buffers: shared hit=107 Total runtime: 30.785 ms (10 rows) There seems to be definitely something funky going on. Since created_by is indexed it shouldn't do any HOT logic. Is there any other information that I can provide? I'm happy to recompile with a patch applied, etc. Thanks. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
shared_buffers is big enough to hold the entire database, and there is plenty of extra space. (verified with PG_buffercache) So i don't think that is the reason. Tom Lane schrieb: >Jeff Janes writes: >> On 7/12/11, lars wrote: >>> The fact that a select (maybe a big analytical query we'll run) touching >>> many rows will update the WAL and wait >>> (apparently) for that IO to complete is making a fully cached database >>> far less useful. >>> I just artificially created this scenario. > >> I can't think of any reason that that WAL would have to be flushed >> synchronously. > >Maybe he's running low on shared_buffers? We would have to flush WAL >before writing a dirty buffer out, so maybe excessive pressure on >available buffers is part of the issue here. > > regards, tom lane > >-- >Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 02:38 PM, Merlin Moncure wrote: Something is not adding up here. Perhaps there is an alternate route to WAL logged activity from selects I'm not thinking of. Right now I'm thinking to run the selects on table 'a' and the inserts concurrently on table 'b' and seeing how that behaves. Another way to get to the bottom is to oprofile the selecting-during-load backend to see where the time is getting spent. Alternative way to do this is to strace attach to the selecting-during-load backend to see if it's really writing to the WAL (I'm really curious about this). Another interesting test would be to try and reproduce the results on native machine. It should be possible to do this on your workstation with a more modestly sized scaling factor. merlin Just tried with two of my test tables. Updates on 'a' have no (measurable) effect on select from 'b'. Back to the first case, here's an strace from the backend doing the select right after the updates. "Q\0\0\0`select count(*) from test1 "..., 8192, 0, NULL, NULL) = 97 gettimeofday({1310512219, 723762}, NULL) = 0 open("base/16385/33032", O_RDWR)= 8 lseek(8, 0, SEEK_END) = 1073741824 open("base/16385/33032.1", O_RDWR|O_CREAT, 0600) = 9 lseek(9, 0, SEEK_END) = 1073741824 open("base/16385/33032.2", O_RDWR|O_CREAT, 0600) = 10 lseek(10, 0, SEEK_END) = 191348736 open("base/16385/33035", O_RDWR)= 11 lseek(11, 0, SEEK_END) = 1073741824 open("base/16385/33035.1", O_RDWR|O_CREAT, 0600) = 12 lseek(12, 0, SEEK_END) = 3571712 lseek(10, 0, SEEK_END) = 191348736 brk(0x28ad000) = 0x28ad000 mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5f28ca mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5f28c5f000 munmap(0x7f5f28c5f000, 266240) = 0 munmap(0x7f5f28ca, 135168) = 0 open("pg_xlog/000100BB0012", O_RDWR) = 13 lseek(13, 1564672, SEEK_SET)= 1564672 write(13, "f\320\1\0\1\0\0\0\273\0\0\0\0\340\27\22`\32\0\0002833!000"..., 2400256) = 2400256 fdatasync(13) = 0 semop(229383, {{9, 1, 0}}, 1) = 0 gettimeofday({1310512219, 885287}, NULL) = 0 sendto(5, "\2\0\0\0\300\3\0\0\1@\0\0\t\0\0\0\1\0\0\0\0\0\0\0\353\4\0\0@\0\2\0"..., 960, 0, NULL, 0) = 960 sendto(5, "\2\0\0\0\300\3\0\0\1@\0\0\t\0\0\0\0\0\0\0\0\0\0\0009\n\0\0@\0\2\0"..., 960, 0, NULL, 0) = 960 sendto(5, "\2\0\0\0\300\3\0\0\1@\0\0\t\0\0\0\0\0\0\0\0\0\0\0v\n\0\0@\0\2\0"..., 960, 0, NULL, 0) = 960 sendto(5, "\2\0\0\0\270\1\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\275\4\0\0\377\177\0\0"..., 440, 0, NULL, 0) = 440 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 So the backend definitely writing to the WAL, directly and synchronously. Selecting the same set of rows again: "Q\0\0\0`select count(*) from test1 "..., 8192, 0, NULL, NULL) = 97 gettimeofday({1310512344, 823728}, NULL) = 0 lseek(10, 0, SEEK_END) = 191348736 lseek(12, 0, SEEK_END) = 3571712 lseek(10, 0, SEEK_END) = 191348736 brk(0x28d5000) = 0x28d5000 brk(0x2915000) = 0x2915000 brk(0x2897000) = 0x2897000 gettimeofday({1310512344, 831043}, NULL) = 0 sendto(5, "\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\10\201\0\0?\0\2\0"..., 232, 0, NULL, 0) = 232 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 No writing to the WAL. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 02:51 PM, Kevin Grittner wrote: I ran x a bunch of times to get a baseline, then y once, then x a bunch more times. The results were a bit surprising: cir=> \timing Timing is on. cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 9.823 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 8.481 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 14.054 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 10.169 ms cir=> execute y('001', '001','2011-6-30'); UPDATE 3456 Time: 404.244 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 128.643 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.657 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 5.883 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.645 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.753 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.253 ms Interesting. When you did you test, did you also find WAL write activity when running x the first time after y? (It's very hard to catch in only a single query, though). Running the update made the next SELECT slow, then it was much *faster*. My best guess is that the data landed in a more concentrated set of pages after the update, and once autovacuum kicked in and cleaned things up it was able to get to that set of data faster. autovacuum | off Well, certainly not while under modification without running autovacuum. That's disabling an integral part of what keeps performance up. Oh, it's just switched off for testing, so that I can control when vacuum runs and make sure that it's not skewing the results while I am measuring something. In a real database I would probably err on vacuuming more than less. For a fully cached database I would probably want to switch off HOT pruning and compaction (which from what we see is done synchronously with the select) and leave it up to the asynchronous auto vacuum to do that. But maybe I am still not quite understanding the performance implications. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 01:04 PM, lars wrote: On 07/12/2011 12:08 PM, Kevin Grittner wrote: lars wrote: select count(*) from test where tenant = $1 and created_date = $2 Ah, that might be a major clue -- prepared statements. What sort of a plan do you get for that as a prepared statement? (Note, it is very likely *not* to be the same plan as you get if you run with literal values!) It is not at all unlikely that it could resort to a table scan if you have one tenant which is five or ten percent of the table, which would likely trigger the pruning as it passed over the modified pages. -Kevin So a read of a row *will* trigger dead tuple pruning, and that requires WAL logging, and this is known/expected? This is actually the only answer I am looking for. :) I have not seen this documented anywhere. I know that Postgres will generate general plans for prepared statements (how could it do otherwise?), I also know that it sometimes chooses a sequential scan. This can always be tweaked to touch fewer rows and/or use a different plan. That's not my objective, though! The fact that a select (maybe a big analytical query we'll run) touching many rows will update the WAL and wait (apparently) for that IO to complete is making a fully cached database far less useful. I just artificially created this scenario. ... Just dropped the table to test something so I can't get the plan right now. Will send an update as soon as I get it setup again. Thanks again. -- Lars Ok... Slightly changes the indexes: \d test Table "lars.test" Column| Type | Modifiers --+---+--- tenant | character(15) | created_by | character(15) | created_date | date | Indexes: "i1" btree (tenant) So just just a simple index on tenant. prepare x as select count(*) from test where tenant = $1 and created_date = $2; PREPARE explain execute x('001','2011-6-30'); QUERY PLAN --- Aggregate (cost=263301.40..263301.41 rows=1 width=0) -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) Recheck Cond: (tenant = $1) Filter: (created_date = $2) -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) Index Cond: (tenant = $1) (6 rows) -- this is when the WAL rows are written: explain (analyze on, buffers on) execute x('001','2011-6-30'); QUERY PLAN Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual time=191.150..191.151 rows=1 loops=1) Buffers: shared hit=3716 -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) (actual time=1.966..188.221 rows=3712 loops=1) Recheck Cond: (tenant = $1) Filter: (created_date = $2) Buffers: shared hit=3716 -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) (actual time=1.265..1.265 rows=3712 loops=1) Index Cond: (tenant = $1) Buffers: shared hit=20 Total runtime: 191.243 ms (10 rows) -- this is when no WAL is written: explain (analyze on, buffers on) execute x('001','2011-6-30'); QUERY PLAN Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual time=11.529..11.530 rows=1 loops=1) Buffers: shared hit=3715 -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) (actual time=1.341..9.187 rows=3712 loops=1) Recheck Cond: (tenant = $1) Filter: (created_date = $2) Buffers: shared hit=3715 -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) (actual time=0.756..0.756 rows=3712 loops=1) Index Cond: (tenant = $1) Buffers: shared hit=19 Total runtime: 11.580 ms (10 rows) If you wanted to recreate this scenario I created a simple script to create the table: create table test(tenant char(15), created_by char(15), created_date date); insert into test values('x', 'y','2011-6-30'); insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test sele
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 12:08 PM, Kevin Grittner wrote: lars wrote: select count(*) from test where tenant = $1 and created_date = $2 Ah, that might be a major clue -- prepared statements. What sort of a plan do you get for that as a prepared statement? (Note, it is very likely *not* to be the same plan as you get if you run with literal values!) It is not at all unlikely that it could resort to a table scan if you have one tenant which is five or ten percent of the table, which would likely trigger the pruning as it passed over the modified pages. -Kevin So a read of a row *will* trigger dead tuple pruning, and that requires WAL logging, and this is known/expected? This is actually the only answer I am looking for. :) I have not seen this documented anywhere. I know that Postgres will generate general plans for prepared statements (how could it do otherwise?), I also know that it sometimes chooses a sequential scan. This can always be tweaked to touch fewer rows and/or use a different plan. That's not my objective, though! The fact that a select (maybe a big analytical query we'll run) touching many rows will update the WAL and wait (apparently) for that IO to complete is making a fully cached database far less useful. I just artificially created this scenario. ... Just dropped the table to test something so I can't get the plan right now. Will send an update as soon as I get it setup again. Thanks again. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 08:13 AM, Ivan Voras wrote: On 12/07/2011 02:09, lars wrote: Oh, and iowait hovers around 20% when SELECTs are slow: avg-cpu: %user %nice %system %iowait %steal %idle 1.54 0.00 0.98 18.49 0.07 78.92 When SELECTs are fast it looks like this: avg-cpu: %user %nice %system %iowait %steal %idle 8.72 0.00 0.26 0.00 0.00 91.01 Note that this is a 12 core VM. So one core at 100% would show as 8.33% CPU. Now only if you could do an "iostat -x" and show the output in both cases... Sure (sorry for missing details): iostat -x during selects when all's fine: avg-cpu: %user %nice %system %iowait %steal %idle 8.250.000.000.000.00 91.75 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 1.000.002.00 0.0024.00 12.00 0.000.00 0.00 0.00 xvdf 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdg 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdap1 is OS volumn. xvdf holds the database files xvdg holds the WAL No IO on database/WAL volumes, one core is pegged close to 100% CPU. iostat -x during update: avg-cpu: %user %nice %system %iowait %steal %idle 1.050.000.584.000.00 94.37 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdf 0.00 0.007.000.00 128.00 0.00 18.29 0.000.00 0.00 0.00 xvdg 0.00 7352.000.00 804.00 0.00 62368.00 77.5766.07 68.83 0.86 69.20 Just updating the WAL. --- and while it's checkpointing: avg-cpu: %user %nice %system %iowait %steal %idle 0.640.000.328.880.00 90.16 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdf 0.00 2548.002.00 1658.0032.00 33408.00 20.14 144.18 86.69 0.60 100.00 xvdg 0.00 5428.000.00 778.00 0.00 58480.00 75.1777.44 100.22 1.21 94.00 Updating the WAL, and database volume due to checkpointing. -- iostat -x after I stopped the update process and checkpointing is done: avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.000.000.00 100.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdf 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdg 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 No activity at all. - iostat -x after I started the select queries after the updates: avg-cpu: %user %nice %system %iowait %steal %idle 2.090.001.49 12.150.00 84.26 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdf 0.00 8.000.002.00 0.0080.00 40.00 0.002.00 2.00 0.40 xvdg 0.00 7844.001.00 1098.00 8.00 82336.00 74.9358.27 59.39 0.70 77.20 Heavy writes to the WAL volume. select * from pg_stat_activity; datid | datname | procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | qu ery_start | waiting | current_query ---+-+-+--+--+--+-+-+-+---+---+ ---+-+--- 16385 | lars| 2654 |16384 | lars | | 127.0.0.1 | | 44972 | 2011-07-12 18:44:09.479581+00 | 2011-07-12 18:50:32.629412+00 | 2011-07-12 18:50:32.629473+00 | f | select count(*) from test where tenant = $1 and created_date = $2 16385 | lars|2658 | 10 | postgres | psql | | | -1 | 2011-07-12 18:49:02.675436+00 | 2011-07-12 18:50:32.631013+00 | 2011-07-12
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/11/2011 08:26 AM, Robert Klemme wrote: On Mon, Jul 11, 2011 at 3:13 PM, k...@rice.edu wrote: I do not know if this makes sense in PostgreSQL and that readers do not block writers and writes do not block readers. Are your UPDATEs to individual rows, each in a separate transaction, or do you UPDATE multiple rows in the same transaction? If you perform multiple updates in a single transaction, you are synchronizing the changes to that set of rows and that constraint is causing other readers that need to get the correct values post- transaction to wait until the COMMIT completes. This means that the WAL write must be completed. What readers should that be? Docs explicitly state that readers are never blocked by writers: http://www.postgresql.org/docs/9.0/interactive/mvcc-intro.html http://www.postgresql.org/docs/9.0/interactive/mvcc.html From what I understand about this issue the observed effect must be caused by the implementation and not by a conceptual issue with transactions. Have you tried disabling synchronous_commit? If this scenario holds, you should be able to reduce the slowdown by un-batching your UPDATEs, as counter-intuitive as that is. This seems to be similar to a problem that I have been looking at with using PostgreSQL as the backend to a Bayesian engine. I am following this thread with interest. I don't think this will help (see above). Also, I would be very cautious to do this because although the client might get a faster acknowledge the DB still has to do the same work as without synchronous_commit (i.e. WAL, checkpointing etc.) but it still has to do significantly more transactions than in the batched version. Typically there is an optimum batch size: if batch size is too small (say, one row) the ratio of TX overhead to "work" is too bad. If batch size is too large (say, millions of rows) you hit resource limitations (memory) which inevitable force the RDBMS to do additional disk IO. Kind regards robert Thanks Ken and Robert, What I am observing is definitely not readers blocked by writers by means of row-level locking. This seems to be some implementation detail in Postgres about how dirty pages (or dead versions of tuples) are flushed to the disk (see the other parts of this thread) when they accessed by a SELECT query. The batch size in this case is one SELECT statement accessing 1 rows via an aggregate (such as COUNT) and an UPDATE updating 1 rows in a single statement. I am not trying to optimize this particular use case, but rather to understand what Postgres is doing, and why SELECT queries are affected negatively (sometimes severely) by concurrent (or even preceding) UPDATEs at all when the database resides in the cache completely. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/11/2011 04:02 PM, lars wrote: On 07/11/2011 02:43 PM, Merlin Moncure wrote: On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner wrote: lars wrote: Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, and then running the SELECTs indeed shows a similar slowdown. Interestingly I see very heavy WAL traffic while executing the SELECTs. (So I was confused as to what caused the WAL traffic). Hint bit changes aren't logged, so if it was that you would be seeing writes to the heap, but not to the WAL. Clean-up of dead tuples is logged -- this is probably the result of pruning dead tuples. You could probably reduce the impact on your SELECT statements at least a little by making autovacuum more aggressive. yeah. In fact, I'd like to disable autovacuum completely just to confirm this. In particular I'd like to know if that removes wal traffic when only selects are going on. Another way to check is to throw some queries to pg_stat_activity during your select period and see if any non-select activity (like autovacum vacuum). Basically I'm suspicious there is more to this story. hint bit flusing causing i/o during SELECT is a typical complaint (especially on systems with underperformant i/o), but I'm suspicious if that's really the problem here. Since you are on a virtualized platform, I can't help but wonder if you are running into some bottleneck that you wouldn't see on native hardware. What's iowait during the slow period? merlin Thanks Kevin and Merlin this is extremely helpful... Ok, that makes much more sense (WALing hint bits did not make sense). I disabled auto-vacuum and did four tests: 1. Run a bunch of updates, stop that process, wait until checkpointing is finished, and run the selects (as before). 2. run VACUUM manually, then run the SELECTs 3. Have the UPDATEs and SELECTs touch a mutually exclusive, random sets of row (still in sets of 1). So the SELECTs are guaranteed not to select rows that were updated. 4. Lastly, change the UPDATEs to update a non-indexed column. To rule out Index maintenance. Still distinct set of rows. In the first case I see the same slowdown (from ~14ms to ~400-500ms). pg_stat_activity shows no other load during that time. I also see write activity only on the WAL volume. In the 2nd case after VACUUM is finished the time is back to 14ms. As an aside: If I run the SELECTs while VACUUM is running the slowdown is about the same as in the first case until (apparently) VACUUM has cleaned up most of the table, at which point the SELECTs become faster again (~50ms). In the 3rd case I see exactly the same behavior, which is interesting. Both before VACUUM is run and after. There's no guarantee obviously that distinct rows do not share the same page of course especially since the index is updated as part of this (but see the 4th case). In case 4 I still see the same issue. Again both before and after VACUUM. In all cases I see from pg_stat_bgwriter that no backend writes buffers directly (but I think that only pertains to dirty buffers, and not the WAL). So I think I have a partial answer to my initial question. However, that brings me to some other questions: Why do SELECTs cause dead tuples to be pruned (as Kevin suggests)? That even happens when the updates did not touch the selected rows(?) And why does that slow down the SELECTs? (checkpointing activity on the EBS volume holding the database for example does not slow down SELECTs at all, only WAL activity does). Does the selecting backend do that work itself? Lastly, is this documented somewhere? (I apologize if it is and I missed it). If not I'd be happy to write a wiki entry for this. Oh, and iowait hovers around 20% when SELECTs are slow: avg-cpu: %user %nice %system %iowait %steal %idle 1.540.000.98 18.490.07 78.92 When SELECTs are fast it looks like this: avg-cpu: %user %nice %system %iowait %steal %idle 8.720.000.260.000.00 91.01 Note that this is a 12 core VM. So one core at 100% would show as 8.33% CPU. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/11/2011 02:43 PM, Merlin Moncure wrote: On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner wrote: lars wrote: Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, and then running the SELECTs indeed shows a similar slowdown. Interestingly I see very heavy WAL traffic while executing the SELECTs. (So I was confused as to what caused the WAL traffic). Hint bit changes aren't logged, so if it was that you would be seeing writes to the heap, but not to the WAL. Clean-up of dead tuples is logged -- this is probably the result of pruning dead tuples. You could probably reduce the impact on your SELECT statements at least a little by making autovacuum more aggressive. yeah. In fact, I'd like to disable autovacuum completely just to confirm this. In particular I'd like to know if that removes wal traffic when only selects are going on. Another way to check is to throw some queries to pg_stat_activity during your select period and see if any non-select activity (like autovacum vacuum). Basically I'm suspicious there is more to this story. hint bit flusing causing i/o during SELECT is a typical complaint (especially on systems with underperformant i/o), but I'm suspicious if that's really the problem here. Since you are on a virtualized platform, I can't help but wonder if you are running into some bottleneck that you wouldn't see on native hardware. What's iowait during the slow period? merlin Thanks Kevin and Merlin this is extremely helpful... Ok, that makes much more sense (WALing hint bits did not make sense). I disabled auto-vacuum and did four tests: 1. Run a bunch of updates, stop that process, wait until checkpointing is finished, and run the selects (as before). 2. run VACUUM manually, then run the SELECTs 3. Have the UPDATEs and SELECTs touch a mutually exclusive, random sets of row (still in sets of 1). So the SELECTs are guaranteed not to select rows that were updated. 4. Lastly, change the UPDATEs to update a non-indexed column. To rule out Index maintenance. Still distinct set of rows. In the first case I see the same slowdown (from ~14ms to ~400-500ms). pg_stat_activity shows no other load during that time. I also see write activity only on the WAL volume. In the 2nd case after VACUUM is finished the time is back to 14ms. As an aside: If I run the SELECTs while VACUUM is running the slowdown is about the same as in the first case until (apparently) VACUUM has cleaned up most of the table, at which point the SELECTs become faster again (~50ms). In the 3rd case I see exactly the same behavior, which is interesting. Both before VACUUM is run and after. There's no guarantee obviously that distinct rows do not share the same page of course especially since the index is updated as part of this (but see the 4th case). In case 4 I still see the same issue. Again both before and after VACUUM. In all cases I see from pg_stat_bgwriter that no backend writes buffers directly (but I think that only pertains to dirty buffers, and not the WAL). So I think I have a partial answer to my initial question. However, that brings me to some other questions: Why do SELECTs cause dead tuples to be pruned (as Kevin suggests)? That even happens when the updates did not touch the selected rows(?) And why does that slow down the SELECTs? (checkpointing activity on the EBS volume holding the database for example does not slow down SELECTs at all, only WAL activity does). Does the selecting backend do that work itself? Lastly, is this documented somewhere? (I apologize if it is and I missed it). If not I'd be happy to write a wiki entry for this. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/11/2011 10:33 AM, Kevin Grittner wrote: lars hofhansl wrote: Yep, I am not seeing the SELECTs slow down (measurably) during checkpoints (i.e. when dirty pages are flushed to disk), but only during writing of the WAL files. How about if you do a whole slew of the UPDATEs and then stop those and run a bunch of SELECTs? (I don't think I've seen anything mentioned so far which rules out hint bit rewrites as an issue.) I see you have tweaked things to balance the writes -- you might want to try further adjustments to reduce backend writes and see what happens. -Kevin Hmm... You are right. Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, and then running the SELECTs indeed shows a similar slowdown. Interestingly I see very heavy WAL traffic while executing the SELECTs. (So I was confused as to what caused the WAL traffic). Why do changes to the hint bits need to be logged to the WAL? If we loose them we can always get that information back from the commit log. Maybe the backend does not know why the page is dirty and will write it to the WAL anyway(?) If that is the case there seems to be room to optimize that. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Thanks Craig. Yep, I am not seeing the SELECTs slow down (measurably) during checkpoints (i.e. when dirty pages are flushed to disk), but only during writing of the WAL files. The buffers shared and OS are big enough to hold the entire database, so evicting cached data should not be necessary. (The database including indexes can fit into 16 or so GB, and I have 68GB on that machine). Interestingly I initially thought there might be a correlation between checkpointing and slower SELECTs, but it turns out that checkpointing just slowed down IO to the WAL - until I move it to its own drive, and then increased the effect I was seeing. I'll do more research and try to provide more useful details. Thanks for the pg_catalog link, I'll have a look at it. -- Lars - Original Message From: Craig Ringer To: pgsql-performance@postgresql.org Sent: Sun, July 10, 2011 4:11:39 PM Subject: Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database On 11/07/2011 4:34 AM, lars wrote: > I have since moved the WAL to its own EBS volume (ext4, data=writeback) > to make it easier to monitor IO. > The times where the SELECTs slow down coincide with heavy write traffic > to the WAL volume. In theory, UPDATEs shouldn't be blocking or slowing SELECTs. Whether that holds up to the light of reality, real-world hardware, and software implementation detail, I really don't know. I avoided responding to your first mail because I generally work with smaller and less performance critical databases so I haven't accumulated much experience with fine-tuning. If your SELECTs were slower *after* your UPDATEs I'd be wondering if your SELECTs are setting hint bits on the pages touched by the UPDATEs. See: http://wiki.postgresql.org/wiki/Hint_Bits . It doesn't sound like that's the case if the SELECTs are slowed down *during* a big UPDATE that hasn't yet committed, though. Could it just be cache pressure - either on shm, or operating system disk cache? All the dirty buffers that have to be flushed to WAL and to the heap may be evicting cached data your SELECTs were benefitting from. Unfortunately, diagnostics in this area are ... limited ... though some of the pg_catalog views (http://www.postgresql.org/docs/9.0/static/catalogs.html) may offer some information. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Statistics and Multi-Column indexes
I know this has been discussed various times... We are maintaining a large multi tenant database where *all* tables have a tenant-id and all indexes and PKs lead with the tenant-id. Statistics and counts for the all other columns are only really meaningful within the context of the tenant they belong to. There appear to be five options for me: 1. Using single column indexes on all interesting columns and rely on PostgreSQLs bitmap indexes to combine them (which are pretty cool). 2. Use multi column indexes and accept that sometimes Postgres pick the wrong index (because a non-tenant-id column might seem highly selective over the table, but it is not for a particular tenant - or vice versa). 3. Use a functional index that combines multiple columns and only query via these, that causes statistics gathering for the expression. I.e. create index i on t((tenantid||column1)) and SELECT ... FROM t WHERE tenantid||column1 = '...' 4. Play with n_distinct and/or set the statistics for the inner columns to some fixed values that lead to the plans that we want. 5. Have a completely different schema and maybe a database per tenant. Currently we use Oracle and query hinting, but I do not like that practice at all (and Postgres does not have hints anyway). Are there any other options? #1 would be the simplest, but I am concerned about the overhead, both maintaining two indexes and building the bitmap during queries - for every query. I don't think #2 is actually an option. We have some tenants with many (sometimes 100s) millions of rows per table, and picking the wrong index would be disastrous. Could something like #3 be generally added to Postgres? I.e. if there is a multi column index keep combined statistics for the involved columns. Of course in that case is it no longer possible to query the index by prefix. #3 also seems expensive as the expression needs to be evaluated for each changed row. Still trying #4. I guess it involves setting the stat target for the inner columns to 0 and then inserting my own records into pg_statistic. Probably only setting n_distinct, i.e. set it "low" if the inner column is not selective within the context of a tenant and "high" otherwise. For various reasons #5 is also not an option. And of course the same set of questions comes up with joins. Thanks. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
I have since moved the WAL to its own EBS volume (ext4, data=writeback) to make it easier to monitor IO. The times where the SELECTs slow down coincide with heavy write traffic to the WAL volume. Maybe this has to do with WALInsertLock or WALWriteLock (or some other lock). Since the slowdown was less severe with WAL on its own volume it seems some exclusive lock on the pages in shared_buffers is held while WAL IO is in progres(?) - that would be "frustrating". (wal_buffers default to 16mb in my setup) Next I am going to have a look at the code. I would be thankful for any further insights, though :) Thanks. -- Lars On 07/07/2011 04:56 PM, lars wrote: I am doing some research that will hopefully lead to replacing a big Oracle installation with a set PostgreSQL servers. The current Oracle installations consists of multiple of RAC clusters with 8 RAC nodes each. Each RAC node has 256gb of memory (to be doubled soon). The nature of our service is such that over a reasonable time (a day or so) the database *is* the working set. So I am looking at Postgres in a context where (almost) all of the data is cached and disk IO is only required for persistence. Setup: PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux instance (kernel 2.6.35) with the database and WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) - yes that is not an ideal setup (WAL should be on separate drive, EBS is slow to begin, etc), but I am mostly interested in read performance for a fully cached database. shared_buffers: varied between 1gb and 20gb checkpoint_segments/timeout: varied accordingly between 16-256 and 5-10m, resp. bgwriter tweaked to get a good distribution of checkpoints, bg-writes, and backend writes. wal_sync_method: tried fdatasync and open_datasync. I read "PostgreSQL 9.0 high performance", and have spent some significant amount of time on this already. PostgreSQL holds up extremely well, once things like "storing hint-bits", checkpoints vs bgwriter vs backend_writes, etc are understood. I installed pg_buffercache and pgfincore to monitor how and where the database is stored. There is one observation that I wasn't able to explain: A SELECT only client is severely slowed down by a concurrent client performing UPDATES on the same table the other client selects from, even when the database resides 100% in the cache (I tried with shared_buffers large enough to hold the database, and also with a smaller setting relying on the OS cache, the behavior is the same). As long as only the reader is running I get great performance (20-30ms, query reading a random set of about 1 rows out of 100m row table in a single SELECT). The backend is close to 100% cpu, which is what want in a cached database. Once the writer starts the read performance drops almost immediately to >200ms. The reading backend's cpu drop drop to <10%, and is mostly waiting (D state in top). The UPDATE touches a random set of also about 1 rows (in one update statement, one of the columns touched is indexed - and that is the same index used for the SELECTs). What I would have expected is that the SELECTs would just continue to read from the cached buffers (whether dirtied or not) and not be affected by concurrent updates. I could not find anything explaining this. The most interesting part: that this does not happen with an exact clone of that relation but UNLOGGED. The same amount of buffers get dirty, the same amount checkpointing, bgwriting, vacuuming. The only difference is WAL maintenance as far as I can tell. Is there some (intentional or not) synchronization between backend when the WAL is maintained? Are there times when read only query needs to compete disk IO when everything is cached? Or are there any other explanations? I am happy to provide more information. Although I am mainly looking for a qualitative answer, which could explain this behavior. Thanks. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] UPDATEDs slowing SELECTs in a fully cached database
I am doing some research that will hopefully lead to replacing a big Oracle installation with a set PostgreSQL servers. The current Oracle installations consists of multiple of RAC clusters with 8 RAC nodes each. Each RAC node has 256gb of memory (to be doubled soon). The nature of our service is such that over a reasonable time (a day or so) the database *is* the working set. So I am looking at Postgres in a context where (almost) all of the data is cached and disk IO is only required for persistence. Setup: PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux instance (kernel 2.6.35) with the database and WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) - yes that is not an ideal setup (WAL should be on separate drive, EBS is slow to begin, etc), but I am mostly interested in read performance for a fully cached database. shared_buffers: varied between 1gb and 20gb checkpoint_segments/timeout: varied accordingly between 16-256 and 5-10m, resp. bgwriter tweaked to get a good distribution of checkpoints, bg-writes, and backend writes. wal_sync_method: tried fdatasync and open_datasync. I read "PostgreSQL 9.0 high performance", and have spent some significant amount of time on this already. PostgreSQL holds up extremely well, once things like "storing hint-bits", checkpoints vs bgwriter vs backend_writes, etc are understood. I installed pg_buffercache and pgfincore to monitor how and where the database is stored. There is one observation that I wasn't able to explain: A SELECT only client is severely slowed down by a concurrent client performing UPDATES on the same table the other client selects from, even when the database resides 100% in the cache (I tried with shared_buffers large enough to hold the database, and also with a smaller setting relying on the OS cache, the behavior is the same). As long as only the reader is running I get great performance (20-30ms, query reading a random set of about 1 rows out of 100m row table in a single SELECT). The backend is close to 100% cpu, which is what want in a cached database. Once the writer starts the read performance drops almost immediately to >200ms. The reading backend's cpu drop drop to <10%, and is mostly waiting (D state in top). The UPDATE touches a random set of also about 1 rows (in one update statement, one of the columns touched is indexed - and that is the same index used for the SELECTs). What I would have expected is that the SELECTs would just continue to read from the cached buffers (whether dirtied or not) and not be affected by concurrent updates. I could not find anything explaining this. The most interesting part: that this does not happen with an exact clone of that relation but UNLOGGED. The same amount of buffers get dirty, the same amount checkpointing, bgwriting, vacuuming. The only difference is WAL maintenance as far as I can tell. Is there some (intentional or not) synchronization between backend when the WAL is maintained? Are there times when read only query needs to compete disk IO when everything is cached? Or are there any other explanations? I am happy to provide more information. Although I am mainly looking for a qualitative answer, which could explain this behavior. Thanks. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very long updates very small tables
On 03/30/2011 06:54 PM, Kevin Grittner wrote: Lars Feistner wrote: On 03/29/2011 09:28 PM, Kevin Grittner wrote: Lars Feistner wrote: The log tells me that certain update statements take sometimes about 3-10 minutes. But we are talking about updates on tables with 1000 to 1 rows and updates that are supposed to update 1 row. The top possibilities that come to my mind are: [all eliminated as possibilities] If you haven't already done so, you should probably turn on checkpoint logging to see if this corresponds to checkpoint activity. If it does, you can try cranking up how aggressive your background writer is, and perhaps limiting your shared_buffers to something around the size of your RAID controller's BBU cache. (I hope you have a RAID controller with BBU cache configured for write-back, anyway.) -Kevin Hello Kevin, i am sorry to disappoint you here. As I said in my first E-Mail we don't have much traffic and the database fits easily into memory. The traffic might increase, at least it was increasing the last 12 months. The database will always fit into memory. No, we don't have a raid and thus we don't have a bbu. Actually we started off with a big SAN that our data centre offered. But sometimes this SAN was a bit slow and when we first encountered the very long updates i thought there was a connection between the long running updates and the slowliness of the SAN, so i started to use the local disk (we are talking about one disk not disks) for the database. I am still seeing the long running inserts and updates. I am still following the auto vacuum trail, it does still not run frequently enough. Thanks a lot for the replies so far. I will keep you guys informed about my next steps and the results. Thanx a lot Lars -- ~~~ Lars Feistner Kompetenzzentrum für Prüfungen in der Medizin Medizinische Fakultät Heidelberg, Im Neuenheimer Feld 346, Raum 013 69120 Heidelberg E-Mail: feist...@uni-heidelberg.de Fon: +49-6221-56-8269 Fax: +49-6221-56-7175 WWW: http://www.ims-m.de http://www.kompmed.de ~~~ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very long updates very small tables
Hello Kevin, On 03/29/2011 09:28 PM, Kevin Grittner wrote: Lars Feistner wrote: The log tells me that certain update statements take sometimes about 3-10 minutes. But we are talking about updates on tables with 1000 to 1 rows and updates that are supposed to update 1 row. The top possibilities that come to my mind are: (1) The tables are horribly bloated. If autovacuum is off or not aggressive enough, things can degenerate to this level. Some tables are auto vacuumed regularly others are not. The specific table extjs_recentlist was never autovacuumed. So i would think that updates on this table should be always very slow, but they are not. Only every 4 or 5th day for maybe half an hour and then everything is fine again. And;-) there is no anti virus installed. (2) Memory is over-committed and your machine is thrashing. We can rule this out. There is enough memory installed and the database is less than 500MB. (3) There are explicit LOCK commands in the software which is contributing to the blocking. We use the the jdbc driver. The jdbc driver might do some locking but we don't. (4) There is some external delay within the transaction, such as waiting for user input while the transaction is open. No, no user interaction within a transaction. Maybe there's a combination of the above at play. Can you rule any of these out? -Kevin So, i will try to get the autovacuum to be more aggressive and will report again if nothing changes. Thanks a lot. Lars -- ~~~ Lars Feistner Kompetenzzentrum für Prüfungen in der Medizin Medizinische Fakultät Heidelberg, Im Neuenheimer Feld 346, Raum 013 69120 Heidelberg E-Mail: feist...@uni-heidelberg.de Fon: +49-6221-56-8269 Fax: +49-6221-56-7175 WWW: http://www.ims-m.de http://www.kompmed.de ~~~ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] very long updates very small tables
Dear list, we have got a web application and when people log in some information is written to the user tables. We have around 500 active users, but at the most 5 people are logged in at the same time. From times to times people log in and then the application is not responsive any more. What we see in the postgres server logs is that processes are waiting for other transactions to finish though not because of a deadlock. The log tells me that certain update statements take sometimes about 3-10 minutes. But we are talking about updates on tables with 1000 to 1 rows and updates that are supposed to update 1 row. We are running under windows 2008 and postgres 8.4.7. ( Sorry for the windows, it was not MY first choice ) My only explanation at the moment would be, that there must be any kind of windows process that stops all other processes until it is finished or something like that. ( Could it even be autovaccuum? ). Is there a way to find out how long autovaccum took ? Has anyone seen anything similiar? Or could it really be that we need a bigger machine with more io? But the one disk in the system still seems not very busy and response times in windows resource monitor are not higher than 28 ms. Following is an excerpt of our server log. LOG: process 1660 acquired ShareLock on transaction 74652 after 533354.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4984 acquired ShareLock on transaction 74652 after 1523530.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 956 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 383055.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4312 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 1422677.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 1523567.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 533391.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 5504 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 183216.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 1524 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 376370.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 1422688.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 383067.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 'f', $2 = '31' LOG: process 4532 acquired ExclusiveLock on tuple (9,112) of relation 16412 of database 16384 after 118851.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: process 4448 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 366304.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 183241.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: duration: 376395.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 4204 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 339893.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 366342.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: process 4760 acquired ExclusiveLock on tuple (4,188) of relation 16412 of database 16384 after 205943.000 ms STATEMENT: UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 LOG: duration: 339923.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: duration: 205963.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '31' LOG: duration: 124654.000 ms execute : UPDATE extjs_recentlist SET visible=$1 WHERE recentlist_id=$2 DETAIL: parameters: $1 = 't', $2 = '1362' LOG: process 3844 still waiting for ShareLock on transaction 74839 after 8000.000 ms Thanx in advance. Lars -- ~~~ Lars Feistner Kompet
Re: [PERFORM] Migrating to Postgresql and new hardware
Interesting. Would have been nice if the test was with a raid-10 setup as raid-5 is not very good for writes... Would you get much of a performance increase with a write-cached ssd even if you got a raid controller with (battery-backed) cache? /Lars -Ursprungligt meddelande- Från: Greg Smith [mailto:g...@2ndquadrant.com] Skickat: den 29 januari 2011 01:27 Till: Lars Kopia: mark; pgsql-performance@postgresql.org Ämne: Re: [PERFORM] Migrating to Postgresql and new hardware Lars wrote: > Below is a quote from the Pliant datasheet: > "No Write Cache: > Pliant EFDs deliver outstanding > write performance > without any dependence on > write cache and thus does > not use battery/supercap." > I liked the article The Register wrote about them, with the headline "Pliant's SSDs are awesome, says Pliant". Of course they do. Check out the write benchmark figures in the information review at http://oliveraaltonen.com/2010/09/29/preliminary-benchmark-results-of-the-pliant-ssd-drives/ to see how badly performance suffers on their design from those decisions. The Fusion I/O devices get nearly an order of magnitude more write IOPS in those tests. As far as I've been able to tell, what Pliant does is just push writes out all the time without waiting for them to be aligned with block sizes, followed by cleaning up the wreckage later via their internal automatic maintenance ASICs (it's sort of an always on TRIM implementation if I'm guessing right). That has significant limitations both in regards to total write speed as well as device longevity. For a database, I'd much rather have a supercap and get ultimate write performance without those downsides. Depends on the read/write ratio though; I could see a heavily read-biased system work well with their approach. Of course, a heavily read-based system would be better served by having a ton of RAM instead in most cases. Could be worst though--they could be misleading about the whole topic of write durability like Intel is. I consider claiming high performance when you don't always really have it, what Pliant is doing here, to be a much lesser sin than losing data at random and not being clear about when that can happen. I'd like FusionIO to put a big "expect your server to be down for many minutes after a power interruption" warning on their drives, too, while I'm wishing for complete vendor transparency here. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Migrating to Postgresql and new hardware
Thanks for the reply! MyISAM was chosen back in 2000. I'm not aware of the reasoning behind this choice... Dell claims both the Samsung and the Pliant are safe to use. Below is a quote from the Pliant datasheet: "No Write Cache: Pliant EFDs deliver outstanding write performance without any dependence on write cache and thus does not use battery/supercap." > As others have mentioned, how are you going to be doing your "shards"? Hmm... shards might not have been a good word to describe it. I'll paste what I wrote in another reply: I used sharding as an expression for partitioning data into several databases. Each user in the system is unaware of any other user. The user never accesses the private data of another user. Each user could in theory be assigned their own database server. This makes it easy to split the 4 users over a number of database servers. There are some shared data that is stored in a special "shared" database. /Lars -Ursprungligt meddelande- Från: mark [mailto:dvlh...@gmail.com] Skickat: den 19 januari 2011 05:10 Till: Lars Kopia: pgsql-performance@postgresql.org Ämne: RE: [PERFORM] Migrating to Postgresql and new hardware Comments in line, take em for what you paid for em. > -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Lars > Sent: Tuesday, January 18, 2011 3:57 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Migrating to Postgresql and new hardware > > Hi, > > We are in the process of moving a web based application from a MySql to > Postgresql database. > Our main reason for moving to Postgresql is problems with MySql > (MyISAM) table locking. I would never try and talk someone out of switching but MyISAM? What version of MySQL and did you pick MyISAM for a good reason or just happened to end up there? > We will buy a new set of servers to run the Postgresql databases. > > The current setup is five Dell PowerEdge 2950 with 2 * XEON E5410, 4GB > RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3 in RAID 5 + 1 spare). > > One server is used for shared data. > Four servers are used for sharded data. A user in the system only has > data in one of the shards. > There is another server to which all data is replicated but I'll leave > that one out of this discussion. > These are dedicated database servers. There are more or less no stored > procedures. The shared database size is about 20GB and each shard > database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect > the size will grow 10%-15% this year. Server load might increase with > 15%-30% this year. This setup is disk I/O bound. The overwhelming > majority of sql statements are fast (typically single row selects, > updates, inserts and deletes on primary key) but there are some slow > long running (10min) queries. > > As new server we are thinking of PowerEdge R510, 1 * Xeon X5650, 24Gb > RAM, H700 512MB NV Cache. One would think you should notice a nice speed improvement, ceteris paribus, since the X5650 will have ->significantly<- more memory bandwidth than the 5410s you are used to, and you are going to have a heck of a lot more ram for things to cache in. I think the H700 is a step up in raid cards as well but with only 4 disks your probably not maxing out there. > Dell has offered two alternative SSDs: > Samsung model SS805 (100GB Solid State Disk SATA 2.5"). > (http://www.plianttechnology.com/lightning_lb.php) > Pliant model LB 150S (149GB Solid State Drive SAS 3Gbps 2.5"). > (http://www.samsung.com/global/business/semiconductor/products/SSD/Prod > ucts_Enterprise_SSD.html) The Samsung ones seems to indicate that they have protection in the event of a power failure, and the pliant does not mention it. Granted I haven't done or seen any pull the plug under max load tests on either family, so I got nothing beyond that it is the first thing I have looked at with every SSD that crosses my path. > > Both are SLC drives. The price of the Pliant is about 2,3 times the > price of the Samsung (does it have twice the performance?). > > One alternative is 5 servers (1 shared and 4 shards) with 5 Samsung > drives (4 in RAID 10 + 1 spare). > Another alternative would be 3 servers (1 shared and 2 shards) with 5 > Pliant drives (4 in RAID 10 + 1 spare). This would be slightly more > expensive than the first alternative but would be easier to upgrade > with two new shard servers when it's needed. As others have mentioned, how are you going to be doing your "shards"? > > Anyone have experience using the Samsung or the Pliant SSD? Any > information about degraded performance over time? > Any comments on the setups? How would an alternative with 1
Re: [PERFORM] Migrating to Postgresql and new hardware
> Are you going to RAID the SSD drives at all? Yes, I was thinking four drives in RAID 10 and a (hot) spare drive... > Of course this is based on my experience, and I have my fireproof suit since > I mentioned the word fusionIO :) Hehe FusionIO has some impressive stats! SSD in RAID10 provides redundancy in case of disc failure. How do you handle this with fusionIO? Two mirrored cards? /Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Migrating to Postgresql and new hardware
> No idea what mysql thinks a shard is, but in PG we have read-only hot > standby's. I used sharding as an expression for partitioning data into several databases. Each user in the system is unaware of any other user. The user never accesses the private data of another user. Each user could in theory be assigned their own database server. This makes it easy to split the 4 users over a number of database servers. There are some shared data that is stored in a special "shared" database. > The standby database is exactly the same as the master (save a bit of > data that has not been synced yet.) I assume you know this... but I'd > really recommend trying out PG's hot-standby and make sure it works the > way you need (because I bet its different than mysql's). > Assuming the "shared" and the "sharded" databases are totally different > (lets call them database a and c), with the PG setup you'd have database > a on one computer, then one master with database b on it (where all > writes go), then several hot-standby's mirroring database b (that > support read-only queries). As our data is easily partitioned into any number of servers we do not plan to use replication for load balancing. We do however plan to use it to set up a backup site. > Its pretty hard to guess what your usage pattern is (70% read, > small columns, no big blobs (like photos), etc)... and even then we'd > still have to guess. It's more like 40% read 60% write. > Not only will I not compare apples to oranges, but I really wont compare > apples in Canada to oranges in Japan. :-) Hehe /Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Migrating to Postgresql and new hardware
Hi, We are in the process of moving a web based application from a MySql to Postgresql database. Our main reason for moving to Postgresql is problems with MySql (MyISAM) table locking. We will buy a new set of servers to run the Postgresql databases. The current setup is five Dell PowerEdge 2950 with 2 * XEON E5410, 4GB RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3 in RAID 5 + 1 spare). One server is used for shared data. Four servers are used for sharded data. A user in the system only has data in one of the shards. There is another server to which all data is replicated but I'll leave that one out of this discussion. These are dedicated database servers. There are more or less no stored procedures. The shared database size is about 20GB and each shard database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect the size will grow 10%-15% this year. Server load might increase with 15%-30% this year. This setup is disk I/O bound. The overwhelming majority of sql statements are fast (typically single row selects, updates, inserts and deletes on primary key) but there are some slow long running (10min) queries. As new server we are thinking of PowerEdge R510, 1 * Xeon X5650, 24Gb RAM, H700 512MB NV Cache. Dell has offered two alternative SSDs: Samsung model SS805 (100GB Solid State Disk SATA 2.5"). (http://www.plianttechnology.com/lightning_lb.php) Pliant model LB 150S (149GB Solid State Drive SAS 3Gbps 2.5"). (http://www.samsung.com/global/business/semiconductor/products/SSD/Products_Enterprise_SSD.html) Both are SLC drives. The price of the Pliant is about 2,3 times the price of the Samsung (does it have twice the performance?). One alternative is 5 servers (1 shared and 4 shards) with 5 Samsung drives (4 in RAID 10 + 1 spare). Another alternative would be 3 servers (1 shared and 2 shards) with 5 Pliant drives (4 in RAID 10 + 1 spare). This would be slightly more expensive than the first alternative but would be easier to upgrade with two new shard servers when it's needed. Anyone have experience using the Samsung or the Pliant SSD? Any information about degraded performance over time? Any comments on the setups? How would an alternative with 15K disks (6 RAID 10 + 1 spare, or even 10 RAID10 + 1 spare) compare? How would these alternatives compare in I/O performance compared to the old setup? Anyone care to guess how the two alternatives would compare in performance running Postgresql? How would the hardware usage of Postgresql compare to MySqls? Regards /Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2 Jan 2007, at 14:54, Ragnar wrote: On þri, 2007-01-02 at 09:04 -0500, Geoffrey wrote: Alvaro Herrera wrote: Actually it has been suggested that a combination of ext2 (for WAL) and ext3 (for data, with data journalling disabled) is a good performer. AFAIK you don't want the overhead of journalling for the WAL partition. I'm curious as to why ext3 for data with journalling disabled? Would that not be the same as ext2? I believe Alvaro was referring to ext3 with journalling enabled for meta-data, but not for data. I also believe this is the standard ext3 configuration, but I could be wrong on that. gnari it doesn't really belong here but ext3 has data journaled (data and meta data) ordered (meta data journald but data written before meta data (default)) journald (meta data only journal) modes. The performance differences between ordered and meta data only journaling should be very small enyway - -- Viele Grüße, Lars Heidieker [EMAIL PROTECTED] http://paradoxon.info - Mystische Erklärungen. Die mystischen Erklärungen gelten für tief; die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind. -- Friedrich Nietzsche -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Darwin) iD8DBQFFmnJUcxuYqjT7GRYRApNrAJ9oYusdw+Io4iSZrEITTbFy2qDA4QCgmBW5 7cpQZmlIv61EF2wP2yNXZhA= =glwc -END PGP SIGNATURE- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate