Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 7/29/11, lars hofhansl wrote: > Thanks Pavan! > > > I think the most important points are still that: > 1. The WAL write should be happening asynchronously (if that is possible) I think it is agreed that this is a "todo"; but since you reported that turning off synchronous commit did not improve performance, it is not directly relevant for you. > 2. There should be an option do not perform these compactions if the page is > only touched by reads. If the page is only touched by reads, there would be nothing to compact. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Thanks Pavan! I think the most important points are still that: 1. The WAL write should be happening asynchronously (if that is possible) 2. There should be an option do not perform these compactions if the page is only touched by reads. (Assuming that when most of the databaseresides in the cache these optimizations are less important.) -- Lars - Original Message - From: Pavan Deolasee To: Merlin Moncure Cc: lars ; Kevin Grittner ; Ivan Voras ; pgsql-performance@postgresql.org Sent: Wednesday, July 27, 2011 7:15 AM Subject: Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database On Wed, Jul 13, 2011 at 10:52 AM, Merlin Moncure wrote: ... There are couple of other (very important) things that HOT does, but probably its not advertised a lot. Even for non-HOT updates (which means either indexed columns were changed or page ran out of free space) or deletes, HOT prunes those tuples and instead mark the line pointer as DEAD. The page is defragmented and dead space is recovered. Each such dead tuple now only consumes two bytes in the page until vacuum removes the dead line pointers. Thats the reason why OP is seeing the behavior even when index columns are being updated. We made a few adjustments to ensure that a page is not pruned too early. So we track the oldest XID that did any updates/deletes to the page and attempt pruning only when the RecentXmin is past the XID. We also mark the page as "full" if some previous update did not find enough free space to do in-block update and use that hint to decide if we should attempt to prune the page. Finally, we prune only if we get the cleanup lock without blocking. What might be worth looking at this condition in pruneheap.c: /* * We prune when a previous UPDATE failed to find enough space on the page * for a new tuple version, or when free space falls below the relation's * fill-factor target (but not less than 10%). * * Checking free space here is questionable since we aren't holding any * lock on the buffer; in the worst case we could get a bogus answer. It's * unlikely to be *seriously* wrong, though, since reading either pd_lower * or pd_upper is probably atomic. Avoiding taking a lock seems more * important than sometimes getting a wrong answer in what is after all * just a heuristic estimate. */ minfree = RelationGetTargetPageFreeSpace(relation, HEAP_DEFAULT_FILLFACTOR); minfree = Max(minfree, BLCKSZ / 10); if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree) { So if the free space in a page falls below the fill-factor or 10% of the block size, we would try to prune the page. We probably need to revisit this area and see if we need to tune HOT ever better. One option could be to see how much space we are going to free and carry out the operation only if its significant enough to justify the cost. I know we had done several benchmarking tests while HOT development, but the tuning mechanism still may not be perfect for all kinds of work loads and it would probably never be. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Wed, Jul 13, 2011 at 10:52 AM, Merlin Moncure wrote: > On Tue, Jul 12, 2011 at 6:15 PM, lars wrote: >> Back to the first case, here's an strace from the backend doing the select >> right after the updates. >> write(13, >> "f\320\1\0\1\0\0\0\273\0\0\0\0\340\27\22`\32\0\0002833!000"..., 2400256) >> = 2400256 > > On Wed, Jul 13, 2011 at 9:46 AM, Kevin Grittner > wrote: >> Code comments indicate that they expect the pruning to be a pretty >> clear win on multiple reads, although I don't know how much that was >> benchmarked. Jeff does raise a good point, though -- it seems odd >> that WAL-logging of this pruning would need to be synchronous. We >> support asynchronous commits -- why not use that feature > > Right -- here are my thoughts. notice the above is writing out 293 > pages. this is suggesting to me that Kevin is right and you've > identified a pattern where you are aggravating the page cleanup > facilities of HOT. What threw me off here (and perhaps bears some > additional investigation) is that early on in the report you were > claiming an update to an indexed field which effectively disables HOT. There are couple of other (very important) things that HOT does, but probably its not advertised a lot. Even for non-HOT updates (which means either indexed columns were changed or page ran out of free space) or deletes, HOT prunes those tuples and instead mark the line pointer as DEAD. The page is defragmented and dead space is recovered. Each such dead tuple now only consumes two bytes in the page until vacuum removes the dead line pointers. Thats the reason why OP is seeing the behavior even when index columns are being updated. We made a few adjustments to ensure that a page is not pruned too early. So we track the oldest XID that did any updates/deletes to the page and attempt pruning only when the RecentXmin is past the XID. We also mark the page as "full" if some previous update did not find enough free space to do in-block update and use that hint to decide if we should attempt to prune the page. Finally, we prune only if we get the cleanup lock without blocking. What might be worth looking at this condition in pruneheap.c: /* * We prune when a previous UPDATE failed to find enough space on the page * for a new tuple version, or when free space falls below the relation's * fill-factor target (but not less than 10%). * * Checking free space here is questionable since we aren't holding any * lock on the buffer; in the worst case we could get a bogus answer. It's * unlikely to be *seriously* wrong, though, since reading either pd_lower * or pd_upper is probably atomic. Avoiding taking a lock seems more * important than sometimes getting a wrong answer in what is after all * just a heuristic estimate. */ minfree = RelationGetTargetPageFreeSpace(relation, HEAP_DEFAULT_FILLFACTOR); minfree = Max(minfree, BLCKSZ / 10); if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree) { So if the free space in a page falls below the fill-factor or 10% of the block size, we would try to prune the page. We probably need to revisit this area and see if we need to tune HOT ever better. One option could be to see how much space we are going to free and carry out the operation only if its significant enough to justify the cost. I know we had done several benchmarking tests while HOT development, but the tuning mechanism still may not be perfect for all kinds of work loads and it would probably never be. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/16/2011 06:33 PM, Jeff Janes wrote: 2.6 ms for an fsync seems awfully quick. I wonder if EBS uses nonvolatile/battery-backed write cache, or if it just lies about fsync actually hitting disk. They have the right type of cache in there to make fsync quick, when you happen to be the lucky one to find it free of a write backlog. So the best case is much better than a typical spinning drive with no such cache. The worst case is in the 100ms+ range though on EBS. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/14/2011 08:47 AM, Tom Lane wrote: The implementation I was imagining was to define another bit in the info parameter for XLogInsert, say XLOG_NON_TRANSACTIONAL. This could be a high-order bit that would not go to disk. Anytime it was *not* set, XLogInsert would set a global boolean that would remember that the current transaction wrote a transactional WAL record. This is the right default since the vast majority of call sites are writing records that we would want to have flushed at commit. There are just a couple of places that would need to be changed to add this flag to their calls. regards, tom lane If you have a patch in mind I'm happy to test it on my setup and report back. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Fri, Jul 15, 2011 at 5:21 PM, lars wrote: > On 07/14/2011 04:03 PM, Jeff Janes wrote: >> >> On Wed, Jul 13, 2011 at 3:41 PM, lars wrote: >>> >>> On 07/13/2011 11:42 AM, Kevin Grittner wrote: So transactions without an XID *are* sensitive to synchronous_commit. That's likely a useful clue. How much did it help the run time of the SELECT which followed the UPDATE? >>> >>> It has surprisingly little impact on the SELECT side: >> >> If your fsync is truly fsyncing, it seems like it should have >> considerable effect. >> >> Could you strace with both -ttt and -T, with and without synchronous >> commit? >> >> Cheers, >> >> Jeff > > Ok, here we go: > > "Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 > <5.357152> > 1310774187.750791 gettimeofday({1310774187, 750809}, NULL) = 0 <0.22> > 1310774187.751023 lseek(12, 0, SEEK_END) = 329908224 <0.23> > 1310774187.751109 lseek(15, 0, SEEK_END) = 396607488 <0.22> > 1310774187.751186 lseek(18, 0, SEEK_END) = 534175744 <0.22> > 1310774187.751360 lseek(12, 0, SEEK_END) = 329908224 <0.23> > 1310774187.753389 brk(0x248e000) = 0x248e000 <0.26> > 1310774187.753953 brk(0x24ce000) = 0x24ce000 <0.23> > 1310774187.755158 brk(0x254e000) = 0x254e000 <0.24> > 1310774187.766605 brk(0x245) = 0x245 <0.000170> > 1310774187.766852 lseek(23, 4513792, SEEK_SET) = 4513792 <0.23> > 1310774187.766927 write(23, > "f\320\1\0\1\0\0\0\320\0\0\0\0\340D-\22\0\0\0\30@!0"..., 32768) = > 32768 <0.75> > 1310774187.767071 fdatasync(23) = 0 <0.002618> > 1310774187.769760 gettimeofday({1310774187, 769778}, NULL) = 0 <0.22> > 1310774187.769848 sendto(5, > "\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0>\0\2\0"..., > 232, 0, NULL, 0) = 232 <0.64> > 1310774187.769993 sendto(6, > "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., > 66, 0, NULL, 0) = 66 <0.000199> The total time for this is about 19 ms, but your previous example was around 35 ms. Is this reproducible? A change of set up between then and now? 2.6 ms for an fsync seems awfully quick. I wonder if EBS uses nonvolatile/battery-backed write cache, or if it just lies about fsync actually hitting disk. But anyway it looks like you aren't blocking much in system calls, and I don't think there are non-system-call ways to block, so the time is probably being spent in something CPU intensive. On my (physical) computer, synchronous_commit=off does eliminate the timing differences between the select immediately after the update and subsequent selects. So while I could reproduce timing differences that were superficially similar to yours, they seem to have some fundamentally different cause. Maybe the best way to figure out what is going on is to loop the update and the select in different processes, and use perf or oprof to profile just the select process (with and without the update running). It would also be good to know the timings without profiling turned on as well, to know how much the profiling is disturbing the timing. ... > Here's an example with more dirty rows (I basically let the updater run for > a while dirtying very many rows). I'm surprised that make that much of a difference. The select should only clean up blocks it actually visits, and updating more rows shouldn't change that very much. ... > 1310775407.027503 semop(229383, {{2, -1, 0}}, 1) = 0 <2.215137> ... > No WAL, but checkout that one expensive semop! 2s!! Is that reproducible, or is it just a one time anomaly? Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/14/2011 04:03 PM, Jeff Janes wrote: On Wed, Jul 13, 2011 at 3:41 PM, lars wrote: On 07/13/2011 11:42 AM, Kevin Grittner wrote: So transactions without an XID *are* sensitive to synchronous_commit. That's likely a useful clue. How much did it help the run time of the SELECT which followed the UPDATE? It has surprisingly little impact on the SELECT side: If your fsync is truly fsyncing, it seems like it should have considerable effect. Could you strace with both -ttt and -T, with and without synchronous commit? Cheers, Jeff Ok, here we go: "Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 <5.357152> 1310774187.750791 gettimeofday({1310774187, 750809}, NULL) = 0 <0.22> 1310774187.751023 lseek(12, 0, SEEK_END) = 329908224 <0.23> 1310774187.751109 lseek(15, 0, SEEK_END) = 396607488 <0.22> 1310774187.751186 lseek(18, 0, SEEK_END) = 534175744 <0.22> 1310774187.751360 lseek(12, 0, SEEK_END) = 329908224 <0.23> 1310774187.753389 brk(0x248e000)= 0x248e000 <0.26> 1310774187.753953 brk(0x24ce000)= 0x24ce000 <0.23> 1310774187.755158 brk(0x254e000)= 0x254e000 <0.24> 1310774187.766605 brk(0x245)= 0x245 <0.000170> 1310774187.766852 lseek(23, 4513792, SEEK_SET) = 4513792 <0.23> 1310774187.766927 write(23, "f\320\1\0\1\0\0\0\320\0\0\0\0\340D-\22\0\0\0\30@!0"..., 32768) = 32768 <0.75> 1310774187.767071 fdatasync(23) = 0 <0.002618> 1310774187.769760 gettimeofday({1310774187, 769778}, NULL) = 0 <0.22> 1310774187.769848 sendto(5, "\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0>\0\2\0"..., 232, 0, NULL, 0) = 232 <0.64> 1310774187.769993 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 <0.000199> (23 is the WAL fd) vs. "Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 <7.343720> 1310774306.978767 gettimeofday({1310774306, 978785}, NULL) = 0 <0.21> 1310774306.978994 lseek(12, 0, SEEK_END) = 330883072 <0.24> 1310774306.979080 lseek(15, 0, SEEK_END) = 397131776 <0.21> 1310774306.979157 lseek(18, 0, SEEK_END) = 534732800 <0.22> 1310774306.979332 lseek(12, 0, SEEK_END) = 330883072 <0.22> 1310774306.983096 brk(0x248e000)= 0x248e000 <0.26> 1310774306.983653 brk(0x24ce000)= 0x24ce000 <0.23> 1310774306.984667 brk(0x254e000)= 0x254e000 <0.23> 1310774306.996040 brk(0x245)= 0x245 <0.000168> 1310774306.996298 gettimeofday({1310774306, 996317}, NULL) = 0 <0.21> 1310774306.996388 sendto(5, "\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0>\0\2\0"..., 232, 0, NULL, 0) = 232 <0.78> 1310774306.996550 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 <0.000202> So the difference is only 2ms. The size of the WAL buffers written is on 32k, Here's an example with more dirty rows (I basically let the updater run for a while dirtying very many rows). "Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 <23.690018> 1310775141.398780 gettimeofday({1310775141, 398801}, NULL) = 0 <0.28> 1310775141.399018 lseek(12, 0, SEEK_END) = 372514816 <0.23> 1310775141.399105 lseek(15, 0, SEEK_END) = 436232192 <0.22> 1310775141.399185 lseek(18, 0, SEEK_END) = 573620224 <0.23> 1310775141.399362 lseek(12, 0, SEEK_END) = 372514816 <0.24> 1310775141.414017 brk(0x249)= 0x249 <0.28> 1310775141.414575 brk(0x24d)= 0x24d <0.25> 1310775141.415600 brk(0x255)= 0x255 <0.24> 1310775141.417757 semop(229383, {{0, -1, 0}}, 1) = 0 <0.24> ... 1310775141.448998 semop(229383, {{0, -1, 0}}, 1) = 0 <0.25> 1310775141.453134 brk(0x2452000)= 0x2452000 <0.000167> 1310775141.453377 fadvise64(22, 0, 0, POSIX_FADV_DONTNEED) = 0 <0.25> 1310775141.453451 close(22) = 0 <0.32> 1310775141.453537 open("pg_xlog/000100D100C2", O_RDWR) = 22 <0.59> 1310775141.453696 write(22, "f\320\3\0\1\0\0\0\321\0\0\0\0\0\0\3023\356\17N\23l\vN\0\0\0\1\0 \0\0"..., 5365760) = 5365760 <0.005991> 1310775141.459798 write(22, "f\320\1\0\1\0\0\0\321\0\0\0\0\340Q\302`\5\0\915!000"..., 9019392) = 9019392 <0.010062> 1310775141.469965 fdatasync(22) = 0 <0.231385> 1310775141.701424 semop(229383, {{2, 1, 0}}, 1) = 0 <0.31> 1310775141.702657 gettimeofday({1310775141, 702682}, NULL) = 0 <0.28> 1310775141.702765 sendto(5, "\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\r\201\0\0>\0\2\0"..., 232, 0, NULL, 0) = 232 <0.71> 1310775141.702942 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 <0.000220> vs "Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 <55.595425> 1310775406.842823 gettimeofday({1310775406, 842842}, NULL) = 0 <0.26> 1310775406.843092 lseek(12, 0, SEEK_END) = 3827
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Wed, Jul 13, 2011 at 3:41 PM, lars wrote: > On 07/13/2011 11:42 AM, Kevin Grittner wrote: >> >> So transactions without an XID *are* sensitive to >> synchronous_commit. That's likely a useful clue. >> >> How much did it help the run time of the SELECT which followed the >> UPDATE? > > It has surprisingly little impact on the SELECT side: If your fsync is truly fsyncing, it seems like it should have considerable effect. Could you strace with both -ttt and -T, with and without synchronous commit? Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Wed, Jul 13, 2011 at 11:10 AM, lars wrote: ... > => update test set created_by = '001' where tenant = > '001'; > UPDATE 3712 ... > > There seems to be definitely something funky going on. Since created_by is > indexed it shouldn't do any HOT logic. Once the update has been run once, further executions are degenerate (they replace the updated indexed column with the same value it already holds). The HOT code detects this and uses a HOT update in this case despite the apparent update of an indexed column. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Tom Lane wrote: > It seems like we ought to distinguish heap cleanup activities from > user-visible semantics (IOW, users shouldn't care if a HOT cleanup > has to be done over after restart, so if the transaction only > wrote such records there's no need to flush). This'd require more > process-global state than we keep now, I'm afraid. That makes sense, and seems like the right long-term fix. It seems like a boolean might do it; the trick would be setting it (or not) in all the right places. > Another approach we could take (also nontrivial) is to prevent > select-only queries from doing HOT cleanups. You said upthread > that there were alleged performance benefits from aggressive > cleanup, but IMO that can charitably be described as unproven. > The real reason it happens is that we didn't see a simple way for > page fetches to know soon enough whether a tuple update would be > likely to happen later, so they just do cleanups unconditionally. Hmm. One trivial change could be to skip it when the top level transaction is declared to be READ ONLY. At least that would give people a way to work around it for now. Of course, that can't be back-patched before 9.1 because subtransactions could override READ ONLY before that. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
"Kevin Grittner" writes: > Tom Lane wrote: >> It seems like we ought to distinguish heap cleanup activities from >> user-visible semantics (IOW, users shouldn't care if a HOT cleanup >> has to be done over after restart, so if the transaction only >> wrote such records there's no need to flush). This'd require more >> process-global state than we keep now, I'm afraid. > That makes sense, and seems like the right long-term fix. It seems > like a boolean might do it; the trick would be setting it (or not) > in all the right places. The implementation I was imagining was to define another bit in the info parameter for XLogInsert, say XLOG_NON_TRANSACTIONAL. This could be a high-order bit that would not go to disk. Anytime it was *not* set, XLogInsert would set a global boolean that would remember that the current transaction wrote a transactional WAL record. This is the right default since the vast majority of call sites are writing records that we would want to have flushed at commit. There are just a couple of places that would need to be changed to add this flag to their calls. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Thu, Jul 14, 2011 at 4:05 PM, Kevin Grittner wrote: > Tom Lane wrote: > >> It seems like we ought to distinguish heap cleanup activities from >> user-visible semantics (IOW, users shouldn't care if a HOT cleanup >> has to be done over after restart, so if the transaction only >> wrote such records there's no need to flush). This'd require more >> process-global state than we keep now, I'm afraid. > > That makes sense, and seems like the right long-term fix. It seems > like a boolean might do it; the trick would be setting it (or not) > in all the right places. I also believe this is the right way to go. I think the crucial thing is in "distinguish heap cleanup activities from user-visible semantics" - basically this is what happens with auto vacuum: it does work concurrently that you do not want to burden on user transactions. >> Another approach we could take (also nontrivial) is to prevent >> select-only queries from doing HOT cleanups. You said upthread >> that there were alleged performance benefits from aggressive >> cleanup, but IMO that can charitably be described as unproven. >> The real reason it happens is that we didn't see a simple way for >> page fetches to know soon enough whether a tuple update would be >> likely to happen later, so they just do cleanups unconditionally. > > Hmm. One trivial change could be to skip it when the top level > transaction is declared to be READ ONLY. At least that would give > people a way to work around it for now. Of course, that can't be > back-patched before 9.1 because subtransactions could override READ > ONLY before that. What I don't like about this approach is that it a) increases complexity for the user, b) might not be for everyone (i.e. tools like OR mappers which do not allow such setting of the TX or cases where you do not know what type of TX this is when you start it) and c) it still keeps the performance penalty to suddenly come to haunt a different TX. I can only speculate whether the latter might actually cause other people to run into issues because their usage patterns currently force the cleanout activities into an unimportant TX while the workaround would suddenly have the cleanout delay show up in an important TX which used to be fast. This is also hard to debug since you would normally only look at the slow TX before you realize you need to look elsewhere (the length of this thread is kind of proof of this already :-)). My 0.01 EUR... Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/13/2011 11:42 AM, Kevin Grittner wrote: So transactions without an XID *are* sensitive to synchronous_commit. That's likely a useful clue. How much did it help the run time of the SELECT which followed the UPDATE? It has surprisingly little impact on the SELECT side: => set synchronous_commit = on; => update test set created_by = '001' where tenant = '001'; UPDATE 3712 Time: 384.702 ms lars=> select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 36.571 ms => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 5.702 ms => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 5.822 ms => set synchronous_commit = off; SET Time: 0.145 ms => update test set created_by = '001' where tenant = '001'; UPDATE 3712 Time: 96.227 ms => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 32.422 ms => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) Time: 6.080 ms I tried it multiple times, and while the numbers change by 5-10ms the relationship is the same. The same results show when I use my JDBC code to run updates/selects as fast as possible. When synchronous_commit is off for the SELECTing process it seems to be slightly faster. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
"Kevin Grittner" writes: > Tom Lane wrote: >> Huh? If there was never an XID, there's no commit WAL record, >> hence nothing to make asynchronous. > If you look at the RecordTransactionCommit() function in xact.c > you'll see that's not correct. Oh, hmmm ... that code was written with the idea that things like sequence XLOG_SEQ_LOG records ought to be flushed to disk before reporting commit; otherwise you don't have a guarantee that the same sequence value couldn't be handed out again after crash/restart, in a transaction that just does something like SELECT nextval('myseq'); without any updates of regular tables. It seems like we ought to distinguish heap cleanup activities from user-visible semantics (IOW, users shouldn't care if a HOT cleanup has to be done over after restart, so if the transaction only wrote such records there's no need to flush). This'd require more process-global state than we keep now, I'm afraid. Another approach we could take (also nontrivial) is to prevent select-only queries from doing HOT cleanups. You said upthread that there were alleged performance benefits from aggressive cleanup, but IMO that can charitably be described as unproven. The real reason it happens is that we didn't see a simple way for page fetches to know soon enough whether a tuple update would be likely to happen later, so they just do cleanups unconditionally. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Tom Lane wrote: > Huh? If there was never an XID, there's no commit WAL record, > hence nothing to make asynchronous. If you look at the RecordTransactionCommit() function in xact.c you'll see that's not correct. Currently the commit record has nothing to do with whether it synchronizes on WAL writes. In particular, this section around line 1096 is where the choice is made: if ((wrote_xlog && synchronous_commit > SYNCHRONOUS_COMMIT_OFF) || forceSyncCommit || nrels > 0) { /* * Synchronous commit case: In the OP's test case, wrote_xlog is true, while forceSyncCommit is false and nrels == 0. It doesn't seem like commit of a read-only transaction should be a magical time for pruning WAL entries to hit the disk, so it probably would work to modify that "if" to not drop into the synchronous commit code if the transaction is explicitly declared READ ONLY or if it never acquired an XID; although there's likely some better way to deal with it. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Wed, Jul 13, 2011 at 1:10 PM, lars wrote: > On 07/13/2011 08:17 AM, Tom Lane wrote: >> >> "Kevin Grittner" writes: >>> >>> ... Jeff does raise a good point, though -- it seems odd >>> that WAL-logging of this pruning would need to be synchronous. >> >> Yeah, we need to get to the bottom of that. If there's enough >> shared_buffer space then it shouldn't be. > > This thread has gotten long, let me try to compile all the relevant > information in one email. > > \d test > Table "lars.test" > Column | Type | Modifiers > --+---+--- > tenant | character(15) | > created_by | character(15) | > created_date | date | small aside here: try to avoid use of character(n) type -- varchar(n) is superior in every way, including performance (although that has nothing to do with your WAL issues on this thread). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
lars wrote: > On 07/13/2011 07:46 AM, Kevin Grittner wrote: >> >> I've mentioned this in a hand-wavy general sense, but I should >> have mentioned specifics ages ago: for a database where the >> active portion of the database is fully cached, it is best to set >> seq_page_cost and random_page_cost to the same value, somewhere >> in the 0.1 to 0.05 range. (In your case I would use 0.05.) In >> highly cached databases I have sometimes also found it necessary >> to increase cpu_tuple_cost. (In your case I might try 0.02.) >> > I've been doing that for other tests already (I didn't want to add > too many variations here). > The Bitmap Heap scans through the table are only useful for > spinning media and not the cache (just to state the obvious). > > As an aside: I found that queries in a cold database take almost > twice as long when I make that change, > so for spinning media this is very important. No doubt. We normally run months to years between reboots, with most of our cache at the OS level. We don't have much reason to ever restart PostgreSQL except to install new versions. So we don't worry overly much about the cold cache scenario. >> Which raises an interesting question -- what happens to the >> timings if your SELECTs are done with synchronous_commit = off? > > Just tried that... > In that case the WAL is still written (as seen via iostat), but > not synchronously by the transaction (as seen by strace). So transactions without an XID *are* sensitive to synchronous_commit. That's likely a useful clue. How much did it help the run time of the SELECT which followed the UPDATE? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/13/2011 07:46 AM, Kevin Grittner wrote: I've mentioned this in a hand-wavy general sense, but I should have mentioned specifics ages ago: for a database where the active portion of the database is fully cached, it is best to set seq_page_cost and random_page_cost to the same value, somewhere in the 0.1 to 0.05 range. (In your case I would use 0.05.) In highly cached databases I have sometimes also found it necessary to increase cpu_tuple_cost. (In your case I might try 0.02.) I've been doing that for other tests already (I didn't want to add too many variations here). The Bitmap Heap scans through the table are only useful for spinning media and not the cache (just to state the obvious). As an aside: I found that queries in a cold database take almost twice as long when I make that change, so for spinning media this is very important. Which raises an interesting question -- what happens to the timings if your SELECTs are done with synchronous_commit = off? Just tried that... In that case the WAL is still written (as seen via iostat), but not synchronously by the transaction (as seen by strace). -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/13/2011 08:17 AM, Tom Lane wrote: "Kevin Grittner" writes: ... Jeff does raise a good point, though -- it seems odd that WAL-logging of this pruning would need to be synchronous. Yeah, we need to get to the bottom of that. If there's enough shared_buffer space then it shouldn't be. This thread has gotten long, let me try to compile all the relevant information in one email. \d test Table "lars.test" Column| Type | Modifiers --+---+--- tenant | character(15) | created_by | character(15) | created_date | date | Indexes: "i1" btree (tenant) "i11" btree (created_by) -- Table is populated like this: create table test(tenant char(15), created_by char(15), created_date date); insert into test values('x', 'y','2011-6-30'); insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; -- 256k rows update test set tenant = lpad((random()*1)::int::text,15,'0'), created_by = lpad((random()*1)::int::text,15,'0'); insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; -- 32m rows create index i1 on test(tenant); create index i11 on test(created_by); vacuum analyze; -- I doubt it needs that many rows. => SELECT 'version'::text AS "name", version() AS "current_setting" UNION ALL SELECT name,current_setting(name) FROM pg_settings WHERE NOT source='default' AND NOT name IN ('config_file','data_directory','hba_file','ident_file', 'log_timezone','DateStyle','lc_messages','lc_monetary', 'lc_numeric','lc_time','timezone_abbreviations', 'default_text_search_config','application_name', 'transaction_deferrable','transaction_isolation', 'transaction_read_only'); name | current_setting --+-- version | PostgreSQL 9.1beta2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.4 20100726 (Red Hat 4.4.4-13), 64-bit bgwriter_delay | 10ms bgwriter_lru_maxpages| 1000 checkpoint_completion_target | 0.9 checkpoint_segments | 128 client_encoding | UTF8 effective_cache_size | 64GB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 log_checkpoints | on log_line_prefix | %m maintenance_work_mem | 2GB max_connections | 100 max_stack_depth | 2MB server_encoding | UTF8 shared_buffers | 20GB TimeZone | UTC wal_buffers | 16MB work_mem | 1GB (19 rows) -- Now: -- => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) => SELECT c.relname, isdirty, count(*) * 8192 / 1024/1024 AS buffers FROM pg_buffercache b, pg_class c WHERE b.relfilenode = pg_relation_filenode(c.oid) AND b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) GROUP BY c.relname,isdirty ORDER BY 3 DESC LIMIT 6; relname| isdirty | buffers ---+-+- test | t | 14 pg_opclass_oid_index | f | 0 pg_rewrite| f | 0 i11 | t | 0 pg_rewrite_rel_rulename_index | f | 0 pg_constraint | f | 0 -- Just started the server, no nothing else is cached, yet -- it doesn't matter if that update is executed by the same or another backend. => update test set created_by = '001' where tenant = '001'; UPDATE 3712 => select count(*) from test where tenant = '001' and created_date = '2011-6-30'; count --- 3712 (1 row) strace now shows: - Q\0\0\0_select count(*) from test w"..., 8192, 0, NULL, NULL) = 96 gettimeofday({1310579341, 854669}, N
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
"Kevin Grittner" writes: > ... Jeff does raise a good point, though -- it seems odd > that WAL-logging of this pruning would need to be synchronous. Yeah, we need to get to the bottom of that. If there's enough shared_buffer space then it shouldn't be. > We > support asynchronous commits -- why not use that feature > automatically for transactions where the only writes are this sort > of thing. Which raises an interesting question -- what happens to > the timings if your SELECTs are done with synchronous_commit = off? > I wonder if it would make any sense to implicitly use async commit > for a transaction which is declared READ ONLY or which never > acquires and XID? Huh? If there was never an XID, there's no commit WAL record, hence nothing to make asynchronous. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Tue, Jul 12, 2011 at 6:15 PM, lars wrote: > Back to the first case, here's an strace from the backend doing the select > right after the updates. > write(13, > "f\320\1\0\1\0\0\0\273\0\0\0\0\340\27\22`\32\0\0002833!000"..., 2400256) > = 2400256 On Wed, Jul 13, 2011 at 9:46 AM, Kevin Grittner wrote: > Code comments indicate that they expect the pruning to be a pretty > clear win on multiple reads, although I don't know how much that was > benchmarked. Jeff does raise a good point, though -- it seems odd > that WAL-logging of this pruning would need to be synchronous. We > support asynchronous commits -- why not use that feature Right -- here are my thoughts. notice the above is writing out 293 pages. this is suggesting to me that Kevin is right and you've identified a pattern where you are aggravating the page cleanup facilities of HOT. What threw me off here (and perhaps bears some additional investigation) is that early on in the report you were claiming an update to an indexed field which effectively disables HOT. The fairly lousy I/O performance of EBS is further hurting you here: you have a very fast computer with lots of memory with a late 90's disk system in terms of performance. This means that AWS is not all that great for simulating load profiles unless you are also highly underweight I/O in your servers. Postgres btw demands (as does Oracle) a decent i/o system for many workloads that might be surprising. A note about HOT: there is no way to disable it (other than updating an indexed field to bypass it) -- HOT was a performance revolution for Postgres and numerous benchmarks as well as anecdotal reports have confirmed this. HOT mitigates the impact of dead tuples by 1. highly reducing index bloat under certain conditions and 2. allowing dead tuples to be more aggressively cleaned up -- a 'page level vacuum' if it were. HOT is an especially huge win when updates are frequent and transactions are small and shortbut maybe in your simulated case it's not helping. hm. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
[combining responses to two posts on this thread by lars] lars wrote: > On the face of it, though, this looks like Postgres would not be > that useful as database that resides (mostly) in the cache. I've mentioned this in a hand-wavy general sense, but I should have mentioned specifics ages ago: for a database where the active portion of the database is fully cached, it is best to set seq_page_cost and random_page_cost to the same value, somewhere in the 0.1 to 0.05 range. (In your case I would use 0.05.) In highly cached databases I have sometimes also found it necessary to increase cpu_tuple_cost. (In your case I might try 0.02.) This won't directly address the specific issue you've been investigating in this thread, but it will tend to give you faster plans for your actual environment without needing to fuss with things on a query-by-query basis. It may indirectly mitigate the problem at hand through heavier use of indexes which would reduce pruning and hint-bit setting by readers. > Interesting. When you did you test, did you also find WAL write > activity when running x the first time after y? I wasn't able to check that in this quick, ad hoc run. > Oh, it's just switched off for testing, so that I can control > when vacuum runs and make sure that it's not skewing the results > while I am measuring something. Again, that's an impulse I can certainly understand, but the problem is that turning autovacuum off skews results in other ways, such as forcing readers to do maintenance work which might otherwise be done in a cost-limited background process. Or if that didn't happen you would be constantly chasing through lots of dead line pointers which would hurt performance in another way. It's probably best to consider autovacuum an integral part of normal database operations and run benchmarks like this with it operational. This will also give you an opportunity to tune thresholds and costing factors to evaluate the impact that such adjustments have on potential workloads. > For a fully cached database I would probably want to switch off > HOT pruning and compaction (which from what we see is done > synchronously with the select) and leave it up to the asynchronous > auto vacuum to do that. But maybe I am still not quite > understanding the performance implications. Code comments indicate that they expect the pruning to be a pretty clear win on multiple reads, although I don't know how much that was benchmarked. Jeff does raise a good point, though -- it seems odd that WAL-logging of this pruning would need to be synchronous. We support asynchronous commits -- why not use that feature automatically for transactions where the only writes are this sort of thing. Which raises an interesting question -- what happens to the timings if your SELECTs are done with synchronous_commit = off? I wonder if it would make any sense to implicitly use async commit for a transaction which is declared READ ONLY or which never acquires and XID? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
shared_buffers is big enough to hold the entire database, and there is plenty of extra space. (verified with PG_buffercache) So i don't think that is the reason. Tom Lane schrieb: >Jeff Janes writes: >> On 7/12/11, lars wrote: >>> The fact that a select (maybe a big analytical query we'll run) touching >>> many rows will update the WAL and wait >>> (apparently) for that IO to complete is making a fully cached database >>> far less useful. >>> I just artificially created this scenario. > >> I can't think of any reason that that WAL would have to be flushed >> synchronously. > >Maybe he's running low on shared_buffers? We would have to flush WAL >before writing a dirty buffer out, so maybe excessive pressure on >available buffers is part of the issue here. > > regards, tom lane > >-- >Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Jeff Janes writes: > On 7/12/11, lars wrote: >> The fact that a select (maybe a big analytical query we'll run) touching >> many rows will update the WAL and wait >> (apparently) for that IO to complete is making a fully cached database >> far less useful. >> I just artificially created this scenario. > I can't think of any reason that that WAL would have to be flushed > synchronously. Maybe he's running low on shared_buffers? We would have to flush WAL before writing a dirty buffer out, so maybe excessive pressure on available buffers is part of the issue here. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 7/12/11, lars wrote: > > > The fact that a select (maybe a big analytical query we'll run) touching > many rows will update the WAL and wait > (apparently) for that IO to complete is making a fully cached database > far less useful. > I just artificially created this scenario. I can't think of any reason that that WAL would have to be flushed synchronously. There is already code that makes transactions that only have certain kinds of "maintenance" WAL to skip the flush. Could this pruning WAL be added to that group? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 02:38 PM, Merlin Moncure wrote: Something is not adding up here. Perhaps there is an alternate route to WAL logged activity from selects I'm not thinking of. Right now I'm thinking to run the selects on table 'a' and the inserts concurrently on table 'b' and seeing how that behaves. Another way to get to the bottom is to oprofile the selecting-during-load backend to see where the time is getting spent. Alternative way to do this is to strace attach to the selecting-during-load backend to see if it's really writing to the WAL (I'm really curious about this). Another interesting test would be to try and reproduce the results on native machine. It should be possible to do this on your workstation with a more modestly sized scaling factor. merlin Just tried with two of my test tables. Updates on 'a' have no (measurable) effect on select from 'b'. Back to the first case, here's an strace from the backend doing the select right after the updates. "Q\0\0\0`select count(*) from test1 "..., 8192, 0, NULL, NULL) = 97 gettimeofday({1310512219, 723762}, NULL) = 0 open("base/16385/33032", O_RDWR)= 8 lseek(8, 0, SEEK_END) = 1073741824 open("base/16385/33032.1", O_RDWR|O_CREAT, 0600) = 9 lseek(9, 0, SEEK_END) = 1073741824 open("base/16385/33032.2", O_RDWR|O_CREAT, 0600) = 10 lseek(10, 0, SEEK_END) = 191348736 open("base/16385/33035", O_RDWR)= 11 lseek(11, 0, SEEK_END) = 1073741824 open("base/16385/33035.1", O_RDWR|O_CREAT, 0600) = 12 lseek(12, 0, SEEK_END) = 3571712 lseek(10, 0, SEEK_END) = 191348736 brk(0x28ad000) = 0x28ad000 mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5f28ca mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5f28c5f000 munmap(0x7f5f28c5f000, 266240) = 0 munmap(0x7f5f28ca, 135168) = 0 open("pg_xlog/000100BB0012", O_RDWR) = 13 lseek(13, 1564672, SEEK_SET)= 1564672 write(13, "f\320\1\0\1\0\0\0\273\0\0\0\0\340\27\22`\32\0\0002833!000"..., 2400256) = 2400256 fdatasync(13) = 0 semop(229383, {{9, 1, 0}}, 1) = 0 gettimeofday({1310512219, 885287}, NULL) = 0 sendto(5, "\2\0\0\0\300\3\0\0\1@\0\0\t\0\0\0\1\0\0\0\0\0\0\0\353\4\0\0@\0\2\0"..., 960, 0, NULL, 0) = 960 sendto(5, "\2\0\0\0\300\3\0\0\1@\0\0\t\0\0\0\0\0\0\0\0\0\0\0009\n\0\0@\0\2\0"..., 960, 0, NULL, 0) = 960 sendto(5, "\2\0\0\0\300\3\0\0\1@\0\0\t\0\0\0\0\0\0\0\0\0\0\0v\n\0\0@\0\2\0"..., 960, 0, NULL, 0) = 960 sendto(5, "\2\0\0\0\270\1\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\275\4\0\0\377\177\0\0"..., 440, 0, NULL, 0) = 440 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 So the backend definitely writing to the WAL, directly and synchronously. Selecting the same set of rows again: "Q\0\0\0`select count(*) from test1 "..., 8192, 0, NULL, NULL) = 97 gettimeofday({1310512344, 823728}, NULL) = 0 lseek(10, 0, SEEK_END) = 191348736 lseek(12, 0, SEEK_END) = 3571712 lseek(10, 0, SEEK_END) = 191348736 brk(0x28d5000) = 0x28d5000 brk(0x2915000) = 0x2915000 brk(0x2897000) = 0x2897000 gettimeofday({1310512344, 831043}, NULL) = 0 sendto(5, "\2\0\0\0\350\0\0\0\1@\0\0\2\0\0\0\1\0\0\0\0\0\0\0\10\201\0\0?\0\2\0"..., 232, 0, NULL, 0) = 232 sendto(6, "T\0\0\0\36\0\1count\0\0\0\0\0\0\0\0\0\0\24\0\10\377\377\377\377\0\0D"..., 66, 0, NULL, 0) = 66 No writing to the WAL. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 02:51 PM, Kevin Grittner wrote: I ran x a bunch of times to get a baseline, then y once, then x a bunch more times. The results were a bit surprising: cir=> \timing Timing is on. cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 9.823 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 8.481 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 14.054 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 10.169 ms cir=> execute y('001', '001','2011-6-30'); UPDATE 3456 Time: 404.244 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 128.643 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.657 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 5.883 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.645 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.753 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.253 ms Interesting. When you did you test, did you also find WAL write activity when running x the first time after y? (It's very hard to catch in only a single query, though). Running the update made the next SELECT slow, then it was much *faster*. My best guess is that the data landed in a more concentrated set of pages after the update, and once autovacuum kicked in and cleaned things up it was able to get to that set of data faster. autovacuum | off Well, certainly not while under modification without running autovacuum. That's disabling an integral part of what keeps performance up. Oh, it's just switched off for testing, so that I can control when vacuum runs and make sure that it's not skewing the results while I am measuring something. In a real database I would probably err on vacuuming more than less. For a fully cached database I would probably want to switch off HOT pruning and compaction (which from what we see is done synchronously with the select) and leave it up to the asynchronous auto vacuum to do that. But maybe I am still not quite understanding the performance implications. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
lars wrote: > vacuum analyze; I tried this out on a 16 core, 64 GB machine. It was a replication target for a few dozen source databases into a couple 2 TB reporting databases, and had some light testing going on, but it was only at about 50% capacity, so that shouldn't throw this off by *too* much, I hope. Since our data is long-lived enough to worry about transaction ID freezing issues, I always follow a bulk load with VACUUM FREEZE ANALYZE; so I did that here. I also just threw this into the 2 TB database without changing our configuration. Among other things, that means that autovacuum was on. > prepare x as select count(*) from test where tenant = $1 and > created_date = $2; > prepare y as update test set created_by = $1 where tenant = $2 and > created_date = $3; > > execute y('001', '001','2011-6-30'); > execute x('001','2011-6-30'); I ran x a bunch of times to get a baseline, then y once, then x a bunch more times. The results were a bit surprising: cir=> \timing Timing is on. cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 9.823 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 8.481 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 14.054 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 10.169 ms cir=> execute y('001', '001','2011-6-30'); UPDATE 3456 Time: 404.244 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 128.643 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.657 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 5.883 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.645 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.753 ms cir=> execute x('001','2011-6-30'); count --- 3456 (1 row) Time: 2.253 ms Running the update made the next SELECT slow, then it was much *faster*. My best guess is that the data landed in a more concentrated set of pages after the update, and once autovacuum kicked in and cleaned things up it was able to get to that set of data faster. > On the face of it, though, this looks like Postgres would not be > that useful as database that resides (mostly) in the cache. > autovacuum | off Well, certainly not while under modification without running autovacuum. That's disabling an integral part of what keeps performance up. There are very few, if any, situations where running PostgreSQL in production without autovacuum makes any sense, and benchmarks which disable it don't give a very accurate picture of typical performance. Now, if you're looking to artificially create a worst-case scenario, then it makes sense, but I'm not clear on the point of it. I do understand the impulse, though. When we first started using PostgreSQL there were certain very small tables which were updated very frequently which got slow when autovacuum kicked in. We made autovacuum less aggressive, and found that things go worse! Se we went the other way and made autovacuum much more aggressive than the defaults, and everything was fine. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Tue, Jul 12, 2011 at 2:08 PM, Kevin Grittner wrote: > lars wrote: > >> select count(*) from test where tenant = $1 and created_date = $2 > > Ah, that might be a major clue -- prepared statements. I'm really skeptical that this is the case -- the table is 100m and there is no way you are banging through 100m in 500ms records particularly if doing i/o. Also regarding the page cleanup, IIRC the optimization only takes place if the dead tuples are HOT -- when the OP opened the thread he stated it was happening with update on indexed field (and later tested it on unindexed field). Something is not adding up here. Perhaps there is an alternate route to WAL logged activity from selects I'm not thinking of. Right now I'm thinking to run the selects on table 'a' and the inserts concurrently on table 'b' and seeing how that behaves. Another way to get to the bottom is to oprofile the selecting-during-load backend to see where the time is getting spent. Alternative way to do this is to strace attach to the selecting-during-load backend to see if it's really writing to the WAL (I'm really curious about this). Another interesting test would be to try and reproduce the results on native machine. It should be possible to do this on your workstation with a more modestly sized scaling factor. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 01:04 PM, lars wrote: On 07/12/2011 12:08 PM, Kevin Grittner wrote: lars wrote: select count(*) from test where tenant = $1 and created_date = $2 Ah, that might be a major clue -- prepared statements. What sort of a plan do you get for that as a prepared statement? (Note, it is very likely *not* to be the same plan as you get if you run with literal values!) It is not at all unlikely that it could resort to a table scan if you have one tenant which is five or ten percent of the table, which would likely trigger the pruning as it passed over the modified pages. -Kevin So a read of a row *will* trigger dead tuple pruning, and that requires WAL logging, and this is known/expected? This is actually the only answer I am looking for. :) I have not seen this documented anywhere. I know that Postgres will generate general plans for prepared statements (how could it do otherwise?), I also know that it sometimes chooses a sequential scan. This can always be tweaked to touch fewer rows and/or use a different plan. That's not my objective, though! The fact that a select (maybe a big analytical query we'll run) touching many rows will update the WAL and wait (apparently) for that IO to complete is making a fully cached database far less useful. I just artificially created this scenario. ... Just dropped the table to test something so I can't get the plan right now. Will send an update as soon as I get it setup again. Thanks again. -- Lars Ok... Slightly changes the indexes: \d test Table "lars.test" Column| Type | Modifiers --+---+--- tenant | character(15) | created_by | character(15) | created_date | date | Indexes: "i1" btree (tenant) So just just a simple index on tenant. prepare x as select count(*) from test where tenant = $1 and created_date = $2; PREPARE explain execute x('001','2011-6-30'); QUERY PLAN --- Aggregate (cost=263301.40..263301.41 rows=1 width=0) -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) Recheck Cond: (tenant = $1) Filter: (created_date = $2) -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) Index Cond: (tenant = $1) (6 rows) -- this is when the WAL rows are written: explain (analyze on, buffers on) execute x('001','2011-6-30'); QUERY PLAN Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual time=191.150..191.151 rows=1 loops=1) Buffers: shared hit=3716 -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) (actual time=1.966..188.221 rows=3712 loops=1) Recheck Cond: (tenant = $1) Filter: (created_date = $2) Buffers: shared hit=3716 -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) (actual time=1.265..1.265 rows=3712 loops=1) Index Cond: (tenant = $1) Buffers: shared hit=20 Total runtime: 191.243 ms (10 rows) -- this is when no WAL is written: explain (analyze on, buffers on) execute x('001','2011-6-30'); QUERY PLAN Aggregate (cost=263301.40..263301.41 rows=1 width=0) (actual time=11.529..11.530 rows=1 loops=1) Buffers: shared hit=3715 -> Bitmap Heap Scan on test (cost=3895.99..263299.28 rows=847 width=0) (actual time=1.341..9.187 rows=3712 loops=1) Recheck Cond: (tenant = $1) Filter: (created_date = $2) Buffers: shared hit=3715 -> Bitmap Index Scan on i1 (cost=0.00..3895.77 rows=169372 width=0) (actual time=0.756..0.756 rows=3712 loops=1) Index Cond: (tenant = $1) Buffers: shared hit=19 Total runtime: 11.580 ms (10 rows) If you wanted to recreate this scenario I created a simple script to create the table: create table test(tenant char(15), created_by char(15), created_date date); insert into test values('x', 'y','2011-6-30'); insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test select * from test; insert into test sel
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
lars wrote: > So a read of a row *will* trigger dead tuple pruning, and that > requires WAL logging, and this is known/expected? Yes, because pruning dead line pointers will make subsequent reads faster. It's intended to be an optimization. > This is actually the only answer I am looking for. :) I have not > seen this documented anywhere. You would currently need to look at the README-HOT file or source code, I think. There probably should be some mention in the user docs, but I haven't noticed any, and it is more technical than most of the documentation gets. Perhaps a "note" block somewhere... > The fact that a select (maybe a big analytical query we'll run) > touching many rows will update the WAL and wait (apparently) for > that IO to complete is making a fully cached database far less > useful. Well, I've never run into this because I have directly attached storage through a RAID controller with a battery-backed cache configured for write-back. The pruning is pretty light on CPU usage, and with a BBU controller, the WAL writes just move from one cache to another. If that's not an option for you, you could contrive to have the update code reread the modified rows after COMMIT, or configure your autovacuum to be very aggressive so that a background process usually takes care of this before a SELECT gets to it. And there's a good chance that tuning your query and/or running with literal values available to the planner would be a big net win even without this issue; if this issue is a problem for you, it's just another reason to do that tuning. > Just dropped the table to test something so I can't get the plan > right now. Will send an update as soon as I get > it setup again. I'll be surprised if you don't see a seqscan. The most interesting bit at this point (at least to me) is working on tuning the cost factors for the planner. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 12:08 PM, Kevin Grittner wrote: lars wrote: select count(*) from test where tenant = $1 and created_date = $2 Ah, that might be a major clue -- prepared statements. What sort of a plan do you get for that as a prepared statement? (Note, it is very likely *not* to be the same plan as you get if you run with literal values!) It is not at all unlikely that it could resort to a table scan if you have one tenant which is five or ten percent of the table, which would likely trigger the pruning as it passed over the modified pages. -Kevin So a read of a row *will* trigger dead tuple pruning, and that requires WAL logging, and this is known/expected? This is actually the only answer I am looking for. :) I have not seen this documented anywhere. I know that Postgres will generate general plans for prepared statements (how could it do otherwise?), I also know that it sometimes chooses a sequential scan. This can always be tweaked to touch fewer rows and/or use a different plan. That's not my objective, though! The fact that a select (maybe a big analytical query we'll run) touching many rows will update the WAL and wait (apparently) for that IO to complete is making a fully cached database far less useful. I just artificially created this scenario. ... Just dropped the table to test something so I can't get the plan right now. Will send an update as soon as I get it setup again. Thanks again. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
lars wrote: > select count(*) from test where tenant = $1 and created_date = $2 Thinking about this some more, it would be interesting to know your PostgreSQL configuration. I seem to remember you mentioning some settings up-thread, but I'm not sure whether it was comprehensive. Could you paste the result of running the query on this Wiki page?: http://wiki.postgresql.org/wiki/Server_Configuration It might be that if you generate your queries with literals rather than using server-side prepared statements, and tweak a couple configuration settings, this problem will evaporate. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
lars wrote: > select count(*) from test where tenant = $1 and created_date = $2 Ah, that might be a major clue -- prepared statements. What sort of a plan do you get for that as a prepared statement? (Note, it is very likely *not* to be the same plan as you get if you run with literal values!) It is not at all unlikely that it could resort to a table scan if you have one tenant which is five or ten percent of the table, which would likely trigger the pruning as it passed over the modified pages. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/12/2011 08:13 AM, Ivan Voras wrote: On 12/07/2011 02:09, lars wrote: Oh, and iowait hovers around 20% when SELECTs are slow: avg-cpu: %user %nice %system %iowait %steal %idle 1.54 0.00 0.98 18.49 0.07 78.92 When SELECTs are fast it looks like this: avg-cpu: %user %nice %system %iowait %steal %idle 8.72 0.00 0.26 0.00 0.00 91.01 Note that this is a 12 core VM. So one core at 100% would show as 8.33% CPU. Now only if you could do an "iostat -x" and show the output in both cases... Sure (sorry for missing details): iostat -x during selects when all's fine: avg-cpu: %user %nice %system %iowait %steal %idle 8.250.000.000.000.00 91.75 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 1.000.002.00 0.0024.00 12.00 0.000.00 0.00 0.00 xvdf 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdg 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdap1 is OS volumn. xvdf holds the database files xvdg holds the WAL No IO on database/WAL volumes, one core is pegged close to 100% CPU. iostat -x during update: avg-cpu: %user %nice %system %iowait %steal %idle 1.050.000.584.000.00 94.37 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdf 0.00 0.007.000.00 128.00 0.00 18.29 0.000.00 0.00 0.00 xvdg 0.00 7352.000.00 804.00 0.00 62368.00 77.5766.07 68.83 0.86 69.20 Just updating the WAL. --- and while it's checkpointing: avg-cpu: %user %nice %system %iowait %steal %idle 0.640.000.328.880.00 90.16 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdf 0.00 2548.002.00 1658.0032.00 33408.00 20.14 144.18 86.69 0.60 100.00 xvdg 0.00 5428.000.00 778.00 0.00 58480.00 75.1777.44 100.22 1.21 94.00 Updating the WAL, and database volume due to checkpointing. -- iostat -x after I stopped the update process and checkpointing is done: avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.000.000.00 100.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdf 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdg 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 No activity at all. - iostat -x after I started the select queries after the updates: avg-cpu: %user %nice %system %iowait %steal %idle 2.090.001.49 12.150.00 84.26 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 xvdf 0.00 8.000.002.00 0.0080.00 40.00 0.002.00 2.00 0.40 xvdg 0.00 7844.001.00 1098.00 8.00 82336.00 74.9358.27 59.39 0.70 77.20 Heavy writes to the WAL volume. select * from pg_stat_activity; datid | datname | procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | qu ery_start | waiting | current_query ---+-+-+--+--+--+-+-+-+---+---+ ---+-+--- 16385 | lars|2654 |16384 | lars | | 127.0.0.1 | | 44972 | 2011-07-12 18:44:09.479581+00 | 2011-07-12 18:50:32.629412+00 | 2011-07-12 18:50:32.629473+00 | f | select count(*) from test where tenant = $1 and created_date = $2 16385 | lars|2658 | 10 | postgres | psql | | | -1 | 2011-07-12 18:49:02.675436+00 | 2011-07-12 18:50:32.631013+00 | 2011-07-12 18:50:32.631013+00
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Tue, Jul 12, 2011 at 9:36 AM, Kevin Grittner wrote: > lars wrote: > >> I am not trying to optimize this particular use case, but rather >> to understand what Postgres is doing, and why SELECT queries are >> affected negatively (sometimes severely) by concurrent (or even >> preceding) UPDATEs at all when the database resides in the cache >> completely. > > I think we're stuck in terms of trying to guess what is causing the > behavior you are seeing. Things which would help get it "unstuck": > > (1) More information about your load process. Looking at the code, > I could sort of see a possible path to this behavior if the load > process involves any adjustments beyond straight INSERTs or COPYs > in. > > (2) You could poke around with a profiler, a debugger, and/or > the contrib/pageinspect module to sort things out. hm, also strace on the 'select' process might give some clues. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 12/07/2011 02:09, lars wrote: Oh, and iowait hovers around 20% when SELECTs are slow: avg-cpu: %user %nice %system %iowait %steal %idle 1.54 0.00 0.98 18.49 0.07 78.92 When SELECTs are fast it looks like this: avg-cpu: %user %nice %system %iowait %steal %idle 8.72 0.00 0.26 0.00 0.00 91.01 Note that this is a 12 core VM. So one core at 100% would show as 8.33% CPU. Now only if you could do an "iostat -x" and show the output in both cases... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 12/07/2011 16:18, Merlin Moncure wrote: On Tue, Jul 12, 2011 at 8:22 AM, Ivan Voras wrote: On 08/07/2011 01:56, lars wrote: Setup: PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux instance (kernel 2.6.35) with the database and WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) - yes that is not an ideal setup (WAL should be on separate drive, EBS is slow to begin, etc), but I am mostly interested in read performance for a fully cached database. I know you said you know these things - but do you really know the (huge) extent to which all your IO is slowed? Even context switches in a virtualized environment are slowed down by a huge margin - which would make practically all in-memory lock operations very slow - much slower than they would be on "real" hardware, and EBS by definition is even slower then regular private virtual storage environments. I regrettably didn't bookmark the page which did exact measurements of EBS, but http://www.google.com/search?q=how+slow+is+ec2 will illustrate my point. (of course, you may already know all this :) ). sure, but the OP's question is valid: in postgres, readers don't block writers, so why is the reader waiting? Yes, but I'm suggesting a different question: are we sure we are not seeing the influences of the environment (EC2+EBS) instead of the software system? We need some more information here. Definitely. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
lars wrote: > I am not trying to optimize this particular use case, but rather > to understand what Postgres is doing, and why SELECT queries are > affected negatively (sometimes severely) by concurrent (or even > preceding) UPDATEs at all when the database resides in the cache > completely. I think we're stuck in terms of trying to guess what is causing the behavior you are seeing. Things which would help get it "unstuck": (1) More information about your load process. Looking at the code, I could sort of see a possible path to this behavior if the load process involves any adjustments beyond straight INSERTs or COPYs in. (2) You could poke around with a profiler, a debugger, and/or the contrib/pageinspect module to sort things out. (3) You could post a reproducible test case -- where you start with CREATE TABLE and populate with something like the generate_series() function and go through a clearly described set of steps to see the behavior. With the someone else could do the profiling, debugging, and/or page inspection. >From what you've said, it seems like either you're omitting some detail as irrelevant which is actually significant, or you've found a bug we should hunt down and fix. I really don't know which it is. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Tue, Jul 12, 2011 at 8:22 AM, Ivan Voras wrote: > On 08/07/2011 01:56, lars wrote: > >> Setup: >> PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux >> instance (kernel 2.6.35) with the database and >> WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) >> - yes that is not an ideal setup >> (WAL should be on separate drive, EBS is slow to begin, etc), but I am >> mostly interested in read performance for a fully cached database. > > I know you said you know these things - but do you really know the (huge) > extent to which all your IO is slowed? Even context switches in a > virtualized environment are slowed down by a huge margin - which would make > practically all in-memory lock operations very slow - much slower than they > would be on "real" hardware, and EBS by definition is even slower then > regular private virtual storage environments. I regrettably didn't bookmark > the page which did exact measurements of EBS, but > http://www.google.com/search?q=how+slow+is+ec2 will illustrate my point. (of > course, you may already know all this :) ). sure, but the OP's question is valid: in postgres, readers don't block writers, so why is the reader waiting? I'd like to know definitively: *) is the reader bottlenecked on disk i/o (it seems yes) *) is that disk i/o heap or wal (it seems wal) *) is that disk i/o reading/writing (it seems writing) *) given the above, why is this happening (presumably disk page tidying)? We need some more information here. I'd like to see the table information -- at least the average width of the record both pre/post update, if it is or is not toasted, and the number of size and indexes pre/post update. I'm really suspicious of the virtualization tech as well -- is it possible to run this test on at least semi decent native hardware? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 08/07/2011 01:56, lars wrote: Setup: PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux instance (kernel 2.6.35) with the database and WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) - yes that is not an ideal setup (WAL should be on separate drive, EBS is slow to begin, etc), but I am mostly interested in read performance for a fully cached database. I know you said you know these things - but do you really know the (huge) extent to which all your IO is slowed? Even context switches in a virtualized environment are slowed down by a huge margin - which would make practically all in-memory lock operations very slow - much slower than they would be on "real" hardware, and EBS by definition is even slower then regular private virtual storage environments. I regrettably didn't bookmark the page which did exact measurements of EBS, but http://www.google.com/search?q=how+slow+is+ec2 will illustrate my point. (of course, you may already know all this :) ). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/11/2011 08:26 AM, Robert Klemme wrote: On Mon, Jul 11, 2011 at 3:13 PM, k...@rice.edu wrote: I do not know if this makes sense in PostgreSQL and that readers do not block writers and writes do not block readers. Are your UPDATEs to individual rows, each in a separate transaction, or do you UPDATE multiple rows in the same transaction? If you perform multiple updates in a single transaction, you are synchronizing the changes to that set of rows and that constraint is causing other readers that need to get the correct values post- transaction to wait until the COMMIT completes. This means that the WAL write must be completed. What readers should that be? Docs explicitly state that readers are never blocked by writers: http://www.postgresql.org/docs/9.0/interactive/mvcc-intro.html http://www.postgresql.org/docs/9.0/interactive/mvcc.html From what I understand about this issue the observed effect must be caused by the implementation and not by a conceptual issue with transactions. Have you tried disabling synchronous_commit? If this scenario holds, you should be able to reduce the slowdown by un-batching your UPDATEs, as counter-intuitive as that is. This seems to be similar to a problem that I have been looking at with using PostgreSQL as the backend to a Bayesian engine. I am following this thread with interest. I don't think this will help (see above). Also, I would be very cautious to do this because although the client might get a faster acknowledge the DB still has to do the same work as without synchronous_commit (i.e. WAL, checkpointing etc.) but it still has to do significantly more transactions than in the batched version. Typically there is an optimum batch size: if batch size is too small (say, one row) the ratio of TX overhead to "work" is too bad. If batch size is too large (say, millions of rows) you hit resource limitations (memory) which inevitable force the RDBMS to do additional disk IO. Kind regards robert Thanks Ken and Robert, What I am observing is definitely not readers blocked by writers by means of row-level locking. This seems to be some implementation detail in Postgres about how dirty pages (or dead versions of tuples) are flushed to the disk (see the other parts of this thread) when they accessed by a SELECT query. The batch size in this case is one SELECT statement accessing 1 rows via an aggregate (such as COUNT) and an UPDATE updating 1 rows in a single statement. I am not trying to optimize this particular use case, but rather to understand what Postgres is doing, and why SELECT queries are affected negatively (sometimes severely) by concurrent (or even preceding) UPDATEs at all when the database resides in the cache completely. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/11/2011 04:02 PM, lars wrote: On 07/11/2011 02:43 PM, Merlin Moncure wrote: On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner wrote: lars wrote: Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, and then running the SELECTs indeed shows a similar slowdown. Interestingly I see very heavy WAL traffic while executing the SELECTs. (So I was confused as to what caused the WAL traffic). Hint bit changes aren't logged, so if it was that you would be seeing writes to the heap, but not to the WAL. Clean-up of dead tuples is logged -- this is probably the result of pruning dead tuples. You could probably reduce the impact on your SELECT statements at least a little by making autovacuum more aggressive. yeah. In fact, I'd like to disable autovacuum completely just to confirm this. In particular I'd like to know if that removes wal traffic when only selects are going on. Another way to check is to throw some queries to pg_stat_activity during your select period and see if any non-select activity (like autovacum vacuum). Basically I'm suspicious there is more to this story. hint bit flusing causing i/o during SELECT is a typical complaint (especially on systems with underperformant i/o), but I'm suspicious if that's really the problem here. Since you are on a virtualized platform, I can't help but wonder if you are running into some bottleneck that you wouldn't see on native hardware. What's iowait during the slow period? merlin Thanks Kevin and Merlin this is extremely helpful... Ok, that makes much more sense (WALing hint bits did not make sense). I disabled auto-vacuum and did four tests: 1. Run a bunch of updates, stop that process, wait until checkpointing is finished, and run the selects (as before). 2. run VACUUM manually, then run the SELECTs 3. Have the UPDATEs and SELECTs touch a mutually exclusive, random sets of row (still in sets of 1). So the SELECTs are guaranteed not to select rows that were updated. 4. Lastly, change the UPDATEs to update a non-indexed column. To rule out Index maintenance. Still distinct set of rows. In the first case I see the same slowdown (from ~14ms to ~400-500ms). pg_stat_activity shows no other load during that time. I also see write activity only on the WAL volume. In the 2nd case after VACUUM is finished the time is back to 14ms. As an aside: If I run the SELECTs while VACUUM is running the slowdown is about the same as in the first case until (apparently) VACUUM has cleaned up most of the table, at which point the SELECTs become faster again (~50ms). In the 3rd case I see exactly the same behavior, which is interesting. Both before VACUUM is run and after. There's no guarantee obviously that distinct rows do not share the same page of course especially since the index is updated as part of this (but see the 4th case). In case 4 I still see the same issue. Again both before and after VACUUM. In all cases I see from pg_stat_bgwriter that no backend writes buffers directly (but I think that only pertains to dirty buffers, and not the WAL). So I think I have a partial answer to my initial question. However, that brings me to some other questions: Why do SELECTs cause dead tuples to be pruned (as Kevin suggests)? That even happens when the updates did not touch the selected rows(?) And why does that slow down the SELECTs? (checkpointing activity on the EBS volume holding the database for example does not slow down SELECTs at all, only WAL activity does). Does the selecting backend do that work itself? Lastly, is this documented somewhere? (I apologize if it is and I missed it). If not I'd be happy to write a wiki entry for this. Oh, and iowait hovers around 20% when SELECTs are slow: avg-cpu: %user %nice %system %iowait %steal %idle 1.540.000.98 18.490.07 78.92 When SELECTs are fast it looks like this: avg-cpu: %user %nice %system %iowait %steal %idle 8.720.000.260.000.00 91.01 Note that this is a 12 core VM. So one core at 100% would show as 8.33% CPU. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/11/2011 02:43 PM, Merlin Moncure wrote: On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner wrote: lars wrote: Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, and then running the SELECTs indeed shows a similar slowdown. Interestingly I see very heavy WAL traffic while executing the SELECTs. (So I was confused as to what caused the WAL traffic). Hint bit changes aren't logged, so if it was that you would be seeing writes to the heap, but not to the WAL. Clean-up of dead tuples is logged -- this is probably the result of pruning dead tuples. You could probably reduce the impact on your SELECT statements at least a little by making autovacuum more aggressive. yeah. In fact, I'd like to disable autovacuum completely just to confirm this. In particular I'd like to know if that removes wal traffic when only selects are going on. Another way to check is to throw some queries to pg_stat_activity during your select period and see if any non-select activity (like autovacum vacuum). Basically I'm suspicious there is more to this story. hint bit flusing causing i/o during SELECT is a typical complaint (especially on systems with underperformant i/o), but I'm suspicious if that's really the problem here. Since you are on a virtualized platform, I can't help but wonder if you are running into some bottleneck that you wouldn't see on native hardware. What's iowait during the slow period? merlin Thanks Kevin and Merlin this is extremely helpful... Ok, that makes much more sense (WALing hint bits did not make sense). I disabled auto-vacuum and did four tests: 1. Run a bunch of updates, stop that process, wait until checkpointing is finished, and run the selects (as before). 2. run VACUUM manually, then run the SELECTs 3. Have the UPDATEs and SELECTs touch a mutually exclusive, random sets of row (still in sets of 1). So the SELECTs are guaranteed not to select rows that were updated. 4. Lastly, change the UPDATEs to update a non-indexed column. To rule out Index maintenance. Still distinct set of rows. In the first case I see the same slowdown (from ~14ms to ~400-500ms). pg_stat_activity shows no other load during that time. I also see write activity only on the WAL volume. In the 2nd case after VACUUM is finished the time is back to 14ms. As an aside: If I run the SELECTs while VACUUM is running the slowdown is about the same as in the first case until (apparently) VACUUM has cleaned up most of the table, at which point the SELECTs become faster again (~50ms). In the 3rd case I see exactly the same behavior, which is interesting. Both before VACUUM is run and after. There's no guarantee obviously that distinct rows do not share the same page of course especially since the index is updated as part of this (but see the 4th case). In case 4 I still see the same issue. Again both before and after VACUUM. In all cases I see from pg_stat_bgwriter that no backend writes buffers directly (but I think that only pertains to dirty buffers, and not the WAL). So I think I have a partial answer to my initial question. However, that brings me to some other questions: Why do SELECTs cause dead tuples to be pruned (as Kevin suggests)? That even happens when the updates did not touch the selected rows(?) And why does that slow down the SELECTs? (checkpointing activity on the EBS volume holding the database for example does not slow down SELECTs at all, only WAL activity does). Does the selecting backend do that work itself? Lastly, is this documented somewhere? (I apologize if it is and I missed it). If not I'd be happy to write a wiki entry for this. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Mon, Jul 11, 2011 at 4:55 PM, Kevin Grittner wrote: > Merlin Moncure wrote: >> Kevin Grittner wrote: >>> lars wrote: >>> Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, and then running the SELECTs indeed shows a similar slowdown. Interestingly I see very heavy WAL traffic while executing the SELECTs. (So I was confused as to what caused the WAL traffic). >>> >>> Hint bit changes aren't logged, so if it was that you would be >>> seeing writes to the heap, but not to the WAL. Clean-up of dead >>> tuples is logged -- this is probably the result of pruning dead >>> tuples. You could probably reduce the impact on your SELECT >>> statements at least a little by making autovacuum more >>> aggressive. >> >> yeah. In fact, I'd like to disable autovacuum completely just to >> confirm this. > > If I'm right, disabling autovacuum would tend to make this *worse*. > >> In particular I'd like to know if that removes wal traffic when >> only selects are going on. > > My guess: no. > >> Another way to check is to throw some queries to pg_stat_activity >> during your select period and see if any non-select activity (like >> autovacum vacuum). > > That's not a bad thing to check, but be careful what causality you > assume based on a correlation here -- blaming autovacuum might be a > bit like like blaming firefighters for fires, because you keep > seeing them at the same time. You might actually want them to > respond faster and more aggressively, rather than keeping them away. > > You do realize, that just reading a page with dead tuples can cause > dead tuple pruning, right? No autovacuum involved. Your SELECT > statement waits for things to be tidied up and the page is marked > dirty. I'm thinking that more aggressive autovacuum runs would > clean more of this up in background processes and let the SELECT > statement avoid some of this work -- speeding them up. Yeah, but that doesn't jive with the facts as I understand them -- 10k records are being written at random place and 10k records are being read at random place on some large table. I'm assuming (!) that most of the time the 'selects' are not hitting tuples that are recently updated unless the table is relatively small against the 10k window size. If the select is reading a bunch of un-recently-updated tuples, and this is not a sequential scan, and autovauum is not running and causing sideband i/o, wal activity should be small. What percentage of the table is updated at the end of the test? If it's high, or greater than 1.0 write ration, then disabling AV would be a huge negative. Or maybe the random select/update index is synchronized -- but then autovacuum wouldn't really be a player either way. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Merlin Moncure wrote: > Kevin Grittner wrote: >> lars wrote: >> >>> Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, >>> and then running the SELECTs indeed shows a similar slowdown. >>> >>> Interestingly I see very heavy WAL traffic while executing the >>> SELECTs. (So I was confused as to what caused the WAL traffic). >> >> Hint bit changes aren't logged, so if it was that you would be >> seeing writes to the heap, but not to the WAL. Clean-up of dead >> tuples is logged -- this is probably the result of pruning dead >> tuples. You could probably reduce the impact on your SELECT >> statements at least a little by making autovacuum more >> aggressive. > > yeah. In fact, I'd like to disable autovacuum completely just to > confirm this. If I'm right, disabling autovacuum would tend to make this *worse*. > In particular I'd like to know if that removes wal traffic when > only selects are going on. My guess: no. > Another way to check is to throw some queries to pg_stat_activity > during your select period and see if any non-select activity (like > autovacum vacuum). That's not a bad thing to check, but be careful what causality you assume based on a correlation here -- blaming autovacuum might be a bit like like blaming firefighters for fires, because you keep seeing them at the same time. You might actually want them to respond faster and more aggressively, rather than keeping them away. You do realize, that just reading a page with dead tuples can cause dead tuple pruning, right? No autovacuum involved. Your SELECT statement waits for things to be tidied up and the page is marked dirty. I'm thinking that more aggressive autovacuum runs would clean more of this up in background processes and let the SELECT statement avoid some of this work -- speeding them up. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner wrote: > lars wrote: > >> Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, >> and then running the SELECTs indeed shows a similar slowdown. >> >> Interestingly I see very heavy WAL traffic while executing the >> SELECTs. (So I was confused as to what caused the WAL traffic). > > Hint bit changes aren't logged, so if it was that you would be > seeing writes to the heap, but not to the WAL. Clean-up of dead > tuples is logged -- this is probably the result of pruning dead > tuples. You could probably reduce the impact on your SELECT > statements at least a little by making autovacuum more aggressive. yeah. In fact, I'd like to disable autovacuum completely just to confirm this. In particular I'd like to know if that removes wal traffic when only selects are going on. Another way to check is to throw some queries to pg_stat_activity during your select period and see if any non-select activity (like autovacum vacuum). Basically I'm suspicious there is more to this story. hint bit flusing causing i/o during SELECT is a typical complaint (especially on systems with underperformant i/o), but I'm suspicious if that's really the problem here. Since you are on a virtualized platform, I can't help but wonder if you are running into some bottleneck that you wouldn't see on native hardware. What's iowait during the slow period? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
lars wrote: > Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, > and then running the SELECTs indeed shows a similar slowdown. > > Interestingly I see very heavy WAL traffic while executing the > SELECTs. (So I was confused as to what caused the WAL traffic). Hint bit changes aren't logged, so if it was that you would be seeing writes to the heap, but not to the WAL. Clean-up of dead tuples is logged -- this is probably the result of pruning dead tuples. You could probably reduce the impact on your SELECT statements at least a little by making autovacuum more aggressive. At some point you could see the overhead of autovacuum having some impact on your SELECT statements, so you may need to experiment to find the "sweet spot" for your load. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 07/11/2011 10:33 AM, Kevin Grittner wrote: lars hofhansl wrote: Yep, I am not seeing the SELECTs slow down (measurably) during checkpoints (i.e. when dirty pages are flushed to disk), but only during writing of the WAL files. How about if you do a whole slew of the UPDATEs and then stop those and run a bunch of SELECTs? (I don't think I've seen anything mentioned so far which rules out hint bit rewrites as an issue.) I see you have tweaked things to balance the writes -- you might want to try further adjustments to reduce backend writes and see what happens. -Kevin Hmm... You are right. Stopping the UPDATEs, waiting for any CHECKPOINTs to finish, and then running the SELECTs indeed shows a similar slowdown. Interestingly I see very heavy WAL traffic while executing the SELECTs. (So I was confused as to what caused the WAL traffic). Why do changes to the hint bits need to be logged to the WAL? If we loose them we can always get that information back from the commit log. Maybe the backend does not know why the page is dirty and will write it to the WAL anyway(?) If that is the case there seems to be room to optimize that. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
lars hofhansl wrote: > Yep, I am not seeing the SELECTs slow down (measurably) during > checkpoints (i.e. when dirty pages are flushed to disk), but only > during writing of the WAL files. How about if you do a whole slew of the UPDATEs and then stop those and run a bunch of SELECTs? (I don't think I've seen anything mentioned so far which rules out hint bit rewrites as an issue.) I see you have tweaked things to balance the writes -- you might want to try further adjustments to reduce backend writes and see what happens. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Mon, Jul 11, 2011 at 05:26:49PM +0200, Robert Klemme wrote: > On Mon, Jul 11, 2011 at 3:13 PM, k...@rice.edu wrote: > > I do not know if this makes sense in PostgreSQL and that readers > > do not block writers and writes do not block readers. Are your > > UPDATEs to individual rows, each in a separate transaction, or > > do you UPDATE multiple rows in the same transaction? If you > > perform multiple updates in a single transaction, you are > > synchronizing the changes to that set of rows and that constraint > > is causing other readers that need to get the correct values post- > > transaction to wait until the COMMIT completes. This means that > > the WAL write must be completed. > > What readers should that be? Docs explicitly state that readers are > never blocked by writers: > http://www.postgresql.org/docs/9.0/interactive/mvcc-intro.html > http://www.postgresql.org/docs/9.0/interactive/mvcc.html > > From what I understand about this issue the observed effect must be > caused by the implementation and not by a conceptual issue with > transactions. > > > Have you tried disabling synchronous_commit? If this scenario > > holds, you should be able to reduce the slowdown by un-batching > > your UPDATEs, as counter-intuitive as that is. This seems to > > be similar to a problem that I have been looking at with using > > PostgreSQL as the backend to a Bayesian engine. I am following > > this thread with interest. > > I don't think this will help (see above). Also, I would be very > cautious to do this because although the client might get a faster > acknowledge the DB still has to do the same work as without > synchronous_commit (i.e. WAL, checkpointing etc.) but it still has to > do significantly more transactions than in the batched version. > > Typically there is an optimum batch size: if batch size is too small > (say, one row) the ratio of TX overhead to "work" is too bad. If > batch size is too large (say, millions of rows) you hit resource > limitations (memory) which inevitable force the RDBMS to do additional > disk IO. > > Kind regards > > robert > Okay, If we assume that the current implementation of MVCC is preventing readers from blocking writers and writers from blocking readers, then the application may have some statements that are implicitly locking the database and that is conflicting with the UPDATEs. Maybe the slowdown is caused by index updates caused by the write activity. Just throwing out some ideas. Regards, Ken regarding index updates with the read-only queries. > -- > remember.guy do |as, often| as.you_can - without end > http://blog.rubybestpractices.com/ > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Mon, Jul 11, 2011 at 3:13 PM, k...@rice.edu wrote: > I do not know if this makes sense in PostgreSQL and that readers > do not block writers and writes do not block readers. Are your > UPDATEs to individual rows, each in a separate transaction, or > do you UPDATE multiple rows in the same transaction? If you > perform multiple updates in a single transaction, you are > synchronizing the changes to that set of rows and that constraint > is causing other readers that need to get the correct values post- > transaction to wait until the COMMIT completes. This means that > the WAL write must be completed. What readers should that be? Docs explicitly state that readers are never blocked by writers: http://www.postgresql.org/docs/9.0/interactive/mvcc-intro.html http://www.postgresql.org/docs/9.0/interactive/mvcc.html >From what I understand about this issue the observed effect must be caused by the implementation and not by a conceptual issue with transactions. > Have you tried disabling synchronous_commit? If this scenario > holds, you should be able to reduce the slowdown by un-batching > your UPDATEs, as counter-intuitive as that is. This seems to > be similar to a problem that I have been looking at with using > PostgreSQL as the backend to a Bayesian engine. I am following > this thread with interest. I don't think this will help (see above). Also, I would be very cautious to do this because although the client might get a faster acknowledge the DB still has to do the same work as without synchronous_commit (i.e. WAL, checkpointing etc.) but it still has to do significantly more transactions than in the batched version. Typically there is an optimum batch size: if batch size is too small (say, one row) the ratio of TX overhead to "work" is too bad. If batch size is too large (say, millions of rows) you hit resource limitations (memory) which inevitable force the RDBMS to do additional disk IO. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Hi Lars, I do not know if this makes sense in PostgreSQL and that readers do not block writers and writes do not block readers. Are your UPDATEs to individual rows, each in a separate transaction, or do you UPDATE multiple rows in the same transaction? If you perform multiple updates in a single transaction, you are synchronizing the changes to that set of rows and that constraint is causing other readers that need to get the correct values post- transaction to wait until the COMMIT completes. This means that the WAL write must be completed. Have you tried disabling synchronous_commit? If this scenario holds, you should be able to reduce the slowdown by un-batching your UPDATEs, as counter-intuitive as that is. This seems to be similar to a problem that I have been looking at with using PostgreSQL as the backend to a Bayesian engine. I am following this thread with interest. Regards, Ken On Thu, Jul 07, 2011 at 04:56:13PM -0700, lars wrote: > I am doing some research that will hopefully lead to replacing a big > Oracle installation with a set PostgreSQL servers. > > The current Oracle installations consists of multiple of RAC > clusters with 8 RAC nodes each. Each RAC node has 256gb of > memory (to be doubled soon). > The nature of our service is such that over a reasonable time (a day > or so) the database *is* the working set. > > So I am looking at Postgres in a context where (almost) all of the > data is cached and disk IO is only required for persistence. > > Setup: > PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux > instance (kernel 2.6.35) with the database and > WAL residing on the same EBS volume with EXT4 (data=ordered, > barriers=1) - yes that is not an ideal setup > (WAL should be on separate drive, EBS is slow to begin, etc), but I > am mostly interested in read performance for a fully cached > database. > > shared_buffers: varied between 1gb and 20gb > checkpoint_segments/timeout: varied accordingly between 16-256 and > 5-10m, resp. > bgwriter tweaked to get a good distribution of checkpoints, > bg-writes, and backend writes. > wal_sync_method: tried fdatasync and open_datasync. > > I read "PostgreSQL 9.0 high performance", and have spent some > significant amount of time on this already. > > PostgreSQL holds up extremely well, once things like "storing > hint-bits", checkpoints vs bgwriter vs backend_writes, etc > are understood. I installed pg_buffercache and pgfincore to monitor > how and where the database is stored. > > There is one observation that I wasn't able to explain: > A SELECT only client is severely slowed down by a concurrent client > performing UPDATES on the same table the other > client selects from, even when the database resides 100% in the > cache (I tried with shared_buffers large enough to hold > the database, and also with a smaller setting relying on the OS > cache, the behavior is the same). > > As long as only the reader is running I get great performance > (20-30ms, query reading a random set of about 1 rows > out of 100m row table in a single SELECT). The backend is close to > 100% cpu, which is what want in a cached database. > > Once the writer starts the read performance drops almost immediately > to >200ms. > The reading backend's cpu drop drop to <10%, and is mostly waiting > (D state in top). > The UPDATE touches a random set of also about 1 rows (in one > update statement, one of the columns touched is > indexed - and that is the same index used for the SELECTs). > > What I would have expected is that the SELECTs would just continue > to read from the cached buffers (whether dirtied > or not) and not be affected by concurrent updates. I could not find > anything explaining this. > > The most interesting part: > that this does not happen with an exact clone of that relation but > UNLOGGED. The same amount of buffers get dirty, > the same amount checkpointing, bgwriting, vacuuming. The only > difference is WAL maintenance as far as I can tell. > > Is there some (intentional or not) synchronization between backend > when the WAL is maintained? Are there times when > read only query needs to compete disk IO when everything is cached? > Or are there any other explanations? > > I am happy to provide more information. Although I am mainly looking > for a qualitative answer, which could explain this behavior. > > Thanks. > > -- Lars > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
Thanks Craig. Yep, I am not seeing the SELECTs slow down (measurably) during checkpoints (i.e. when dirty pages are flushed to disk), but only during writing of the WAL files. The buffers shared and OS are big enough to hold the entire database, so evicting cached data should not be necessary. (The database including indexes can fit into 16 or so GB, and I have 68GB on that machine). Interestingly I initially thought there might be a correlation between checkpointing and slower SELECTs, but it turns out that checkpointing just slowed down IO to the WAL - until I move it to its own drive, and then increased the effect I was seeing. I'll do more research and try to provide more useful details. Thanks for the pg_catalog link, I'll have a look at it. -- Lars - Original Message From: Craig Ringer To: pgsql-performance@postgresql.org Sent: Sun, July 10, 2011 4:11:39 PM Subject: Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database On 11/07/2011 4:34 AM, lars wrote: > I have since moved the WAL to its own EBS volume (ext4, data=writeback) > to make it easier to monitor IO. > The times where the SELECTs slow down coincide with heavy write traffic > to the WAL volume. In theory, UPDATEs shouldn't be blocking or slowing SELECTs. Whether that holds up to the light of reality, real-world hardware, and software implementation detail, I really don't know. I avoided responding to your first mail because I generally work with smaller and less performance critical databases so I haven't accumulated much experience with fine-tuning. If your SELECTs were slower *after* your UPDATEs I'd be wondering if your SELECTs are setting hint bits on the pages touched by the UPDATEs. See: http://wiki.postgresql.org/wiki/Hint_Bits . It doesn't sound like that's the case if the SELECTs are slowed down *during* a big UPDATE that hasn't yet committed, though. Could it just be cache pressure - either on shm, or operating system disk cache? All the dirty buffers that have to be flushed to WAL and to the heap may be evicting cached data your SELECTs were benefitting from. Unfortunately, diagnostics in this area are ... limited ... though some of the pg_catalog views (http://www.postgresql.org/docs/9.0/static/catalogs.html) may offer some information. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On 11/07/2011 4:34 AM, lars wrote: I have since moved the WAL to its own EBS volume (ext4, data=writeback) to make it easier to monitor IO. The times where the SELECTs slow down coincide with heavy write traffic to the WAL volume. In theory, UPDATEs shouldn't be blocking or slowing SELECTs. Whether that holds up to the light of reality, real-world hardware, and software implementation detail, I really don't know. I avoided responding to your first mail because I generally work with smaller and less performance critical databases so I haven't accumulated much experience with fine-tuning. If your SELECTs were slower *after* your UPDATEs I'd be wondering if your SELECTs are setting hint bits on the pages touched by the UPDATEs. See: http://wiki.postgresql.org/wiki/Hint_Bits . It doesn't sound like that's the case if the SELECTs are slowed down *during* a big UPDATE that hasn't yet committed, though. Could it just be cache pressure - either on shm, or operating system disk cache? All the dirty buffers that have to be flushed to WAL and to the heap may be evicting cached data your SELECTs were benefitting from. Unfortunately, diagnostics in this area are ... limited ... though some of the pg_catalog views (http://www.postgresql.org/docs/9.0/static/catalogs.html) may offer some information. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
I have since moved the WAL to its own EBS volume (ext4, data=writeback) to make it easier to monitor IO. The times where the SELECTs slow down coincide with heavy write traffic to the WAL volume. Maybe this has to do with WALInsertLock or WALWriteLock (or some other lock). Since the slowdown was less severe with WAL on its own volume it seems some exclusive lock on the pages in shared_buffers is held while WAL IO is in progres(?) - that would be "frustrating". (wal_buffers default to 16mb in my setup) Next I am going to have a look at the code. I would be thankful for any further insights, though :) Thanks. -- Lars On 07/07/2011 04:56 PM, lars wrote: I am doing some research that will hopefully lead to replacing a big Oracle installation with a set PostgreSQL servers. The current Oracle installations consists of multiple of RAC clusters with 8 RAC nodes each. Each RAC node has 256gb of memory (to be doubled soon). The nature of our service is such that over a reasonable time (a day or so) the database *is* the working set. So I am looking at Postgres in a context where (almost) all of the data is cached and disk IO is only required for persistence. Setup: PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux instance (kernel 2.6.35) with the database and WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) - yes that is not an ideal setup (WAL should be on separate drive, EBS is slow to begin, etc), but I am mostly interested in read performance for a fully cached database. shared_buffers: varied between 1gb and 20gb checkpoint_segments/timeout: varied accordingly between 16-256 and 5-10m, resp. bgwriter tweaked to get a good distribution of checkpoints, bg-writes, and backend writes. wal_sync_method: tried fdatasync and open_datasync. I read "PostgreSQL 9.0 high performance", and have spent some significant amount of time on this already. PostgreSQL holds up extremely well, once things like "storing hint-bits", checkpoints vs bgwriter vs backend_writes, etc are understood. I installed pg_buffercache and pgfincore to monitor how and where the database is stored. There is one observation that I wasn't able to explain: A SELECT only client is severely slowed down by a concurrent client performing UPDATES on the same table the other client selects from, even when the database resides 100% in the cache (I tried with shared_buffers large enough to hold the database, and also with a smaller setting relying on the OS cache, the behavior is the same). As long as only the reader is running I get great performance (20-30ms, query reading a random set of about 1 rows out of 100m row table in a single SELECT). The backend is close to 100% cpu, which is what want in a cached database. Once the writer starts the read performance drops almost immediately to >200ms. The reading backend's cpu drop drop to <10%, and is mostly waiting (D state in top). The UPDATE touches a random set of also about 1 rows (in one update statement, one of the columns touched is indexed - and that is the same index used for the SELECTs). What I would have expected is that the SELECTs would just continue to read from the cached buffers (whether dirtied or not) and not be affected by concurrent updates. I could not find anything explaining this. The most interesting part: that this does not happen with an exact clone of that relation but UNLOGGED. The same amount of buffers get dirty, the same amount checkpointing, bgwriting, vacuuming. The only difference is WAL maintenance as far as I can tell. Is there some (intentional or not) synchronization between backend when the WAL is maintained? Are there times when read only query needs to compete disk IO when everything is cached? Or are there any other explanations? I am happy to provide more information. Although I am mainly looking for a qualitative answer, which could explain this behavior. Thanks. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] UPDATEDs slowing SELECTs in a fully cached database
I am doing some research that will hopefully lead to replacing a big Oracle installation with a set PostgreSQL servers. The current Oracle installations consists of multiple of RAC clusters with 8 RAC nodes each. Each RAC node has 256gb of memory (to be doubled soon). The nature of our service is such that over a reasonable time (a day or so) the database *is* the working set. So I am looking at Postgres in a context where (almost) all of the data is cached and disk IO is only required for persistence. Setup: PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux instance (kernel 2.6.35) with the database and WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) - yes that is not an ideal setup (WAL should be on separate drive, EBS is slow to begin, etc), but I am mostly interested in read performance for a fully cached database. shared_buffers: varied between 1gb and 20gb checkpoint_segments/timeout: varied accordingly between 16-256 and 5-10m, resp. bgwriter tweaked to get a good distribution of checkpoints, bg-writes, and backend writes. wal_sync_method: tried fdatasync and open_datasync. I read "PostgreSQL 9.0 high performance", and have spent some significant amount of time on this already. PostgreSQL holds up extremely well, once things like "storing hint-bits", checkpoints vs bgwriter vs backend_writes, etc are understood. I installed pg_buffercache and pgfincore to monitor how and where the database is stored. There is one observation that I wasn't able to explain: A SELECT only client is severely slowed down by a concurrent client performing UPDATES on the same table the other client selects from, even when the database resides 100% in the cache (I tried with shared_buffers large enough to hold the database, and also with a smaller setting relying on the OS cache, the behavior is the same). As long as only the reader is running I get great performance (20-30ms, query reading a random set of about 1 rows out of 100m row table in a single SELECT). The backend is close to 100% cpu, which is what want in a cached database. Once the writer starts the read performance drops almost immediately to >200ms. The reading backend's cpu drop drop to <10%, and is mostly waiting (D state in top). The UPDATE touches a random set of also about 1 rows (in one update statement, one of the columns touched is indexed - and that is the same index used for the SELECTs). What I would have expected is that the SELECTs would just continue to read from the cached buffers (whether dirtied or not) and not be affected by concurrent updates. I could not find anything explaining this. The most interesting part: that this does not happen with an exact clone of that relation but UNLOGGED. The same amount of buffers get dirty, the same amount checkpointing, bgwriting, vacuuming. The only difference is WAL maintenance as far as I can tell. Is there some (intentional or not) synchronization between backend when the WAL is maintained? Are there times when read only query needs to compete disk IO when everything is cached? Or are there any other explanations? I am happy to provide more information. Although I am mainly looking for a qualitative answer, which could explain this behavior. Thanks. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance