Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
Haribabu kommi haribabu.ko...@huawei.com writes: To handle the above case instead of directly resetting the dead tuples as zero, how if the exact dead tuples are removed from the table stats. With this approach vacuum gets triggered frequently thus it reduces the bloat. This does not seem like a very good idea as-is, because it will mean that n_dead_tuples can diverge arbitrarily far from reality over time, as a result of accumulation of errors. It also doesn't seem like a very good idea that VACUUM sets n_live_tuples while only adjusting n_dead_tuples incrementally; ideally those counters should move in the same fashion. In short, I think this patch will create at least as many problems as it fixes. What would make more sense to me is for VACUUM to estimate the number of remaining dead tuples somehow and send that in its message. However, since the whole point here is that we aren't accounting for transactions that commit while VACUUM runs, it's not very clear how to do that. Another way to look at it is that we want to keep any increments to n_dead_tuples that occur after VACUUM takes its snapshot. Maybe we could have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts, and then send that as the value to subtract when it's done? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
This paragraph reads a bit strange to me: + A statistics session is the time period when statistics are gathered by statistics collector + without being reset. So a statistics session continues across normal shutdowns, + but whenever statistics are reset, like during a crash or upgrade, a new time period + of statistics collection commences i.e. a new statistics session. + The query_id value generation is linked to statistics session to emphasize the fact + that whenever statistics are reset,the query_id for the same queries will also change. time period when? Shouldn't that be time period during which. Also, doesn't a new statistics session start when a stats reset is invoked by the user? The bit after commences appears correct (to me, not a native by any means) but seems also a bit strange. I have tried to rephrase this. Hopefully less confusing A statistics session refers to the time period when statement statistics are gathered by statistics collector. A statistics session persists across normal shutdowns. Whenever statistics are reset like during a crash or upgrade, a new statistics session starts. The query_id value generation is linked to statistics session to emphasize that whenever statistics are reset,the query_id for the same queries will also change. regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5774365.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] Patch for fast gin cache performance improvement
I think it is desirable that this patch should be resubmitted for the next CommitFest for further review and testing mentioned above. So I'd like to mark this patch as Returned with Feedback. Is it OK? Sounds like a good idea. Thanks for the review! Ian Link Etsuro Fujita Thursday, October 10, 2013 1:01 AM Ian Link wrote: Although I asked this question, I've reconsidered about these parameters, and it seems that these parameters not only make code rather complex but are a little confusing to users. So I'd like to propose to introduce only one parameter: fast_cache_size. While users that give weight to update performance for the fast update technique set this parameter to a large value, users that give weight not only to update performance but to search performance set this parameter to a small value. What do you think about this? I think it makes sense to maintain this separation. If the user doesn't need a per-index setting, they don't have to use the parameter. Since the parameter is off by default, they don't even need to worry about it. There might as well be one parameter for users that don't need fine-grained control. We can document this and I don't think it will be confusing to the user. OK, though I'd like to hear the opinion of others. 4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads to the increase in GIN search performance, which, however, leads to the decrease in GIN update performance. Am I right? If so, I think the tradeoff should be noted in the documentation. I believe this is correct. 5. The following documents in Chapter 57. GIN Indexes need to be updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and Tricks Sure, I can add something. 6. I would like to see the results for the additional test cases (tsvectors). I don't really have any good test cases for this available, and have very limited time for postgres at the moment. Feel free to create a test case, but I don't believe I can at the moment. Sorry! Unfortunately, I don't have much time to do such a thing, though I think we should do that. (In addition, we should do another performance test to make clear an influence of fast update performance from introducing these parameters, which would be required to determine the default values of these parameters.) 7. The commented-out elog() code should be removed. Sorry about that, I shouldn't have submitted the patch with those still there. I should have a new patch soonish, hopefully. Thanks for your feedback! I think it is desirable that this patch should be resubmitted for the next CommitFest for further review and testing mentioned above. So I'd like to mark this patch as Returned with Feedback. Is it OK? Thanks, Best regards, Etsuro Fujita Ian Link Monday, September 30, 2013 3:09 PM Hi Etsuro, Sorry for the delay but I have been very busy with work. I have been away from postgres for a while, so I will need a little time to review the code and make sure I give you an informed response. I'll get back to you as soon as I am able. Thanks for understanding. Ian Link Etsuro Fujita Friday, September 27, 2013 2:24 AM I wrote: I had a look over this patch. I think this patch is interesting and very useful. Here are my review points: 8. I think there are no issues in this patch. However, I have one question: how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this case, in my understanding, this patch inserts new entries into the pending list temporarily and immediately moves them to the main GIN data structure using ginInsertCleanup(). Am I right? If so, that is obviously inefficient. Sorry, There are incorrect expressions. I mean gin_fast_limit 0 and fast_cache_size = 0. Although I asked this question, I've reconsidered about these parameters, and it seems that these parameters not only make code rather complex but are a little confusing to users. So I'd like to propose to introduce only one parameter: fast_cache_size. While users that give weight to update performance for the fast update technique set this parameter to a large value, users that give weight not only to update performance but to search performance set this parameter to a small value. What do you think about this? Thanks, Best regards, Etsuro Fujita Etsuro Fujita Thursday, September 26, 2013 6:02 AM Hi Ian, This patch contains a performance improvement for the fast gin cache. As you may know, the performance of the fast gin cache decreases with its size. Currently, the size of the fast gin cache is tied to work_mem. The size of work_mem can often be quite high. The large size of work_mem is inappropriate for the fast gin cache size. Therefore, we created a separate cache size called gin_fast_limit. This global variable controls the size of the fast gin cache, independently of work_mem. Currently, the default
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Oct 12, 2013 2:13 AM, MauMau maumau...@gmail.com wrote: From: Bruce Momjian br...@momjian.us On Thu, Oct 10, 2013 at 11:01:52PM +0900, MauMau wrote: Although this is not directly related to memory, could you set max_prepared_transactions = max_connections at initdb time? People must feel frustrated when they can't run applications on a Java or .NET application server and notice that they have to set max_prepared_transactions and restart PostgreSQL. This is far from friendly. I think the problem is that many users don't need prepared transactions and therefore don't want the overhead. Is that still accurate? I'm not sure if many use XA features, but I saw the questions and answer a few times, IIRC. In the trouble situation, PostgreSQL outputs an intuitive message like increase max_prepared_transactions, so many users might possibly have been able to change the setting and solve the problem themselves without asking for help, feeling stress like Why do I have to set this? For example, max_prepared_transactions is called hideous creature in the following page: https://community.jboss.org/wiki/InstallPostgreSQLOnFedora?_sscc=t Anybody who follows that page is screwed anyway. I notice they recommend running regular VACUUM FULL across the whole database, so it's obvious they know nothing about postgresql. There's nothing we can do about what people write on random pages around the Internet. According to the below page, the amount of memory consumed for this is (770 + 270 * max_locks_per_transaction) * max_prepared_transactions. With the default setting of maxconnections=100 and max_locks_per_transaction=64, this is only 180KB. So the overhead is negligible. You are assuming memory is the only overhead. I don't think it is. If the goal is to make PostgreSQL more friendly and run smoothly without frustration from the start and not perfect tuning, I think max_prepared_transactions=max_connections is an easy and good item. If the goal is limited to auto-tuning memory sizes, this improvement can be treated separately. Frankly, I think we'd help 1000 times more users of we enabled a few wal writers by default and jumped the wal level. Mainly so they could run one off base backup. That's used by orders of magnitude more users than XA. /Magnus
Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem
On Oct 11, 2013 10:23 PM, Josh Berkus j...@agliodbs.com wrote: On 10/11/2013 01:11 PM, Bruce Momjian wrote: In summary, I think we need to: * decide on new defaults for work_mem and maintenance_work_mem * add an initdb flag to allow users/packagers to set shared_bufffers? * add an autovacuum_work_mem setting? * change the default for temp_buffers? If we're changing defaults, bgwriter_lru_maxpages and vacuum_cost_limit could also use a bump; those thresholds were set for servers with 1GB of RAM Uh, those are there to limit io and not memory, right? More memory isn't the reason to increase them, more io is. For people deploying on modern server hardware then yes it's often low, but for all those deploying in virtualized environments with io performance reminding you of the 1990ies, I'm not so sure it is... /Magnus
Re: [HACKERS] GIN improvements part 1: additional information
On Sat, Oct 12, 2013 at 1:55 AM, Tomas Vondra t...@fuzzy.cz wrote: On 10.10.2013 13:57, Heikki Linnakangas wrote: On 09.10.2013 02:04, Tomas Vondra wrote: On 8.10.2013 21:59, Heikki Linnakangas wrote: On 08.10.2013 17:47, Alexander Korotkov wrote: Hi, Tomas! On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz wrote: I've attempted to rerun the benchmarks tests I did a few weeks ago, but I got repeated crashes when loading the data (into a table with tsvector+gin index). Right before a crash, theres this message in the log: PANIC: not enough space in leaf page! Thanks for testing. Heikki's version of patch don't works for me too on even much more simplier examples. I can try to get it working if he answer my question about GinDataLeafPageGetPostingList* macros. The new macros in that patch version were quite botched. Here's a new attempt. Nope, still the same errors :-( PANIC: not enough space in leaf page! LOG: server process (PID 29722) was terminated by signal 6: Aborted DETAIL: Failed process was running: autovacuum: ANALYZE public.messages I've continued hacking away at the patch, here's yet another version. I've done a lot of cleanup and refactoring to make the code more readable (I hope). I'm not sure what caused the panic you saw, but it's probably fixed now. Let me know if not. Yup, this version fixed the issues. I haven't been able to do any benchmarks yet, all I have is some basic stats | HEAD | patched == load duration | 1084 s | 1086 s subject index | 96 MB | 96 MB body index | 2349 MB | 2051 MB So there's virtually no difference in speed (which is expected, AFAIK) and the large index on full message bodies is significantly smaller. Yes, it should be no significant difference in speed. But difference in index sizes seems to be too small. Could you share database dump somewhere? -- With best regards, Alexander Korotkov.
Re: [HACKERS] Compression of full-page-writes
On Fri, Oct 11, 2013 at 10:36 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-11 09:22:50 +0530, Amit Kapila wrote: I think it will be difficult to prove by using any compression algorithm, that it compresses in most of the scenario's. In many cases it can so happen that the WAL will also not be reduced and tps can also come down if the data is non-compressible, because any compression algorithm will have to try to compress the data and it will burn some cpu for that, which inturn will reduce tps. Then those concepts maybe aren't such a good idea after all. Storing lots of compressible data in an uncompressed fashion isn't an all that common usecase. I most certainly don't want postgres to optimize for blank padded data, especially if it can hurt other scenarios. Just not enough benefit. That said, I actually have relatively high hopes for compressing full page writes. There often enough is lot of repetitiveness between rows on the same page that it should be useful outside of such strange scenarios. But maybe pglz is just not a good fit for this, it really isn't a very good algorithm in this day and aage. Do you think that if WAL reduction or performance with other compression algorithm (for ex. snappy) is better, then chances of getting the new compression algorithm in postresql will be more? Wouldn't it be okay, if we have GUC to enable it and have pluggable api for calling compression method, with this we can even include other compression algorithm's if they proved to be good and reduce the dependency of this patch on inclusion of new compression methods in postgresql? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 12.10.2013 12:11, Alexander Korotkov wrote: On Sat, Oct 12, 2013 at 1:55 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: Yup, this version fixed the issues. I haven't been able to do any benchmarks yet, all I have is some basic stats | HEAD | patched == load duration | 1084 s | 1086 s subject index | 96 MB | 96 MB body index | 2349 MB | 2051 MB So there's virtually no difference in speed (which is expected, AFAIK) and the large index on full message bodies is significantly smaller. Yes, it should be no significant difference in speed. But difference in index sizes seems to be too small. Could you share database dump somewhere? Turns out that if I do VACUUM FULL after loading the data (a sequence of INSERT commands), the index sizes drop significantly. | HEAD | patched == subject index | 42 MB |15 MB body index | 624 MB | 328 MB So there's a significant improvement, as expected. I'm wondering if the bloat is expected too? Is that the consequence of incremental index updates vs. rebuilding the whole index at once during VACUUM FULL? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] background workers, round three
I briefly checked these patches. Let me add some comments. * terminate-worker-v1.patch TerminateBackgroundWorker() turns on slot-terminate flag under LW_SHARED lock. Is it reasonable because all the possible caller is the background worker process itself, isn't it? * ephemeral-precious-v1.patch AtEOXact_BackgroundWorker() is located around other AtEOXact_* routines. Doesn't it makes resource management complicated? In case when main process goes into error handler but worker process is still running in health, it may continue to calculate something and put its results on shared memory segment, even though main process suggest postmaster to kill it. All the ResourceOwnerRelease() callbacks are located prior to AtEOXact_BackgroundWorker(), it is hard to release resources being in use by background worker, because they are healthy running until it receives termination signal, but sent later. In addition, it makes implementation complicated if we need to design background workers to release resources if and when it is terminated. I don't think it is a good coding style, if we need to release resources in different location depending on context. So, I'd like to propose to add a new invocation point of ResourceOwnerRelease() after all AtEOXact_* jobs, with new label something like RESOURCE_RELEASE_FINAL. In addition, AtEOXact_BackgroundWorker() does not synchronize termination of background worker processes being killed. Of course it depends on situation, I think it is good idea to wait for completion of worker processes to be terminated, to ensure resource to be released is backed to the main process if above ResourceOwnerRelease() do the job. Thanks, 2013/10/11 Robert Haas robertmh...@gmail.com: On Fri, Oct 11, 2013 at 9:27 AM, Michael Paquier michael.paqu...@gmail.com wrote: Finally I got the chance to put my hands on this code. Really sorry for the late replay. Thanks for the review. I'll respond to this in more detail later, but to make a long story short, I'm looking to apply terminate-worker-v1.patch (possibly with modifications after going over your review comments), but I'm not feeling too good any more about ephemeral-precious-v1.patch, because my experience with those facilities has so far proved unsatisfying. I think I'd like to withdraw the latter patch pending further study. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] removing old ports and architectures
Hi, As discussed in 20130926225545.gb26...@awork2.anarazel.de and on quite some other occasions there's quite some scalability improvements we could make if we had cross platform support for atomic operations. Providing that is a fair bit of work for every architecture/compiler/OS, so I think it's a reasonable thing to remove unused things first. Alternatively we can maintain parallel code paths for atomic ops supporting platforms and for those without, but that seems like a bad idea from a complexity and testing perspective given that all even remotely current platforms should support them. I think we should remove support the following ports: - IRIX - UnixWare - Tru64 Neither of those are relevant. I think we should remove support for the following architectures: - VAX - univel (s_lock support remaining) - sinix (s_lock support remaining) - sun3 (I think it's just s_lock support remaining) - natsemi 32k - superH - ALPHA (big pain in the ass to get right, nobody uses it anymore) - m86k (doesn't have a useable CAS on later iterations like coldfire) - M32R (no userspace CAS afaics) - mips for anything but gcc 4.4, using gcc's atomics support - s390 for anything but gcc 4.4, using gcc's atomics support - 32bit/v9 sparc (doesn't have proper atomics, old) Possibly: - all mips - PA-RISC. I think Tom was the remaining user there? Maybe just !gcc. Any arguments against? Last round of discussion of removing dead ports: 1335292179.13481.4.ca...@vanquo.pezone.net Discusses state of of spinlocks and barriers on various platforms: 20130920151110.ga8...@awork2.anarazel.de Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
On 11.10.2013 13:42, Huchev wrote: gettimeofday(start, NULL); for (i = 0; i VALUES; i++) { state = XXH32_init(result); XXH32_update(state, i, 4); XXH32_digest(state); } gettimeofday(end, NULL); This code is using the update variant, which is only useful when dealing with very large amount of data which can't fit into a single block of memory. This is obviously overkill for a 4-bytes-only test. 3 functions calls, a malloc, intermediate data book keeping, etc. To hash a single block of data, it's better to use the simpler (and faster) variant XXH32() : gettimeofday(start, NULL); for (i = 0; i VALUES; i++) { XXH32(i, 4, result); } gettimeofday(end, NULL); You'll probably get better results by an order of magnitude. For better results, you could even inline it (yes, for such short loop with almost no work to do, it makes a very sensible difference). Not really. Even with this change it's slightly slower than crc32, at least with the 32-bit integers. With 64-bit integers it's about 2x as fast. But even then it's like ~1% of the total runtime, so any improvements here are not really changing anything. The inlining is not a good idea IMHO, because that'd be very different from the actual usage (there won't be such tight loop). OTOH I'm not sure if the compiler does not already inline that as an optimization. That being said, it's true that these advanced hash algorithms only shine with big enough amount of data to hash. Hashing a 4-bytes value into a 4-bytes hash is a bit limited exercise. There is no pigeon hole issue. A simple multiplication by a 32-bits prime would fare good enough and result in zero collision. Agreed. I'll revisit this if/when I'll need to support larger data types in this aggregate. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] removing old ports and architectures
On Sat, Oct 12, 2013 at 5:46 PM, Andres Freund and...@2ndquadrant.com wrote: I think we should remove support the following ports: - IRIX - UnixWare - Tru64 Neither of those are relevant. Seems reasonable. I think we should remove support for the following architectures: - VAX Agreed. - univel (s_lock support remaining) - sinix (s_lock support remaining) - sun3 (I think it's just s_lock support remaining) - natsemi 32k I don't know enough about these, which doesn't bode well for them. - superH SuperH isn't dead, but it is only used for very small embedded systems, I think (mostly microcontrollers). So maybe. - ALPHA (big pain in the ass to get right, nobody uses it anymore) Yes, for many years now ALPHA has only been useful as a way of illustrating how bad it's possible for CPU memory operation reordering considerations to get. So I quite agree. - m86k (doesn't have a useable CAS on later iterations like coldfire) It does seem like Motorola 68k is vanishingly close to dead. - M32R (no userspace CAS afaics) - mips for anything but gcc 4.4, using gcc's atomics support - s390 for anything but gcc 4.4, using gcc's atomics support - 32bit/v9 sparc (doesn't have proper atomics, old) Not so sure about these. Possibly: - all mips - PA-RISC. I think Tom was the remaining user there? Maybe just !gcc. I think we should think hard about removing support for MIPS. A lot of Chinese chip manufacturers have licensed MIPS technology in just the last couple of years, so there is plenty of it out there; I'd be slightly concerned that the proposed restrictions on MIPS would be onerous. Much of this is the kind of hardware that a person might plausibly want to run Postgres on. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers