Re: [CORE] [HACKERS] back-branch multixact fixes 9.5 alpha/beta: schedule
On Mon, 8 Jun 2015 13:53:42 -0300 Alvaro Herrera alvhe...@2ndquadrant.com wrote: * people with the wrong oldestMulti setting in pg_control (which would be due to a buggy pg_upgrade being used long ago) will be unable to start if they upgrade to 9.3.7 or 9.3.8. A solution for them would be to downgrade to 9.3.6. We had reports of this problem starting just a couple of days after we released 9.4.2, I think. Does this mean that for people with wrong oldestMulti settings in pg_control due to a buggy pg_upgrade being used long ago can fix this by updating to 9.3.9 when it is released? Asking for a friend... -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [CORE] Restore-reliability mode
On Mon, 8 Jun 2015 13:03:56 -0300 Claudio Freire klaussfre...@gmail.com wrote: Ohmygosh, you have to rpm install a bunch of -devel stuff? What a massive hardship. It's not about the 5 minutes of compile time, it's about the signalling. Just *when* is git ready for testing? You don't know from the outside. I do lurk here a lot and still am unsure quite often. Even simply releasing an alpha *tarball* would be useful enough. What is needed is the signal to test, rather than a fully-built package. This. The clients I referred to earlier don't even use the rpm packages, they build from sources. They need to know when it is worthwhile to take a new set of sources and test. Some sort of labeling about what the contents are would enable them to do this. I don't think a monthly snapshot would work as well as the requirement is knowing that grouping sets are in not that it is July now. -dg -- David Gouldda...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [CORE] Restore-reliability mode
I think Alphas are valuable and useful and even more so if they have release notes. For example, some of my clients are capable of fetching sources and building from scratch and filing bug reports and are often interested in particular new features. They even have staging infrastructure that could test new postgres releases with real applications. But they don't do it. They also don't follow -hackers, they don't track git, and they don't have any easy way to tell if if the new feature they are interested in is actually complete and ready to test at any particular time. A lot of features are developed in multiple commits over a period of time and they see no point in testing until at least most of the feature is complete and expected to work. But it is not obvious from outside when that happens for any given feature. For my clients the value of Alpha releases would mainly be the release notes, or some other mark in the sand that says As of Alpha-3 feature X is included and expected to mostly work. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vitesse DB call for testing
On Fri, 17 Oct 2014 13:12:27 -0400 Tom Lane t...@sss.pgh.pa.us wrote: CK Tan ck...@vitessedata.com writes: The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the bigint instead of numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg and vitessedb are using int4 in the execution. Well, that's pretty much cheating: it's too hard to disentangle what's coming from JIT vs what's coming from using a different accumulator datatype. If we wanted to depend on having int128 available we could get that speedup with a couple hours' work. But what exactly are you compiling here? I trust not the actual data accesses; that seems far too complicated to try to inline. regards, tom lane I don't have any inside knowledge, but from the presentation given at the recent SFPUG followed by a bit of google-fu I think these papers are relevant: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf http://sites.computer.org/debull/A14mar/p3.pdf -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spin Lock sleep resolution
On Tue, 18 Jun 2013 10:09:55 +0300 Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02.04.2013 22:58, David Gould wrote: I'll give the patch a try, I have a workload that is impacted by spinlocks fairly heavily sometimes and this might help or at least give me more information. Thanks! Did you ever get around to test this? I repeated these pgbench tests I did earlier: http://www.postgresql.org/message-id/5190e17b.9060...@vmware.com I concluded in that thread that on this platform, the TAS_SPIN macro really needs a non-locked test before the locked one. That fixes the big fall in performance with more than 28 clients. ... I wasn't expecting much of a gain from this, just wanted to verify that it's not making things worse. So looks good to me. Thanks for the followup, and I really like your graph, it looks exactly like what we were hitting. My client ended up configuring around it and adding more hosts so the urgency to run more tests sort of declined, although I think we still hit it from time to time. If you would like to point me at or send me the latest flavor of the patch it may be timely for me to test again. Especially if this is a more or less finished version, we are about to roll out a new build to all these hosts and I'd be happy to try to incorporate this patch and get some production experience with it on 80 core hosts. -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spin Lock sleep resolution
On Tue, 18 Jun 2013 11:41:06 +0300 Heikki Linnakangas hlinnakan...@vmware.com wrote: Oh, interesting. What kind of hardware are you running on? To be honest, I'm not sure what my test hardware is, it's managed by another team across the world, but /proc/cpuinfo says: model name: Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz It claims to have 80 of these: model name : Intel(R) Xeon(R) CPU E7-L8867 @2.13GHz Postgres is on ramfs on these with unlogged tables. And it's running in a virtual machine on VMware; that might also be a factor. I'm not a fan of virtualization. It makes performance even harder to reason about. It would be good to test the TAS_SPIN nonlocked patch on a variety of systems. The comments in s_lock.h say that on Opteron, the non-locked test is a huge loss. In particular, would be good to re-test that on a modern AMD system. I'll see what I can do. However I don't have acces to any large modern AMD systems. -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spin Lock sleep resolution
On Tue, 2 Apr 2013 09:01:36 -0700 Jeff Janes jeff.ja...@gmail.com wrote: Sorry. I triple checked that the patch was there, but it seems like if you save a draft with an attachment, when you come back later to finish and send it, the attachment may not be there anymore. The Gmail Offline teams still has a ways to go. Hopefully it is actually there this time. I'll give the patch a try, I have a workload that is impacted by spinlocks fairly heavily sometimes and this might help or at least give me more information. Thanks! -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors
On Fri, 14 Dec 2012 15:39:44 -0500 Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 12, 2012 at 8:29 AM, David Gould da...@sonic.net wrote: We lose noticable performance when we raise fill-factor above 10. Even 20 is slower. Whoa. Any interest in a fill-factor patch to place exactly one row per page? That would be the least contended. There are applications where it might help. During busy times these hosts sometimes fall into a stable state with very high cpu use mostly in s_lock() and LWLockAcquire() and I think PinBuffer plus very high system cpu in the scheduler (I don't have the perf trace in front of me so take this with a grain of salt). In this mode they fall from the normal 7000 queries per second to below 3000. I have seen signs of something similar to this when running pgbench -S tests at high concurrency. I've never been able to track down where I think I may have seen that with pgbench -S too. I did not have time to investigate more, but out of a sequence of three minute runs I got most runs at 300k+ qps and but a couple were around 200k qps. the problem is happening. My belief is that once a spinlock starts to be contended, there's some kind of death spiral that can't be arrested until the workload eases up. But I haven't had much luck identifying exactly which spinlock is the problem or if it even is just one... I agree about the death spiral. I think what happens is all the backends get synchcronized by waiting and they are more likely to contend again. -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bulk_multi_insert infinite loops with large rows and small fill factors
COPY IN loops in heap_multi_insert() extending the table until it fills the disk when trying to insert a wide row into a table with a low fill-factor. Internally fill-factor is implemented by reserving some space space on a page. For large enough rows and small enough fill-factor bulk_multi_insert() can't fit the row even on a new empty page, so it keeps allocating new pages but is never able to place the row. It should always put at least one row on an empty page. In the excerpt below saveFreeSpace is the reserved space for the fill-factor. while (ndone ntuples) { ... /* * Find buffer where at least the next tuple will fit. If the page is * all-visible, this will also pin the requisite visibility map page. */ buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]-t_len, ... /* Put as many tuples as fit on this page */ for (nthispage = 0; ndone + nthispage ntuples; nthispage++) { HeapTuple heaptup = heaptuples[ndone + nthispage]; if (PageGetHeapFreeSpace(page) MAXALIGN(heaptup-t_len) + saveFreeSpace) break; RelationPutHeapTuple(relation, buffer, heaptup); } ...Do a bunch of dirtying and logging etc ... } This was introduced in 9.2 as part of the bulk insert speedup. One more point, in the case where we don't insert any rows, we still do all the dirtying and logging work even though we did not modify the page. I have tried skip all this if no rows are added (nthispage == 0), but my access method foo is sadly out of date, so someone should take a skeptical look at that. A test case and patch against 9.2.2 is attached. It fixes the problem and passes make check. Most of the diff is just indentation changes. Whoever tries this will want to test this on a small partition by itself. -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. \set ECHO all \timing on drop table if exists big_sparse; CREATE TABLE big_sparse ( u_id integer NOT NULL, c_id integer DEFAULT 0 NOT NULL, a_id integer DEFAULT 0 NOT NULL, cr_id integer DEFAULT 0 NOT NULL, b integer, c_t_u text, l_date abstime DEFAULT now(), c_b double precision DEFAULT 0.0, d_b double precision, c_s text, c_s_u abstime, c_e_u abstime, co_b_r_ts abstime, e_a_s integer, c_c_i_l text, c_c_e_l text, ag_c_i_l text, ag_c_e_l text, c_c_t_l text, c_r_t_l text, c_m_t_l text, c_ci_t_l text, ag_c_t_l text, ag_r_t_l text, ag_m_t_l text, ag_ci_t_l text, c_l_t_l text, ag_l_t_l text, r_s_id smallint DEFAULT 1 NOT NULL, r_s text NOT NULL, i_u_l text, e_u_l text, f_l text, f smallint, e_s_l_e text, r_h_f_n text, cr_h smallint, cr_w smallint, n_b integer, cr_a_l text, e_id bigint NOT NULL, a_ex text, ca_b_a real[], su smallint DEFAULT 0 NOT NULL, h_b_f real[], p_id integer, ca_b_aj real[], ca_b_aj_i integer[] ) WITH (fillfactor=10); COPY big_sparse FROM stdin; 1688 700032834 700580073 704810483 25 foobar.grotxbarb/xdexyzzyen.xyzzyv=xxixsrc=201002RAA9 2011-11-12 05:00:01+00 0 500 i 2010-10-31 04:00:00+00 \N 2011-11-12 05:00:01+00 113 \N \N \N \N US \N \N \N \N \N \N \N en \N 5 Rxyzzyn \N foobar./|foobar.xyzzyatxyzzy.xyzzyk/|foobar.grotxyzzyroupbarb/|grotxyzzxyzzybarb|foobar.grotxyxyzzybarb/|foobar.xyzzyxyzzyily.xyzzy/|grotxyzzyancebarb|grotxyzzydog|foobar.grotxyzzyxyzzysbarbdog/|foobar.grotxyzzyaobarb/|grotxyzzxyzzywsbarb|foobar.xyzzyun.xyzzy/|foobar.grotogejbarb/|foobar.grotxyzzxyzzybarb/|foobar.grotogcdbarbdog/|grotxyzzyetbarb|foobar.grotxyzzyipobarb/|foobar.grotxyzzyaobarb/|grotxyzzyxyzzyiabarb|grotxyzzybarbdog xyzzyign 2 1:1291|0|0 ABC DEF 36|0|0 OR 1290|0|0 ABC DEF 36|0|0 OR 13|0|0 ABC DEF 36|0|0 OR 84|2592000|0 ABC DEF 36|0|0 OR 83|2592000|0 ABC DEF 36|0|0 OR 82|2592000|0 ABC DEF 36|0|0 OR 12|0|0 ABC DEF 36|0|0 \N 0 0 25 \N 0 \N \N 0 {1,1,1,1,1,0.20003,0.40006,0.60024,0.80012,1,1,1,1,1,1,1! ,1,1,1,1,1,1,1,1} \N \N \N \. *** postgresql-9.2.2/src/backend/access/heap/heapam.c 2012-12-03 12:16:10.0 -0800 --- postgresql-9.2.2dg/src/backend/access/heap/heapam.c 2012-12-12 01:55:58.174653706 -0800 *** *** 2158,2163 --- 2158,2164 Buffer buffer; Buffer vmbuffer = InvalidBuffer; bool all_visible_cleared = false; + bool page_is_empty; int nthispage; /* *** *** 2173,2299 START_CRIT_SECTION(); /* Put as many tuples as fit on this page */ for (nthispage = 0; ndone + nthispage ntuples; nthispage++) { HeapTuple heaptup = heaptuples[ndone + nthispage]; ! if (PageGetHeapFreeSpace(page) MAXALIGN(heaptup-t_len) + saveFreeSpace) break; ! RelationPutHeapTuple(relation, buffer, heaptup
Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors
On Wed, 12 Dec 2012 12:27:11 +0100 Andres Freund and...@2ndquadrant.com wrote: On 2012-12-12 03:04:19 -0800, David Gould wrote: COPY IN loops in heap_multi_insert() extending the table until it fills the Heh. Nice one. Did you hit that in practice? Yeah, with a bunch of hosts that run postgres on a ramdisk, and that copy happens late in the initial setup script for new hosts. The first batch of new hosts to be setup with 9.2 filled the ramdisk, oomed and fell over within a minute. Since the script setups up a lot of stuff we had no idea at first who oomed. ISTM this would be fixed with a smaller footprint by just making if (PageGetHeapFreeSpace(page) MAXALIGN(heaptup-t_len) + saveFreeSpace) if (!PageIsEmpty(page) PageGetHeapFreeSpace(page) MAXALIGN(heaptup-t_len) + saveFreeSpace) I think that should work? I like PageIsEmpty() better (and would have used if I I knew), but I'm not so crazy about the negation. -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors
On Wed, 12 Dec 2012 13:56:08 +0200 Heikki Linnakangas hlinnakan...@vmware.com wrote: However, RelationGetBufferForTuple() won't return such a page, it guarantees that the first tuple does indeed fit on the page it returns. For the same reason, the later check that at least one tuple was actually placed on the page is not necessary. I committed a slightly different version, which unconditionally puts the first tuple on the page, and only applies the freespace check to the subsequent tuples. Since RelationGetBufferForTuple() guarantees that the first tuple fits, we can trust that, like heap_insert does. --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2172,8 +2172,12 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, /* NO EREPORT(ERROR) from here till changes are logged */ START_CRIT_SECTION(); - /* Put as many tuples as fit on this page */ - for (nthispage = 0; ndone + nthispage ntuples; nthispage++) + /* + * () has ensured that the first tuple fits. + * Put that on the page, and then as many other tuples as fit. + */ + RelationPutHeapTuple(relation, buffer, heaptuples[ndone]); + for (nthispage = 1; ndone + nthispage ntuples; nthispage++) { HeapTuple heaptup = heaptuples[ndone + nthispage]; I don't know if this is the same thing. At least in the comments I was reading trying to figure this out there was some concern that someone else could change the space on the page. Does RelationGetBufferForTuple() guarantee against this too? -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: bulk_multi_insert infinite loops with large rows and small fill factors
On Wed, 12 Dec 2012 14:23:12 +0200 Heikki Linnakangas hlinnakan...@vmware.com wrote: The bug's been fixed now, but note that huge tuples like this will always cause the table to be extended. Even if there are completely empty pages in the table, after a vacuum. Even a completely empty existing page is not considered spacious enough in this case, because it's still too small when you take fillfactor into account, so the insertion will always extend the table. If you regularly run into this situation, you might want to raise your fillfactor.. Actually, we'd like it lower. Ideally, one row per page. We lose noticable performance when we raise fill-factor above 10. Even 20 is slower. During busy times these hosts sometimes fall into a stable state with very high cpu use mostly in s_lock() and LWLockAcquire() and I think PinBuffer plus very high system cpu in the scheduler (I don't have the perf trace in front of me so take this with a grain of salt). In this mode they fall from the normal 7000 queries per second to below 3000. Once in this state they tend to stay that way. If we turn down the number of incoming requests they go back to normal. Our conjecture is that most requests are for only a few keys and so we have multiple sessions contending for few pages and convoying in the buffer manager. The table is under 20k rows, but the hot items are probably only a couple hundred different rows. The busy processes are doing reads only, but there is some update activity on this table too. Ah, found an email with the significant part of the perf output: ... set number of client threads = number of postgres backends = 70. That way all my threads have constant access to a backend and they just spin in a tight loop running the same query over and over (with different values). ... this seems to have tapped into 9.2's resonant frequency, right now we're spending almost all our time spin locking. ... 762377.00 71.0% s_lock /usr/local/bin/postgres 22279.00 2.1% LWLockAcquire /usr/local/bin/postgres 18916.00 1.8% LWLockRelease /usr/local/bin/postgres I was trying to resurrect the pthread s_lock() patch to see if that helps, but it did not apply at all and I have not had time to persue it. We have tried lots of number of processes and get the best result with about ten less active postgresql backends than HT cores. System is 128GB with: processor : 79 vendor_id : GenuineIntel cpu family : 6 model : 47 model name : Intel(R) Xeon(R) CPU E7-L8867 @ 2.13GHz stepping: 2 cpu MHz : 2128.478 cache size : 30720 KB -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Strange errors from 9.2.1 and 9.2.2 (I hope I'm missing something obvious)
I'm sure I've had a stroke or something in the middle of the night and just didn't notice, but I'm able to reproduce the following on three different hosts on both 9.2.1 and 9.2.2. As far as I know the only difference between these queries is whitespace since I just up-arrowed them in psql and deleted a space or lf. And as far as I can tell none of these errors are correct. Complete transcript, freshly started 9.2.2. dg@jekyl:~$ psql psql (9.2.2) Type help for help. dg=# CREATE TABLE t ( i INTEGER, PRIMARY KEY (i) ); ERROR: type key does not exist LINE 3: PRIMARY KEY (i) ^ dg=# CREATE TABLE t ( i INTEGER, PRIMARY KEY (i) ); ERROR: syntax error at or near PRIMARY LINE 3: PRIMARY KEY (i) ^ dg=# CREATE TABLE t ( i INTEGER, PRIMARY KEY (i) ); ERROR: column i named in key does not exist LINE 2: i INTEGER, PRIMARY KEY (i) ^ Someone please set me straight, and tell me I've had a brain injury because I am not comfortable with computers just fucking with me which is the other explanation. -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange errors from 9.2.1 and 9.2.2 (I hope I'm missing something obvious)
On Tue, 11 Dec 2012 18:58:58 -0700 Josh Kupershmidt schmi...@gmail.com wrote: On Tue, Dec 11, 2012 at 6:01 PM, David Gould da...@sonic.net wrote: I'm sure I've had a stroke or something in the middle of the night and just didn't notice, but I'm able to reproduce the following on three different hosts on both 9.2.1 and 9.2.2. As far as I know the only difference between these queries is whitespace since I just up-arrowed them in psql and deleted a space or lf. And as far as I can tell none of these errors are correct. Complete transcript, freshly started 9.2.2. dg@jekyl:~$ psql psql (9.2.2) Type help for help. dg=# CREATE TABLE t ( i INTEGER, PRIMARY KEY (i) ); ERROR: type key does not exist LINE 3: PRIMARY KEY (i) Hrm, although I didn't see such characters in your above text, perhaps you have some odd Unicode characters in your input. For example, the attached superficially similar input file will generate the same error message for me. (The odd character in my input is U+2060, 'Word Joiner', encoded 0xE2 0x81 0xA0.) Thank you. I got the example via cut and paste from email and pasted it into psql on different hosts. od tells me it ends each line with: \n followed by 0xC2 0xA0 and then normal spaces. The C2A0 thing is apparently NO-BREAK SPACE. Invisible, silent, odorless but still deadly. Which will teach me not to accept text files from the sort of people who write code in Word I guess. -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] huge tlb support
On Tue, 21 Aug 2012 18:06:38 +0200 Andres Freund and...@2ndquadrant.com wrote: On Tuesday, August 21, 2012 05:56:58 PM Robert Haas wrote: On Tue, Aug 21, 2012 at 11:31 AM, Andres Freund and...@2ndquadrant.com wrote: On Tuesday, August 21, 2012 05:30:28 PM Robert Haas wrote: On Thu, Aug 16, 2012 at 10:53 PM, David Gould da...@sonic.net wrote: A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had horrible problems caused by transparent_hugepages running postgres on largish systems (128GB to 512GB memory, 32 cores). The system sometimes goes 99% system time and is very slow and unresponsive to the point of not successfully completing new tcp connections. Turning off transparent_hugepages fixes it. Yikes! Any idea WHY that happens? Afair there were several bugs that could cause that in earlier version of the hugepage feature. The prominent was something around never really stopping to search for mergeable pages even though the probability was small or such. This is what I think was going on. We did see a lot (99%) of time in some routine in the VM (I forget exactly which), and my interpretation was that it was trying to create hugepages from scattered fragments. I'm inclined to think this torpedos any idea we might have of enabling hugepages automatically whenever possible. I think we should just add a GUC for this and call it good. If the state of the world improves sufficiently in the future, we can adjust, but I think for right now we should just do this in the simplest way possible and move on. He is talking about transparent hugepages not hugepages afaics. Hmm. I guess you're right. But why would it be different? Because in this case explicit hugepage usage reduces the pain instead of increasing it. And we cannot do much against transparent hugepages being enabled by default. Unless I misremember how things work the problem is/was independent of anonymous mmap or sysv shmem. Explicit hugepages work because the pages can be created early before all of memory is fragmented and you either succeed or fail. Transparent hugepages uses a daemon that looks for processe that might benefit from hugepages and tries to create hugepages on the fly. On a system that has been up for a some time memory may be so fragmented that this is just a waste of time. Real as opposed to transparent hugepages would be a huge win for applications that try to use high connection counts. Each backend attached to the postgresql shared memory uses its own set of page table entries at the rate of 2KB per MB of mapped shared memory. At 8GB of shared buffers and 1000 connections this uses 16GB just for page tables. -dg -- David Gould 510 282 0869 da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] huge tlb support
On Mon, 9 Jul 2012 12:30:23 +0200 Andres Freund and...@2ndquadrant.com wrote: On Monday, July 09, 2012 08:11:00 AM Tom Lane wrote: y...@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes: Also, I was under the impression that recent Linux kernels use hugepages automatically if they can, so I wonder exactly what Andres was testing on ... if you mean the trasparent hugepage feature, iirc it doesn't affect MAP_SHARED mappings like this. Oh! That would explain some things. It seems like a pretty nasty restriction though ... do you know why they did that? Looking a bit deeper they explicitly only work on private memory. The reason apparently being that its too hard to update the page table entries in multiple processes at once without introducing locking problems/scalability issues. To be sure one can check /proc/$pid_of_pg_proccess/smaps and look for the mapping to /dev/zero or the biggest mapping ;). Its not counted as Anonymous memory and it doesn't have transparent hugepages. I was confused before because there is quite some (400mb here) huge pages allocated for postgres during a pgbench run but thats just all the local memory... A warning, on RHEL 6.1 (2.6.32-131.4.1.el6.x86_64 #1 SMP) we have had horrible problems caused by transparent_hugepages running postgres on largish systems (128GB to 512GB memory, 32 cores). The system sometimes goes 99% system time and is very slow and unresponsive to the point of not successfully completing new tcp connections. Turning off transparent_hugepages fixes it. That said, explicit hugepage support for the buffer cache would be a big win especially for high connection counts. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers