Merlin Moncure <mmonc...@gmail.com> wrote: > On Sun, Jun 8, 2014 at 5:45 PM, Kevin Grittner <kgri...@ymail.com> wrote:
> Hm, your patch seems to boil down to > interleave_memory(start, size, numa_all_nodes_ptr) > inside PGSharedMemoryCreate(). That's the functional part -- the rest is about not breaking the builds for environments which are not NUMA-aware. > I've read your email a couple of times and am a little hazy > around a couple of points, in particular: "the above is probably > more important than the attached patch". So I have a couple of > questions: > > *) There is a lot of advice floating around (for example here: > http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html > ) > to instruct operators to disable zone_reclaim. Will your changes > invalidate any of that advice? I expect that it will make the need for that far less acute, although it is probably still best to disable zone_reclaim (based on the documented conditions under which disabling it makes sense). > *) is there any downside to enabling --with-libnuma if you have > support? Not that I can see. There are two additional system calls on postmaster start-up. I don't expect the time those take to be significant. > Do you expect packagers will enable it generally? I suspect so. > Why not just always build it in (if configure allows it) and rely > on a GUC if there is some kind of tradeoff (and if there is one, > what kinds of things are you looking for to manage it)? If a build is done on a machine with the NUMA library, and the executable is deployed on a machine without it, the postmaster will get an error on the missing library. I talked about this briefly with Tom in Ottawa, and he thought that it would be up to packagers to create a dependency on the library if they build PostgreSQL using the --with-libnuma option. The reason to require the option is so that a build is not created which won't run on target machines if a packagers does nothing to deal with NUMA. > *) The bash script above, what problem does the 'alternate > policy' solve? By default, all OS buffers and cache is located in the memory node closest to the process which does the read or write which first causes it to be used. For something like the cp command, that probably makes sense. For something like PostgreSQL it can lead to unbalanced placement of shared resources (like pages in shared tables and indexes). > *) What kinds of improvements (even if in very general terms) > will we see from better numa management? Are there further > optimizations possible? When I spread both OS cache and PostgreSQL shared memory, I got about 2% better performance overall for a read-only load on a 4 node system which started with everything on one node. I used pgbench and picked a scale which put the database size at about 25% of machine memory before I initialized the database, so that one memory node was 100% filled with minimal "spill" to the other nodes. The run times between the two cases had very minimal overlap. The balanced memory usage had more consistent results; the unbalance load had more variable performance timings, with a rare run showing better than all the balanced times. I didn't spend as much time with read/write benchmarks but those seemed overall worse for the unbalance load, and one outlier on the bad side was about 20% below the (again, pretty tightly clustered) times for the balanced load. These tests were designed to try to create a pretty bad case for the unbalanced load in a default cpuset configuration and just an unlucky sizing of the working set relative to a memory node size. At PGCon I had a discussion over lunch with someone who saw far worse performance from unbalance memory, but he carefully engineered a really bad case by using one cpuset to force all data into one node, and then another cpuset to force PostgreSQL to run only on cores from which access to that node was relatively slow. If I remember correctly, he saw about 20% of the throughput that way versus using the same cores with balanced memory usage. He conceded that this was a pretty artificial case, and you would have to be *trying* to hurt performance to set things up that way, but he wanted to establish a "worst case" so that he had a hard bounding of what the maximum possible benefit from balancing load might be. There is definitely a need for more benchmarks and benchmarks on more environments, but my preliminary tests all looked favorable to the combination of this patch and the cpuset changes. I would have posted this months ago if I had found enough time to do more benchmarks and put together a nice presentation of the results, but I figured it was a good idea to put this in front of people even with only preliminary results, so that if others were interested in doing so they could see what results they got in their environments or with workloads I had not considered. I will note that given the wide differences I saw between run times with the unbalanced memory usage, there must be some variable that matters which I was not properly controlling. I still haven't figured out what that was. It might be something as simple as a particular process (like the checkpoint or bgwriter process?) landing on the fully-allocated memory node versus landing somewhere else. I will also note that if the buffers and cache are populated by small OLTP queries running on a variety of cores, the data can be spread just by happenstance, and in that case this patch should not be expected to make any difference at all. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers