Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Wei Shan, Thank you for your response. Query B was run after initializing the DB ex-novo doing VACUUM ANALYZE before and after creating and clustering indexes. By the way, these results are consistent through time and are reproducible, so it’s not a metter of statistic collector (I guess). Your observation is the same done at dba.stackexchange.com and this make me think that the built-in Postgres of OS X Server is truly optimized. Best regards, Pietro PS on the other response I reported both postgresql.conf Il giorno 01/apr/2015, alle ore 16:44, Wei Shan weishan@gmail.com ha scritto: Just looking at the 2 B_2 queries, I'm curious as to why is the execution plan different between the 2 machines. Is the optimiser stats updated on both databases? Regards, Wei Shan On 1 April 2015 at 22:32, Aidan Van Dyk ai...@highrise.ca wrote: On Wed, Apr 1, 2015 at 9:56 AM, Pietro Pugni pietro.pu...@gmail.com wrote: Now let’s propose some query profiling times. B type set are transactions, so it's impossible for me to post EXPLAIN ANALYZE results. I've extracted two querys from a single transactions and executed the twos on both system. Here are the results: T420 Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06 MacMini Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk Looking at the 2 B_2 queries (since they are so drastically different), the in-memory quicksorts stand out on the Dell as being *drastically* slower than the disk-based sorts on your mac-mini -- Regards, Ang Wei Shan
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
On Thu, Apr 2, 2015 at 6:33 AM, Pietro Pugni pietro.pu...@gmail.com wrote: *T420* work_mem = 512MB *MacMini* work_mem = 32MB So that is why the T420 does memory sorts and the mini does disk sorts. I'd start looking at why memory sorts on the T420 is so slow. Check your numa settings, etc (as already mentioned). For a drastic test, disable the 2nd socket on the dell, and just use one (eliminate any numa/QPI costs) and see how it compares to the no-numa MacMini. If you want to see how bad the NUMA/QPI is, play with stream to benchmark memory performance. a.
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Jeff, thank you for your response. I’m using Postgres 9.0 on MacMini because I’ve noticed that it’s quite fast compared to different Ubuntu machines on which I’ve worked with different (and more performant) hardware. The built-in Postgres version on OS X Server is impossible to update. I should stop it and install a parallel and independent distribution which has not been optimized by Apple. On opensource.appel.com they have different Postgres versions but the latest one is 9.2.x. They stopped updating it in 2012. pg_test_fsync tells me that T420 disk iops are ~7 times faster than MacMini, which is ok, but queries run ~2-5 times slower (for brevity I didn’t report all test results in my first mail). I’ve searched just now what a collation is because I’ve never explicitly used one before, so I think it uses the default one. B_2 query is of the form: WITH soggetti AS ( SELECT ... FROM ... GROUP BY ...) SELECT ... INTO ... FROM soggetti, ... WHERE ... (I omit the … part because they’re not relevant) Best regards, Pietro PS it’s the first time for me on this list so I don’t know if you read the other answers. I reported the postgresql.conf for both systems Il giorno 01/apr/2015, alle ore 18:38, Jeff Janes jeff.ja...@gmail.com ha scritto: On Wed, Apr 1, 2015 at 6:56 AM, Pietro Pugni pietro.pu...@gmail.com wrote: This question was posted originally on http://dba.stackexchange.com/questions/96444/cant-get-dell-pe-t420-perc-h710-perform-better-than-a-macmini-with-postgresql and they suggested to post it on this mailing list. It's months that I'm trying to solve a performance issue with PostgreSQL. I’m able to give you all the technical details needed. SYSTEM CONFIGURATION Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID controller configured in this way: two Intel Xeon E5-2640 v2 @2Ghz PostgreSQL 9.4 (updated to the latest available version) My personal low cost and low profile development machine is a MacMini configured in this way: one Intel i7 @2.2Ghz PostgreSQL 9.0.13 (the original built-in shipped with OS X Server) Using such different versions of PostgreSQL seems like a recipe for frustration. Here are two benchmarks generated using pg_test_fsync: This is unlikely to be important for the type of workload you describe. Fsyncs are the bottleneck for many short transactions, but not often the bottleneck for very large transactions. T420 Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06 MacMini Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk What collation is used for both databases? Perhaps the T420 is using a much slower collation. How can you sort 2,951,191 but then materialize 4,458,971 rows out of that? I've never seen that before. (Or, in the other plan, put 2,951,191 rows into the sort from the CTE but get 4,458,971 out of the sort? Cheers, Jeff
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Pietro, The modern CPUs trying to be too smart. try to run this code to disable CPUs c-states: setcpulatency.c #include stdio.h #include fcntl.h #include stdint.h int main(int argc, char **argv) { int32_t l; int fd; if (argc != 2) { fprintf(stderr, Usage: %s latency in us\n, argv[0]); return 2; } l = atoi(argv[1]); printf(setting latency to %d us\n, l); fd = open(/dev/cpu_dma_latency, O_WRONLY); if (fd 0) { perror(open /dev/cpu_dma_latency); return 1; } if (write(fd, l, sizeof(l)) != sizeof(l)) { perror(write to /dev/cpu_dma_latency); return 1; } while (1) pause(); } you can use i7z (https://code.google.com/p/i7z/) to see the percentage of CPU power to be used. Changing CPU from C1 to C0 takes quite some time and for DB workload not optimal (if you need a high throughout and any given moment). I see ~65% boost when run './setcpulatency 0'. Tigran. - Original Message - From: Pietro Pugni pietro.pu...@gmail.com To: i...@postgresql-consulting.com Cc: pgsql-performance pgsql-performance@postgresql.org Sent: Thursday, April 2, 2015 12:57:22 PM Subject: Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL Hi Ilya, thank your for your response. Both system were configured for each test I’ve done. On T420 I’ve optimized the kernel following the official Postgres documentation ( http://www.postgresql.org/docs/9.4/static/kernel-resources.html ): kernel.shmmax=68719476736 kernel.shmall=16777216 vm.overcommit_memory=2 vm.overcommit_ratio=90 RAID controllers were configured as following: - Write cache: WriteBack - Read cache: ReadAhead - Disk cache (only T420): disabled to take full advantage of WriteBack cache (BBU is charged and working) - NCQ (only MacMini because it’s a SATA option): enabled (this affects a lot the overall performance) For postgresql.conf: T420 Normal operations autovacuum = on maintenance_work_mem = 512MB work_mem = 512MB wal_buffers = 64MB effective_cache_size = 64GB # this helps A LOT in disk write speed when creating indexes shared_buffers = 32GB checkpoint_segments = 2000 checkpoint_completion_target = 1.0 effective_io_concurrency = 0 # 1 doesn’t make any substantial difference max_connections = 10 # 20 doesn’t make any difference Data loading (same as above with the following changes): autovacuum = off maintenance_work_mem = 64GB MacMini Normal operations autovacuum = on maintenance_work_mem = 128MB work_mem = 32MB wal_buffers = 32MB effective_cache_size = 800MB shared_buffers = 512MB checkpoint_segments = 32 checkpoint_completion_target = 1.0 effective_io_concurrency = 1 max_connections = 20 Data loading (same as above with the following changes): autovacuum = off maintenance_work_mem = 6GB Best regards, Pietro Il giorno 01/apr/2015, alle ore 16:27, Ilya Kosmodemiansky ilya.kosmodemian...@postgresql-consulting.com ha scritto: Hi Pietro, On Wed, Apr 1, 2015 at 3:56 PM, Pietro Pugni pietro.pu...@gmail.com wrote: T420: went from 311seconds (default postgresql.conf) to 195seconds doing tuning adjustments over RAID, kernel and postgresql.conf; MacMini: 40seconds. I'am afraid, the matter is, that PostgreSQL is not configured properly (and so do operating system and probably controller, however pg_test_fsync shows that things are not so bad there as with postgresql.conf). It is pretty useless to benchmark a database using out-of-the-box configuration. You need at least configure shared memory related, checkpoints-related and autovacuum-related settings. And as a first step, please compare postgresql.conf on Mac and on the server: sometimes (with some mac installers) default postgresql.conf can be not the same as on server. Best regards, Ilya -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi, On Thu, Apr 2, 2015 at 12:47 PM, Pietro Pugni pietro.pu...@gmail.com wrote: Hi Jeff, thank you for your response. I’m using Postgres 9.0 on MacMini because I’ve noticed that it’s quite fast compared to different Ubuntu machines on which I’ve worked with different (and more performant) hardware. The built-in Postgres version on OS X Server is impossible to update. I should stop it and install a parallel and independent distribution which has not been optimized by Apple. On opensource.appel.com they have different Postgres versions but the latest one is 9.2.x. They stopped updating it in 2012. If you want you can compile 9.0 on OSX and double check. I don't remember well but ITSM that a fsync used by psql was a noop on OSX. pg_test_fsync tells me that T420 disk iops are ~7 times faster than MacMini, which is ok, but queries run ~2-5 times slower (for brevity I didn’t report all test results in my first mail). I’ve searched just now what a collation is because I’ve never explicitly used one before, so I think it uses the default one. What's the output of free and sysctl -a | grep vm.zone_reclaim_mode Search the mailing list for zone_reclaim_mode there's some tips. For testing you can also use the mac mini config with the dell, at least it should give you the same plan. With your example disks don't seem to matter, it's all in memory. Keep in mind that a psql query is still single thread so the mac and the dell should get more or less the same speed for in memory queries. B_2 query is of the form: WITH soggetti AS ( SELECT ... FROM ... GROUP BY ...) SELECT ... INTO ... FROM soggetti, ... WHERE ... (I omit the … part because they’re not relevant) Best regards, Pietro -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Gerardo, thank you for your response. At the moment I can’t switch to RAID10. I know it has best performance, but both systems have RAID5 and MacMini has a consumer desktop RAID solution while T420 has a server-grade one. Anyway, I used two configurations for each system: one for data loading operations and the other one for any other kind of operation (SELECT etc.). These configurations were made studying different combinations. I’ve changed kernel parameters as stated in the official Postgres documentation ( www.postgresql.org/docs/9.4/static/kernel-resources.html ). I copy and paste here the various postgresql.conf involved: T420 Normal operations autovacuum = on maintenance_work_mem = 512MB work_mem = 512MB wal_buffers = 64MB effective_cache_size = 64GB # this helps A LOT in disk write speed when creating indexes shared_buffers = 32GB checkpoint_segments = 2000 checkpoint_completion_target = 1.0 effective_io_concurrency = 0 # 1 doesn’t make any substantial difference max_connections = 10 # 20 doesn’t make any difference Data loading (same as above with the following changes): autovacuum = off maintenance_work_mem = 64GB MacMini Normal operations autovacuum = on maintenance_work_mem = 128MB work_mem = 32MB wal_buffers = 32MB effective_cache_size = 800MB shared_buffers = 512MB checkpoint_segments = 32 checkpoint_completion_target = 1.0 effective_io_concurrency = 1 max_connections = 20 Data loading (same as above with the following changes): autovacuum = off maintenance_work_mem = 6GB Best regards, Pietro Il giorno 02/apr/2015, alle ore 04:19, Gerardo Herzig gher...@fmed.uba.ar ha scritto: Ok, a quick view on the system, and some things that may be important to note: Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID controller configured in this way: * VD0: two 15k SAS disks (ext4, OS partition, WAL partition, RAID1) * VD1: ten 10k SAS disks (XFS, Postgres data partition, RAID5) Well...usually RAID5 have the worst performance in writing...EVER!!! Have you tested this in another raid configuration? RAID10 is usually the best bet. This system has the following configuration: * Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64) * 128GB RAM (DDR3, 8x16GB @1600Mhz) * two Intel Xeon E5-2640 v2 @2Ghz * Dell Perc H710 with 512MB RAM (Write cache: WriteBack, Read cache: ReadAhead, Disk cache: disabled): * VD0 (OS and WAL partition): two 15k SAS disks (ext4, RAID1) * VD1 (Postgres data partition): ten 10k SAS disks (XFS, RAID5) * PostgreSQL 9.4 (updated to the latest available version) * moved pg_stat_tmp to RAM disk [...] versions. You did not mention any postgres configuration at all. If you let the default checkpoint_segments=3, that would be an IO hell for your disk controler...and the RAID5 making things worst...Can you show us the values of: checkpoint_segments shared_buffers work_mem maintenance_work_mem effective_io_concurrency I would start from there, few changes, and check again. I would change the RAID first of all things, and try those tests again. Cheers. Gerardo
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Ilya, thank your for your response. Both system were configured for each test I’ve done. On T420 I’ve optimized the kernel following the official Postgres documentation ( http://www.postgresql.org/docs/9.4/static/kernel-resources.html ): kernel.shmmax=68719476736 kernel.shmall=16777216 vm.overcommit_memory=2 vm.overcommit_ratio=90 RAID controllers were configured as following: - Write cache: WriteBack - Read cache: ReadAhead - Disk cache (only T420): disabled to take full advantage of WriteBack cache (BBU is charged and working) - NCQ (only MacMini because it’s a SATA option): enabled (this affects a lot the overall performance) For postgresql.conf: T420 Normal operations autovacuum = on maintenance_work_mem = 512MB work_mem = 512MB wal_buffers = 64MB effective_cache_size = 64GB # this helps A LOT in disk write speed when creating indexes shared_buffers = 32GB checkpoint_segments = 2000 checkpoint_completion_target = 1.0 effective_io_concurrency = 0 # 1 doesn’t make any substantial difference max_connections = 10 # 20 doesn’t make any difference Data loading (same as above with the following changes): autovacuum = off maintenance_work_mem = 64GB MacMini Normal operations autovacuum = on maintenance_work_mem = 128MB work_mem = 32MB wal_buffers = 32MB effective_cache_size = 800MB shared_buffers = 512MB checkpoint_segments = 32 checkpoint_completion_target = 1.0 effective_io_concurrency = 1 max_connections = 20 Data loading (same as above with the following changes): autovacuum = off maintenance_work_mem = 6GB Best regards, Pietro Il giorno 01/apr/2015, alle ore 16:27, Ilya Kosmodemiansky ilya.kosmodemian...@postgresql-consulting.com ha scritto: Hi Pietro, On Wed, Apr 1, 2015 at 3:56 PM, Pietro Pugni pietro.pu...@gmail.com wrote: T420: went from 311seconds (default postgresql.conf) to 195seconds doing tuning adjustments over RAID, kernel and postgresql.conf; MacMini: 40seconds. I'am afraid, the matter is, that PostgreSQL is not configured properly (and so do operating system and probably controller, however pg_test_fsync shows that things are not so bad there as with postgresql.conf). It is pretty useless to benchmark a database using out-of-the-box configuration. You need at least configure shared memory related, checkpoints-related and autovacuum-related settings. And as a first step, please compare postgresql.conf on Mac and on the server: sometimes (with some mac installers) default postgresql.conf can be not the same as on server. Best regards, Ilya -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi On Thu, Apr 2, 2015 at 3:52 PM, Pietro Pugni pietro.pu...@gmail.com wrote: I’ve searched just now what a collation is because I’ve never explicitly used one before, so I think it uses the default one. What's the output of free and sysctl -a | grep vm.zone_reclaim_mode Search the mailing list for zone_reclaim_mode there's some tips. vm.zone_reclaim_mode = 0 In my understanding it's the rigth value there's also huge page /sys/kernel/mm/transparent_hugepage/enabled can you try to disable it? Also test on the dell: select tmp.cf, tmp.dt from grep_studi.tmp; and select tmp.cf, tmp.dt from grep_studi.tmp order by tmp.cf; in Query B_2 the sort is 9 time slower on the dell, you have to find why... For testing you can also use the mac mini config with the dell, at least it should give you the same plan. With your example disks don't seem to matter, it's all in memory. T420 with optimal postgresql.conf Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06 T420 with MacMini postgresql.conf Query B_1 [51280.208ms + 0.699ms] http://explain.depesz.com/s/wlb Query B_2 [177278.205ms + 0.428ms] http://explain.depesz.com/s/rzr 32 GB for buffers is too high for the queries in your test but it doesn't matter. MacMini Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Tigran, The modern CPUs trying to be too smart. try to run this code to disable CPUs c-states: setcpulatency.c #include stdio.h #include fcntl.h #include stdint.h int main(int argc, char **argv) { int32_t l; int fd; if (argc != 2) { fprintf(stderr, Usage: %s latency in us\n, argv[0]); return 2; } l = atoi(argv[1]); printf(setting latency to %d us\n, l); fd = open(/dev/cpu_dma_latency, O_WRONLY); if (fd 0) { perror(open /dev/cpu_dma_latency); return 1; } if (write(fd, l, sizeof(l)) != sizeof(l)) { perror(write to /dev/cpu_dma_latency); return 1; } while (1) pause(); } —— your C code should be equivalent to the following: echo 0 /dev/cpu_dma_latency Right? I executed the above command but time execution increase of about 2 seconds over 129seconds (I’ve executed the transaction several times repeating the procedure of restarting db and redoing transaction). With setting echo 1 /dev/cpu_dma_latency it returns to 129seconds. you can use i7z (https://code.google.com/p/i7z/) to see the percentage of CPU power to be used. I’ve installed i7z-GUI but it reports the following and crashes with segmentation fault (T420 has Intel Xeon, not i-series): i7z DEBUG: i7z version: svn-r77-(20-Nov-2011) i7z DEBUG: Found Intel Processor i7z DEBUG:Stepping 4 i7z DEBUG:Model e i7z DEBUG:Family 6 i7z DEBUG:Processor Type 0 i7z DEBUG:Extended Model 3 i7z DEBUG: msr = Model Specific Register i7z DEBUG: detected a newer model of ivy bridge processor i7z DEBUG: my coder doesn't know about it, can you send the following info to him? i7z DEBUG: model e, extended model 3, proc_family 6 i7z DEBUG: msr device files DONOT exist, trying out a makedev script i7z DEBUG: modprobbing for msr [1]+ Segmentation fault (core dumped) i7z_GUI Changing CPU from C1 to C0 takes quite some time and for DB workload not optimal (if you need a high throughout and any given moment). I see ~65% boost when run './setcpulatency 0'. Tigran. With “takes quite some time” you mean that it will take some time to take effect? Thank you a lot for your help. Best regards, Pietro
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Il giorno 02/apr/2015, alle ore 14:29, didier did...@gmail.com ha scritto: Hi, On Thu, Apr 2, 2015 at 12:47 PM, Pietro Pugni pietro.pu...@gmail.com wrote: Hi Jeff, thank you for your response. I’m using Postgres 9.0 on MacMini because I’ve noticed that it’s quite fast compared to different Ubuntu machines on which I’ve worked with different (and more performant) hardware. The built-in Postgres version on OS X Server is impossible to update. I should stop it and install a parallel and independent distribution which has not been optimized by Apple. On opensource.appel.com they have different Postgres versions but the latest one is 9.2.x. They stopped updating it in 2012. If you want you can compile 9.0 on OSX and double check. I don't remember well but ITSM that a fsync used by psql was a noop on OSX. You’re referring to disk scheduler? I’ve tried to change it on T420 with no significant variations over performance. I’ve also tried different fsync options with no improvements. pg_test_fsync tells me that T420 disk iops are ~7 times faster than MacMini, which is ok, but queries run ~2-5 times slower (for brevity I didn’t report all test results in my first mail). I’ve searched just now what a collation is because I’ve never explicitly used one before, so I think it uses the default one. What's the output of free and sysctl -a | grep vm.zone_reclaim_mode Search the mailing list for zone_reclaim_mode there's some tips. vm.zone_reclaim_mode = 0 I’ve also set these options in /etc/sysctl.conf: kernel.shmmax=68719476736 kernel.shmall=16777216 vm.overcommit_memory=2 vm.overcommit_ratio=90 I’ll search the mailing list. For testing you can also use the mac mini config with the dell, at least it should give you the same plan. With your example disks don't seem to matter, it's all in memory. The same transaction took 106s on MacMini; 129s on T420 with my optimized configuration; 180s on T420 using MacMini configuration. Query plans for B_1 and B_2 queries with the two configurations on T420: T420 with optimal postgresql.conf Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06 T420 with MacMini postgresql.conf Query B_1 [51280.208ms + 0.699ms] http://explain.depesz.com/s/wlb Query B_2 [177278.205ms + 0.428ms] http://explain.depesz.com/s/rzr MacMini Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk Keep in mind that a psql query is still single thread so the mac and the dell should get more or less the same speed for in memory queries. Yes I know ;) With 128GB I try to maximize RAM usage, but it’s difficult to fully understand how to achieve this. Thank you again, Pietro
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Aidan, T420 work_mem = 512MB MacMini work_mem = 32MB So that is why the T420 does memory sorts and the mini does disk sorts. I'd start looking at why memory sorts on the T420 is so slow. Check your numa settings, etc (as already mentioned). For a drastic test, disable the 2nd socket on the dell, and just use one (eliminate any numa/QPI costs) and see how it compares to the no-numa MacMini. the command dmesg | grep -i numa doesn’t display me anything. I think T420 hasn’t NUMA on it. Is there a way to enable it from Ubuntu? I don’t have immediate access to BIOS (server is in another location). For QPI I don’t know what to do. Please, can you give me more details? If you want to see how bad the NUMA/QPI is, play with stream to benchmark memory performance. With stream you refer to this: https://sites.utexas.edu/jdm4372/tag/stream-benchmark/ ? Do you suggest me some way to do this kind of tests? Thank you very much Pietro
Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
On Thu, Apr 2, 2015 at 9:23 AM, Pietro Pugni pietro.pu...@gmail.com wrote: the command dmesg | grep -i numa doesn’t display me anything. I think T420 hasn’t NUMA on it. Is there a way to enable it from Ubuntu? I don’t have immediate access to BIOS (server is in another location). NUMA stands for Non-Uniform-Memory-Access . It's basically the label for systems which have memory attached to different cpu sockets, such that accessing all of the memory from a paritciular cpu thread has different costs based on where the actual memory is located (i.e. on some other socket, or the local socket). For QPI I don’t know what to do. Please, can you give me more details? QPI is the the intel QuickPath Interconnect. It's the communication path between CPU sockets. Memory ready by one cpu thread that has to come from another cpu socket's memory controller goes through QPI. Google has lots of info on these, and how they impact performance, etc. If you want to see how bad the NUMA/QPI is, play with stream to benchmark memory performance. With stream you refer to this: https://sites.utexas.edu/jdm4372/tag/stream-benchmark/ ? Do you suggest me some way to do this kind of tests? Ya, that's the one. I don't have specific tests in mind. A more simple overview might be numactl --hardware a.