Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Pietro Pugni
Hi Wei Shan,
Thank you for your response.
Query B was run after initializing the DB ex-novo doing VACUUM ANALYZE before 
and after creating and clustering indexes.
By the way, these results are consistent through time and are reproducible, so 
it’s not a metter of statistic collector (I guess).
Your observation is the same done at dba.stackexchange.com and this make me 
think that the built-in Postgres of OS X Server is truly optimized. 

Best regards,
 Pietro

PS on the other response I reported both postgresql.conf 


Il giorno 01/apr/2015, alle ore 16:44, Wei Shan weishan@gmail.com ha 
scritto:

 Just looking at the 2 B_2 queries, I'm curious as to why is the execution 
 plan different between the 2 machines. Is the optimiser stats updated on both 
 databases?
 
 Regards,
 Wei Shan
 
 On 1 April 2015 at 22:32, Aidan Van Dyk ai...@highrise.ca wrote:
 On Wed, Apr 1, 2015 at 9:56 AM, Pietro Pugni pietro.pu...@gmail.com wrote:
  
 Now let’s propose some query profiling times.
 
 B type set are transactions, so it's impossible for me to post EXPLAIN 
 ANALYZE results. I've extracted two querys from a single transactions and 
 executed the twos on both system. Here are the results:
 
 T420
 
 Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM
 
 Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06
 
 MacMini
 
 Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx
 
 Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk
 
 
 Looking at the 2 B_2 queries (since they are so drastically different), the 
 in-memory quicksorts stand out on the Dell as being *drastically* slower than 
 the disk-based sorts on your mac-mini
 
 
 
 
 -- 
 Regards,
 Ang Wei Shan



Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Aidan Van Dyk
On Thu, Apr 2, 2015 at 6:33 AM, Pietro Pugni pietro.pu...@gmail.com wrote:


 *T420*
 work_mem = 512MB



 *MacMini*
 work_mem = 32MB


So that is why the T420 does memory sorts and the mini does disk sorts.

I'd start looking at why memory sorts on the T420 is so slow.   Check your
numa settings, etc (as already mentioned).

For a drastic test, disable the 2nd socket on the dell, and just use one
(eliminate any numa/QPI costs) and see how it compares to the no-numa
MacMini.

If you want to see how bad the NUMA/QPI is, play with stream to benchmark
memory performance.

a.


Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Pietro Pugni
Hi Jeff,
thank you for your response.
I’m using Postgres 9.0 on MacMini because I’ve noticed that it’s quite fast 
compared to different Ubuntu machines on which I’ve worked with different (and 
more performant) hardware.
The built-in Postgres version on OS X Server is impossible to update. I should 
stop it and install a parallel and independent distribution which has not been 
optimized by Apple. On opensource.appel.com they have different Postgres 
versions but the latest one is 9.2.x. They stopped updating it in 2012.
pg_test_fsync tells me that T420 disk iops are ~7 times faster than MacMini, 
which is ok, but queries run ~2-5 times slower (for brevity I didn’t report all 
test results in my first mail).

I’ve searched just now what a collation is because I’ve never explicitly used 
one before, so I think it uses the default one.

B_2 query is of the form:
WITH soggetti AS (
 SELECT ... FROM ... GROUP BY ...)
SELECT ... INTO ... FROM soggetti, ... WHERE ... 

(I omit the … part because they’re not relevant)

Best regards,
 Pietro

PS it’s the first time for me on this list so I don’t know if you read the 
other answers. I reported the postgresql.conf for both systems




Il giorno 01/apr/2015, alle ore 18:38, Jeff Janes jeff.ja...@gmail.com ha 
scritto:

 On Wed, Apr 1, 2015 at 6:56 AM, Pietro Pugni pietro.pu...@gmail.com wrote:
 This question was posted originally on 
 http://dba.stackexchange.com/questions/96444/cant-get-dell-pe-t420-perc-h710-perform-better-than-a-macmini-with-postgresql
  and they suggested to post it on this mailing list.
 
 It's months that I'm trying to solve a performance issue with PostgreSQL. I’m 
 able to give you all the technical details needed.
 
 SYSTEM CONFIGURATION
 
 Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID 
 controller configured in this way:
 
 two Intel Xeon E5-2640 v2 @2Ghz
 PostgreSQL 9.4 (updated to the latest available version)
 My personal low cost and low profile development machine is a MacMini 
 configured in this way:
 
 one Intel i7 @2.2Ghz
 PostgreSQL 9.0.13 (the original built-in shipped with OS X Server)
 
 Using such different versions of PostgreSQL seems like a recipe for 
 frustration.
 
 
 Here are two benchmarks generated using pg_test_fsync:
 
 
 This is unlikely to be important for the type of workload you describe.  
 Fsyncs are the bottleneck for many short transactions, but not often the 
 bottleneck for very large transactions.
 
 
 
 T420
 
 Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06
 
 MacMini
 
 Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk
 
 
 
 What collation is used for both databases?  Perhaps the T420 is using a much 
 slower collation.
 
 How can you sort 2,951,191 but then materialize 4,458,971 rows out of that?  
 I've never seen that before.  (Or, in the other plan, put 2,951,191 rows into 
 the sort from the CTE but get 4,458,971 out of the sort?
 
 Cheers,
 
 Jeff



Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Mkrtchyan, Tigran
Hi Pietro,

The modern CPUs trying to be too smart.

try to run this code to disable CPUs c-states:

 setcpulatency.c 

#include stdio.h
#include fcntl.h
#include stdint.h

int main(int argc, char **argv) {
   int32_t l;
   int fd;

   if (argc != 2) {
  fprintf(stderr, Usage: %s latency in us\n, argv[0]);
  return 2;
   }

   l = atoi(argv[1]);
   printf(setting latency to %d us\n, l);
   fd = open(/dev/cpu_dma_latency, O_WRONLY);
   if (fd  0) {
  perror(open /dev/cpu_dma_latency);
  return 1;
   }
   
   if (write(fd, l, sizeof(l)) != sizeof(l)) {
  perror(write to /dev/cpu_dma_latency);
  return 1;
   }
   
   while (1) pause();
}




you can use i7z (https://code.google.com/p/i7z/) to see the percentage of CPU 
power to be used.
Changing CPU from C1 to C0 takes quite some time and for DB workload not 
optimal (if you need a 
high throughout and any given moment).

I see ~65% boost when run './setcpulatency 0'.

Tigran.

- Original Message -
 From: Pietro Pugni pietro.pu...@gmail.com
 To: i...@postgresql-consulting.com
 Cc: pgsql-performance pgsql-performance@postgresql.org
 Sent: Thursday, April 2, 2015 12:57:22 PM
 Subject: Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than 
 a MacMini with PostgreSQL

 Hi Ilya,
 thank your for your response.
 Both system were configured for each test I’ve done. On T420 I’ve optimized 
 the
 kernel following the official Postgres documentation (
 http://www.postgresql.org/docs/9.4/static/kernel-resources.html ):
 kernel.shmmax=68719476736
 kernel.shmall=16777216
 vm.overcommit_memory=2
 vm.overcommit_ratio=90
 
 
 RAID controllers were configured as following:
 - Write cache: WriteBack
 - Read cache: ReadAhead
 - Disk cache (only T420): disabled to take full advantage of WriteBack cache
 (BBU is charged and working)
 - NCQ (only MacMini because it’s a SATA option): enabled (this affects a lot 
 the
 overall performance)
 
 For postgresql.conf:
 
 T420
 Normal operations
 autovacuum = on
 maintenance_work_mem = 512MB
 work_mem = 512MB
 wal_buffers = 64MB
 effective_cache_size = 64GB # this helps A LOT in disk write speed when 
 creating
 indexes
 shared_buffers = 32GB
 checkpoint_segments = 2000
 checkpoint_completion_target = 1.0
 effective_io_concurrency = 0 # 1 doesn’t make any substantial difference
 max_connections = 10 # 20 doesn’t make any difference
 
 Data loading (same as above with the following changes):
 autovacuum = off
 maintenance_work_mem = 64GB
 
 
 MacMini
 Normal operations
 autovacuum = on
 maintenance_work_mem = 128MB
 work_mem = 32MB
 wal_buffers = 32MB
 effective_cache_size = 800MB
 shared_buffers = 512MB
 checkpoint_segments = 32
 checkpoint_completion_target = 1.0
 effective_io_concurrency = 1
 max_connections = 20
 
 Data loading (same as above with the following changes):
 autovacuum = off
 maintenance_work_mem = 6GB
 
 
 Best regards,
 Pietro
 
 
 
 Il giorno 01/apr/2015, alle ore 16:27, Ilya Kosmodemiansky
 ilya.kosmodemian...@postgresql-consulting.com ha scritto:
 
 Hi Pietro,
 
 On Wed, Apr 1, 2015 at 3:56 PM, Pietro Pugni pietro.pu...@gmail.com wrote:
 T420: went from 311seconds (default postgresql.conf) to 195seconds doing
 tuning adjustments over RAID, kernel and postgresql.conf;
 MacMini: 40seconds.
 
 I'am afraid, the matter is, that PostgreSQL is not configured properly
 (and so do operating system and probably controller, however
 pg_test_fsync shows that things are not so bad there as with
 postgresql.conf).
 
 It is pretty useless to benchmark a database using out-of-the-box
 configuration. You need at least configure shared memory related,
 checkpoints-related and autovacuum-related settings. And as a first
 step, please compare postgresql.conf on Mac and on the server:
 sometimes (with some mac installers) default postgresql.conf can be
 not the same as on server.
 
 Best regards,
 Ilya
 
 
 --
 Ilya Kosmodemiansky,
 
 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
  i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread didier
Hi,

On Thu, Apr 2, 2015 at 12:47 PM, Pietro Pugni pietro.pu...@gmail.com wrote:
 Hi Jeff,
 thank you for your response.
 I’m using Postgres 9.0 on MacMini because I’ve noticed that it’s quite fast
 compared to different Ubuntu machines on which I’ve worked with different
 (and more performant) hardware.
 The built-in Postgres version on OS X Server is impossible to update. I
 should stop it and install a parallel and independent distribution which has
 not been optimized by Apple. On opensource.appel.com they have different
 Postgres versions but the latest one is 9.2.x. They stopped updating it in
 2012.
If you want you can compile 9.0 on OSX and double check.
I don't remember well but ITSM that a fsync used by psql was a noop on  OSX.

 pg_test_fsync tells me that T420 disk iops are ~7 times faster than MacMini,
 which is ok, but queries run ~2-5 times slower (for brevity I didn’t report
 all test results in my first mail).


 I’ve searched just now what a collation is because I’ve never explicitly
 used one before, so I think it uses the default one.

What's the output of free and sysctl -a | grep vm.zone_reclaim_mode

Search the mailing list for zone_reclaim_mode there's some tips.


For testing you can also use the mac mini config with the dell, at
least it should give you the same plan.
With your example disks don't seem to matter, it's all in memory.

Keep in mind that a psql query is still single thread so the mac and
the dell should get more or less the same speed for in memory queries.


 B_2 query is of the form:
 WITH soggetti AS (
  SELECT ... FROM ... GROUP BY ...)
 SELECT ... INTO ... FROM soggetti, ... WHERE ...

 (I omit the … part because they’re not relevant)

 Best regards,
  Pietro


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Pietro Pugni
Hi Gerardo,
thank you for your response.
At the moment I can’t switch to RAID10. I know it has best performance, but 
both systems have RAID5 and MacMini has a consumer desktop RAID solution while 
T420 has a server-grade one.
Anyway, I used two configurations for each system: one for data loading 
operations and the other one for any other kind of operation (SELECT etc.). 
These configurations were made studying different combinations. I’ve changed 
kernel parameters as stated in the official Postgres documentation ( 
www.postgresql.org/docs/9.4/static/kernel-resources.html ).
I copy and paste here the various postgresql.conf involved:

T420
Normal operations
autovacuum = on
maintenance_work_mem = 512MB
work_mem = 512MB
wal_buffers = 64MB
effective_cache_size = 64GB # this helps A LOT in disk write speed when 
creating indexes
shared_buffers = 32GB
checkpoint_segments = 2000
checkpoint_completion_target = 1.0
effective_io_concurrency = 0 # 1 doesn’t make any substantial difference
max_connections = 10 # 20 doesn’t make any difference

Data loading (same as above with the following changes):
autovacuum = off
maintenance_work_mem = 64GB


MacMini
Normal operations
autovacuum = on
maintenance_work_mem = 128MB
work_mem = 32MB
wal_buffers = 32MB
effective_cache_size = 800MB
shared_buffers = 512MB
checkpoint_segments = 32
checkpoint_completion_target = 1.0
effective_io_concurrency = 1
max_connections = 20

Data loading (same as above with the following changes):
autovacuum = off
maintenance_work_mem = 6GB


Best regards,
 Pietro


Il giorno 02/apr/2015, alle ore 04:19, Gerardo Herzig gher...@fmed.uba.ar ha 
scritto:

 Ok, a quick view on the system, and some things that may be important to note:
 
 Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID
 controller configured in this way:
 
* VD0: two 15k SAS disks (ext4, OS partition, WAL partition,
RAID1)
* VD1: ten 10k SAS disks (XFS, Postgres data partition, RAID5)
 
 
 Well...usually RAID5 have the worst performance in writing...EVER!!! Have you 
 tested this in another raid configuration? RAID10 is usually the best bet.
 
 
 
 This system has the following configuration:
 
* Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)
* 128GB RAM (DDR3, 8x16GB @1600Mhz)
* two Intel Xeon E5-2640 v2 @2Ghz
* Dell Perc H710 with 512MB RAM (Write cache: WriteBack, Read
cache: ReadAhead, Disk cache: disabled):
* VD0 (OS and WAL partition): two 15k SAS disks (ext4, RAID1)
* VD1 (Postgres data partition): ten 10k SAS disks (XFS,
RAID5)
* PostgreSQL 9.4 (updated to the latest available version)
* moved pg_stat_tmp to RAM disk
 
 
 [...] versions.
 
 You did not mention any postgres configuration at all. If you let the 
 default checkpoint_segments=3, that would be an IO hell for your disk 
 controler...and the RAID5 making things worst...Can you show us the values of:
 
 checkpoint_segments
 shared_buffers
 work_mem
 maintenance_work_mem
 effective_io_concurrency
 
 I would start from there, few changes, and check again. I would change the 
 RAID first of all things, and try those tests again.
 
 Cheers.
 Gerardo



Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Pietro Pugni
Hi Ilya,
thank your for your response.
Both system were configured for each test I’ve done. On T420 I’ve optimized the 
kernel following the official Postgres documentation ( 
http://www.postgresql.org/docs/9.4/static/kernel-resources.html ):
kernel.shmmax=68719476736
kernel.shmall=16777216
vm.overcommit_memory=2
vm.overcommit_ratio=90


RAID controllers were configured as following:
- Write cache: WriteBack
- Read cache: ReadAhead
- Disk cache (only T420): disabled to take full advantage of WriteBack cache 
(BBU is charged and working)
- NCQ (only MacMini because it’s a SATA option): enabled (this affects a lot 
the overall performance)

For postgresql.conf:

T420
Normal operations
autovacuum = on
maintenance_work_mem = 512MB
work_mem = 512MB
wal_buffers = 64MB
effective_cache_size = 64GB # this helps A LOT in disk write speed when 
creating indexes
shared_buffers = 32GB
checkpoint_segments = 2000
checkpoint_completion_target = 1.0
effective_io_concurrency = 0 # 1 doesn’t make any substantial difference
max_connections = 10 # 20 doesn’t make any difference

Data loading (same as above with the following changes):
autovacuum = off
maintenance_work_mem = 64GB


MacMini
Normal operations
autovacuum = on
maintenance_work_mem = 128MB
work_mem = 32MB
wal_buffers = 32MB
effective_cache_size = 800MB
shared_buffers = 512MB
checkpoint_segments = 32
checkpoint_completion_target = 1.0
effective_io_concurrency = 1
max_connections = 20

Data loading (same as above with the following changes):
autovacuum = off
maintenance_work_mem = 6GB


Best regards,
 Pietro



Il giorno 01/apr/2015, alle ore 16:27, Ilya Kosmodemiansky 
ilya.kosmodemian...@postgresql-consulting.com ha scritto:

 Hi Pietro,
 
 On Wed, Apr 1, 2015 at 3:56 PM, Pietro Pugni pietro.pu...@gmail.com wrote:
 T420: went from 311seconds (default postgresql.conf) to 195seconds doing
 tuning adjustments over RAID, kernel and postgresql.conf;
 MacMini: 40seconds.
 
 I'am afraid, the matter is, that PostgreSQL is not configured properly
 (and so do operating system and probably controller, however
 pg_test_fsync shows that things are not so bad there as with
 postgresql.conf).
 
 It is pretty useless to benchmark a database using out-of-the-box
 configuration. You need at least configure shared memory related,
 checkpoints-related and autovacuum-related settings. And as a first
 step, please compare postgresql.conf on Mac and on the server:
 sometimes (with some mac installers) default postgresql.conf can be
 not the same as on server.
 
 Best regards,
 Ilya
 
 
 -- 
 Ilya Kosmodemiansky,
 
 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
 i...@postgresql-consulting.com



Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread didier
Hi

On Thu, Apr 2, 2015 at 3:52 PM, Pietro Pugni pietro.pu...@gmail.com wrote:


 I’ve searched just now what a collation is because I’ve never explicitly
 used one before, so I think it uses the default one.


 What's the output of free and sysctl -a | grep vm.zone_reclaim_mode

 Search the mailing list for zone_reclaim_mode there's some tips.

 vm.zone_reclaim_mode = 0
In my understanding it's the rigth value
there's also huge page
/sys/kernel/mm/transparent_hugepage/enabled
can you try to disable it?

Also test on the dell:
select tmp.cf, tmp.dt from grep_studi.tmp;
and
select tmp.cf, tmp.dt from grep_studi.tmp order by tmp.cf;
in Query B_2
the sort is 9 time slower on the dell, you have to find why...


 For testing you can also use the mac mini config with the dell, at
 least it should give you the same plan.
 With your example disks don't seem to matter, it's all in memory.

 T420 with optimal postgresql.conf
 Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM
 Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06


 T420 with MacMini postgresql.conf
 Query B_1 [51280.208ms + 0.699ms] http://explain.depesz.com/s/wlb
 Query B_2 [177278.205ms + 0.428ms] http://explain.depesz.com/s/rzr

32 GB for buffers is too high for the queries in your test but it
doesn't matter.

 MacMini
 Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx
 Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Pietro Pugni
Hi Tigran,

 The modern CPUs trying to be too smart.
 
 try to run this code to disable CPUs c-states:
 
  setcpulatency.c 
 
 #include stdio.h
 #include fcntl.h
 #include stdint.h
 
 int main(int argc, char **argv) {
   int32_t l;
   int fd;
 
   if (argc != 2) {
  fprintf(stderr, Usage: %s latency in us\n, argv[0]);
  return 2;
   }
 
   l = atoi(argv[1]);
   printf(setting latency to %d us\n, l);
   fd = open(/dev/cpu_dma_latency, O_WRONLY);
   if (fd  0) {
  perror(open /dev/cpu_dma_latency);
  return 1;
   }
 
   if (write(fd, l, sizeof(l)) != sizeof(l)) {
  perror(write to /dev/cpu_dma_latency);
  return 1;
   }
 
   while (1) pause();
 }
 
 
 ——
 
your C code should be equivalent to the following:
echo 0  /dev/cpu_dma_latency
Right?
I executed the above command but time execution increase of about 2 seconds 
over 129seconds (I’ve executed the transaction several times repeating the 
procedure of restarting db and redoing transaction). With setting echo 1  
/dev/cpu_dma_latency it returns to 129seconds.

 you can use i7z (https://code.google.com/p/i7z/) to see the percentage of CPU 
 power to be used.

I’ve installed i7z-GUI but it reports the following and crashes with 
segmentation fault (T420 has Intel Xeon, not i-series):
i7z DEBUG: i7z version: svn-r77-(20-Nov-2011)
i7z DEBUG: Found Intel Processor
i7z DEBUG:Stepping 4
i7z DEBUG:Model e
i7z DEBUG:Family 6
i7z DEBUG:Processor Type 0
i7z DEBUG:Extended Model 3
i7z DEBUG: msr = Model Specific Register
i7z DEBUG: detected a newer model of ivy bridge processor
i7z DEBUG: my coder doesn't know about it, can you send the following info to 
him?
i7z DEBUG: model e, extended model 3, proc_family 6
i7z DEBUG: msr device files DONOT exist, trying out a makedev script
i7z DEBUG: modprobbing for msr
[1]+  Segmentation fault  (core dumped) i7z_GUI


 Changing CPU from C1 to C0 takes quite some time and for DB workload not 
 optimal (if you need a 
 high throughout and any given moment).
 
 I see ~65% boost when run './setcpulatency 0'.
 
 Tigran.
 
With “takes quite some time” you mean that it will take some time to take 
effect?

Thank you a lot for your help.
Best regards,
 Pietro



Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Pietro Pugni

Il giorno 02/apr/2015, alle ore 14:29, didier did...@gmail.com ha scritto:

 Hi,
 
 On Thu, Apr 2, 2015 at 12:47 PM, Pietro Pugni pietro.pu...@gmail.com wrote:
 Hi Jeff,
 thank you for your response.
 I’m using Postgres 9.0 on MacMini because I’ve noticed that it’s quite fast
 compared to different Ubuntu machines on which I’ve worked with different
 (and more performant) hardware.
 The built-in Postgres version on OS X Server is impossible to update. I
 should stop it and install a parallel and independent distribution which has
 not been optimized by Apple. On opensource.appel.com they have different
 Postgres versions but the latest one is 9.2.x. They stopped updating it in
 2012.
 If you want you can compile 9.0 on OSX and double check.
 I don't remember well but ITSM that a fsync used by psql was a noop on  OSX.
 
You’re referring to disk scheduler? I’ve tried to change it on T420 with no 
significant variations over performance.
I’ve also tried different fsync options with no improvements.

 pg_test_fsync tells me that T420 disk iops are ~7 times faster than MacMini,
 which is ok, but queries run ~2-5 times slower (for brevity I didn’t report
 all test results in my first mail).
 
 
 I’ve searched just now what a collation is because I’ve never explicitly
 used one before, so I think it uses the default one.
 
 What's the output of free and sysctl -a | grep vm.zone_reclaim_mode
 
 Search the mailing list for zone_reclaim_mode there's some tips.
 
vm.zone_reclaim_mode = 0

I’ve also set these options in /etc/sysctl.conf:
kernel.shmmax=68719476736
kernel.shmall=16777216
vm.overcommit_memory=2
vm.overcommit_ratio=90

I’ll search the mailing list.

 For testing you can also use the mac mini config with the dell, at
 least it should give you the same plan.
 With your example disks don't seem to matter, it's all in memory.
The same transaction took 106s on MacMini; 129s on T420 with my optimized 
configuration; 180s on T420 using MacMini configuration.
Query plans for B_1 and B_2 queries with the two configurations on T420:

T420 with optimal postgresql.conf
Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM
Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06


T420 with MacMini postgresql.conf
Query B_1 [51280.208ms + 0.699ms] http://explain.depesz.com/s/wlb
Query B_2 [177278.205ms + 0.428ms] http://explain.depesz.com/s/rzr

MacMini
Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx
Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk


 Keep in mind that a psql query is still single thread so the mac and
 the dell should get more or less the same speed for in memory queries.
Yes I know ;) With 128GB I try to maximize RAM usage, but it’s difficult to 
fully understand how to achieve this.

Thank you again,
 Pietro

Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Pietro Pugni
Hi Aidan,

  
 T420
 work_mem = 512MB
  
 MacMini
 work_mem = 32MB
 
 So that is why the T420 does memory sorts and the mini does disk sorts.
 
 I'd start looking at why memory sorts on the T420 is so slow.   Check your 
 numa settings, etc (as already mentioned).
 
 For a drastic test, disable the 2nd socket on the dell, and just use one 
 (eliminate any numa/QPI costs) and see how it compares to the no-numa MacMini.
 

the command 
dmesg | grep -i numa
doesn’t display me anything. I think T420 hasn’t NUMA on it. Is there a way to 
enable it from Ubuntu? I don’t have immediate access to BIOS (server is in 
another location).
For QPI I don’t know what to do. Please, can you give me more details?

 If you want to see how bad the NUMA/QPI is, play with stream to benchmark 
 memory performance.


With stream you refer to this: 
https://sites.utexas.edu/jdm4372/tag/stream-benchmark/ ? Do you suggest me some 
way to do this kind of tests?

Thank you very much 
 Pietro



Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-02 Thread Aidan Van Dyk
On Thu, Apr 2, 2015 at 9:23 AM, Pietro Pugni pietro.pu...@gmail.com wrote:


 the command
 dmesg | grep -i numa
 doesn’t display me anything. I think T420 hasn’t NUMA on it. Is there a
 way to enable it from Ubuntu? I don’t have immediate access to BIOS (server
 is in another location).


NUMA stands for Non-Uniform-Memory-Access .  It's basically the label
for systems which have memory attached to different cpu sockets, such that
accessing all of the memory from a paritciular cpu thread has different
costs based on where the actual memory is located (i.e. on some other
socket, or the local socket).


 For QPI I don’t know what to do. Please, can you give me more details?


QPI is the the intel QuickPath Interconnect. It's the communication path
between CPU sockets.   Memory ready by one cpu thread that has to come from
another cpu socket's memory controller goes through QPI.

Google has lots of info on these, and how they impact performance, etc.

 If you want to see how bad the NUMA/QPI is, play with stream to benchmark
 memory performance.


 With stream you refer to this:
 https://sites.utexas.edu/jdm4372/tag/stream-benchmark/ ? Do you suggest
 me some way to do this kind of tests?


Ya, that's the one.  I don't have specific tests in mind.

A more simple overview might be numactl --hardware

a.