Re: [PERFORM] select with max functions

2017-10-03 Thread Mark Kirkwood

On 03/10/17 04:29, Tom Lane wrote:

Mariel Cherkassky  writes:

explain analyze   SELECT Ma.User_Id,
   COUNT(*) COUNT
FROM   Manuim Ma
WHERE  Ma.Bb_Open_Date  =
   (SELECT Bb_Open_Date
FROM   Manuim Man
WHERE  Man.User_Id = Ma.User_Id order
by   bb_open_date desc limit 1
   )
GROUP  BY Ma.User_Id
HAVING COUNT(*) > 1;

The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of

SELECT Ma.User_Id,
COUNT(*) COUNT
FROM   Manuim Ma,
   (SELECT User_Id, max(Bb_Open_Date) as max
FROM   Manuim Man
GROUP BY User_Id) ss
WHERE  Ma.User_Id = ss.User_Id AND
   Ma.Bb_Open_Date = ss.max
GROUP  BY Ma.User_Id
HAVING COUNT(*) > 1;

This is still not going to be instantaneous, but it might be better.

It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.

regards, tom lane




Further ideas based on Tom's rewrite: If that MAX is still expensive it 
might be worth breaking



SELECT User_Id, max(Bb_Open_Date) as max
   FROM   Manuim Man
   GROUP BY User_Id

out into a VIEW, and considering making it MATERIALIZED, or creating an 
equivalent  trigger based summary table (there are examples in the docs 
of how to do this).


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Odd sudden performance degradation related to temp object churn

2017-08-21 Thread Mark Kirkwood

On 19/08/17 13:49, Mark Kirkwood wrote:




On 19/08/17 02:21, Jeremy Finzel wrote:
On Tue, Aug 15, 2017 at 12:07 PM, Scott Marlowe 
<scott.marl...@gmail.com <mailto:scott.marl...@gmail.com>> wrote:


So do iostat or iotop show you if / where your disks are working
hardest? Or is this CPU overhead that's killing performance?


Sorry for the delayed reply. I took a look in more detail at the 
query plans from our problem query during this incident. There are 
actually 6 plans, because there were 6 unique queries.  I traced one 
query through our logs, and found something really interesting. That 
is that all of the first 5 queries are creating temp tables, and all 
of them took upwards of 500ms each to run.  The final query, however, 
is a simple select from the last temp table, and that query took 
0.035ms!  This really confirms that somehow, the issue had to do with 
/writing /to the SAN, I think.  Of course this doesn't answer a whole 
lot, because we had no other apparent issues with write performance 
at all.


I also provide some graphs below.


Hi, graphs for latency (or await etc) might be worth looking at too - 
sometimes the troughs between the IO spikes are actually when the 
disks have been overwhelmed with queued up pending IOs...





Sorry - I see you *did* actually have iowait in there under your CPU 
graph...which doesn't look to be showing up a lot of waiting. However 
still might be well worth getting graphs showing per device waits and 
utilizations.


regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-08-19 Thread Mark Kirkwood

Nice!

Pleased that the general idea worked well for you!

I'm also relieved that you did not follow my recommendation exactly - 
I'm been trialling a Samsung 960 Evo (256GB) and Intel 600p (256GB) and 
I've stumbled across the serious disadvantages of (consumer) M.2 drives 
using TLC NAND - terrible sustained write performance! While these guys 
can happily do ~ 2GB/s reads, their write performance is only 'burst 
capable'. They have small SLC NAND 'write caches' that do  ~1GB/s for a 
*limited time* (10-20s) and after that you get ~ 200 MB/s! Ouch - my old 
Crucial 550 can do 350 MB/s sustained writes (so two of them in RAID0 
are doing 700 MB/s for hours).


Bigger capacity drives can do better - but overall I'm not that 
impressed with the current trend of using TLC NAND.


regards

Mark


On 21/07/17 00:50, Charles Nadeau wrote:

Mark,

I received yesterday a second server having 16 drives bays. Just for a 
quick trial, I used 2 old 60GB SSD (a Kingston V300 and a ADATA SP900) 
to build a RAID0. To my surprise, my very pecky RAID controller (HP 
P410i) recognised them without a fuss (although as SATAII drives at 
3Gb/s. A quick fio benchmark gives me 22000 random 4k read IOPS, more 
than my 5 146GB 10k SAS disks in RAID0). I moved my most frequently 
used index to this array and will try to do some benchmarks.
Knowing that SSDs based on SandForce-2281 controller are recognised by 
my server, I may buy a pair of bigger/newer ones to put my tables on.


Thanks!

Charles

On Sat, Jul 15, 2017 at 1:57 AM, Mark Kirkwood 
<mark.kirkw...@catalyst.net.nz <mailto:mark.kirkw...@catalyst.net.nz>> 
wrote:


Thinking about this a bit more - if somewhat more blazing
performance is needed, then this could be achieved via losing the
RAID card and spinning disks altogether and buying 1 of the NVME
or SATA solid state products: e.g

- Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds
and 200K IOPS)

- Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)


The Samsung needs an M.2 port on the mobo (but most should have
'em - and if not PCIe X4 adapter cards are quite cheap). The Intel
is a bit more expensive compared to the Samsung, and is slower but
has a longer lifetime. However for your workload the Sammy is
probably fine.

regards

Mark

On 15/07/17 11:09, Mark Kirkwood wrote:

Ah yes - that seems more sensible (but still slower than I
would expect for 5 disks RAID 0).




-- 
Sent via pgsql-performance mailing list

(pgsql-performance@postgresql.org
<mailto:pgsql-performance@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
<http://www.postgresql.org/mailpref/pgsql-performance>




--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Odd sudden performance degradation related to temp object churn

2017-08-18 Thread Mark Kirkwood



On 19/08/17 02:21, Jeremy Finzel wrote:
On Tue, Aug 15, 2017 at 12:07 PM, Scott Marlowe 
> wrote:


So do iostat or iotop show you if / where your disks are working
hardest? Or is this CPU overhead that's killing performance?


Sorry for the delayed reply. I took a look in more detail at the query 
plans from our problem query during this incident. There are actually 
6 plans, because there were 6 unique queries.  I traced one query 
through our logs, and found something really interesting. That is that 
all of the first 5 queries are creating temp tables, and all of them 
took upwards of 500ms each to run.  The final query, however, is a 
simple select from the last temp table, and that query took 0.035ms!  
This really confirms that somehow, the issue had to do with /writing 
/to the SAN, I think.  Of course this doesn't answer a whole lot, 
because we had no other apparent issues with write performance at all.


I also provide some graphs below.


Hi, graphs for latency (or await etc) might be worth looking at too - 
sometimes the troughs between the IO spikes are actually when the disks 
have been overwhelmed with queued up pending IOs...


Also SANs are notorious for this sort of thing - typically they have a 
big RAM cache that you are actually writing to, and everything is nice 
and fast until your workload (along with everyone else's) fills up the 
cache and then performance drops of a cliff for a while (I've seen SAN 
disks with iostat utilizations of 105% <-- Lol... and await numbers that 
scroll off the page in that scenario)!


regards
Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-15 Thread Mark Kirkwood
Right, that is a bit of a show stopper for those SSD (the Intel needs 
SATA 6Gb/s and the Sammy's need PCIe 3.0 to perform to their rated specs).


regards

Mark


On 16/07/17 04:12, Charles Nadeau wrote:

Mark,

The server is a . It doesn't really work with SATA drives. And when 
you find one that is compatible, it is only used at 3Gb/s with a 
maximum of 5 IOPS (a well know caracteristic of the HP P410i SAS 
RAID controller). I am looking at getting a Kingston Digital HyperX 
Predator that I could use in one of the PCIe 2.0 x4 slot. However I am 
worried about the "thermal runaway", i.e. when the server can't get a 
temperature reading from a PCIe card, it spins the fans at full speed 
to protect the server against high temperature. The machine being next 
to my desk I worry about the deafening noise it will create.

Thanks!

Chales

On Sat, Jul 15, 2017 at 1:57 AM, Mark Kirkwood 
<mark.kirkw...@catalyst.net.nz <mailto:mark.kirkw...@catalyst.net.nz>> 
wrote:


Thinking about this a bit more - if somewhat more blazing
performance is needed, then this could be achieved via losing the
RAID card and spinning disks altogether and buying 1 of the NVME
or SATA solid state products: e.g

- Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds
and 200K IOPS)

- Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)


The Samsung needs an M.2 port on the mobo (but most should have
'em - and if not PCIe X4 adapter cards are quite cheap). The Intel
is a bit more expensive compared to the Samsung, and is slower but
has a longer lifetime. However for your workload the Sammy is
probably fine.

regards

Mark

    On 15/07/17 11:09, Mark Kirkwood wrote:

Ah yes - that seems more sensible (but still slower than I
would expect for 5 disks RAID 0).




-- 
Sent via pgsql-performance mailing list

(pgsql-performance@postgresql.org
<mailto:pgsql-performance@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
<http://www.postgresql.org/mailpref/pgsql-performance>




--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Mark Kirkwood
Thinking about this a bit more - if somewhat more blazing performance is 
needed, then this could be achieved via losing the RAID card and 
spinning disks altogether and buying 1 of the NVME or SATA solid state 
products: e.g


- Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds and 
200K IOPS)


- Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)


The Samsung needs an M.2 port on the mobo (but most should have 'em - 
and if not PCIe X4 adapter cards are quite cheap). The Intel is a bit 
more expensive compared to the Samsung, and is slower but has a longer 
lifetime. However for your workload the Sammy is probably fine.


regards

Mark

On 15/07/17 11:09, Mark Kirkwood wrote:
Ah yes - that seems more sensible (but still slower than I would 
expect for 5 disks RAID 0).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Mark Kirkwood
Ah yes - that seems more sensible (but still slower than I would expect 
for 5 disks RAID 0). You should be able to get something like 5 * 
(single disk speed) i.e about 500MB/s.


Might be worth increasing device read ahead (more than you have 
already). Some of these so-called 'smart' RAID cards need to be hit over 
the head before they will perform. E.g: I believe you have it set to 128 
- I'd try 4096 or even 16384 (In the past I've used those settings on 
some extremely stupid cards that refused to max out their disks known 
speeds).


Also worth investigating is RAID stripe size - for DW work it makes 
sense for it to be reasonably big (256K to 1M), which again will help 
speed is sequential scans.


Cheers

Mark


On 15/07/17 02:09, Charles Nadeau wrote:

Mark,

First I must say that I changed my disks configuration from 4 disks in 
RAID 10 to 5 disks in RAID 0 because I almost ran out of disk space 
during the last ingest of data.

Here is the result test you asked. It was done with a cold cache:

flows=# \timing
Timing is on.
flows=# explain select count(*) from flows;
  QUERY PLAN

---
 Finalize Aggregate  (cost=17214914.09..17214914.09 rows=1 width=8)
   ->  Gather  (cost=17214914.07..17214914.09 rows=1 width=8)
 Workers Planned: 1
 ->  Partial Aggregate  (cost=17213914.07..17213914.07
rows=1 width=8)
   ->  Parallel Seq Scan on flows
 (cost=0.00..17019464.49 rows=388899162 width=0)
(5 rows)

Time: 171.835 ms
flows=# select pg_relation_size('flows');
 pg_relation_size
--
 129865867264
(1 row)

Time: 57.157 ms
flows=# select count(*) from flows;
LOG:  duration: 625546.522 ms  statement: select count(*) from flows;
   count
---
 589831190
(1 row)

Time: 625546.662 ms

The throughput reported by Postgresql is almost 198MB/s, and the 
throughput as mesured by dstat during the query execution was between 
25 and 299MB/s. It is much better than what I had before! The i/o wait 
was about 12% all through the query. One thing I noticed is the 
discrepency between the read throughput reported by pg_activity and 
the one reported by dstat: pg_activity always report a value lower 
than dstat.


Besides the change of disks configuration, here is what contributed 
the most to the improvment of the performance so far:


Using Hugepage
Increasing effective_io_concurrency to 256
Reducing random_page_cost from 22 to 4
Reducing min_parallel_relation_size to 512kB to have more workers
when doing sequential parallel scan of my biggest table


Thanks for recomending this test, I now know what the real throughput 
should be!


Charles

On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood 
<mark.kirkw...@catalyst.net.nz <mailto:mark.kirkw...@catalyst.net.nz>> 
wrote:


Hmm - how are you measuring that sequential scan speed of 4MB/s?
I'd recommend doing a very simple test e.g, here's one on my
workstation - 13 GB single table on 1 SATA drive - cold cache
after reboot, sequential scan using Postgres 9.6.2:

bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
 QUERY PLAN


 Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
   ->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00
rows=1 width=0)
(2 rows)


bench=#  SELECT pg_relation_size('pgbench_accounts');
 pg_relation_size
--
  13429514240
(1 row)

bench=# SELECT count(*) FROM pgbench_accounts;
   count
---
 1
(1 row)

Time: 118884.277 ms


So doing the math seq read speed is about 110MB/s (i.e 13 GB in
120 sec). Sure enough, while I was running the query iostat showed:

Device: rrqm/s   wrqm/s r/s w/srMB/s wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm %util
sda   0.00 0.00  926.000.00 114.89  0.00 
 254.10 1.902.032.030.00   1.08 100.00



So might be useful for us to see something like that from your
system - note you need to check you really have flushed the cache,
and that no other apps are using the db.

regards

Mark


On 12/07/17 00:46, Charles Nadeau wrote:

After reducing random_page_cost to 4 and testing more, I can
report that the aggregate read throughput for parallel
sequential scan is about 90MB/s. However the throughput for
sequential scan is still around 4MB/s.





--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to

Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Mark Kirkwood
Hmm - how are you measuring that sequential scan speed of 4MB/s? I'd 
recommend doing a very simple test e.g, here's one on my workstation - 
13 GB single table on 1 SATA drive - cold cache after reboot, sequential 
scan using Postgres 9.6.2:


bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
 QUERY PLAN

 Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
   ->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00 
rows=1 width=0)

(2 rows)


bench=#  SELECT pg_relation_size('pgbench_accounts');
 pg_relation_size
--
  13429514240
(1 row)

bench=# SELECT count(*) FROM pgbench_accounts;
   count
---
 1
(1 row)

Time: 118884.277 ms


So doing the math seq read speed is about 110MB/s (i.e 13 GB in 120 
sec). Sure enough, while I was running the query iostat showed:


Device: rrqm/s   wrqm/s r/s w/srMB/s wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.00  926.000.00 114.89 0.00   
254.10 1.902.032.030.00   1.08 100.00



So might be useful for us to see something like that from your system - 
note you need to check you really have flushed the cache, and that no 
other apps are using the db.


regards

Mark

On 12/07/17 00:46, Charles Nadeau wrote:
After reducing random_page_cost to 4 and testing more, I can report 
that the aggregate read throughput for parallel sequential scan is 
about 90MB/s. However the throughput for sequential scan is still 
around 4MB/s.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can postgresql plan a query using multiple CPU cores?

2017-05-22 Thread Mark Kirkwood
On 23/05/17 08:21, Clemens Eisserer wrote:
> Hi,
>
> I have a letancy-sensitive legacy application, where the time consumed
> by query planning was always causing some headaches.
> Currently it is running on postgresql-8.4 - will postgresql-10 support
> generating plans using multiple CPU cores to reduce the time required
> to generate a single plan?
>
> Thank you in advance and best regards, Clemens
>
>
Hi,

Might be worthwhile posting an example (query + EXPLAIN ANALYZE etc), so
we can see what type of queries are resulting in long plan times.

Cheers

Mark



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Size of Temporary tablespace is increasing very much in postgresql 9.1.

2016-12-16 Thread Mark Kirkwood

On 15/12/16 23:28, Dinesh Chandra 12108 wrote:


Dear expert,

In postgreSQL-9.1,the size of *pgsql_tmp* inside tablespace (Temp 
tablespace) is increased by 544G in one day.


However, the DBsize is as usual but tablespace size is getting increased.

Could you please suggest why it is happening ?




That is due to queries doing sorts or (hash) joins. You can log which 
queries are doing this with the log_temp_files parameter.


Now it might be that this is just normal/expected (e.g complex data 
warehouse style workload), but it could also be many small queries that 
might benefit from some additional indexes (logging the queries will 
help you decide what if anything needs to be done).


regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql 8.4 optimize for heavy select load

2016-09-19 Thread Mark Kirkwood

On 19/09/16 19:40, Job wrote:


Hello,

i would please like to have some suggestions to optimize Postgres 8.4 for a 
very heavy number of select (with join) queries.
The queries read data, very rarely they write.



We probably need to see schema and query examples to help you (with 
EXPLAIN ANALYZE output). Also - err 8.4 - I (and others probably) will 
recommend you upgrade to a more recent (and supported for that matter) 
version - currently 9.5/9.6 - lots of performance improvements you are 
missing out on!


Best wishes

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to find disk IOs for a Query?

2016-09-01 Thread Mark Kirkwood

On 01/09/16 17:56, Mark Kirkwood wrote:

 the other way to attack this is to trace your backend postgres 
process (err perfmon...no idea how to do this on windows...)


No idea why I thought you were on windows (maybe was reading another 
message just before yours) - sorry!




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to find disk IOs for a Query?

2016-08-31 Thread Mark Kirkwood

On 01/09/16 10:01, Bobby Mozumder wrote:


Is it possible to find the number of disk IOs performed for a query?  EXPLAIN 
ANALYZE looks like it shows number of sequential rows scanned, but not number 
of IOs.

My database is on an NVMe SSD, and am trying to cut microseconds of disk IO per 
query by possibly denormalizing.




Try EXPLAIN (ANALYZE, BUFFERS) e.g:

bench=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM pgbench_accounts 
WHERE bid=1;

QUERY PLAN



 Finalize Aggregate  (cost=217118.90..217118.91 rows=1 width=8) (actual 
time=259

.723..259.723 rows=1 loops=1)
   Buffers: shared hit=2370 read=161727
   ->  Gather  (cost=217118.68..217118.89 rows=2 width=8) (actual 
time=259.686..

259.720 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=2370 read=161727
 ->  Partial Aggregate (cost=216118.68..216118.69 rows=1 
width=8) (actu

al time=258.473..258.473 rows=1 loops=3)
   Buffers: shared hit=2208 read=161727
   ->  Parallel Seq Scan on pgbench_accounts  
(cost=0.00..216018.33

rows=40139 width=0) (actual time=0.014..256.820 rows=3 loops=3)
 Filter: (bid = 1)
 Rows Removed by Filter: 330
 Buffers: shared hit=2208 read=161727
 Planning time: 0.044 ms
 Execution time: 260.357 ms
(14 rows)

...shows the number of (8k unless you've changed it) pages read from 
disk or cache. Now this might not be exactly what you are after - the 
other way to attack this is to trace your backend postgres process (err 
perfmon...no idea how to do this on windows...) and count read and write 
calls.


regards

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance problems with 9.2.15

2016-07-22 Thread Mark Kirkwood

On 22/07/16 13:07, Johan Fredriksson wrote:

And by the way, I have also tried to upgrade to Postgresql 9.4.8 (the latest 
version in postgresl.org's own repository) without improvment.



Not sure what repo you are using, but 9.5.3 and 9.6 Beta are the 
*actual* latest versions. Now I'm not sure they will actually help your 
particular query, but are probably worth a try out!


regards

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning guidelines for server with 256GB of RAM and SSDs?

2016-07-07 Thread Mark Kirkwood

On 08/07/16 02:09, Wes Vaske (wvaske) wrote:

?The Crucial drive does not have power loss protection. The Samsung drive does.


(The Crucial M550 has capacitors to protect data that's already been written to the 
device but not the entire cache. For instance, if data is read from the device during a 
garbage collection operation, the M550 will protect that data instead of introducing 
corruption of old data. This is listed as "power loss protection" on the spec 
sheet but it's not the level of protection that people on this list would expect from a 
drive)



Yes - the MX200 board (see):

http://www.anandtech.com/show/9258/crucial-mx200-250gb-500gb-1tb-ssd-review

looks to have the same sort of capacitors that the M550 uses, so not 
ideal for db or transaction logs!


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning guidelines for server with 256GB of RAM and SSDs?

2016-07-06 Thread Mark Kirkwood

On 06/07/16 07:17, Mkrtchyan, Tigran wrote:

Hi,

We had a similar situation and the best performance was with 64MB
background_bytes and 512 MB dirty_bytes.

Tigran.

On Jul 5, 2016 16:51, Kaixi Luo  wrote:


 Here are my server specs:

 RAID1 - 2x480GB Samsung SSD with power loss protection (will be used to
 store the PostgreSQL database)
 RAID1 - 2x240GB Crucial SSD with power loss protection. (will be used to
 store PostgreSQL transactions logs)



Can you tell the exact model numbers for the Samsung and Crucial SSD's? 
It typically matters! E.g I have some Crucial M550 that have capacitors 
and (originally) claimed to be power off safe, but with testing have 
been shown to be not really power off safe at all. I'd be dubious about 
Samsungs too.


The Intel Datacenter range (S3700 and similar) are known to have power 
off safety that does work.


regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hyperthreadin low performance

2015-07-24 Thread Mark Kirkwood

On 23/07/15 23:37, domenico febbo wrote:

is the problem also in PostgreSQL 9.4.x?
I'm going to buy a production's server with 4 sockets E7-4850 12 cores
so 12*4 = 48 cores (and 96 threads using HT).

What do you suggest?
Using or not HT?



From my experience 9.4 is considerably better (we are using it on the 
60 core box mentioned prev).


48 cores should be fine, enabling HT and asking Postgres to effectively 
handle 96 could provoke issues. However it is reasonably easy to test - 
tune shared_buffers and checkpoint segments sensibly and run pgbench for 
a steadily increasing number of clients. With 48 cores you should 
(hopefully) see a tps curve that increases and then gently flattens off 
somewhere. If 96 cores are too many then you will see a tps curve that 
initially increases then sharply drops.


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hyperthreadin low performance

2015-07-21 Thread Mark Kirkwood

On 21/07/15 20:04, David Rowley wrote:

On 21 July 2015 at 14:59, Jeison Bedoya Delgado
jeis...@audifarma.com.co mailto:jeis...@audifarma.com.co wrote:

hi everyone,

Recently update a database to machine with RHEL7, but i see that the
performance is betther if the hyperthreading tecnology is
deactivated and use only 32 cores.

is normal that the machine performance is better with 32 cores that
64 cores?.


You might be interested in
http://www.postgresql.org/message-id/53f4f36e.6050...@agliodbs.com



However I do wonder if we have been misinterpreting these tests. We tend 
to assume the position of see hyperthreading is bad, switch it off.


The linked post under the one above:

http://www.postgresql.org/message-id/53ed371d@catalyst.net.nz

shows that 60 core (no hyperthreading) performance is also pessimal, 
leading me to conclude that *perhaps* it is simply the number of cores 
that is the problem - particularly as benchmark results for single 
socket cpus clearly show hyperthreading helps performance...


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-24 Thread Mark Kirkwood
On 24/06/15 09:05, Jim Nasby wrote:
 On 6/19/15 9:57 AM, Ian Pushee wrote:


 On 6/19/2015 10:47 AM, Andreas Kretschmer wrote:
 Explain Analyze outputs (links as requested):
 Default plan: http://explain.depesz.com/s/ib3k
 Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

 Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
 All pgsql settings are at their defaults.
 increase work_mem. per session via set work_mem  = 'xxxMB'; or in
 postgresql.conf, reload.



 Hi Andreas,

 The number of rows in the events table isn't constrained, so
 unfortunately it isn't feasible to set work_mem high enough to allow an
 in-memory sort. Forcing the planner to use the index works to produce a
 fast query, so I'm wondering if there is a more general way to getting
 the planner to take into account that work_mem isn't big enough to fit
 the query which will result in a MUCH more costly external merge.
 
 What Andreas is saying is the reason the sort is so expensive is because
 it spilled to disk. If you don't have enough memory to do the sort
 in-memory, then you probably don't have enough memory to buffer the
 table either, which means the index scan is going to be a LOT more
 expensive than a sort.
 
 That said, the better your IO system is the lower you need to set
 random_page_cost. With a good raid setup 2.0 is a good starting point,
 and I've run as low as 1.1. I've never run a system on all SSD, but I've
 heard others recommend setting it as low as 1.0 on an all SSD setup.
 
 It's also worth noting that there's some consensus that the optimizer is
 generally too eager to switch from an index scan to a seqscan.


Mind you, this eagerness could be caused by the OP having
effective_cache_size set to the default. This should be changed (set to
a few GB...)!

Cheers

Mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to reduce writing on disk ? (90 gb on pgsql_tmp)

2015-06-03 Thread Mark Kirkwood

On 04/06/15 12:58, Scott Marlowe wrote:

On Wed, Jun 3, 2015 at 6:53 PM, Scott Marlowe scott.marl...@gmail.com wrote:

On Wed, Jun 3, 2015 at 4:29 PM, Joshua D. Drake j...@commandprompt.com wrote:


On 06/03/2015 03:16 PM, Tomas Vondra wrote:


What is more important, though, is the amount of memory. OP reported the
query writes ~95GB of temp files (and dies because of full disk, so
there may be more). The on-disk format is usually more compact than the
in-memory representation - for example on-disk sort often needs 3x less
space than in-memory qsort. So we can assume the query needs 95GB of
data. Can you explain how that's going to fit into the 64GB RAM?


Cache is free memory. If you think of it any other way when you're
looking at memory usage and pressure on theings like swap you're
gonna make some bad decisions.



Cache is not free memory - it's there for a purpose and usually plays a
significant role in performance. Sure, it may be freed and used for
other purposes, but that has consequences - e.g. it impacts performance
of other queries etc. You generally don't want to do that on production.



Exactly. If your cache is reduced your performance is reduced because less
things are in cache. It is not free memory. Also the command free is not
useful in this scenario. It is almost always better to use sar so you can
see where the data points are that free is using.


But if that WAS happening he wouldn't still HAVE 60G of cache! That's
my whole point. He's NOT running out of memory. He's not even having
to dump cache right now.


Further if he started using a few gig here for this one it wouldn't
have a big impact on cache (60G-1G etc) but might make it much faster,
as spilling to disk is a lot less intrusive when you've got a bigger
chunk of ram to work in. OTOH doing something like setting work_mem to
60G would likely be fatal.

But he's not down to 3GB of memory by any kind of imagination. Any
working machine will slowly, certainly fill its caches since it's not
using the memory for anything else. That's normal. As long as you're
not blowing out the cache you're fine.




I agree with Scott's analysis here.

It seems to me that the issue is the query(s) using too much disk space. 
As others have said, it may not be practical to up work_mem to the point 
where is all happens in memory...so probably need to:


- get more disk or,
- tweak postgres params to get a less disk hungry plan (need to see that 
explain analyze)!


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Survey: Max TPS you've ever seen

2015-02-12 Thread Mark Kirkwood

On 13/02/15 00:20, Gudmundsson Martin (mg) wrote:

Hi all!


- checkpoint_segments 1000
- checkpoint_completion_target 0.9
- wal_buffers  256MB
- shared_buffers 31 gb
- max_connections 500


I see that some of you are using wal_buffers = 256MB.
I was under the impression that Postgres will not benefit from higher value 
than the segment size, i.e. 16MB. More than that will not do/help anything.

What's the reasoning behind setting it to higher than 16MB? Do I have old 
information?

Best regards, Martin



There was some discussion a while ago in which 32MB and 8MB both 
demonstrated better performance than 16MB (probably related to the fact 
the the default wal file size is 16MB). We just experimented further 
with bigger values, and saw some improvement.


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Survey: Max TPS you've ever seen

2015-02-10 Thread Mark Kirkwood

On 10/02/15 10:29, Gavin Flower wrote:

On 10/02/15 08:30, Luis Antonio Dias de Sá Junior wrote:

Hi,

A survay: with pgbench using TPS-B, what is the maximum TPS you're
ever seen?

For me: 12000 TPS.

--
Luis Antonio Dias de Sá Junior

Important to specify:

1. O/S
2. version of PostgreSQL
3. PostgreSQL configuration
4. hardware configuration
5. anything else that might affect performance

I suspect that Linux will out perform Microsoft on the same hardware,
and optimum configuration for both O/S's...




Yes, exactly - and also the pgbench parameters:

- scale
- number of clients
- number of threads
- statement options (prepared or simple etc)
- length of test

We've managed to get 4 to 6 TPS on some pretty serious hardware:

- 60 core, 1 TB ram
- 16 SSD + 4 PCIe SSD storage
- Ubuntu 14.04
- Postgres 9.4 (beta and rc)

...with Postgres parameters customized:

- checkpoint_segments 1920
- checkpoint_completion_target 0.8
- wal_buffers  256MB
- wal_sync_method open_datasync
- shared_buffers 10GB
- max_connections 600
- effective_io_concurrency 10

..and finally pgbench parameters

- scale 2000
- clients 32, 64, 128, 256 (best results at 32 and 64 generally)
- threads = 1/2 client number
- prepared option
- 10 minute test run time

Points to note, we did *not* disable fsync or prevent buffers being 
actually written (common dirty tricks in benchmarks). However, as others 
have remarked - raw numbers mean little. Pgbench is very useful for 
testing how tuning configurations are helping (or not) for a particular 
hardware and software setup, but is less useful for answering the 
question how many TPS can postgres do...


Regards

Mark





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange choice of general index over partial index

2015-01-15 Thread Mark Kirkwood

On 16/01/15 11:30, Josh Berkus wrote:

This is an obfuscation and mock up, but:

table files (
id serial pk,
filename text not null,
state varchar(20) not null
... 18 more columns
)

index file_state on (state)
(35GB in size)
index file_in_flight_state (state) where state in (
'waiting','assigning', 'processing' )
(600MB in size)
... 10 more indexes

More important facts:
* state = 'done' 95% of the time.  thereform the partial index
represents only 5% of the table
* all indexes and the table are very bloated
* server has 128GB RAM
* Version 9.2.

Given this setup, I would expect the planner to *always* choose
file_in_flight_state over file_state for this query:

SELECT id, filename FROM files WHERE state = 'waiting';

... and yet it keeps selecting file_state based on extremely small
changes to the stats.   This is important because the same query, using
file_state, is 20X to 50X slower, because that index frequently gets
pushed out of memory.

What am I missing?  Or is this potentially a planner bug for costing?



Are you seeing a bitmapscan access plan? If so see if disabling it gets 
you a plan on the files_in_flight index. I'm seeing this scenario with a 
fake/generated dataset a bit like yours in 9.2 (9.5 uses the 
files_in_flight w/o any coercing).


regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange choice of general index over partial index

2015-01-15 Thread Mark Kirkwood

On 16/01/15 13:37, Mark Kirkwood wrote:

On 16/01/15 11:30, Josh Berkus wrote:

This is an obfuscation and mock up, but:

table files (
id serial pk,
filename text not null,
state varchar(20) not null
... 18 more columns
)

index file_state on (state)
(35GB in size)
index file_in_flight_state (state) where state in (
'waiting','assigning', 'processing' )
(600MB in size)
... 10 more indexes

More important facts:
* state = 'done' 95% of the time.  thereform the partial index
represents only 5% of the table
* all indexes and the table are very bloated
* server has 128GB RAM
* Version 9.2.

Given this setup, I would expect the planner to *always* choose
file_in_flight_state over file_state for this query:

SELECT id, filename FROM files WHERE state = 'waiting';

... and yet it keeps selecting file_state based on extremely small
changes to the stats.   This is important because the same query, using
file_state, is 20X to 50X slower, because that index frequently gets
pushed out of memory.

What am I missing?  Or is this potentially a planner bug for costing?



Are you seeing a bitmapscan access plan? If so see if disabling it gets
you a plan on the files_in_flight index. I'm seeing this scenario with a
fake/generated dataset a bit like yours in 9.2 (9.5 uses the
files_in_flight w/o any coercing).



FWIW: For me 9.2 and 9.3 (default config) generate plans like:
state=# EXPLAIN ANALYZE
SELECT id, filename
FROM files
WHERE state = 'processing';
QUERY PLAN 


--
 Bitmap Heap Scan on files  (cost=3102.02..89228.68 rows=164333 
width=15) (actual time=26.629..803.507 rows=166696 loops=1)

   Recheck Cond: ((state)::text = 'processing'::text)
   Rows Removed by Index Recheck: 7714304
   -  Bitmap Index Scan on file_state  (cost=0.00..3060.93 rows=164333 
width=0) (actual time=25.682..25.682 rows=166696 loops=1)

 Index Cond: ((state)::text = 'processing'::text)
 Total runtime: 808.662 ms
(6 rows)


whereas 9.4 and 9.5 get:

  QUERY 
PLAN


---
 Index Scan using file_in_flight on files  (cost=0.42..62857.39 
rows=158330 width=15) (actual time=0.055..202.732 rows=166696 loops=1)

   Index Cond: ((state)::text = 'processing'::text)
 Planning time: 24.203 ms
 Execution time: 208.926 ms
(4 rows)


This is with each version loading exactly the same dataset (generated by 
the attached scripty). Obviously this is a vast simplification of what 
Josh is looking at - but it is (hopefully) interesting that these later 
versions are doing so much better...


Cheers

Mark



gendata.pl
Description: Perl program

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange choice of general index over partial index

2015-01-15 Thread Mark Kirkwood

On 16/01/15 15:32, Mark Kirkwood wrote:

On 16/01/15 13:37, Mark Kirkwood wrote:

On 16/01/15 11:30, Josh Berkus wrote:

This is an obfuscation and mock up, but:

table files (
id serial pk,
filename text not null,
state varchar(20) not null
... 18 more columns
)

index file_state on (state)
(35GB in size)
index file_in_flight_state (state) where state in (
'waiting','assigning', 'processing' )
(600MB in size)
... 10 more indexes

More important facts:
* state = 'done' 95% of the time.  thereform the partial index
represents only 5% of the table
* all indexes and the table are very bloated
* server has 128GB RAM
* Version 9.2.

Given this setup, I would expect the planner to *always* choose
file_in_flight_state over file_state for this query:

SELECT id, filename FROM files WHERE state = 'waiting';

... and yet it keeps selecting file_state based on extremely small
changes to the stats.   This is important because the same query, using
file_state, is 20X to 50X slower, because that index frequently gets
pushed out of memory.

What am I missing?  Or is this potentially a planner bug for costing?



Are you seeing a bitmapscan access plan? If so see if disabling it gets
you a plan on the files_in_flight index. I'm seeing this scenario with a
fake/generated dataset a bit like yours in 9.2 (9.5 uses the
files_in_flight w/o any coercing).



FWIW: For me 9.2 and 9.3 (default config) generate plans like:
state=# EXPLAIN ANALYZE
SELECT id, filename
FROM files
WHERE state = 'processing';
 QUERY PLAN
--

  Bitmap Heap Scan on files  (cost=3102.02..89228.68 rows=164333
width=15) (actual time=26.629..803.507 rows=166696 loops=1)
Recheck Cond: ((state)::text = 'processing'::text)
Rows Removed by Index Recheck: 7714304
-  Bitmap Index Scan on file_state  (cost=0.00..3060.93 rows=164333
width=0) (actual time=25.682..25.682 rows=166696 loops=1)
  Index Cond: ((state)::text = 'processing'::text)
  Total runtime: 808.662 ms
(6 rows)


whereas 9.4 and 9.5 get:

   QUERY PLAN

---

  Index Scan using file_in_flight on files  (cost=0.42..62857.39
rows=158330 width=15) (actual time=0.055..202.732 rows=166696 loops=1)
Index Cond: ((state)::text = 'processing'::text)
  Planning time: 24.203 ms
  Execution time: 208.926 ms
(4 rows)


This is with each version loading exactly the same dataset (generated by
the attached scripty). Obviously this is a vast simplification of what
Josh is looking at - but it is (hopefully) interesting that these later
versions are doing so much better...



A bit more poking about shows that the major factor (which this fake 
dataset anyway) is the default for effective_cache_size (changes from 
128MB to 4GB in 9.4). Increasing this makes 9.2 start using the 
files_in_flight index in a plain index scan too.


Josh - might be worth experimenting with this parameter.

regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange choice of general index over partial index

2015-01-15 Thread Mark Kirkwood

On 16/01/15 16:06, Mark Kirkwood wrote:


A bit more poking about shows that the major factor (which this fake
dataset anyway) is the default for effective_cache_size (changes from
128MB to 4GB in 9.4). Increasing this makes 9.2 start using the
files_in_flight index in a plain index scan too.



Arrg - misread the planner outputin 9.2 what changes is a plan that 
uses an index scan on the *file_state* index (not 
files_in_flight)...which appears much faster than the bitmap scan on 
file_state. Apologies for the confusion.


I'm thinking that I'm seeing the effect Tom has just mentioned.

regards

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange choice of general index over partial index

2015-01-15 Thread Mark Kirkwood

On 16/01/15 16:28, Josh Berkus wrote:

On 01/16/2015 04:17 PM, Mark Kirkwood wrote:

On 16/01/15 16:06, Mark Kirkwood wrote:


A bit more poking about shows that the major factor (which this fake
dataset anyway) is the default for effective_cache_size (changes from
128MB to 4GB in 9.4). Increasing this makes 9.2 start using the
files_in_flight index in a plain index scan too.



Arrg - misread the planner outputin 9.2 what changes is a plan that
uses an index scan on the *file_state* index (not
files_in_flight)...which appears much faster than the bitmap scan on
file_state. Apologies for the confusion.

I'm thinking that I'm seeing the effect Tom has just mentioned.


It's not using a bitmapscan in either case; it's a straight indexscan.




Right, I suspect that bloating is possibly the significant factor then - 
can you REINDEX?


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning the configuration

2014-12-11 Thread Mark Kirkwood

On 12/12/14 11:36, Eric Pierce wrote:



From: pgsql-performance-ow...@postgresql.org pgsql-performance-ow...@postgresql.org 
on behalf of Evgeniy Shishkin itparan...@gmail.com
Sent: Thursday, December 11, 2014 7:11 AM
To: Andrea Suisani
Cc: mfatticci...@mbigroup.it; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Tuning the configuration


On 11 Dec 2014, at 15:02, Andrea Suisani sick...@opinioni.net wrote:

On 12/10/2014 11:44 AM, Maila Fatticcioni wrote:

2- I would like to use the two SDD to store the wal file. Do you think
it is useful or how should I use them?


I definitely would give it a try.





I don't understand the logic behind using drives,
which are best for random io, for sequent io workloads.



Better use 10k sas with BBU raid for wal, money wise.


Very much agree with this.  Because SSD is fast doesn't make it suited for 
certain things, and a streaming sequential 100% write workload is one of them.  
 I've worked with everything from local disk to high-end SAN and even at the 
high end we've always put any DB logs on spinning disk.  RAID1 is generally 
sufficient.  SSD is king for read heavy random I/O workload.




Mind you wal is a little different - the limiting factor is (usually) 
not raw sequential speed but fsync latency. These days a modern SSD has 
fsync response pretty much equal to that of a card with BBU + spinners - 
and has more high speed storage available (cards usually have only a 
1G or so of RAM on them).



regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8xIntel S3500 SSD in RAID10 on Dell H710p

2014-12-10 Thread Mark Kirkwood

On 10/12/14 21:30, Strahinja Kustudić wrote:

On Wed, Dec 10, 2014 at 4:55 AM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:


That is interesting: I've done some testing on this type of card with 16
(slightly faster Hitachi) SSD attached. Setting WT and NORA should enable
the so-called 'fastpath' mode for the card [1]. We saw performance improve
markedly (300MB/s random write go to 1300MB/s).

This *might* be related to the fact that 16 SSD can put out more IOPS than
the card can actually handle - whereas your 8 S3500 is probably the perfect
number (e.g 8*11000 = 88000 which the card can handle ok).


[1] If you make the change while there are no outstanding background
operations (array rebuild etc) in progress (see
http://www.flagshiptech.com/eBay/Dell/poweredgeh310h710h810UsersGuide.pdf
).



I read that guide too, which is the reason why I tried with WT/NORA, but
the document also states:  NOTE: RAID 10, RAID 50, and RAID 60 virtual
disks cannot use FastPath. Which is a little odd, since usually if you
want performance with reliability, you go RAID10.

Do you have any suggestions what I could try to tweak to get more
performance?



We are using these configured as *individual* drives on RAID0 that are 
then md raided in a (software) RAID 10 array. Maybe try that out (as 
fastpath only cares about the HW RAID setup).


Interestingly we were also seeing better performance on a fully HW RAID 
10 array with WT/NORA...so (I guess) our Hitachi SSD probably have lower 
latency than the S3500 does.


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8xIntel S3500 SSD in RAID10 on Dell H710p

2014-12-09 Thread Mark Kirkwood

On 10/12/14 12:28, Strahinja Kustudić wrote:


  * These tests are with the H710p controller set to write-back (WB) and
with adaptive read ahead (ADRA). I ran a few tests with
write-through (WT) and no read ahead (NORA), but the results were worse.


That is interesting: I've done some testing on this type of card with 16 
(slightly faster Hitachi) SSD attached. Setting WT and NORA should 
enable the so-called 'fastpath' mode for the card [1]. We saw 
performance improve markedly (300MB/s random write go to 1300MB/s).


This *might* be related to the fact that 16 SSD can put out more IOPS 
than the card can actually handle - whereas your 8 S3500 is probably the 
perfect number (e.g 8*11000 = 88000 which the card can handle ok).



[1] If you make the change while there are no outstanding background 
operations (array rebuild etc) in progress (see 
http://www.flagshiptech.com/eBay/Dell/poweredgeh310h710h810UsersGuide.pdf).


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgtune + configurations with 9.3

2014-11-14 Thread Mark Kirkwood

On 15/11/14 06:06, Shaun Thomas wrote:

Alexey,

The issue is not that 8GB is the maximum. You *can* set it higher. What I'm 
saying, and I'm not alone in this, is that setting it higher can actually 
decrease performance for various reasons. Setting it to 25% of memory on a 
system with 512GB of RAM for instance, would be tantamount to disaster. A 
checkpoint with a setting that high could overwhelm pretty much any disk 
controller and end up  completely ruining DB performance. And that's just *one* 
of the drawbacks.



It is probably time to revisit this 8GB limit with some benchmarking. We 
don't really have a hard and fast rule that is known to be correct, and 
that makes Alexey's job really difficult. Informally folk (including 
myself at times) have suggested:


min(ram/4, 8GB)

as the 'rule of thumb' for setting shared_buffers. However I was 
recently benchmarking a machine with a lot of ram (1TB) and entirely SSD 
storage [1], and that seemed quite happy with 50GB of shared buffers 
(better performance than with 8GB). Now shared_buffers was not the 
variable we were concentrating on so I didn't get too carried away and 
try much bigger than about 100GB - but this seems like a good thing to 
come out with some numbers for i.e pgbench read write and read only tps 
vs shared_buffers 1 - 100 GB in size.


Cheers

Mark

[1] I may be in a position to benchmark the machines these replaced at 
some not to distant time. These are the previous generation (0.5TB ram, 
32 cores and all SSD storage) but probably still good for this test.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgtune + configurations with 9.3

2014-11-14 Thread Mark Kirkwood

On 15/11/14 15:08, Jim Nasby wrote:

On 11/14/14, 5:00 PM, Mark Kirkwood wrote:


as the 'rule of thumb' for setting shared_buffers. However I was
recently benchmarking a machine with a lot of ram (1TB) and entirely
SSD storage [1], and that seemed quite happy with 50GB of shared
buffers (better performance than with 8GB). Now shared_buffers was not
the variable we were concentrating on so I didn't get too carried away
and try much bigger than about 100GB - but this seems like a good
thing to come out with some numbers for i.e pgbench read write and
read only tps vs shared_buffers 1 - 100 GB in size.


What PG version?

One of the huge issues with large shared_buffers is the immense overhead
you end up with for running the clock sweep, and on most systems that
overhead is born by every backend individually. You will only see that
overhead if your database is larger than shared bufers, because you only
pay it when you need to evict a buffer. I suspect you'd actually need a
database at least 2x  shared_buffers for it to really start showing up.



That was 9.4 beta1 and2.

A variety of db sizes were tried, some just fitting inside 
shared_buffers and some a bit over 2x larger, and one variant where we 
sized the db to 600GB, and used 4,8 and 50GB shared_buffers (50 was the 
best by a small margin...and certainly no worse).


Now we were mainly looking at 60 core performance issues (see thread 60 
core performance with 9.3), and possibly some detrimental effects of 
larger shared_buffers may have been masked by this - but performance was 
certainly not hurt with larger shared_buffers.


regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-24 Thread Mark Kirkwood

On 24/09/14 21:23, Mkrtchyan, Tigran wrote:

Hi Merlin et al.

after building postgres 9.4 myself from sources I get the same performance as
with 9.3. The difference was in the value of debug_assertions setting.

Now the next step. Why my 3 years old laptop gets x1.8 times more tps than my 
one month old server?
And Mark Kirkwood's desktop gets x2 times more tps as well? Is there some 
special optimization
for i7 which does not work with Intel(R) Xeon(R) CPU E5-2660?




Yes - firstly, nicely done re finding the assertions (my 9.4 beta2 was 
built from src - never thought to mention sorry)!


I'd guess that you are seeing some bios setting re the p320 SSD - it 
*should* be seriously fast...but does not seem to be. You could try 
running some pure IO benchmarks to confirm this (e.g fio). Also see if 
the manual for however it is attached to the system allows for some 
optimized-for-ssd settings that tend to work better (altho these usually 
imply the drive is plugged into an adapter card of some kind - mind you 
your p320 *does* used a custom connector that does 2.5 SATA to PCIe 
style interconnect so I'd look to debug that first).


Cheers

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-24 Thread Mark Kirkwood

On 25/09/14 01:03, Mkrtchyan, Tigran wrote:


With pg_test_timing I can see, that overhead is 48 nsec on my server and 32 
nsec on the laptop.
what makes this difference and have it any influence on the overall performance?



Hmm - 22 nsec for my workstation, so while it could be a factor, your 
laptop and my workstation performed the pgbench about the same, so I'd 
look elsewhere - in particlular sync IO performance:



$ cd where ssd mounted
$ pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync   140.231 ops/sec7131 
usecs/op
fdatasync   138.159 ops/sec7238 
usecs/op
fsync   137.680 ops/sec7263 
usecs/op

fsync_writethrough  n/a
open_sync   137.202 ops/sec7289 
usecs/op


Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync68.832 ops/sec   14528 
usecs/op
fdatasync   135.994 ops/sec7353 
usecs/op
fsync   137.454 ops/sec7275 
usecs/op

fsync_writethrough  n/a
open_sync69.092 ops/sec   14473 
usecs/op


Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
 1 * 16kB open_sync write   136.904 ops/sec7304 
usecs/op
 2 *  8kB open_sync writes   68.857 ops/sec   14523 
usecs/op
 4 *  4kB open_sync writes   34.744 ops/sec   28782 
usecs/op

 8 *  2kB open_sync writes  write failed: Invalid argument



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-19 Thread Mark Kirkwood

On 19/09/14 17:53, Mkrtchyan, Tigran wrote:



- Original Message -

From: Mark Kirkwood mark.kirkw...@catalyst.net.nz



Further to the confusion, here's my 9.3 vs 9.4 on two M550 (one for 9.3
one for 9.4), see below for results.

I'm running xfs on them with trim/discard enabled:

$ mount|grep pg
/dev/sdd4 on /mnt/pg94 type xfs (rw,discard)
/dev/sdc4 on /mnt/pg93 type xfs (rw,discard)


I'm *not* seeing any significant difference between 9.3 and 9.4, and the
numbers are both about 2x your best number, which is food for thought
(those P320's should toast my M550 for write performance...).


cool! any details on OS and other options? I still get the same numbers
as before.



Sorry, Ubuntu 14.04 on a single socket i7 3.4 Ghz, 16G (i.e my workstation).

I saw the suggestion that Didier made to run 9.3 on the SSD that you 
were using for 9.4, and see if it suddenly goes slow - then we'd know 
it's something about the disk (or filesystem/mount options). Can you 
test this?


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-19 Thread Mark Kirkwood

On 19/09/14 19:24, Mkrtchyan, Tigran wrote:



- Original Message -

From: Mark Kirkwood mark.kirkw...@catalyst.net.nz
To: Tigran Mkrtchyan tigran.mkrtch...@desy.de
Cc: Merlin Moncure mmonc...@gmail.com, postgres performance list 
pgsql-performance@postgresql.org
Sent: Friday, September 19, 2014 8:26:27 AM
Subject: Re: [PERFORM] postgres 9.3 vs. 9.4

On 19/09/14 17:53, Mkrtchyan, Tigran wrote:



- Original Message -

From: Mark Kirkwood mark.kirkw...@catalyst.net.nz



Further to the confusion, here's my 9.3 vs 9.4 on two M550 (one for 9.3
one for 9.4), see below for results.

I'm running xfs on them with trim/discard enabled:

$ mount|grep pg
/dev/sdd4 on /mnt/pg94 type xfs (rw,discard)
/dev/sdc4 on /mnt/pg93 type xfs (rw,discard)


I'm *not* seeing any significant difference between 9.3 and 9.4, and the
numbers are both about 2x your best number, which is food for thought
(those P320's should toast my M550 for write performance...).


cool! any details on OS and other options? I still get the same numbers
as before.



Sorry, Ubuntu 14.04 on a single socket i7 3.4 Ghz, 16G (i.e my workstation).

I saw the suggestion that Didier made to run 9.3 on the SSD that you
were using for 9.4, and see if it suddenly goes slow - then we'd know
it's something about the disk (or filesystem/mount options). Can you
test this?



swapping the disks did not change the results.
Nevertheless, I run the same test on my fedora20 laptop
8GB RAM, i7 2.2GHz and got 2600tps! I am totally
confused now! Is it kernel version? libc?




Well, that's progress anyway!

I guess you could try fedora 20 on the Dell server and see if that makes 
any difference. But yes, confusing. Having been dealing with a high end 
Dell server myself recently (R920), some re-reading of any manuals you 
can find might be useful, we were continually surprised how easy it was 
to have everything configured *slow*... and the detail in the 
manuals...could be better!


Cheers

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-19 Thread Mark Kirkwood

On 19/09/14 19:24, Mkrtchyan, Tigran wrote:



- Original Message -

From: Mark Kirkwood mark.kirkw...@catalyst.net.nz
To: Tigran Mkrtchyan tigran.mkrtch...@desy.de
Cc: Merlin Moncure mmonc...@gmail.com, postgres performance list 
pgsql-performance@postgresql.org
Sent: Friday, September 19, 2014 8:26:27 AM
Subject: Re: [PERFORM] postgres 9.3 vs. 9.4

On 19/09/14 17:53, Mkrtchyan, Tigran wrote:



- Original Message -

From: Mark Kirkwood mark.kirkw...@catalyst.net.nz



Further to the confusion, here's my 9.3 vs 9.4 on two M550 (one for 9.3
one for 9.4), see below for results.

I'm running xfs on them with trim/discard enabled:

$ mount|grep pg
/dev/sdd4 on /mnt/pg94 type xfs (rw,discard)
/dev/sdc4 on /mnt/pg93 type xfs (rw,discard)


I'm *not* seeing any significant difference between 9.3 and 9.4, and the
numbers are both about 2x your best number, which is food for thought
(those P320's should toast my M550 for write performance...).


cool! any details on OS and other options? I still get the same numbers
as before.



Sorry, Ubuntu 14.04 on a single socket i7 3.4 Ghz, 16G (i.e my workstation).

I saw the suggestion that Didier made to run 9.3 on the SSD that you
were using for 9.4, and see if it suddenly goes slow - then we'd know
it's something about the disk (or filesystem/mount options). Can you
test this?



swapping the disks did not change the results.




Do you mean that 9.3 was still faster using the disk that 9.4 had used? 
If so that strongly suggests that there is something you have configured 
differently in the 9.4 installation [1]. Not wanting to sound mean - but 
it is really easy to accidentally connect to the wrong instance when 
there are two on the same box (ahem, yes , done it myself). So perhaps 
another look at the 9.4 vs 9.3 setup (or even posti the config files 
postgresql.conf + postgresql.auto.conf for 9.4 here).


Regards

Mark

[1] In the light of my previous test of (essentially) your config + 
numerous other folk have been benchmarking 9.4.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-18 Thread Mark Kirkwood

On 18/09/14 21:58, Mkrtchyan, Tigran wrote:



Hi Folk,

I am trying to investigate some performance issues which we have with postgres
(a different topic by itself) and tried postgres.9.4beta2, with a hope that it
perform better.

Turned out that 9.4 is 2x slower than 9.3.5 on the same hardware.

Some technical details:

   Host: rhel 6.5 2.6.32-431.23.3.el6.x86_64
   256 GB RAM, 40 cores, Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
   2x160GB  PCIe SSD DELL_P320h-MTFDGAL175SAH ( on one 9.3, on an other one 9.4 
)

postgres tweaks:


default_statistics_target = 100
wal_writer_delay = 10s
vacuum_cost_delay = 50
synchronous_commit = off
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
effective_cache_size = 94GB
work_mem = 402MB
wal_buffers = 16MB
checkpoint_segments = 64
shared_buffers = 8GB
max_connections = 100
random_page_cost = 1.5
# other goodies
log_line_prefix = '%m %d %u %r %%'
log_temp_files = 0
log_min_duration_statement = 5

in both cases databases are fresh - no data.

Here is a results with pgbench.


9.3.5:

# /usr/pgsql-9.3/bin/pgbench -r -j 1 -c 1 -T 60
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 96361
tps = 1605.972262 (including connections establishing)
tps = 1606.064501 (excluding connections establishing)
statement latencies in milliseconds:
0.001391\set nbranches 1 * :scale
0.000473\set ntellers 10 * :scale
0.000430\set naccounts 10 * :scale
0.000533\setrandom aid 1 :naccounts
0.000393\setrandom bid 1 :nbranches
0.000468\setrandom tid 1 :ntellers
0.000447\setrandom delta -5000 5000
0.025161BEGIN;
0.131317UPDATE pgbench_accounts SET abalance = abalance + 
:delta WHERE aid = :aid;
0.100211SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.117406UPDATE pgbench_tellers SET tbalance = tbalance + :delta 
WHERE tid = :tid;
0.114332UPDATE pgbench_branches SET bbalance = bbalance + 
:delta WHERE bid = :bid;
0.086660INSERT INTO pgbench_history (tid, bid, aid, delta, 
mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.035940END;


9.4beta2:

# /usr/pgsql-9.3/bin/pgbench -r -j 1 -c 1 -T 60
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 34017
tps = 566.948384 (including connections establishing)
tps = 567.008666 (excluding connections establishing)
statement latencies in milliseconds:
0.001879\set nbranches 1 * :scale
0.000526\set ntellers 10 * :scale
0.000490\set naccounts 10 * :scale
0.000595\setrandom aid 1 :naccounts
0.000421\setrandom bid 1 :nbranches
0.000480\setrandom tid 1 :ntellers
0.000484\setrandom delta -5000 5000
0.055047BEGIN;
0.172179UPDATE pgbench_accounts SET abalance = abalance + 
:delta WHERE aid = :aid;
0.135392SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.157224UPDATE pgbench_tellers SET tbalance = tbalance + :delta 
WHERE tid = :tid;
0.147969UPDATE pgbench_branches SET bbalance = bbalance + 
:delta WHERE bid = :bid;
0.123001INSERT INTO pgbench_history (tid, bid, aid, delta, 
mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.957854END;

any ideas?



Hi Tigran,

Some ideas:

60s is too short for reliable results (default settings for checkpoints 
is 300s so 600s is the typical elapsed time to get reasonably repeatable 
numbers (to ensure you get about 1 checkpoint in your run). In addition 
I usually do


psql !
CHECKPOINT;
!

Plus

$ sleep 10

before each run so that I've got some confidence that we are starting 
from approximately the same state each time (and getting hopefully only 
*one* checkpoint per run)!


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-18 Thread Mark Kirkwood

On 19/09/14 08:32, Merlin Moncure wrote:

On Thu, Sep 18, 2014 at 4:58 AM, Mkrtchyan, Tigran
tigran.mkrtch...@desy.de wrote:


9.3.5:
 0.035940END;


9.4beta2:
 0.957854END;



time being spent on 'END' is definitely suggesting i/o related issues.
This is making me very skeptical that postgres is the source of the
problem.   I also thing synchronous_commit is not set properly on the
new instance (or possibly there is a bug or some such).  Can you
verify via:

select * from pg_settings where name = 'synchronous_commit';

on both servers?



Yes, does look suspicious. It *could* be that the 9.4 case is getting 
unlucky and checkpointing just before the end of the 60s run, and 9.3 
isn't.



What is iowait?  For pci-e SSD, these drives don't seem very fast...





These look like rebranded Micron P320's and should be extremely 
fast...However I note that my Crucial/Micron M550's are very fast for 
most writes *but* are much slower for sync writes (and fsync) that 
happen at commit...


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-18 Thread Mark Kirkwood

On 19/09/14 09:10, Mkrtchyan, Tigran wrote:



- Original Message -

From: Mark Kirkwood mark.kirkw...@catalyst.net.nz
To: Merlin Moncure mmonc...@gmail.com, Tigran Mkrtchyan 
tigran.mkrtch...@desy.de
Cc: postgres performance list pgsql-performance@postgresql.org
Sent: Thursday, September 18, 2014 10:56:36 PM
Subject: Re: [PERFORM] postgres 9.3 vs. 9.4

On 19/09/14 08:32, Merlin Moncure wrote:

On Thu, Sep 18, 2014 at 4:58 AM, Mkrtchyan, Tigran
tigran.mkrtch...@desy.de wrote:


9.3.5:
  0.035940END;


9.4beta2:
  0.957854END;



time being spent on 'END' is definitely suggesting i/o related issues.
This is making me very skeptical that postgres is the source of the
problem.   I also thing synchronous_commit is not set properly on the
new instance (or possibly there is a bug or some such).  Can you
verify via:

select * from pg_settings where name = 'synchronous_commit';

on both servers?



Yes, does look suspicious. It *could* be that the 9.4 case is getting
unlucky and checkpointing just before the end of the 60s run, and 9.3
isn't.


10 minutes run had the same results.

Is there some kind of statistics which can tell there time is spend?
Or the only way is to run on solaris with dtrace? For me it's more important
to find why I get only 1500tps with 9.3. The test with 9.4 was just a hope for
a magic code change that will give me a better performance.




Interesting. With respect to dtrace, you can use systemtap on Linux to 
achieve similar things.


However before getting too carried away with that - we already *know* 
that 9.4 is spending longer in END (i.e commit) than 9.3 is. I'd 
recommend you see what wal_sync_method is set to on both systems. If it 
is the same, then my suspicion is that one of the SSD's needs to be 
trimmed [1]. You can do this by running:


$ fstrim /mountpoint

Also - are you using the same filesystem and mount options on each SSD?

Cheers

Mark

[1] if fact, for the paranoid - I usually secure erase any SSD before 
performance testing, and then check the SMART counters too...




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-18 Thread Mark Kirkwood

On 19/09/14 10:16, Mark Kirkwood wrote:

On 19/09/14 09:10, Mkrtchyan, Tigran wrote:



- Original Message -

From: Mark Kirkwood mark.kirkw...@catalyst.net.nz
To: Merlin Moncure mmonc...@gmail.com, Tigran Mkrtchyan
tigran.mkrtch...@desy.de
Cc: postgres performance list pgsql-performance@postgresql.org
Sent: Thursday, September 18, 2014 10:56:36 PM
Subject: Re: [PERFORM] postgres 9.3 vs. 9.4

On 19/09/14 08:32, Merlin Moncure wrote:

On Thu, Sep 18, 2014 at 4:58 AM, Mkrtchyan, Tigran
tigran.mkrtch...@desy.de wrote:


9.3.5:
  0.035940END;


9.4beta2:
  0.957854END;



time being spent on 'END' is definitely suggesting i/o related issues.
This is making me very skeptical that postgres is the source of the
problem.   I also thing synchronous_commit is not set properly on the
new instance (or possibly there is a bug or some such).  Can you
verify via:

select * from pg_settings where name = 'synchronous_commit';

on both servers?



Yes, does look suspicious. It *could* be that the 9.4 case is getting
unlucky and checkpointing just before the end of the 60s run, and 9.3
isn't.


10 minutes run had the same results.

Is there some kind of statistics which can tell there time is spend?
Or the only way is to run on solaris with dtrace? For me it's more
important
to find why I get only 1500tps with 9.3. The test with 9.4 was just a
hope for
a magic code change that will give me a better performance.




Interesting. With respect to dtrace, you can use systemtap on Linux to
achieve similar things.

However before getting too carried away with that - we already *know*
that 9.4 is spending longer in END (i.e commit) than 9.3 is. I'd
recommend you see what wal_sync_method is set to on both systems. If it
is the same, then my suspicion is that one of the SSD's needs to be
trimmed [1]. You can do this by running:

$ fstrim /mountpoint

Also - are you using the same filesystem and mount options on each SSD?

Cheers

Mark

[1] if fact, for the paranoid - I usually secure erase any SSD before
performance testing, and then check the SMART counters too...



Further to the confusion, here's my 9.3 vs 9.4 on two M550 (one for 9.3 
one for 9.4), see below for results.


I'm running xfs on them with trim/discard enabled:

$ mount|grep pg
/dev/sdd4 on /mnt/pg94 type xfs (rw,discard)
/dev/sdc4 on /mnt/pg93 type xfs (rw,discard)


I'm *not* seeing any significant difference between 9.3 and 9.4, and the 
numbers are both about 2x your best number, which is food for thought 
(those P320's should toast my M550 for write performance...).



9.3:

$ pgbench -r -j 1 -c 1 -T 60 bench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 194615
tps = 3243.567115 (including connections establishing)
tps = 3243.771688 (excluding connections establishing)
statement latencies in milliseconds:
0.000798\set nbranches 1 * :scale
0.000302\set ntellers 10 * :scale
0.000276\set naccounts 10 * :scale
0.000330\setrandom aid 1 :naccounts
0.000265\setrandom bid 1 :nbranches
0.000278\setrandom tid 1 :ntellers
0.000298\setrandom delta -5000 5000
0.012818BEGIN;
	0.065403	UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE 
aid = :aid;

0.048516SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
	0.058343	UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE 
tid = :tid;
	0.057763	UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE 
bid = :bid;
	0.043293	INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) 
VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

0.017087END;


9.4:

$ pgbench -r -j 1 -c 1 -T 60 bench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 194130
latency average: 0.309 ms
tps = 3235.488190 (including connections establishing)
tps = 3235.560235 (excluding connections establishing)
statement latencies in milliseconds:
0.000460\set nbranches 1 * :scale
0.000231\set ntellers 10 * :scale
0.000224\set naccounts 10 * :scale
0.000258\setrandom aid 1 :naccounts
0.000252\setrandom bid 1 :nbranches
0.000266\setrandom tid 1 :ntellers
0.000272\setrandom delta -5000 5000
0.011724BEGIN;
	0.083750	UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE 
aid = :aid;

0.045553SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
	0.054412	UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE 
tid = :tid;
	0.053371	UPDATE pgbench_branches SET bbalance = bbalance

Re: [PERFORM] tuning postgresql 9.3.5 and multiple cores

2014-08-26 Thread Mark Kirkwood

On 26/08/14 06:47, Jeison Bedoya Delgado wrote:

hi, recently i change the hardware of my database 32 cores up to 64
cores and 128GB Ram, but the performance is the same.  Perhaps i have to
change any parameter in the postgresql.conf?.



In addition to the points that others have made, even if you do have  
32 active sessions it it not clear that 64 cores will automagically get 
you twice (or in fact any) better performance than 32. We are seeing 
exactly this effect with a (60 core) machine that gets pretty much the 
same performance as an older generation 32 core one.


Interestingly while this is *likely* a software issue - it is not 
immediately obvious where it lies - we tested Postgres (9.3/9.4/9.5) and 
Mysql (5.5/5.6/5.7) *all* of which exhibited the the lack of improvement 
with more cores.


Profiling suggested numa effects - but trying to eliminate these seemed 
to simply throw up new factors to inhibit performance. My *guess* (and 
it is a guess) is that we are seeing 2 (perhaps more) performance 
bottlenecks very close to each other: numa and spinlock contention at least.


Regards

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3

2014-08-26 Thread Mark Kirkwood

On 26/08/14 10:13, Josh Berkus wrote:

On 08/22/2014 07:02 AM, Andres Freund wrote:

On 2014-08-21 14:02:26 -0700, Josh Berkus wrote:

On 08/20/2014 07:40 PM, Bruce Momjian wrote:

Not sure how you can make such a blanket statement when so many people
have tested and shown the benefits of hyper-threading.


Actually, I don't know that anyone has posted the benefits of HT.
Link?


There's definitely cases where it can help. But it's highly workload
*and* hardware dependent.


The only cases I've seen where HT can be beneficial is when you have
large numbers of idle connections.  Then the idle connections can be
parked on the HT virtual cores.  However, even in this case I haven't
seen a head-to-head performance comparison.



I've just had a pair of Crucial m550's arrive, so a bit of benchmarking 
is in order. The results (below) seem to suggest that HT enabled is 
certainly not inhibiting scaling performance for single socket i7's. I 
performed several runs (typical results shown below).


Intel i7-4770 3.4 Ghz, 16G
2x Crucial m550
Ubuntu 14.04
Postgres 9.4 beta2

logging_collector = on
max_connections = 600
shared_buffers = 1GB
wal_buffers = 32MB
checkpoint_segments = 128
effective_cache_size = 10GB

pgbench scale = 300
test duration (each) = 600s

db on 1x m550
xlog on 1x m550

clients |  tps (HT)|  tps (no HT)
+--+-
4   |  517 |  520
8   | 1013 |  999
16  | 1938 | 1913
32  | 3574 | 3560
64  | 5873 | 5412
128 | 8351 | 7450
256 | 9426 | 7840
512 | 9357 | 7288


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3

2014-08-25 Thread Mark Kirkwood

On 26/08/14 10:13, Josh Berkus wrote:

On 08/22/2014 07:02 AM, Andres Freund wrote:

On 2014-08-21 14:02:26 -0700, Josh Berkus wrote:

On 08/20/2014 07:40 PM, Bruce Momjian wrote:

Not sure how you can make such a blanket statement when so many people
have tested and shown the benefits of hyper-threading.


Actually, I don't know that anyone has posted the benefits of HT.
Link?


There's definitely cases where it can help. But it's highly workload
*and* hardware dependent.


The only cases I've seen where HT can be beneficial is when you have
large numbers of idle connections.  Then the idle connections can be
parked on the HT virtual cores.  However, even in this case I haven't
seen a head-to-head performance comparison.



I recall HT beneficial on a single socket (i3 or i7), using pgbench as 
the measuring tool. However I didn't save the results at the time. I've 
just got some new ssd's to play with so might run some pgbench tests on 
my home machine (Haswell i7) with HT on and off.


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3

2014-08-21 Thread Mark Kirkwood

On 21/08/14 11:14, Mark Kirkwood wrote:


You didn't mention what
cpu this is for (or how many sockets etc), would be useful to know.



Just to clarify - while you mentioned that the production system was 40 
cores, it wasn't immediately obvious that the same system was the source 
of the measurements you posted...sorry if I'm being a mixture of 
pedantic and dense - just trying to make sure it is clear what 
systems/cpus etc we are talking about (with this in mind it never hurts 
to quote cpu and mobo model numbers)!


Cheers

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3

2014-08-21 Thread Mark Kirkwood

On 22/08/14 11:29, Josh Berkus wrote:

On 08/21/2014 04:08 PM, Steve Crawford wrote:

On 08/21/2014 03:51 PM, Josh Berkus wrote:

On 08/21/2014 02:26 PM, Scott Marlowe wrote:

I'm running almost the exact same setup in production as a spare. It
has 4 of those CPUs, 256G RAM, and is currently set to use HT. Since
it's a spare node I might be able to do some testing on it as well.
It's running a 3.2 kernel right now. I could probably get a later
model kernel on it even.

You know about the IO performance issues with 3.2, yes?


Were those 3.2 only and since fixed or are there issues persisting in
3.2+? The 12.04 LTS release of Ubuntu Server was 3.2 but the 14.04 is 3.13.


The issues I know of were fixed in 3.9.



There is a 3.11 kernel series for Ubuntu 12.04 Precise.

Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3

2014-08-20 Thread Mark Kirkwood

On 21/08/14 07:13, Josh Berkus wrote:

Mark, all:

So, this is pretty damming:

Read-only test with HT ON:

[pgtest@db ~]$ pgbench -c 20 -j 4 -T 600 -S bench
starting vacuum...end.
transaction type: SELECT only
scaling factor: 30
query mode: simple
number of clients: 20
number of threads: 4
duration: 600 s
number of transactions actually processed: 47167533
tps = 78612.471802 (including connections establishing)
tps = 78614.604352 (excluding connections establishing)

Read-only test with HT Off:

[pgtest@db ~]$ pgbench -c 20 -j 4 -T 600 -S bench
starting vacuum...end.
transaction type: SELECT only
scaling factor: 30
query mode: simple
number of clients: 20
number of threads: 4
duration: 600 s
number of transactions actually processed: 82457739
tps = 137429.508196 (including connections establishing)
tps = 137432.893796 (excluding connections establishing)


On a read-write test, it's 10% faster with HT off as well.

Further, from their production machine we've seen that having HT on
causes the machine to slow down by 5X whenever you get more than 40
cores (as in 100% of real cores or 50% of HT cores) worth of activity.

So we're definitely back to If you're using PostgreSQL, turn off
Hyperthreading.




Hmm - that is interesting - I don't think we compared read only scaling 
for hyperthreading on and off (only read write). You didn't mention what 
cpu this is for (or how many sockets etc), would be useful to know.


Notwithstanding the above results, my workmate Matt made an interesting 
observation: the scaling graph for (our) 60 core box (HT off), looks 
just like the one for our 32 core box with HT *on*.


We are wondering if a lot of the previous analysis of HT performance 
regressions should actually be reevaluated in the light of ...err is it 
just that we have a lot more cores...? [1]


Regards

Mark

[1] Particularly as in *some* cases (single socket i7 for instance) HT 
on seems to scale fine.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-08-14 Thread Mark Kirkwood

On 15/08/14 06:18, Josh Berkus wrote:

Mark,

Is the 60-core machine using some of the Intel chips which have 20
hyperthreaded virtual cores?

If so, I've been seeing some performance issues on these processors.
I'm currently doing a side-by-side hyperthreading on/off test.



Hi Josh,

The board has 4 sockets with E7-4890 v2 cpus. They have 15 cores/30 
threads. We've running with hyperthreading off (noticed the usual 
steep/sudden scaling dropoff with it on).


What model are your 20 cores cpus?

Cheers

Mark






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-08-11 Thread Mark Kirkwood

On 01/08/14 09:38, Alvaro Herrera wrote:

Matt Clarkson wrote:


The LWLOCK_STATS below suggest that ProcArrayLock might be the main
source of locking that's causing throughput to take a dive as the client
count increases beyond the core count.



Any thoughts or comments on these results are welcome!


Do these results change if you use Heikki's patch for CSN-based
snapshots?  See
http://www.postgresql.org/message-id/539ad153.9000...@vmware.com for the
patch (but note that you need to apply on top of 89cf2d52030 in the
master branch -- maybe it applies to HEAD the 9.4 branch but I didn't
try).



Hi Alvaro,

Applying the CSN patch on top of the rwlock + numa in 9.4 (bit of a 
patch-fest we have here now) shows modest improvement at highest client 
number (but appears to hurt performance in the mid range):


 clients |  tps
-+
6|  8445
12   | 14548
24   | 20043
48   | 27451
96   | 27718
192  | 23614
384  | 24737


Initial runs were quite disappointing, until we moved the csnlog 
directory onto the same filesystem that the xlogs are on (PCIe SSD). We 
could potentially look at locating them on their own separate volume if 
that make sense.


Adding in LWLOCK stats again shows quite a different picture from the 
previous:


48 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay %
+--+---+---+-
WALWriteLock| 25426001 | 1239  | 62.227442 | 14.373550
CLogControlLock |  1793739 | 1376  |  4.389986 | 15.962877
ProcArrayLock   |  1007765 | 1305  |  2.466398 | 15.139211
CSNLogControlLock   |  609556  | 1722  |  1.491824 | 19.976798
WALInsertLocks 4|  994170  |  247  |  2.433126 |  2.865429
WALInsertLocks 7|  983497  |  243  |  2.407005 |  2.819026
WALInsertLocks 5|  993068  |  239  |  2.430429 |  2.772622
WALInsertLocks 3|  991446  |  229  |  2.426459 |  2.656613
WALInsertLocks 0|  964185  |  235  |  2.359741 |  2.726218
WALInsertLocks 1|  995237  |  221  |  2.435737 |  2.563805
WALInsertLocks 2|  997593  |  213  |  2.441503 |  2.470998
WALInsertLocks 6|  978178  |  201  |  2.393987 |  2.331787
BufFreelistLock |  887194  |  206  |  2.171313 |  2.389791
XidGenLock  |  327385  |  366  |  0.801240 |  4.245940
CheckpointerCommLock|  104754  |  151  |  0.256374 |  1.751740
WALBufMappingLock   |  274226  |7  |  0.671139 |  0.081206


96 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay %
+--+---+---+-
WALWriteLock| 25426001 |  1239 | 62.227442 | 14.373550
WALWriteLock| 30097625 |  9616 | 48.550747 | 19.068393
CLogControlLock |  3193429 | 13490 | 5.151349  | 26.750481
ProcArrayLock   |  2007103 | 11754 | 3.237676  | 23.308017
CSNLogControlLock   |  1303172 |  5022 | 2.102158  |  9.958556
BufFreelistLock |  1921625 |  1977 | 3.099790  |  3.920363
WALInsertLocks 0|  2011855 |   681 | 3.245341  |  1.350413
WALInsertLocks 5|  1829266 |   627 | 2.950805  |  1.243332
WALInsertLocks 7|  1806966 |   632 | 2.914833  |  1.253247
WALInsertLocks 4|  1847372 |   591 | 2.980012  |  1.171945
WALInsertLocks 1|  1948553 |   557 | 3.143228  |  1.104523
WALInsertLocks 6|  1818717 |   582 | 2.933789  |  1.154098
WALInsertLocks 3|  1873964 |   552 | 3.022908  |  1.094608
WALInsertLocks 2|  1912007 |   523 | 3.084276  |  1.037102
XidGenLock  |   512521 |   699 | 0.826752  |  1.386107
CheckpointerCommLock|   386853 |   711 | 0.624036  |  1.409903
WALBufMappingLock   |   546462 |65 | 0.881503  |  0.128894


384 clients

  Lock  |Blk   | SpinDelay | Blk % | SpinDelay %
+--+---+---+-
WALWriteLock| 25426001 |   1239| 62.227442 | 14.373550
WALWriteLock| 20703796 |  87265| 27.749961 | 15.360068
CLogControlLock |  3273136 | 122616|  4.387089 | 21.582422
ProcArrayLock   |  3969918 | 100730|  5.321008 | 17.730128
CSNLogControlLock   |  3191989 | 115068|  4.278325 | 20.253851
BufFreelistLock |  2014218 |  27952|  2.699721 |  4.920009
WALInsertLocks 0|  2750082 |   5438|  3.686023 |  0.957177
WALInsertLocks 1|  2584155 |   5312|  3.463626 |  0.934999
WALInsertLocks 2|  2477782 |   5497|  3.321051 |  0.967562
WALInsertLocks 4|  2375977 |   5441|  3.184598 |  0.957705
WALInsertLocks 5|  2349769 |   5458|  3.149471 |  0.960697
WALInsertLocks 6|  2329982 |   5367|  3.122950 |  0.944680
WALInsertLocks 3|  2415965 |   4771|  3.238195 |  0.839774
WALInsertLocks 7|  2316144 |   4930|  3.104402 |  0.867761
CheckpointerCommLock|   584419 |  10794|  0.783316 |  1.899921
XidGenLock  |   391212 |  

Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Mark Kirkwood

Hi Tomas,

Unfortunately I think you are mistaken - disabling the stats collector 
(i.e. track_counts = off) means that autovacuum has no idea about 
when/if it needs to start a worker (as it uses those counts to decide), 
and hence you lose all automatic vacuum and analyze as a result.


With respect to comments like it shouldn't make difference etc etc, 
well the profile suggests otherwise, and the change in tps numbers 
support the observation.


regards

Mark

On 30/07/14 20:42, Tomas Vondra wrote:

On 30 Červenec 2014, 3:44, Mark Kirkwood wrote:


While these numbers look great in the middle range (12-96 clients), then
benefit looks to be tailing off as client numbers increase. Also running
with no stats (and hence no auto vacuum or analyze) is way too scary!


I assume you've disabled statistics collector, which has nothing to do
with vacuum or analyze.

There are two kinds of statistics in PostgreSQL - data distribution
statistics (which is collected by ANALYZE and stored in actual tables
within the database) and runtime statistics (which is collected by the
stats collector and stored in a file somewhere on the dist).

By disabling statistics collector you loose runtime counters - number of
sequential/index scans on a table, tuples read from a relation aetc. But
it does not influence VACUUM or planning at all.

Also, it's mostly async (send over UDP and you're done) and shouldn't make
much difference unless you have large number of objects. There are ways to
improve this (e.g. by placing the stat files into a tmpfs).

Tomas





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-30 Thread Mark Kirkwood

On 31/07/14 00:47, Tomas Vondra wrote:

On 30 Červenec 2014, 14:39, Tom Lane wrote:

Tomas Vondra t...@fuzzy.cz writes:

On 30 ??ervenec 2014, 3:44, Mark Kirkwood wrote:

While these numbers look great in the middle range (12-96 clients),
then
benefit looks to be tailing off as client numbers increase. Also
running
with no stats (and hence no auto vacuum or analyze) is way too scary!



By disabling statistics collector you loose runtime counters - number of
sequential/index scans on a table, tuples read from a relation aetc. But
it does not influence VACUUM or planning at all.


It does break autovacuum.


Of course, you're right. It throws away info about how much data was
modified and when the table was last (auto)vacuumed.

This is a clear proof that I really need to drink at least one cup of
coffee in the morning before doing anything in the morning.



Lol - thanks for taking a look anyway. Yes, coffee is often an important 
part of the exercise.


Regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-29 Thread Mark Kirkwood

On 17/07/14 11:58, Mark Kirkwood wrote:



Trying out with numa_balancing=0 seemed to get essentially the same
performance. Similarly wrapping postgres startup with --interleave.

All this made me want to try with numa *really* disabled. So rebooted
the box with numa=off appended to the kernel cmdline. Somewhat
surprisingly (to me anyway), the numbers were essentially identical. The
profile, however is quite different:



A little more tweaking got some further improvement:

rwlocks patch as before

wal_buffers = 256MB
checkpoint_segments = 1920
wal_sync_method = open_datasync

LSI RAID adaptor disable read ahead and write cache for SSD fast path mode
numa_balancing = 0


Pgbench scale 2000 again:

clients  | tps (prev) |  tps (tweaked config)
-++-
6|   8175 |   8281
12   |  14409 |  15896
24   |  17191 |  19522
48   |  23122 |  29776
96   |  22308 |  32352
192  |  23109 |  28804


Now recall we were seeing no actual tps changes with numa_balancing=0 or 
1 (so the improvement above is from the other changes), but figured it 
might be informative to try to track down what the non-numa bottlenecks 
looked like. We tried profiling the entire 10 minute run which showed 
the stats collector as a possible source of contention:



 3.86%postgres  [kernel.kallsyms][k] _raw_spin_lock_bh
  |
  --- _raw_spin_lock_bh
 |
 |--95.78%-- lock_sock_nested
 |  udpv6_sendmsg
 |  inet_sendmsg
 |  sock_sendmsg
 |  SYSC_sendto
 |  sys_sendto
 |  tracesys
 |  __libc_send
 |  |
 |  |--99.17%-- pgstat_report_stat
 |  |  PostgresMain
 |  |  ServerLoop
 |  |  PostmasterMain
 |  |  main
 |  |  __libc_start_main
 |  |
 |  |--0.77%-- pgstat_send_bgwriter
 |  |  BackgroundWriterMain
 |  |  AuxiliaryProcessMain
 |  |  0x7f08efe8d453
 |  |  reaper
 |  |  __restore_rt
 |  |  PostmasterMain
 |  |  main
 |  |  __libc_start_main
 |   --0.07%-- [...]
 |
 |--2.54%-- __lock_sock
 |  |
 |  |--91.95%-- lock_sock_nested
 |  |  udpv6_sendmsg
 |  |  inet_sendmsg
 |  |  sock_sendmsg
 |  |  SYSC_sendto
 |  |  sys_sendto
 |  |  tracesys
 |  |  __libc_send
 |  |  |
 |  |  |--99.73%-- pgstat_report_stat
 |  |  |  PostgresMain
 |  |  |  ServerLoop



Disabling track_counts and rerunning pgbench:

clients  | tps (no counts)
-+
6|9806
12   |   18000
24   |   29281
48   |   43703
96   |   54539
192  |   36114


While these numbers look great in the middle range (12-96 clients), then 
benefit looks to be tailing off as client numbers increase. Also running 
with no stats (and hence no auto vacuum or analyze) is way too scary!


Trying out less write heavy workloads shows that the stats overhead does 
not appear to be significant for *read* heavy cases, so this result 
above is perhaps more of a curiosity than anything (given that read 
heavy is more typical...and our real workload is more similar to read 
heavy).


The profile for counts off looks like:

 4.79% swapper  [kernel.kallsyms][k] read_hpet
   |
   --- read_hpet
  |
  |--97.10%-- ktime_get
  |  |
  |  |--35.24%-- clockevents_program_event
  |  |  tick_program_event
  |  |  |
  |  |  |--56.59%-- 
__hrtimer_start_range_ns

  |  |  |  |
  |  |  |  |--78.12

Re: [PERFORM] 60 core performance with 9.3

2014-07-16 Thread Mark Kirkwood

On 11/07/14 20:22, Andres Freund wrote:

On 2014-07-11 12:40:15 +1200, Mark Kirkwood wrote:

Full report http://paste.ubuntu.com/886/



#
  8.82%postgres  [kernel.kallsyms][k]
_raw_spin_lock_irqsave
   |
   --- _raw_spin_lock_irqsave
  |
  |--75.69%-- pagevec_lru_move_fn
  |  __lru_cache_add
  |  lru_cache_add
  |  putback_lru_page
  |  migrate_pages
  |  migrate_misplaced_page
  |  do_numa_page
  |  handle_mm_fault
  |  __do_page_fault
  |  do_page_fault
  |  page_fault


So, the majority of the time is spent in numa page migration. Can you
disable numa_balancing? I'm not sure if your kernel version does that at
runtime or whether you need to reboot.
The kernel.numa_balancing sysctl might work. Otherwise you probably need
to boot with numa_balancing=0.

It'd also be worthwhile to test this with numactl --interleave.



Trying out with numa_balancing=0 seemed to get essentially the same 
performance. Similarly wrapping postgres startup with --interleave.


All this made me want to try with numa *really* disabled. So rebooted 
the box with numa=off appended to the kernel cmdline. Somewhat 
surprisingly (to me anyway), the numbers were essentially identical. The 
profile, however is quite different:


Full report at http://paste.ubuntu.com/7806285/


 4.56% postgres  [kernel.kallsyms] [k] 
_raw_spin_lock_irqsave 



   |
   --- _raw_spin_lock_irqsave
  |
  |--41.89%-- try_to_wake_up
  |  |
  |  |--96.12%-- default_wake_function
  |  |  |
  |  |  |--99.96%-- pollwake
  |  |  |  __wake_up_common
  |  |  |  __wake_up_sync_key
  |  |  |  sock_def_readable
  |  |  |  |
  |  |  |  |--99.94%-- 
unix_stream_sendmsg
  |  |  |  | 
sock_sendmsg
  |  |  |  | 
SYSC_sendto
  |  |  |  | 
sys_sendto

  |  |  |  |  tracesys
  |  |  |  | 
__libc_send

  |  |  |  |  pq_flush
  |  |  |  | 
ReadyForQuery
  |  |  |  | 
PostgresMain
  |  |  |  | 
ServerLoop
  |  |  |  | 
PostmasterMain

  |  |  |  |  main
  |  |  |  | 
__libc_start_main

  |  |  |   --0.06%-- [...]
  |  |   --0.04%-- [...]
  |  |
  |  |--2.87%-- wake_up_process
  |  |  |
  |  |  |--95.71%-- 
wake_up_sem_queue_do

  |  |  |  SYSC_semtimedop
  |  |  |  sys_semop
  |  |  |  tracesys
  |  |  |  __GI___semop
  |  |  |  |
  |  |  |  |--99.75%-- 
LWLockRelease
  |  |  |  |  | 

  |  |  |  | 
|--25.09%-- RecordTransactionCommit
  |  |  |  |  | 
  CommitTransaction
  |  |  |  |  | 
  CommitTransactionCommand
  |  |  |  |  | 
  finish_xact_command.part.4
  |  |  |  |  | 
  PostgresMain
  |  |  |  |  | 
  ServerLoop
  |  |  |  |  | 
  PostmasterMain
  |  |  |  |  | 
  main
  |  |  |  |  | 
  __libc_start_main




regards

Mark

Re: [PERFORM] 60 core performance with 9.3

2014-07-16 Thread Mark Kirkwood

On 12/07/14 01:19, Kevin Grittner wrote:


It might be worth a test using a cpuset to interleave OS cache and
the NUMA patch I submitted to the current CF to see whether this is
getting into territory where the patch makes a bigger difference.
I would expect it to do much better than using numactl --interleave
because work_mem and other process-local memory would be allocated
in near memory for each process.

http://www.postgresql.org/message-id/1402267501.4.yahoomail...@web122304.mail.ne1.yahoo.com



Thanks Kevin - I did try this out - seemed slightly better than using 
--interleave, but almost identical to the results posted previously.


However looking at my postgres binary with ldd, I'm not seeing any link 
to libnuma (despite it demanding the library whilst building), so I 
wonder if my package build has somehow vanilla-ified the result :-(


Also I am guessing that with 60 cores I do:

$ sudo /bin/bash -c echo 0-59 /dev/cpuset/postgres/cpus

i.e cpus are cores not packages...? If I've stuffed it up I'll redo!


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-11 Thread Mark Kirkwood

On 11/07/14 20:22, Andres Freund wrote:

On 2014-07-11 12:40:15 +1200, Mark Kirkwood wrote:



Postgres 9.4 beta
rwlock patch
pgbench scale = 2000


On that scale - that's bigger than shared_buffers IIRC - I'd not expect
the patch to make much of a difference.



Right - we did test with it bigger (can't recall exactly how big), but 
will retry again after setting the numa parameters below.



#
  8.82%postgres  [kernel.kallsyms][k]
_raw_spin_lock_irqsave
   |
   --- _raw_spin_lock_irqsave
  |
  |--75.69%-- pagevec_lru_move_fn
  |  __lru_cache_add
  |  lru_cache_add
  |  putback_lru_page
  |  migrate_pages
  |  migrate_misplaced_page
  |  do_numa_page
  |  handle_mm_fault
  |  __do_page_fault
  |  do_page_fault
  |  page_fault


So, the majority of the time is spent in numa page migration. Can you
disable numa_balancing? I'm not sure if your kernel version does that at
runtime or whether you need to reboot.
The kernel.numa_balancing sysctl might work. Otherwise you probably need
to boot with numa_balancing=0.

It'd also be worthwhile to test this with numactl --interleave.



That was my feeling too - but I had no idea what the magic switch was to 
tame it (appears to be in 3.13 kernels), will experiment and report 
back. Thanks again!


Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-10 Thread Mark Kirkwood

On 01/07/14 22:13, Andres Freund wrote:

On 2014-07-01 21:48:35 +1200, Mark Kirkwood wrote:

- cherry picking the last 5 commits into 9.4 branch and building a package
from that and retesting:

Clients | 9.4 tps 60 cores (rwlock)
+--
6   |  70189
12  | 128894
24  | 233542
48  | 422754
96  | 590796
192 | 630672

Wow - that is more like it! Andres that is some nice work, we definitely owe
you some beers for that :-) I am aware that I need to retest with an
unpatched 9.4 src - as it is not clear from this data how much is due to
Andres's patches and how much to the steady stream of 9.4 development. I'll
post an update on that later, but figured this was interesting enough to
note for now.


Cool. That's what I like (and expect) to see :). I don't think unpatched
9.4 will show significantly different results than 9.3, but it'd be good
to validate that. If you do so, could you post the results in the
-hackers thread I just CCed you on? That'll help the work to get into
9.5.


So we seem to have nailed read only performance. Going back and 
revisiting read write performance finds:


Postgres 9.4 beta
rwlock patch
pgbench scale = 2000

max_connections = 200;
shared_buffers = 10GB;
maintenance_work_mem = 1GB;
effective_io_concurrency = 10;
wal_buffers = 32MB;
checkpoint_segments = 192;
checkpoint_completion_target = 0.8;

clients  | tps (32 cores) | tps
-++-
6|   8313 |   8175
12   |  11012 |  14409
24   |  16151 |  17191
48   |  21153 |  23122
96   |  21977 |  22308
192  |  22917 |  23109


So we are back to not doing significantly better than 32 cores. Hmmm. 
Doing quite a few more tweaks gets some better numbers:


kernel.sched_autogroup_enabled=0
kernel.sched_migration_cost_ns=500
net.core.somaxconn=1024
/sys/kernel/mm/transparent_hugepage/enabled [never]

+checkpoint_segments = 1920
+wal_buffers = 256MB;


clients  | tps
-+-
6|   8366
12   |  15988
24   |  19828
48   |  30315
96   |  31649
192  |  29497

One more:

+wal__sync_method = open_datasync

clients  | tps
-+-
6|  9566
12   | 17129
24   | 22962
48   | 34564
96   | 32584
192  | 28367

So this looks better - however I suspect 32 core performance would 
improve with these as well!


The problem does *not* look to be connected with IO (I will include some 
iostat below). So time to get the profiler out (192 clients for 1 minute):


Full report http://paste.ubuntu.com/886/

# 
# captured on: Fri Jul 11 03:09:06 2014
# hostname : ncel-prod-db3
# os release : 3.13.0-24-generic
# perf version : 3.13.9
# arch : x86_64
# nrcpus online : 60
# nrcpus avail : 60
# cpudesc : Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
# cpuid : GenuineIntel,6,62,7
# total memory : 1056692116 kB
# cmdline : /usr/lib/linux-tools-3.13.0-24/perf record -ag
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 
= 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, 
precise_ip = 0, attr_mmap2 = 0, attr_mmap  = 1, attr_mmap_data = 0

# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, uncore_cbox_10 = 17, uncore_cbox_11 = 18, 
uncore_cbox_12 = 19, uncore_cbox_13 = 20, uncore_cbox_14 = 21, software 
= 1, uncore_irp = 33, uncore_pcu = 22, tracepoint = 2, uncore_imc_0 = 
25, uncore_imc_1 = 26, uncore_imc_2 = 27, uncore_imc_3 = 28, 
uncore_imc_4 = 29, uncore_imc_5 = 30, uncore_imc_6 = 31, uncore_imc_7 = 
32, uncore_qpi_0 = 34, uncore_qpi_1 = 35, uncore_qpi_2 = 36, 
uncore_cbox_0 = 7, uncore_cbox_1 = 8, uncore_cbox_2 = 9, uncore_cbox_3 = 
10, uncore_cbox_4 = 11, uncore_cbox_5 = 12, uncore_cbox_6 = 13, 
uncore_cbox_7 = 14, uncore_cbox_8 = 15, uncore_cbox_9 = 16, 
uncore_r2pcie = 37, uncore_r3qpi_0 = 38, uncore_r3qpi_1 = 39, breakpoint 
= 5, uncore_ha_0 = 23, uncore_ha_1 = 24, uncore_ubox = 6

# 
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 359906321606
#
# Overhead CommandShared Object 
Symbol
#   ..  ... 
.

#
 8.82%postgres  [kernel.kallsyms][k] 
_raw_spin_lock_irqsave

  |
  --- _raw_spin_lock_irqsave
 |
 |--75.69%-- pagevec_lru_move_fn
 |  __lru_cache_add
 |  lru_cache_add
 |  putback_lru_page
 |  migrate_pages
 |  migrate_misplaced_page
 |  do_numa_page
 |  handle_mm_fault
 |  __do_page_fault
 |  do_page_fault

Re: [PERFORM] 60 core performance with 9.3

2014-07-01 Thread Mark Kirkwood

On 27/06/14 21:19, Andres Freund wrote:

On 2014-06-27 14:28:20 +1200, Mark Kirkwood wrote:

My feeling is spinlock or similar, 'perf top' shows

kernel find_busiest_group
kernel _raw_spin_lock

as the top time users.


Those don't tell that much by themselves, could you do a hierarchical
profile? I.e. perf record -ga? That'll at least give the callers for
kernel level stuff. For more information compile postgres with
-fno-omit-frame-pointer.



Unfortunately this did not help - had lots of unknown symbols from 
postgres in the profile - I'm guessing the Ubuntu postgresql-9.3 package 
needs either the -dev package or to be rebuilt with the enable profile 
option (debug and no-omit-frame-pointer seem to be there already).


However further investigation did uncover *very* interesting things. 
Firstly I had previously said that read only performance looked 
ok...this was wrong, purely based on comparison to Robert's blog post. 
Rebooting the 60 core box with 32 cores enabled showed that we got 
*better* scaling performance in the read only case and illustrated we 
were hitting a serious regression with more cores. At this point data is 
needed:


Test: pgbench
Options: scale 500
 read only
Os: Ubuntu 14.04
Pg: 9.3.4
Pg Options:
max_connections = 200
shared_buffers = 10GB
maintenance_work_mem = 1GB
effective_io_concurrency = 10
wal_buffers = 32MB
checkpoint_segments = 192
checkpoint_completion_target = 0.8


Results

Clients | 9.3 tps 32 cores | 9.3 tps 60 cores
+--+-
6   |  70400   |  71028
12  |  98918   | 129140
24  | 230345   | 240631
48  | 324042   | 409510
96  | 346929   | 120464
192 | 312621   |  92663

So we have anti scaling with 60 cores as we increase the client 
connections. Ouch! A level of urgency led to trying out Andres's 
'rwlock' 9.4 branch [1] - cherry picking the last 5 commits into 9.4 
branch and building a package from that and retesting:


Clients | 9.4 tps 60 cores (rwlock)
+--
6   |  70189
12  | 128894
24  | 233542
48  | 422754
96  | 590796
192 | 630672

Wow - that is more like it! Andres that is some nice work, we definitely 
owe you some beers for that :-) I am aware that I need to retest with an 
unpatched 9.4 src - as it is not clear from this data how much is due to 
Andres's patches and how much to the steady stream of 9.4 development. 
I'll post an update on that later, but figured this was interesting 
enough to note for now.



Regards

Mark

[1] from git://git.postgresql.org/git/users/andresfreund/postgres.git, 
commits:

4b82477dcaf81ad7b0c102f4b66e479a5eb9504a
10d72b97f108b6002210ea97a414076a62302d4e
67ffebe50111743975d54782a3a94b15ac4e755f
fe686ed18fe132021ee5e557c67cc4d7c50a1ada
f2378dc2fa5b73c688f696704976980bab90c611



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-07-01 Thread Mark Kirkwood

On 01/07/14 21:48, Mark Kirkwood wrote:


[1] from git://git.postgresql.org/git/users/andresfreund/postgres.git,
commits:
4b82477dcaf81ad7b0c102f4b66e479a5eb9504a
10d72b97f108b6002210ea97a414076a62302d4e
67ffebe50111743975d54782a3a94b15ac4e755f
fe686ed18fe132021ee5e557c67cc4d7c50a1ada
f2378dc2fa5b73c688f696704976980bab90c611




Hmmm, should read last 5 commits in 'rwlock-contention' and I had pasted 
the commit nos from my tree not Andres's, sorry, here are the right ones:

472c87400377a7dc418d8b77e47ba08f5c89b1bb
e1e549a8e42b753cc7ac60e914a3939584cb1c56
65c2174469d2e0e7c2894202dc63b8fa6f8d2a7f
959aa6e0084d1264e5b228e5a055d66e5173db7d
a5c3ddaef0ee679cf5e8e10d59e0a1fe9f0f1893




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-06-27 Thread Mark Kirkwood

On 27/06/14 21:19, Andres Freund wrote:

On 2014-06-27 14:28:20 +1200, Mark Kirkwood wrote:

My feeling is spinlock or similar, 'perf top' shows

kernel find_busiest_group
kernel _raw_spin_lock

as the top time users.


Those don't tell that much by themselves, could you do a hierarchical
profile? I.e. perf record -ga? That'll at least give the callers for
kernel level stuff. For more information compile postgres with
-fno-omit-frame-pointer.



Excellent suggestion, will do next week!

regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 60 core performance with 9.3

2014-06-26 Thread Mark Kirkwood

I have a nice toy to play with: Dell R920 with 60 cores and 1TB ram [1].

The context is the current machine in use by the customer is a 32 core 
one, and due to growth we are looking at something larger (hence 60 cores).


Some initial tests show similar pgbench read only performance to what 
Robert found here 
http://rhaas.blogspot.co.nz/2012/04/did-i-say-32-cores-how-about-64.html 
(actually a bit quicker around 40 tps).


However doing a mixed read-write workload is getting results the same or 
only marginally quicker than the 32 core machine - particularly at 
higher number of clients (e.g 200 - 500). I have yet to break out the 
perf toolset, but I'm wondering if any folk has compared 32 and 60 (or 
64) core read write pgbench performance?


regards

Mark

[1] Details:

4x E7-4890 15 cores each.
1 TB ram
16x Toshiba PX02SS SATA SSD
4x Samsung NVMe XS1715 PCIe SSD

Ubuntu 14.04  (Linux 3.13)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 60 core performance with 9.3

2014-06-26 Thread Mark Kirkwood

On 27/06/14 14:01, Scott Marlowe wrote:

On Thu, Jun 26, 2014 at 5:49 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:

I have a nice toy to play with: Dell R920 with 60 cores and 1TB ram [1].

The context is the current machine in use by the customer is a 32 core one,
and due to growth we are looking at something larger (hence 60 cores).

Some initial tests show similar pgbench read only performance to what Robert
found here
http://rhaas.blogspot.co.nz/2012/04/did-i-say-32-cores-how-about-64.html
(actually a bit quicker around 40 tps).

However doing a mixed read-write workload is getting results the same or
only marginally quicker than the 32 core machine - particularly at higher
number of clients (e.g 200 - 500). I have yet to break out the perf toolset,
but I'm wondering if any folk has compared 32 and 60 (or 64) core read write
pgbench performance?


My guess is that the read only test is CPU / memory bandwidth limited,
but the mixed test is IO bound.

What's your iostat / vmstat / iotop etc look like when you're doing
both read only and read/write mixed?




That was what I would have thought too, but it does not appear to be the 
case, here is a typical iostat:


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.000.00 0.00 0.00 
0.00 0.000.000.000.00   0.00   0.00
nvme0n1   0.00 0.000.00 4448.00 0.0041.47 
19.10 0.140.030.000.03   0.03  14.40
nvme1n1   0.00 0.000.00 4448.00 0.0041.47 
19.10 0.150.030.000.03   0.03  15.20
nvme2n1   0.00 0.000.00 4549.00 0.0042.20 
19.00 0.150.030.000.03   0.03  15.20
nvme3n1   0.00 0.000.00 4548.00 0.0042.19 
19.00 0.160.040.000.04   0.04  16.00
dm-0  0.00 0.000.000.00 0.00 0.00 
0.00 0.000.000.000.00   0.00   0.00
md0   0.00 0.000.00 17961.00 0.0083.67 
9.54 0.000.000.000.00   0.00   0.00
dm-1  0.00 0.000.000.00 0.00 0.00 
0.00 0.000.000.000.00   0.00   0.00
dm-2  0.00 0.000.000.00 0.00 0.00 
0.00 0.000.000.000.00   0.00   0.00
dm-3  0.00 0.000.000.00 0.00 0.00 
0.00 0.000.000.000.00   0.00   0.00
dm-4  0.00 0.000.000.00 0.00 0.00 
0.00 0.000.000.000.00   0.00   0.00



My feeling is spinlock or similar, 'perf top' shows

kernel find_busiest_group
kernel _raw_spin_lock

as the top time users.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing query time after updates

2014-01-21 Thread Mark Kirkwood

On 21/01/14 21:37, Katharina Koobs wrote:

Dear Heikki,
thank you for your valuable feedback. Regarding your questions: It
gradually slower every day. The database size is increasing only
slightly over time.

I will try your hint regarding CLUSTERING. The difference in effect of
VACUUM FULL in version 9.0 sounds very interesting. I will discuss the
update to version 9.0 with my colleague.

Any further idea or feedback is much appreciated.




Index bloat could be a factor too - performing a regular REINDEX on the 
relevant tables could be worth a try.


Regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing query time after updates

2014-01-21 Thread Mark Kirkwood

On 21/01/14 21:45, Mark Kirkwood wrote:

On 21/01/14 21:37, Katharina Koobs wrote:

Dear Heikki,
thank you for your valuable feedback. Regarding your questions: It
gradually slower every day. The database size is increasing only
slightly over time.

I will try your hint regarding CLUSTERING. The difference in effect of
VACUUM FULL in version 9.0 sounds very interesting. I will discuss the
update to version 9.0 with my colleague.

Any further idea or feedback is much appreciated.




Index bloat could be a factor too - performing a regular REINDEX on the
relevant tables could be worth a try.



Sorry - I missed that you had tried reindex already.

regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system

2013-12-28 Thread Mark Kirkwood

On 28/12/13 18:19, ankush upadhyay wrote:

Hello All,

I am using multi tenant system and doing performance testing of multi
tenant application. In case of single tenant it is working fine but once
I enable tenants, then some time database servers not responding. Any clue?



It is a bit tricky to tell without any relevant information (e.g schema 
description). But a likely culprit would be a missing index on the 
relevant 'tenant_id' type field in each table that you are using to 
distinguish the various tenant datasets.


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query not using index

2013-12-27 Thread Mark Kirkwood

On 23/12/13 21:58, Johann Spies wrote:




On 19 December 2013 16:48, Tom Lane t...@sss.pgh.pa.us
mailto:t...@sss.pgh.pa.us wrote:

Johann Spies johann.sp...@gmail.com
mailto:johann.sp...@gmail.com writes:
  I would appreciate some help optimising the following query:

It's a mistake to imagine that indexes are going to help much with
a join of this size.  Hash or merge join is going to be a lot better
than nestloop.  What you need to do is make sure those will perform
as well as possible, and to that end, it'd likely help to raise
work_mem.  I'm not sure if you can sanely put it high enough to
make the query operate totally in memory --- it looks like you'd
need work_mem of 500MB or more to prevent any of the sorts or
hashes from spilling to disk, and keep in mind that this query
is going to use several times work_mem because there are multiple
sorts/hashes going on.  But if you can transiently dedicate a lot
of RAM to this query, that should help some.  I'd suggest increasing
work_mem via a SET command in the particular session running this
query --- you don't want such a high value to be the global default.


Thanks Tom.  Raising work_mem from 384MB to 512MB made a significant
difference.

You said hash or merge join id going to be a lot better than
nestloop.  Is that purely in the hands of the query planner or what can
I do to get the planner to use that options apart from raising the work_mem?




You can disable the hash and merge join options by doing:

SET enable_hashjoin=off;
SET enable_mergejoin=off;

before running the query again. Timing it (or EXPLAIN ANALYZE) should 
demonstrate if that planner made the right call by choosing hash or 
merge in the first place.


regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing a query

2013-12-23 Thread Mark Kirkwood

On 20/12/13 06:53, Shaun Thomas wrote:

On 12/17/2013 08:48 PM, Kai Sellgren wrote:

This is your select:


SELECT *
FROM Log
LEFT JOIN NewsArticle ON NewsArticle.id = Log.targetId AND
Log.targetType = 'NewsArticle'
ORDER BY Log.createdAt DESC
LIMIT 10


This is your index:


CREATE INDEX Log_targetId_targetType_idx
   ON Log
   USING btree
   (targetId, targetType COLLATE pg_catalog.default);


Unfortunately, this won't help you. You are not matching on any IDs you
indexed, aside from joining against the article table. You have no WHERE
clause to restrict the data set, so it absolutely must read the entire
table to find the most recent records. Without an index on createdAt,
how is it supposed to know what the ten most recent records are?

Add an index to the createdAt column:

CREATE INDEX idx_log_createdat ON Log (createdAt DESC);

Using that, it should get the ten most recent Log records almost
immediately, including associated article content.



Also, might be worth creating an index on NewsArticle(id) so that the 
join to this table does not require a full table scan:


CREATE INDEX newsarticle_id_idx ON NewsArticle (id);

(probably not a problem when you only have a few articles - but will be 
as the volume increases over time).


Regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] WAL + SSD = slow inserts?

2013-12-05 Thread Mark Kirkwood

On 06/12/13 05:13, Skarsol wrote:

On Thu, Dec 5, 2013 at 9:50 AM, Scott Marlowe scott.marl...@gmail.comwrote:


On Thu, Dec 5, 2013 at 8:16 AM, Skarsol skar...@gmail.com wrote:

psql (PostgreSQL) 9.2.5
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Linux 2.6.32-358.6.1.el6.x86_64 #1 SMP Fri Mar 29 16:51:51 EDT 2013

x86_64

x86_64 x86_64 GNU/Linux
All relevant filesystems are ext4

Changes from defaults:
max_connections = 500
shared_buffers = 32000MB
temp_buffers = 24MB
work_mem = 1GB
maintenance_work_mem = 5GB
wal_level = archive
wal_buffers = 16MB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'test ! -f /databases/pg_archive/db/%f  cp %p
/databases/pg_archive/db/%f'
effective_cache_size = 64000MB
default_statistics_target = 5000
log_checkpoints = on
stats_temp_directory = '/tmp/pgstat'

OK I'd make the following changes.
1: Drop shared_buffers to something like 1000MB
2: drop work_mem to 16MB or so. 1GB is pathological, as it can make
the machine run out of memory quite fast.
3: drop max_connections to 100 or so. if you really need 500 conns,
then work_mem of 1G is that much worse.

Next, move pg_xlog OFF the SSDs and back onto spinning media and put
your data/base dir on the SSDs.

SSDs aren't much faster, if at all, for pg_xlog, but are much much
faster for data/base files.

Also changing the io schduler for the SSDs to noop:


http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/block/switching-sched.txt?id=HEAD


Changing the scheduler to noop seems to have had a decent effect. I've made
the other recommended changes other than the connections as we do need that
many currently. We're looking to implement pg_bouncer which should help
with that.




What model SSD are you using? Some can work better with deadline than 
noop (as their own scheduling firmware may be pretty poor). Also, check 
if there are updates for the SSD firmware. I have a couple of Crucial 
M4s that changed from being fairly average to very fast indeed after 
getting later firmware...


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-25 Thread Mark Kirkwood

On 26/11/13 09:28, Andrew Dunstan wrote:


On 11/25/2013 03:19 PM, Heikki Linnakangas wrote:

On 25.11.2013 22:01, Lee Nguyen wrote:

Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM.
That
bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres
database
on our VM farm alongside our application vm's.  We are planning to run a
few Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM?  What are the downsides? Does
anyone
have any metrics or benchmarks with the latest Postgres?


I've also heard people say that they've seen PostgreSQL to perform
worse in a VM. In the performance testing that we've done in VMware,
though, we haven't seen any big impact. So I guess the answer is that
it depends on the specific configuration of CPU, memory, disks and the
software. Synchronous replication is likely going to be the biggest
bottleneck by far, unless it's mostly read-only. I don't know if
virtualization will have a measurable impact on network latency, which
is what matters for synchronous replication.

So, I'd suggest that you try it yourself, and see how it performs. And
please report back to the list, I'd also love to see some numbers!





Yeah, and there are large numbers of public and/or private cloud-based
offerings out there (from Amazon RDS, Heroku, EnterpriseDB and VMware
among others.) Pretty much all of these are VM based, and can be
suitable for many workloads.

Maybe the advice is a bit out of date.



Agreed.

Possibly years ago the maturity of various virtualization layers was 
such that the advice was sound. But these days it seems that provided 
some reading is done (so you understand for instance how to make writes 
go to the hosting hardware), it should be fine.


We make use of many KVM guest VMs on usually Ubuntu and the IO 
performance is pretty indistinguishable from bare metal. In some tests 
we did notice that VMs with 8 cpus tended to stop scaling so we are 
using more smaller VMs rather than fewer big ones [1].


regards

Mark

[1] This was with Pgbench. Note this was over a year ago, so this effect 
may be not present (different kernels and kvm versions), or the magic 
number may be higher than 8 now...





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-06 Thread Mark Kirkwood

On 06/08/13 22:46, Ivan Voras wrote:

Here are two more unexpected results. Same test table (1 mil. records,
id is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
before the experiments):

ivoras=# explain analyze select * from lt where id  90 limit 10;
QUERY PLAN

  Limit  (cost=0.00..1.71 rows=10 width=9) (actual
time=142.669..142.680 rows=10 loops=1)
-  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
(actual time=142.665..142.672 rows=10 loops=1)
  Filter: (id  90)
  Total runtime: 142.735 ms
(4 rows)

Note the Seq Scan.

ivoras=# explain analyze select * from lt where id  90;
   QUERY PLAN
---
  Bitmap Heap Scan on lt  (cost=1683.97..7856.35 rows=101630 width=9)
(actual time=38.462..85.780 rows=10 loops=1)
Recheck Cond: (id  90)
-  Bitmap Index Scan on lt_pkey  (cost=0.00..1658.56 rows=101630
width=0) (actual time=38.310..38.310 rows=10 loops=1)
  Index Cond: (id  90)
  Total runtime: 115.674 ms
(5 rows)

This somewhat explains the above case - we are simply fetching 100,000
records here, and it's slow enough even with the index scan, so
planner skips the index in the former case. BUT, if it did use the
index, it would have been expectedly fast:

ivoras=# set enable_seqscan to off;
SET
ivoras=# explain analyze select * from lt where id  90 limit 10;
  QUERY PLAN

  Limit  (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
rows=10 loops=1)
-  Index Scan using lt_pkey on lt  (cost=0.00..17644.17
rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
  Index Cond: (id  90)
  Total runtime: 0.175 ms
(4 rows)

It looks like the problem is in the difference between what the
planner expects and what the Filter or Index operations deliver:
(cost=0.00..17402.00 rows=101630 width=9) (actual
time=142.665..142.672 rows=10 loops=1).




Hmm - I wonder if the lack or ORDER BY is part of the problem here. 
Consider a similar query on pgbench_accounts:


bench=# explain analyze select aid from pgbench_accounts where aid  
10 limit 20;

QUERY PLAN
-
 Limit  (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 
rows=20 loops=1)
   -  Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846 
width=4) (actual time=0.005..0.463 rows=20 loops=1)

 Filter: (aid  10)
 Total runtime: 0.474 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid  
1000 limit 20;

QUERY PLAN
--
 Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 
rows=20 loops=1)
   -  Index Scan using pgbench_accounts_pkey on pgbench_accounts  
(cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017 
rows=20 loops=1)

 Index Cond: (aid  1000)
 Total runtime: 0.030 ms
(4 rows)


So at some point you get index scans. Now add an ORDER BY:

bench=# explain analyze select aid from pgbench_accounts where aid  
10 order by aid limit 20;

QUERY PLAN

--
--
 Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 
rows=20 loops=1)
   -  Index Scan using pgbench_accounts_pkey on pgbench_accounts  
(cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011 
rows=20 loops=1

)
 Index Cond: (aid  10)
 Total runtime: 0.023 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid  
1000 order by aid limit 20;

QUERY PLAN
--
 Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 
rows=20 loops=1)
   -  Index Scan using pgbench_accounts_pkey on pgbench_accounts  
(cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016 
rows=20 loops=1)

 Index Cond: (aid  1000)
 Total runtime: 0.029 ms
(4 rows)


...and we have index scans for both cases.

Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:

Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Mark Kirkwood

On 23/05/13 13:01, Joshua D. Drake wrote:


On 05/22/2013 04:37 PM, Merlin Moncure wrote:


On Wed, May 22, 2013 at 5:42 PM, Joshua D. Drake 
j...@commandprompt.com wrote:

I am curious how the 710 or S3700 stacks up against the new M500 from
Crucial? I know Intel is kind of the goto for these things but the 
m500 is
power off protected and rated at: Endurance: 72TB total bytes 
written (TBW),

equal to 40GB per day for 5 years .


I don't think the m500 is power safe (nor is any drive at the 1$/gb
price point).


According the the data sheet it is power safe.

http://investors.micron.com/releasedetail.cfm?ReleaseID=732650
http://www.micron.com/products/solid-state-storage/client-ssd/m500-ssd




Yeah - they apparently have a capacitor on board.

Their write endurance is where they don't compare so favorably to the 
S3700 (they are *much* cheaper mind you):


- M500 120GB drive: 40GB per day for 5 years
- S3700 100GB drive: 1000GB per day for 5 years

But great to see more reasonably priced SSD with power off protection.

Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Mark Kirkwood

On 23/05/13 13:32, Mark Kirkwood wrote:

On 23/05/13 13:01, Joshua D. Drake wrote:


On 05/22/2013 04:37 PM, Merlin Moncure wrote:


On Wed, May 22, 2013 at 5:42 PM, Joshua D. Drake 
j...@commandprompt.com wrote:

I am curious how the 710 or S3700 stacks up against the new M500 from
Crucial? I know Intel is kind of the goto for these things but the 
m500 is
power off protected and rated at: Endurance: 72TB total bytes 
written (TBW),

equal to 40GB per day for 5 years .


I don't think the m500 is power safe (nor is any drive at the 1$/gb
price point).


According the the data sheet it is power safe.

http://investors.micron.com/releasedetail.cfm?ReleaseID=732650
http://www.micron.com/products/solid-state-storage/client-ssd/m500-ssd




Yeah - they apparently have a capacitor on board.



Make that quite a few capacitors (top right corner):

http://regmedia.co.uk/2013/05/07/m500_4.jpg


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Mark Kirkwood

On 23/05/13 14:22, Greg Smith wrote:

On 5/22/13 10:04 PM, Mark Kirkwood wrote:

Make that quite a few capacitors (top right corner):
http://regmedia.co.uk/2013/05/07/m500_4.jpg


There are some more shots and descriptions of the internals in the 
excellent review at 
http://techreport.com/review/24666/crucial-m500-ssd-reviewed


That also highlights the big problem with this drive that's kept me 
from buying one so far:


Unlike rivals Intel and Samsung, Crucial doesn't provide utility 
software with a built-in health indicator. The M500's payload of SMART 
attributes doesn't contain any references to flash wear or bytes 
written, either. Several of the SMART attributes are labeled 
Vendor-specific, but you'll need to guess what they track and read 
the associated values using third-party software.


That's a serious problem for most business use of this sort of drive.



Agreed - I was thinking the same thing!

Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-22 Thread Mark Kirkwood

On 23/05/13 14:26, Mark Kirkwood wrote:

On 23/05/13 14:22, Greg Smith wrote:

On 5/22/13 10:04 PM, Mark Kirkwood wrote:

Make that quite a few capacitors (top right corner):
http://regmedia.co.uk/2013/05/07/m500_4.jpg


There are some more shots and descriptions of the internals in the 
excellent review at 
http://techreport.com/review/24666/crucial-m500-ssd-reviewed


That also highlights the big problem with this drive that's kept me 
from buying one so far:


Unlike rivals Intel and Samsung, Crucial doesn't provide utility 
software with a built-in health indicator. The M500's payload of 
SMART attributes doesn't contain any references to flash wear or 
bytes written, either. Several of the SMART attributes are labeled 
Vendor-specific, but you'll need to guess what they track and read 
the associated values using third-party software.


That's a serious problem for most business use of this sort of drive.



Agreed - I was thinking the same thing!




Having said that, there does seem to be a wear leveling counter in its 
SMART attributes - but, yes - I'd like to see indicators more similar 
the level of detail that Intel provides.


Cheers

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reliability with RAID 10 SSD and Streaming Replication

2013-05-16 Thread Mark Kirkwood

On 17/05/13 12:06, Tomas Vondra wrote:

Hi,

On 16.5.2013 16:46, Cuong Hoang wrote:



Pro for the master server. I'm aware of write cache issue on SSDs in
case of power loss. However, our hosting provider doesn't offer any
other choices of SSD drives with supercapacitor. To minimise risk, we
will also set up another RAID 10 SAS in streaming replication mode. For
our application, a few seconds of data loss is acceptable.


Streaming replication allows zero data loss if used in synchronous mode.



I'm not sure synchronous replication is really an option here as it will 
slow the master down to spinning disk io speeds, unless the standby is 
configured with SSDs as well - which probably defeats the purpose of 
this setup.


On the other hand, if the system is so loaded that a pure SAS (spinning 
drive) solution can't keen up, then the standby lag may get to be way 
more than a few seconds...which means look out for huge data loss.


I'd be inclined to apply more leverage to hosting provider to source 
SSDs suitable for your needs, or change hosting providers.


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] statistics target for columns in unique constraint?

2013-05-13 Thread Mark Kirkwood

On 14/05/13 10:10, Marti Raudsepp wrote:

On Mon, May 13, 2013 at 6:01 PM, ach alanchi...@gmail.com wrote:

what I'm wondering is, since
the unique constraint already covers the whole table and all rows in
entirety, is it really necessary for statistics to be set that high on
those?


AFAIK if there are exact-matching unique constraints/indexes for a
query's WHERE clause, the planner will deduce that the query only
returns 1 row and won't consult statistics at all.


Or does that only serve to slow down inserts to that table?


It doesn't slow down inserts directly. Tables are analyzed in the
background by autovacuum. However, I/O traffic from autovacuum analyze
may slow down inserts running concurrently.




A higher number in stats target means larger stats structures - which in 
turn means that the planning stage of *all* queries may be impacted - 
e.g takes up more memory, slightly slower as these larger structures are 
read, iterated over, free'd etc.


So if your only access is via a defined unique key, then (as Marti 
suggests) - a large setting for stats target would seem to be unnecessary.


If you have access to a test environment I'd recommend you model the 
effect of reducing stats target down (back to the default of 100 or even 
to the old version default of 10).


A little - paranoia - maybe switch on statement logging and ensure that 
there are no *other* ways this table is accessed...the fact that the 
number was cranked up from the default is a little suspicious!


Regards

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-10 Thread Mark Kirkwood

(See below for the reply)

On 10/05/13 22:48, Vitalii Tymchyshyn wrote:

Well, could you write a trigger that would do what you need? AFAIR
analyze data is stored no matter transaction boundaries. You could store
some counters in session vars and issue an explicit analyze when enough
rows were added.

7 трав. 2013 08:33, Mark Kirkwood mark.kirkw...@catalyst.net.nz
mailto:mark.kirkw...@catalyst.net.nz напис.

On 07/05/13 18:10, Simon Riggs wrote:

On 7 May 2013 01:23,  mark.kirkw...@catalyst.net.nz
mailto:mark.kirkw...@catalyst.net.nz__ wrote:

I'm thinking that a variant of (2) might be simpler to
inplement:

(I think Matt C essentially beat me to this suggestion - he
originally
discovered this issue). It is probably good enough for only
*new* plans to
react to the increased/increasing number of in progress
rows. So this
would require backends doing significant numbers of row
changes to either
directly update pg_statistic or report their in progress
numbers to the
stats collector. The key change here is the partial
execution numbers
would need to be sent. Clearly one would need to avoid doing
this too
often (!) - possibly only when number of changed rows 
autovacuum_analyze_scale___factor proportion of the relation
concerned or
similar.


Are you loading using COPY? Why not break down the load into chunks?


INSERT - but we could maybe workaround by chunking the INSERT.
However that *really* breaks the idea that in SQL you just say what
you want, not how the database engine should do it! And more
practically means that the most obvious and clear way to add your
new data has nasty side effects, and you have to tip toe around
muttering secret incantations to make things work well :-)

I'm still thinking that making postgres smarter about having current
stats for getting the actual optimal plan is the best solution.


Unfortunately a trigger will not really do the job - analyze ignores in 
progress rows (unless they were added by the current transaction), and 
then the changes made by analyze are not seen by any other sessions. So 
no changes to plans until the entire INSERT is complete and COMMIT 
happens (which could be a while - too long in our case).


Figuring out how to improve on this situation is tricky.


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-10 Thread Mark Kirkwood

On 11/05/13 01:30, Tom Lane wrote:

Mark Kirkwood mark.kirkw...@catalyst.net.nz writes:

Unfortunately a trigger will not really do the job - analyze ignores in
progress rows (unless they were added by the current transaction), and
then the changes made by analyze are not seen by any other sessions. So
no changes to plans until the entire INSERT is complete and COMMIT
happens (which could be a while - too long in our case).


I'm not sure I believe the thesis that plans won't change at all.
The planner will notice that the physical size of the table is growing.
That may not be enough, if the table-contents statistics are missing
or completely unreflective of reality, but it's something.

It is true that *already cached* plans won't change until after an
ANALYZE is done (the key point there being that ANALYZE sends out a
shared-inval message to force replanning of plans for the table).
Conceivably you could issue concurrent ANALYZEs occasionally while
the INSERT is running, not so much to update the stats --- because
they wouldn't --- as to force cached-plan invalidation.


Yeah - true, I was focusing on the particular type of query illustrated 
in the test case - pretty much entirely needing updated selectivity 
stats for a column, which wouldn't change unfortunately.


Cheers

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Mark Kirkwood

On 07/05/13 18:10, Simon Riggs wrote:

On 7 May 2013 01:23,  mark.kirkw...@catalyst.net.nz wrote:


I'm thinking that a variant of (2) might be simpler to inplement:

(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows 
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.


Are you loading using COPY? Why not break down the load into chunks?



INSERT - but we could maybe workaround by chunking the INSERT. However 
that *really* breaks the idea that in SQL you just say what you want, 
not how the database engine should do it! And more practically means 
that the most obvious and clear way to add your new data has nasty side 
effects, and you have to tip toe around muttering secret incantations to 
make things work well :-)


I'm still thinking that making postgres smarter about having current 
stats for getting the actual optimal plan is the best solution.


Cheers

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Mark Kirkwood

On 07/05/13 19:33, Simon Riggs wrote:

On 7 May 2013 07:32, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:

On 07/05/13 18:10, Simon Riggs wrote:


On 7 May 2013 01:23,  mark.kirkw...@catalyst.net.nz wrote:


I'm thinking that a variant of (2) might be simpler to inplement:

(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans
to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows 
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.



Are you loading using COPY? Why not break down the load into chunks?



INSERT - but we could maybe workaround by chunking the INSERT. However that
*really* breaks the idea that in SQL you just say what you want, not how the
database engine should do it! And more practically means that the most
obvious and clear way to add your new data has nasty side effects, and you
have to tip toe around muttering secret incantations to make things work
well :-)


Yes, we'd need to break up SQL statements into pieces and use external
transaction snapshots to do that.


I'm still thinking that making postgres smarter about having current stats
for getting the actual optimal plan is the best solution.


I agree.

The challenge now is to come up with something that actually works;
most of the ideas have been very vague and ignore the many downsides.
The hard bit is the analysis and balanced thinking, not the
developing.



Yeah - seeing likely downsides can be a bit tricky too. I'll have a play 
with some prototyping ideas, since this is actually an area of postgres 
(analyze/stats collector) that I've fiddled with before :-)


Cheers

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread mark . kirkwood
 Simon Riggs wrote:

 Patch works and improves things, but we're still swamped by the block
 accesses via the index.

Which *might* be enough to stop it making the server go unresponsive,
we'll look at the effect of this in the next few days, nice work!


 Which brings me back to Mark's original point, which is that we are
 x100 times slower in this case and it *is* because the choice of
 IndexScan is a bad one for this situation.

 After some thought on this, I do think we need to do something about
 it directly, rather than by tuning infrastructire (as I just
 attempted). The root cause here is that IndexScan plans are sensitive
 to mistakes in data distribution, much more so than other plan types.

 The two options, broadly, are to either

 1. avoid IndexScans in the planner unless they have a *significantly*
 better cost. At the moment we use IndexScans if cost is lowest, even
 if that is only by a whisker.

 2. make IndexScans adaptive so that they switch to other plan types
 mid-way through execution.

 (2) seems fairly hard generically, since we'd have to keep track of
 the tids returned from the IndexScan to allow us to switch to a
 different plan and avoid re-issuing rows that we've already returned.
 But maybe if we adapted the IndexScan plan type so that it adopted a
 more page oriented approach internally, it could act like a
 bitmapscan. Anyway, that would need some proof that it would work and
 sounds like a fair task.

 (1) sounds more easily possible and plausible. At the moment we have
 enable_indexscan = off. If we had something like
 plan_cost_weight_indexscan = N, we could selectively increase the cost
 of index scans so that they would be less likely to be selected. i.e.
 plan_cost_weight_indexscan = 2 would mean an indexscan would need to
 be half the cost of any other plan before it was selected. (parameter
 name selected so it could apply to all parameter types). The reason to
 apply this weighting would be to calculate risk adjusted cost not
 just estimated cost.


I'm thinking that a variant of (2) might be simpler to inplement:

(I think Matt C essentially beat me to this suggestion - he originally
discovered this issue). It is probably good enough for only *new* plans to
react to the increased/increasing number of in progress rows. So this
would require backends doing significant numbers of row changes to either
directly update pg_statistic or report their in progress numbers to the
stats collector. The key change here is the partial execution numbers
would need to be sent. Clearly one would need to avoid doing this too
often (!) - possibly only when number of changed rows 
autovacuum_analyze_scale_factor proportion of the relation concerned or
similar.

regards

Mark



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-05 Thread Mark Kirkwood

On 05/05/13 00:49, Simon Riggs wrote:

On 3 May 2013 13:41, Simon Riggs si...@2ndquadrant.com wrote:


(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.

I think we need both (1) and (3) but the attached patch does just (1).

This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.


ISTM we can improve performance of TransactionIdIsInProgress() by
caching the procno of our last xid.

Mark, could you retest with both these patches? Thanks.



Thanks Simon, will do and report back.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-02 Thread mark . kirkwood
 On 2 May 2013 01:49, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:

 I think we need a problem statement before we attempt a solution,
 which is what Tom is alluding to.


Actually no - I think Tom (quite correctly) was saying that the patch was
not a viable solution. With which I agree.

I believe the title of this thread is the problem statement.

 ISTM that you've got a case where the plan is very sensitive to a
 table load. Which is a pretty common situation and one that can be
 solved in various ways. I don't see much that Postgres can do because
 it can't know ahead of time you're about to load rows. We could
 imagine an optimizer that set thresholds on plans that caused the
 whole plan to be recalculated half way thru a run, but that would be a
 lot of work to design and implement and even harder to test. Having
 static plans at least allows us to discuss what it does after the fact
 with some ease.

 The plan is set using stats that are set when there are very few
 non-NULL rows, and those increase massively on load. The way to cope
 is to run the ANALYZE immediately after the load and then don't allow
 auto-ANALYZE to reset them later.

No. We do run analyze immediately after the load. The surprise was that
this was not sufficient - the (small) amount of time where non optimal
plans were being used due to the in progress row activity was enough to
cripple the system - that is the problem. The analysis of why not led to
the test case included in the original email. And sure it is deliberately
crafted to display the issue, and is therefore open to criticism for being
artificial. However it was purely meant to make it easy to see what I was
talking about.


Currently we are working around this by coercing one of the predicates in
the query to discourage the attractive looking but dangerous index.

I think the idea of telling postgres that we are doing a load is probably
the wrong way to go about this. We have a framework that tries to
automatically figure out the best plans...I think some more thought about
how to make that understand some of the more subtle triggers for a
time-to-do-new-plans moment is the way to go. I understand this is
probably hard - and may imply some radical surgery to how the stats
collector and planner interact.

Regards

Mark




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-02 Thread mark . kirkwood
 mark.kirkw...@catalyst.net.nz wrote on 03.05.2013 00:19:
 I think the idea of telling postgres that we are doing a load is
 probably
 the wrong way to go about this. We have a framework that tries to
 automatically figure out the best plans...I think some more thought
 about
 how to make that understand some of the more subtle triggers for a
 time-to-do-new-plans moment is the way to go. I understand this is
 probably hard - and may imply some radical surgery to how the stats
 collector and planner interact.

 I wonder if freezing (analyze, then disable autovacuum) the statistics
 for the large number of rows would work.




I'm thinking that the issue is actually the opposite - it is that a new
plan is needed because the new (uncomitted) rows are changing the data
distribution. So we want more plan instability rather than plan stability
:-)

Cheers

Mark




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-01 Thread Mark Kirkwood

On 26/04/13 15:34, Gavin Flower wrote:

On 26/04/13 15:19, Mark Kirkwood wrote:

While in general you are quite correct - in the above case
(particularly as I've supplied a test case) it should be pretty
obvious that any moderately modern version of postgres on any
supported platform will exhibit this.



While I admit that I did not look closely at your test case - I am aware
that several times changes to Postgres from one minor version to
another, can have drastic unintended side effects (which might, or might
not, be relevant to your situation). Besides, it helps sets the scene,
and is one less thing that needs to be deduced.



Indeed - however, my perhaps slightly grumpy reply to your email was 
based on an impression of over keen-ness to dismiss my message without 
reading it (!) and a - dare I say it - one size fits all presentation of 
here are the hoops to jump through. Now I spent a reasonable amount of 
time preparing the message and its attendant test case - and a comment 
such as your based on *not reading it* ...errrm... well lets say I think 
we can/should do better.


I am concerned that the deafening lack of any replies to my original 
message is a result of folk glancing at your original quick reply and 
thinking... incomplete problem spec...ignore... when that is not that 
case - yes I should have muttered 9.2 in the original email, but we 
have covered that now.


Regards


Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-01 Thread Mark Kirkwood

On 02/05/13 02:06, Tom Lane wrote:

Mark Kirkwood mark.kirkw...@catalyst.net.nz writes:

I am concerned that the deafening lack of any replies to my original
message is a result of folk glancing at your original quick reply and
thinking... incomplete problem spec...ignore... when that is not that
case - yes I should have muttered 9.2 in the original email, but we
have covered that now.

No, I think it's more that we're trying to get to beta, and so anything
that looks like new development is getting shuffled to folks' to
look at later queues.  The proposed patch is IMO a complete nonstarter
anyway; but I'm not sure what a less bogus solution would look like.



Yeah, I did think that beta might be consuming everyone's attention (of 
course immediately *after* sending the email)!


And yes, the patch was merely to illustrate the problem rather than any 
serious attempt at a solution.


Regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] In progress INSERT wrecks plans on table

2013-04-25 Thread Mark Kirkwood

Recently we encountered the following unhappy sequence of events:

1/ system running happily
2/ batch load into table begins
3/ very quickly (some) preexisting queries on said table go orders of 
magnitude slower

4/ database instance becomes unresponsive
5/ application outage

After looking down a few false leads, We've isolated the cause to the 
following:


The accumulating in-progress row changes are such that previously 
optimal plans are optimal no longer. Now this situation will fix itself 
when the next autoanalyze happens (and new plan will be chosen) - 
however that cannot occur until the batch load is completed and 
committed (approx 70 seconds). However during that time there is enough 
of a performance degradation for queries still using the old plan to 
cripple the server.


Now that we know what is happening we can work around it. But I'm 
wondering - is there any way (or if not should there be one) to let 
postgres handle this automatically? I experimented with a quick hack to 
src/backend/commands/analyze.c (attached) that lets another session's 
ANALYZE see in progress rows - which works but a) may cause other 
problems and b) does not help autoaanalyze which has to wait for COMMIT 
+ stats message.


I've attached a (synthetic) test case that shows the issue, I'll 
reproduce the output below to hopefully make the point obvious:



   Table public.plan
 Column |Type | Modifiers
+-+---
 id | integer | not null
 typ| integer | not null
 dat| timestamp without time zone |
 val| text| not null
Indexes:
plan_id UNIQUE, btree (id)
plan_dat btree (dat)
plan_typ btree (typ)


[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
 QUERY PLAN
-
 Index Scan using plan_dat on plan  (cost=0.00..265.47 rows=55 
width=117) (actual time=0.130..4.409 rows=75 loops=1)

   Index Cond: (dat IS NOT NULL)
   Filter: (typ = 3)
   Rows Removed by Filter: 5960
 Total runtime: 4.440 ms
(5 rows)

[Session 2]

BEGIN;
INSERT INTO plan
SELECT id + 201,typ,current_date + id * '1 seconds'::interval ,val
FROM plan
;

[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;

  QUERY PLAN
---
 Index Scan using plan_dat on plan  (cost=0.00..551.35 rows=91 
width=117) (actual time=0.131..202.699 rows=75 loops=1)

   Index Cond: (dat IS NOT NULL)
   Filter: (typ = 3)
   Rows Removed by Filter: 5960
 Total runtime: 202.729 ms
(5 rows)
[Session 2]
COMMIT;

[Session 1...wait for autoanalyze to finish then]

EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
QUERY PLAN
---
 Bitmap Heap Scan on plan  (cost=407.87..44991.95 rows=10116 width=117) 
(actual time=2.692..6.582 rows=75 loops=1)

   Recheck Cond: (typ = 3)
   Filter: (dat IS NOT NULL)
   Rows Removed by Filter: 19925
   -  Bitmap Index Scan on plan_typ  (cost=0.00..405.34 rows=20346 
width=0) (actual time=2.573..2.573 rows=2 loops=1)

 Index Cond: (typ = 3)
 Total runtime: 6.615 ms


Regards

Mark


plan.tar.gz
Description: application/gzip
*** analyze.c.orig	2013-04-26 10:40:06.634942283 +1200
--- analyze.c	2013-04-26 11:36:13.537404101 +1200
***
*** 1173,1183 
  	 * has to adjust the numbers we send to the stats
  	 * collector to make this come out right.)
  	 */
! 	if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(targtuple.t_data)))
! 	{
! 		sample_it = true;
! 		liverows += 1;
! 	}
  	break;
  
  case HEAPTUPLE_DELETE_IN_PROGRESS:
--- 1173,1181 
  	 * has to adjust the numbers we send to the stats
  	 * collector to make this come out right.)
  	 */
! 	/* Amend so insert in progress tuples are counted */
! 	sample_it = true;
! 	liverows += 1;
  	break;
  
  case HEAPTUPLE_DELETE_IN_PROGRESS:

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-04-25 Thread Mark Kirkwood

On 26/04/13 14:56, Gavin Flower wrote:

On 26/04/13 14:33, Mark Kirkwood wrote:

Recently we encountered the following unhappy sequence of events:

1/ system running happily
2/ batch load into table begins
3/ very quickly (some) preexisting queries on said table go orders of 
magnitude slower

4/ database instance becomes unresponsive
5/ application outage

After looking down a few false leads, We've isolated the cause to the 
following:


The accumulating in-progress row changes are such that previously 
optimal plans are optimal no longer. Now this situation will fix 
itself when the next autoanalyze happens (and new plan will be 
chosen) - however that cannot occur until the batch load is completed 
and committed (approx 70 seconds). However during that time there is 
enough of a performance degradation for queries still using the old 
plan to cripple the server.


Now that we know what is happening we can work around it. But I'm 
wondering - is there any way (or if not should there be one) to let 
postgres handle this automatically? I experimented with a quick hack 
to src/backend/commands/analyze.c (attached) that lets another 
session's ANALYZE see in progress rows - which works but a) may cause 
other problems and b) does not help autoaanalyze which has to wait 
for COMMIT + stats message.


I've attached a (synthetic) test case that shows the issue, I'll 
reproduce the output below to hopefully make the point obvious:



   Table public.plan
 Column |Type | Modifiers
+-+---
 id | integer | not null
 typ| integer | not null
 dat| timestamp without time zone |
 val| text| not null
Indexes:
plan_id UNIQUE, btree (id)
plan_dat btree (dat)
plan_typ btree (typ)


[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
 QUERY PLAN
- 

 Index Scan using plan_dat on plan  (cost=0.00..265.47 rows=55 
width=117) (actual time=0.130..4.409 rows=75 loops=1)

   Index Cond: (dat IS NOT NULL)
   Filter: (typ = 3)
   Rows Removed by Filter: 5960
 Total runtime: 4.440 ms
(5 rows)

[Session 2]

BEGIN;
INSERT INTO plan
SELECT id + 201,typ,current_date + id * '1 seconds'::interval ,val
FROM plan
;

[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;

  QUERY PLAN
--- 

 Index Scan using plan_dat on plan  (cost=0.00..551.35 rows=91 
width=117) (actual time=0.131..202.699 rows=75 loops=1)

   Index Cond: (dat IS NOT NULL)
   Filter: (typ = 3)
   Rows Removed by Filter: 5960
 Total runtime: 202.729 ms
(5 rows)
[Session 2]
COMMIT;

[Session 1...wait for autoanalyze to finish then]

EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
QUERY PLAN
--- 

 Bitmap Heap Scan on plan  (cost=407.87..44991.95 rows=10116 
width=117) (actual time=2.692..6.582 rows=75 loops=1)

   Recheck Cond: (typ = 3)
   Filter: (dat IS NOT NULL)
   Rows Removed by Filter: 19925
   -  Bitmap Index Scan on plan_typ  (cost=0.00..405.34 rows=20346 
width=0) (actual time=2.573..2.573 rows=2 loops=1)

 Index Cond: (typ = 3)
 Total runtime: 6.615 ms


Regards

Mark



Hmm...

You need to specify:

1. version of Postgres
2. Operating system
3. changes to postgresql.conf
4. CPU/RAM etc
5. anything else that might be relevant





While in general you are quite correct - in the above case (particularly 
as I've supplied a test case) it should be pretty obvious that any 
moderately modern version of postgres on any supported platform will 
exhibit this.


I produced the above test case on Postgres 9.2.4 Ubuntu 13.04, with no 
changes to the default postgresql.conf



Now our actual production server is a 32 CPU box with 512GB RAM, and 16 
SAS SSD running Postgres 9.2.4 on Ubuntu 12.04. And yes there are quite 
a few changes from the defaults there - and I wasted quite a lot of time 
chasing issues with high CPU and RAM, and changing various configs to 
see if they helped - before identifying that the issue was in progress 
row changes and planner statistics.  Also in the real case with much 
bigger datasets the difference between the plan being optimal and it 
*not* being optimal is a factor of 2000x elapsed time instead of a mere 
50x  !


regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres upgrade, security release, where?

2013-04-02 Thread Mark Kirkwood

On 02/04/13 21:34, Dave Page wrote:

On Mon, Apr 1, 2013 at 11:43 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:

On 02/04/13 13:55, Bruce Momjian wrote:


On Tue, Apr  2, 2013 at 09:40:07AM +0900, Ian Lawrence Barwick wrote:


Due to the security nature of the release, the source and binaries will
only be publicly available on April 4 --- there are no pre-release
versions available.



The PostgreSQL homepage has a big announcement saying
PostgreSQL minor versions released!, including a mention of a
security issue;
unfortunately it's not obvious that this is for the prior 9.2.3 release
and as
the announcement of the upcoming security release
( http://www.postgresql.org/about/news/1454/ ) does not mention the
new release number, methinks there is plenty of room for confusion :(

It might be an idea to update the splash box with details of the
upcoming
release.



I agree updating the spash box would make sense.



Or perhaps include a date on said splashes, so we know when to panic :-)


I've added the date to the splash. You can cease panicing now :-)



...wipes forehead...



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problems with pg_locks explosion

2013-04-01 Thread Mark Kirkwood
In addition to tuning the various Postgres config knobs you may need to 
look at how your AWS server is set up. If your load is causing an IO 
stall then *symptoms* of this will be lots of locks...


You have quite a lot of memory (60G), so look at tuning the 
vm.dirty_background_ratio, vm.dirty_ratio sysctls to avoid trying to 
*suddenly* write out many gigs of dirty buffers.


Your provisioned volumes are much better than the default AWS ones, but 
are still not hugely fast (i.e 1000 IOPS is about 8 MB/s worth of 
Postgres 8k buffers). So you may need to look at adding more volumes 
into the array, or adding some separate ones and putting pg_xlog 
directory on 'em.


However before making changes I would recommend using iostat or sar to 
monitor how volumes are handling the load (I usually choose a 1 sec 
granularity and look for 100% util and high - server hundred ms - 
awaits). Also iotop could be enlightening.


Regards

Mark

On 02/04/13 11:35, Armand du Plessis wrote:


It's on Amazon EC2 -
* cc2.8xlarge instance type
* 6 volumes in RAID-0 configuration. (1000 PIOPS)

60.5 GiB of memory
88 EC2 Compute Units (2 x Intel Xeon E5-2670, eight-core)
3370 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
EBS-Optimized Available: No**
API name: cc2.8xlarge





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Fwd: [PERFORM] Problems with pg_locks explosion

2013-04-01 Thread Mark Kirkwood
Yeah, as I understand it you should have 6000 IOPS available for the md 
device (ideally).


The iostats you display certainly look benign... but the key time to be 
sampling would be when you see the lock list explode - could look very 
different then.


Re vm.dirty* - I would crank the values down by a factor of 5:

vm.dirty_background_ratio = 1 (down from 5)
vm.dirty_ratio = 2 (down from 10)

Assuming of course that you actually are seeing an IO stall (which 
should be catchable via iostat or iotop)... and not some other issue. 
Otherwise leave 'em alone and keep looking :-)


Cheers

Mark


On 02/04/13 13:31, Armand du Plessis wrote:



I had a look at the iostat output (on a 5s interval) and pasted it
below. The utilization and waits seems low. Included a sample below, #1
taken during normal operation and then when the locks happen it
basically drops to 0 across the board. My (mis)understanding of the IOPS
was that it would be 1000 IOPS per/volume and when in RAID0 should give
me quite a bit higher throughput than in a single EBS volume setup. (My
naive envelop calculation was #volumes * PIOPS = Effective IOPS :/)


I'm looking into  vm.dirty_background_ratio, vm.dirty_ratio sysctls. Is
there any guidance or links available that would be useful as a starting
point?

Thanks again for the help, I really appreciate it.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres upgrade, security release, where?

2013-04-01 Thread Mark Kirkwood

On 02/04/13 13:55, Bruce Momjian wrote:

On Tue, Apr  2, 2013 at 09:40:07AM +0900, Ian Lawrence Barwick wrote:

Due to the security nature of the release, the source and binaries will
only be publicly available on April 4 --- there are no pre-release
versions available.


The PostgreSQL homepage has a big announcement saying
PostgreSQL minor versions released!, including a mention of a
security issue;
unfortunately it's not obvious that this is for the prior 9.2.3 release and as
the announcement of the upcoming security release
( http://www.postgresql.org/about/news/1454/ ) does not mention the
new release number, methinks there is plenty of room for confusion :(

It might be an idea to update the splash box with details of the upcoming
release.


I agree updating the spash box would make sense.



Or perhaps include a date on said splashes, so we know when to panic :-)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-20 Thread Mark Kirkwood

On 21/03/13 13:44, David Rees wrote:

On Thu, Mar 14, 2013 at 4:37 PM, David Boreham david_l...@boreham.org wrote:

You might want to evaluate the performance you can achieve with a single-SSD
(use several for capacity by all means) before considering a RAID card + SSD
solution.
Again I bet it depends on the application but our experience with the older
Intel 710 series is that their performance out-runs the CPU, at least under
our PG workload.


How many people are using a single enterprise grade SSD for production
without RAID? I've had a few consumer grade SSDs brick themselves -
but are the enterprise grade SSDs, like the new Intel S3700 which you
can get in sizes up to 800GB, reliable enough to run as a single drive
without RAID1? The performance of one is definitely good enough for
most medium sized workloads without the complexity of a BBU RAID and
multiple spinning disks...



If you are using Intel S3700 or 710's you can certainly use a pair setup 
in software RAID1 (so avoiding the need for RAID cards and BBU etc).


I'd certainly feel happier with 2 drives :-) . However, a setup using 
replication with a number of hosts - each with a single SSD is going to 
be ok.


Regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-14 Thread Mark Kirkwood

On 15/03/13 07:54, Bruce Momjian wrote:

Only use SSDs with a BBU cache, and don't set SSD caches to
write-through because an SSD needs to cache the write to avoid wearing
out the chips early, see:

http://momjian.us/main/blogs/pgblog/2012.html#August_3_2012



I not convinced about the need for BBU with SSD - you *can* use them 
without one, just need to make sure about suitable longevity and also 
the presence of (proven) power off protection (as discussed previously). 
It is worth noting that using unproven or SSD known to be lacking power 
off protection with a BBU will *not* save you from massive corruption 
(or device failure) upon unexpected power loss.


Also, in terms of performance, the faster PCIe SSD do about as well by 
themselves as connected to a RAID card with BBU. In fact they will do 
better in some cases (the faster SSD can get close to the max IOPS many 
RAID cards can handle...so more than a couple of 'em plugged into one 
card will be throttled by its limitations).


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-14 Thread Mark Kirkwood

On 15/03/13 10:37, Mark Kirkwood wrote:


Also, in terms of performance, the faster PCIe SSD do about as well by 
themselves as connected to a RAID card with BBU.




Sorry - I meant to say the faster **SAS** SSD do..., since you can't 
currently plug PCIe SSD into RAID cards (confusingly, some of the PCIe 
guys actually have RAID card firmware on their boards...Intel 910 I think).


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-14 Thread Mark Kirkwood

On 15/03/13 11:34, Bruce Momjian wrote:


I don't think any drive that corrupts on power-off is suitable for a
database, but for non-db uses, sure, I guess they are OK, though you
have to be pretty money-constrainted to like that tradeoff.



Agreed - really *all* SSD should have capacitor (or equivalent) power 
off protection...that fact that it's a feature present on only a handful 
of drives is...disappointing.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-13 Thread Mark Kirkwood

On 14/03/13 09:16, David Boreham wrote:

On 3/13/2013 1:23 PM, Steve Crawford wrote:


What concerns me more than wear is this:

InfoWorld Article:
http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715


Referenced research paper:
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault


Kind of messes with the D in ACID.


It is somewhat surprising to discover that many SSD products are not
durable under sudden power loss (what where they thinking!?, and ...why
doesn't anyone care??).

However, there is a set of SSD types known to be designed to address
power loss events that have been tested by contributors to this list.
Use only those devices and you won't see this problem. SSDs do have a
wear-out mechanism but wear can be monitored and devices replaced in
advance of failure. In practice longevity is such that most machines
will be in the dumpster long before the SSD wears out. We've had
machines running with several hundred wps constantly for 18 months using
Intel 710 drives and the wear level SMART value is still zero.

In addition, like any electronics module (CPU, memory, NIC), an SSD can
fail so you do need to arrange for valuable data to be replicated.
As with old school disk drives, firmware bugs are a concern so you might
want to consider what would happen if all the drives of a particular
type all decided to quit working at the same second in time (I've only
seen this happen myself with magnetic drives, but in theory it could
happen with SSD).




Just going through this now with a vendor. They initially assured us 
that the drives had end to end protection so we did not need to worry. 
I had to post stripdown pictures from Intel's s3700, showing obvious 
capacitors attached to the board before I was taken seriously and 
actually meaningful specifications were revealed. So now I'm demanding 
to know:


- chipset (and version)
- original manufacturer (for re-badged ones)
- power off protection *explicitly* mentioned
- show me the circuit board (and where are the capacitors)

Seems like you gotta push 'em!

Cheers

Mark





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware upgrade, performance degrade?

2013-03-04 Thread Mark Kirkwood

On 05/03/13 11:54, Steven Crandell wrote:

Here's our hardware break down.

The logvg on the new hardware  is 30MB/s slower (170 MB/s vs 200 MB/s )
than the logvg on the older hardware which was an immediately interesting
difference but we have yet to be able to create a test scenario that
successfully implicates this slower log speed in our problems. That is
something we are actively working on.


Old server hardware:
 Manufacturer: Dell Inc.
 Product Name: PowerEdge R810
 4x Intel(R) Xeon(R) CPU   E7540  @ 2.00GHz
 32x16384 MB 1066 MHz DDR3
 Controller 0: PERC H700 - 2 disk RAID-1 278.88 GB rootvg
 Controller 1: PERC H800 - 18 disk RAID-6 2,178.00 GB datavg, 4
drive RAID-10 272.25 GB logvg, 2 hot spare
 2x 278.88 GB 15K SAS on controller 0
 24x 136.13 GB 15K SAS on controller 1

New server hardware:
Manufacturer: Dell Inc.
 Product Name: PowerEdge R820
 4x Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
 32x32 GB 1333 MHz DDR3
 Controller 0: PERC H710P  - 4 disk RAID-6 557.75 GB rootvg
 Controller 1: PERC H810- 20 disk RAID-60 4,462.00 GB datavg, 2
disk RAID-1  278.88 GB logvg, 2 hot spare
 28x278.88 GB 15K SAS drives total.


On Sun, Mar 3, 2013 at 1:34 PM, Jean-David Beyer jeandav...@verizon.netwrote:



Right - It is probably worth running 'pg_test_fsync' on the two logvg's 
and comparing the results. This will tell you if the commit latency is 
similar or not on the two disk systems.


One other difference that springs immediately to mind is that datavg is 
an 18 disk RAID 6 on the old system and a 20 disk RAID 60 on the new 
one...so you have about 1/2 the io performance right there.


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

2013-02-19 Thread Mark Kirkwood

On 20/02/13 06:51, Josh Berkus wrote:

On 02/18/2013 08:28 PM, Mark Kirkwood wrote:

Might be worth looking at your vm.dirty_ratio, vm.dirty_background_ratio
and friends settings. We managed to choke up a system with 16x SSD by
leaving them at their defaults...

Yeah?  Any settings you'd recommend specifically?  What did you use on
the SSD system?



We set:

vm.dirty_background_ratio = 0
vm.dirty_background_bytes = 1073741824
vm.dirty_ratio = 0
vm.dirty_bytes = 2147483648

i.e 1G for dirty_background and 2G for dirty. We didn't spend much time 
afterwards fiddling with the size much. I'm guessing the we could have 
made it bigger - however the SSD were happier to be constantly writing a 
few G than being handed (say) 50G of buffers to write at once . The 
system has 512G of ram and 32 cores (no hyperthreading).


regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

2013-02-19 Thread Mark Kirkwood

On 20/02/13 12:24, Josh Berkus wrote:


NM, I tested lowering dirty_background_ratio, and it didn't help,
because checkpoints are kicking in before pdflush ever gets there.

So the issue seems to be that if you have this combination of factors:

1. large RAM
2. many/fast CPUs
3. a database which fits in RAM but is larger than the RAID controller's
WB cache
4. pg_xlog on the same volume as pgdata

... then you'll see checkpoint stalls and spread checkpoint will
actually make them worse by making the stalls longer.

Moving pg_xlog to a separate partition makes this better.  Making
bgwriter more aggressive helps a bit more on top of that.



We have pg_xlog on a pair of PCIe SSD. Also we running the deadline io 
scheduler.


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

2013-02-18 Thread Mark Kirkwood

On 19/02/13 13:39, Josh Berkus wrote:

Scott,


So do you have generally slow IO, or is it fsync behavior etc?

All tests except pgBench show this system as superfast.  Bonnie++ and DD
tests are good (200 to 300mb/s), and test_fsync shows 14K/second.
Basically it has no issues until checkpoint kicks in, at which time the
entire system basically halts for the duration of the checkpoint.

For that matter, if I run a pgbench and halt it just before checkpoint
kicks in, I get around 12000TPS, which is what I'd expect on this system.

At this point, we've tried 3.2.0.26, 3.2.0.27, 3.4.0, and tried updating
the RAID driver, and changing the IO scheduler.  Nothing seems to affect
the behavior.   Testing using Ext4 (instead of XFS) next.




Might be worth looking at your vm.dirty_ratio, vm.dirty_background_ratio 
and friends settings. We managed to choke up a system with 16x SSD by 
leaving them at their defaults...


Cheers

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   >