[PERFORM] V8 optimisation (if you're using javascript in postgres)

2015-10-13 Thread Graeme B. Bell
https://medium.com/@c2c/nodejs-a-quick-optimization-advice-7353b820c92e

100% performance boost, for mysterious reasons that may be worth knowing about… 

Graeme Bell
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] One long transaction or multiple short transactions?

2015-10-09 Thread Graeme B. Bell

> I don't think inserts can cause contention on the server. Insert do not lock 
> tables during the transaction. You may have contention on sequence but it 
> won't vary with transaction size.

Perhaps there could be a trigger on inserts which creates some lock contention?




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] One long transaction or multiple short transactions?

2015-10-08 Thread Graeme B. Bell
Sounds like a locking problem, but assuming you aren’t sherlock holmes and 
simply want to get the thing working as soon as possible: 

Stick a fast SSD in there (whether you stay on VM or physical). If you have 
enough I/O, you may be able to solve the problem with brute force.
SSDs are a lot cheaper than your time. 

Suggest you forward this to your operators: a talk I have about optimising 
multi-threaded work in postgres:  

  http://graemebell.net/foss4gcomo.pdf (Slides: “Input/Output” in the 
middle of the talk and also the slides at the end labelled “For Techies")

Graeme Bell

p.s. You mentioned a VM. Consider making the machine physical and not VM. 
You’ll get a performance boost and remove the risk of DB corruption from 
untrustworthy VM fsyncs. One day there will be a power cut or O/S crash during 
these your writes and with a VM you’ve a reasonable chance of nuking your DB 
because VM virtualised storage often doesn’t honour fsync (for performance 
reasons), but it’s fundamental to correct operation of PG. 



> On 08 Oct 2015, at 01:40, Carlo  wrote:
> 
> 
> I am told 32 cores on a LINUX VM. The operators have tried limiting the 
> number of threads. They feel that the number of connections is optimal. 
> However, under the same conditions they noticed a sizable boost in 
> performance if the same import was split into two successive imports which 
> had shorter transactions.
>  



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large object write performance

2015-10-08 Thread Graeme B. Bell

>> First the database was on a partition where compression was enabled, I 
>> changed it to an uncompressed one to see if it makes a difference thinking 
>> maybe the cpu couldn't handle the load.
> It made little difference in my case.
> 
> My regular gmirror partition seems faster:
> dd bs=8k count=25600 if=/dev/zero of=./test
> 25600+0 records in
> 25600+0 records out
> 209715200 bytes transferred in 1.513112 secs (138598612 bytes/sec)
> 
> the zfs compressed partition also goes faster:
> dd bs=8k count=25600 if=/dev/zero of=./test
> 25600+0 records in
> 25600+0 records out
> 209715200 bytes transferred in 0.979065 secs (214199479 bytes/sec)
> but this one didn't really go that fast in my test (maybe 10%)


Please can you run iozone and look for low random write performance with small 
blocks?  (4k)
http://www.slashroot.in/linux-file-system-read-write-performance-test

Also please can you CC to the list with your replies to my on-list emails?

Graeme Bell



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large object write performance

2015-10-08 Thread Graeme B. Bell
Seems a bit slow.

1. Can you share the script (the portion that does the file transfer) to the 
list? Maybe you’re doing something unusual there by mistake.
Similarly the settings you’re using for scp. 

2. What’s the network like?
For example, what if the underlying network is only capable of 10MB/s peak, and 
scp is using compression and the files are highly compressible?
Have you tried storing zip or gzip’d versions of the file into postgres? 
(that’s probably a good idea anyway)

3. ZFS performance can depend on available memory and use of caches (memory + 
L2ARC for reading, ZIL cache for writing).
Maybe put an intel SSD in there (or a pair of them) and use it as a ZIL cache. 

4. Use dd to measure the write performance of ZFS doing a local write to the 
machine. What speed do you get?

5. Transfer a zip’d file over the network using scp. What speed do you get?

6. Is your postgres running all the time or do you start it before this test? 
Perhaps check if any background tasks are running when you use postgres - 
autovacuum, autoanalyze etc. 

Graeme Bell

> On 08 Oct 2015, at 11:17, Bram Van Steenlandt  wrote:
> 
> Hi,
> 
> I use postgresql often but I'm not very familiar with how it works internal.
> 
> I've made a small script to backup files from different computers to a 
> postgresql database.
> Sort of a versioning networked backup system.
> It works with large objects (oid in table, linked to large object), which I 
> import using psycopg
> 
> It works well but slow.
> 
> The database (9.2.9) on the server (freebsd10) runs on a zfs mirror.
> If I copy a file to the mirror using scp I get 37MB/sec
> My script achieves something like 7 or 8MB/sec on large (+100MB) files.
> 
> I've never used postgresql for something like this, is there something I can 
> do to speed things up ?
> It's not a huge problem as it's only the initial run that takes a while 
> (after that, most files are already in the db).
> Still it would be nice if it would be a little faster.
> cpu is mostly idle on the server, filesystem is running 100%.
> This is a seperate postgresql server (I've used freebsd profiles to have 2 
> postgresql server running) so I can change this setup so it will work better 
> for this application.
> 
> I've read different suggestions online but I'm unsure which is best, they all 
> speak of files which are only a few Kb, not 100MB or bigger.
> 
> ps. english is not my native language
> 
> thx
> Bram
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large object write performance

2015-10-08 Thread Graeme B. Bell
>> 
>> 
> Like this ?
> 
> gmirror (iozone -s 4 -a /dev/mirror/gm0s1e) = 806376 (faster drives)
> zfs uncompressed (iozone -s 4 -a /datapool/data) = 650136
> zfs compressed (iozone -s 4 -a /datapool/data) = 676345


If you can get the complete tables (as in the images on the blog post) with 
random performance compared to sequential etc, different block sizes, that 
would be very interesting.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large object write performance

2015-10-08 Thread Graeme B. Bell

> On 08 Oct 2015, at 11:17, Bram Van Steenlandt  wrote:
> 
> The database (9.2.9) on the server (freebsd10) runs on a zfs mirror.
> If I copy a file to the mirror using scp I get 37MB/sec
> My script achieves something like 7 or 8MB/sec on large (+100MB) files.


This may help - great blog article about ZFS with postgres and how use you can 
zfs compression to boost i/o performance substantially.
If your machine is making a lot of smaller writes in postgres (as opposed to 
presumably large writes by scp) then this may alleviate things a bit.

https://www.citusdata.com/blog/64-zfs-compression

Graeme Bell

p.s. Apologies for top-posting on my previous message.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large object write performance

2015-10-08 Thread Graeme B. Bell

> On 08 Oct 2015, at 13:50, Bram Van Steenlandt  wrote:
>>> 1. The part is "fobj = lobject(db.db,0,"r",0,fpath)", I don't think there 
>>> is anything there

Re: lobject

http://initd.org/psycopg/docs/usage.html#large-objects

"Psycopg large object support *efficient* import/export with file system files 
using the lo_import() and lo_export() libpq functions.”

See *

lobject seems to default to string handling in Python
That’s going to be slow.
Try using lo_import / export?

Graeme Bell

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] large object write performance

2015-10-08 Thread Graeme B. Bell

>> 
>> http://initd.org/psycopg/docs/usage.html#large-objects
>> 
>> 
>> "Psycopg large object support *efficient* import/export with file system 
>> files using the lo_import() and lo_export() libpq functions.”
>> 
>> See *
>> 
> I was under the impression they meant that the lobject was using lo_import 
> and lo_export.
> I can't seem to find how to use lo_import en export, I searched google and 
> came to the conclusion the lobject was the way to go.
> >>> x.lo_import()
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'psycopg2._psyco

Bram,

I recommend posting this as a question on a python/psycopg mailing list, for 
advice.
You are probably not the first person to encounter it.

Graeme Bell



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Another parallel postgres project...

2015-09-29 Thread Graeme B. Bell
I previously posted about par_psql, but I recently found another PG parallelism 
project which can do a few extra things that par_psql can’t: 

https://github.com/moat/pmpp
pmpp: Poor Man's Parallel Processing. 

Corey Huinker had the idea of using dblink async as a foundation for 
distributing queries. This allows parallelisation at the query level and across 
multiple dbs simultaneously. 
Nice idea!

Graeme Bell
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] incredible surprise news from intel/micron right now...

2015-08-20 Thread Graeme B. Bell
On 28 Jul 2015, at 22:29, Graeme B. Bell graeme.b...@nibio.no wrote:

 Entering production, availability 2016
 1000x faster than nand flash/ssd , eg dram-latency
 10x denser than dram
 1000x write endurance of nand
 Priced between flash and dram
 Manufactured by intel/micron
 Non-volatile

http://www.anandtech.com/show/9541/intel-announces-optane-storage-brand-for-3d-xpoint-products

Some new information (for anyone putting thought into 2016 DB hardware 
purchases). 

Throughput seems to be good. 
7x better IOPS than one of the best enterprise PCIe SSDs on the market, with 
queue depth 1, 
5x better as queue depth gets higher. 

Graeme. 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any ideas how can I speed up this query?

2015-07-28 Thread Graeme B. Bell
 
 QUERY
 
 SELECT COUNT(*) FROM occurrences WHERE (lat = -27.91550355958 AND lat 
 = -27.015680440420002 AND lng = 152.13307044728307 AND lng = 
 153.03137355271693 AND category_id = 1 AND (ST_Intersects( 
 ST_Buffer(ST_PointFromText('POINT(152.58 -27.465592)')::geography, 
 5)::geography, location::geography)));

 How I can assist planner in providing better row estimates for Bitmap Heap 
 Scan section?

By googling this phrase from your EXPLAIN:   Rows Removed by Index Recheck: 
748669  - you can find this explanation: 

http://stackoverflow.com/questions/26418715/postgresql-rows-removed-by-index

The inner Bitmap Index Scan node is producing a bitmap, putting 1 to all the 
places where records that match your search key are found, and 0 otherwise. As 
your table is quite big, the size of the bitmap is getting bigger, then 
available memory for these kind of operations, configured via work_mem, becomes 
small to keep the whole bitmap.

When in lack of a memory, inner node will start producing 1 not for records, 
but rather for blocks that are known to contain matching records. This means, 
that outer node Bitmap Heap Scan has to read all records from such block and 
re-check them. Obiously, there'll be some non-matching ones, and their number 
is what you see as Rows Removed by Index Recheck.

Therefore, try substantially increasing your work_mem (use set.  so that 
it's on a per-session basis, not global) so that you don't have to read in all 
the rows to re-check them.
This is why Googling phrases from your explain before list-posting is always a 
good idea :-)

BTW - what are your statistics set to? If you have a huge table, it can be 
worth raising them from the default.  
  http://www.postgresql.org/docs/9.4/static/planner-stats.html
ALTER TABLE SET STATISTICS, try raising this to 1000.


 POSTGRESQL VERSION INFO

For postgis-related questions, remember to also include the postgis version. 

Hope this helps and good luck

Graeme Bell.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] autofreeze/vacuuming - avoiding the random performance hit

2015-07-28 Thread Graeme B. Bell
Some of you may have had annoying problems in the past with autofreeze or 
autovacuum running at unexpected moments and dropping the performance of your 
server randomly. 

On our SSD-RAID10 based system we found a 20GB table finished it's vacuum 
freeze in about 100 seconds. There were no noticeable interruptions to our 
services; maybe a tiny little bit of extra latency on the web maps, very hard 
to tell if it was real or imagination.

If auto-stuff in postgresql has been a pain point for you in the past, I can 
confirm that SSD drives are a nice solution (and also for any other 
autovacuum/analyze type stuff) since they can handle incoming random IO very 
nicely while also making very fast progress with the housekeeping work. 

Graeme Bell



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autofreeze/vacuuming - avoiding the random performance hit

2015-07-28 Thread Graeme B. Bell

Entire database. People have talked about using SSDs for data/indices and 
spinning disks for WAL. However I find having everything on the same disks is 
good for 3 reasons. 

1. The SSD is simply vastly faster than the disks. That means if huge amount of 
WAL is being written out (e.g. tons of data inserted), WAL isn't lagging at 
all. Anyone arguing that WAL suits spinning disk because they write fast 
sequentially should acknowledge that SSDs also write fast sequentially - 
considerably faster. 

2. By having extra 'fsync' events, IO is less bumpy. Every time wal is written 
out, all your buffers are getting flushed out (in principle), which helps to 
avoid huge IO spikes. 

3. Simpler setup, less volumes to worry about in linux or disk types to manage. 
For example, we only need spare SSDs in the hotspare bay and on the shelf. Even 
a single HDD for wal requires a mirrored HDD, plus a hotspare (that's 3 bays 
gone from e.g. 8), plus some more on the shelf...  all to get worse 
performance. 

Our DBs have been a total dream since I put SSDs everywhere. It got rid of 
every throughput/latency/io spike problem. The only thing I'd do differently 
today is that I'd buy intel ssds instead of the ones we chose; and preferably a 
NVMe direct connect with software raid in place of hardware raid and sata.

Graeme Bell.

On 28 Jul 2015, at 17:51, Wei Shan weishan@gmail.com wrote:

 Did you put your entire database on SSD or just the WAL/indexes?
 
 On 28 July 2015 at 23:39, Graeme B. Bell graeme.b...@nibio.no wrote:
 Some of you may have had annoying problems in the past with autofreeze or 
 autovacuum running at unexpected moments and dropping the performance of your 
 server randomly.
 
 On our SSD-RAID10 based system we found a 20GB table finished it's vacuum 
 freeze in about 100 seconds. There were no noticeable interruptions to our 
 services; maybe a tiny little bit of extra latency on the web maps, very hard 
 to tell if it was real or imagination.
 
  If auto-stuff in postgresql has been a pain point for you in the past, I can 
 confirm that SSD drives are a nice solution (and also for any other 
 autovacuum/analyze type stuff) since they can handle incoming random IO very 
 nicely while also making very fast progress with the housekeeping work.
 
 Graeme Bell
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 
 -- 
 Regards,
 Ang Wei Shan



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] incredible surprise news from intel/micron right now...

2015-07-28 Thread Graeme B. Bell
Entering production, availability 2016
1000x faster than nand flash/ssd , eg dram-latency
10x denser than dram
1000x write endurance of nand
Priced between flash and dram
Manufactured by intel/micron
Non-volatile

Guess what's going in my 2016 db servers :-)

Please, don't be vapourware... 

http://hothardware.com/news/intel-and-micron-jointly-drop-disruptive-game-changing-3d-xpoint-cross-point-memory-1000x-faster-than-nand




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] parallelisation provides postgres performance (script example + ppt slides)

2015-07-23 Thread Graeme B. Bell

Hi all,

1. For those that don't like par_psql (http://github.com/gbb/par_psql), here's 
an alternative approach that uses the Gnu Parallel command to organise 
parallelism for queries that take days to run usually. Short script and 
GIS-focused, but may give you a few ideas about how to parallelise your own 
code with Gnu Parallel. 

https://github.com/gbb/fast_map_intersection


2. Also, I gave a talk at FOSS4G Como about these tools, and how to get better 
performance from your DB with parallelisation. May be helpful to people who are 
new to parallelisation / multi-core work with postgres. 

http://graemebell.net/foss4gcomo.pdf  


Graeme Bell.

p.s.  (this version of the slides still has a few typos, which will be fixed 
soon when I get the source ppts back from my colleague's laptop).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hyperthreadin low performance (and some discussion about benchmarking)

2015-07-23 Thread Graeme B. Bell
On 23 Jul 2015, at 13:37, domenico febbo mimmopastic...@gmail.com wrote:

 is the problem also in PostgreSQL 9.4.x?
 I'm going to buy a production's server with 4 sockets E7-4850 12 cores
 so 12*4 = 48 cores (and 96 threads using HT).
 
 What do you suggest?
 Using or not HT?
 
 BR


1. If you have enough money to buy a 4-socket E7, then you certainly have 
enough money to pay someone (maybe yourself) for the 30 minutes of work needed 
to run a benchmark on the machine with and without hyperthreading and compare 
them.  I mean literally, run pgbench, reboot, turn on/off HT, run pgbench. Then 
you'll know what works best for your configuration. Don't be lazy about this, 
it's as important as the money you're throwing at the hardware. 

2. Keep in mind most of the numbers people throw around are pgbench numbers. 
Pgbench is representative of some workloads (e.g. bank transactions) and less 
representative of others (mixed query types, GIS work, scientific work, heavy 
IO, interaction with other applications/libraries...). Are you using the server 
for other tasks besides postgres, for example? I find I get better performance 
with HT when I'm using postgres with GDAL on the same server. Probably because 
the HT cores are being asked to do two different types of things, which is 
where HT shines. 

3. IMPORTANT : it doesn't matter how pgbench performs for other people on other 
computers and what they think is best.
What matters is 'how does YOUR normal workload perform on YOUR computer'.
The best way to do that is to put together a simple simulated workload that 
looks like your intended use of the system.
Leave it running.
If it's for an important system, look at all aspects of performance: 
transactions per second, I/O stalls, latency, ... 
If you can't do that, pgbench can be used instead.



Finally. A serious point. The lack of diversity in postgres benchmarking is 
quite amazing, to my mind, and is probably at the root of the eternal 
disagreements about optimal settings as well as the existence of long-standing 
hidden scaling/performance bugs (or weird kernel interactions). pgbench is 
useful, but really... let's make some more tools (or share links, if you know 
of them). 

Since contribution  gripe, here is my own (first, tiny) contribution, which 
I mentioned earlier in the month: https://github.com/gbb/t. 

As a point of contrast. Take a look at how computer game players measure the 
performance of graphics cards and disk drives in their product reviews. 
http://www.guru3d.com/articles-pages/radeon-r9-290-review-benchmarks,32.html

32 pages of data and discussion to test the performance of a single model 
(among thousands of possibilities and millions of configurations)! And this 
article is ordinary, run of the mill stuff in the gaming scene, literally the 
first link I hit in Google. Has anyone ever in the history of these lists ever 
posted so much diverse and structured evidence in support of their beliefs 
about a postgres setting?

Gaming reviewers use a multitude of real-world games, synthetic benchmarks, 
theoretical estimates... as someone with a foot in both worlds it is quite 
amusing to see that game-players address benchmarking and optimisation of 
performance far more seriously, scientifically (and successfully) than most 
professional database admins. 

Many graphics card reviews care very much about reproducability/repeated 
results, surrounding test conditions (very detailed information about other 
components used in the test, software versioning), warmup effects, benchmark 
quirks, performance at different scales/settings, and so on...  writing 'I saw 
some post where someone said they got a better result from XYZ' would certainly 
not be good enough in that community.  

Graeme Bell. 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell
 No, of course it doesn't.  It appears that you didn't look at the repo or 
 read my previous mail before you wrote this. 
 
 FFS, I *ran* some of the tests and reported on results. With you in CC.

Just checked back. So you did. I'm sorry, I made the mistake I accused you of. 

But... why then did you say I hadn't provided him with individual functions, 
when you've seen the repo yourself? I don't understand. You knew they're there.

 What I mean is that I don't just run random code from some random github
 repository.

Sure, but surely that's not an issue when the SQL functions are also seperately 
provided and clearly labelled in the repo?

Do you feel there is a difference about the trustworthiness of isolated files 
containing an SQL function presented in a github repo, and SQL functions 
presented in an email?

I am not sure I can agree with that idea, I think they are both just SQL 
functions. The difference is that one also offers you a bit more if you want to 
check/try it.

 I do not wish to antagonise you either, so please go and look at the
 repo before you write the next reply.
 
 Over and out.

Seems there has been a misunderstanding here and I feel I'm still missing 
something in what you're saying. Sorry Andres. Let's just forget this. I don't 
think we disagree especially on this and I am not looking to make an enemy here.

Also, thanks for running the benchmarks to get some numbers.

Graeme. 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell

On 09 Jul 2015, at 15:22, Thomas Kellerer spam_ea...@gmx.net wrote:

 Graeme B. Bell schrieb am 09.07.2015 um 11:44:
 I don't recall seeing a clear statement telling me I should mark pl/pgsql
 functions nonvolatile wherever possible or throw all performance and
 scalability out the window. 
 
 From: http://www.postgresql.org/docs/current/static/xfunc-volatility.html
 
   For best optimization results, you should label your functions 
with the strictest volatility category that is valid for them.


Hi Thomas,

Thank you very much for the link.

However, the point I was making wasn't that no sentence exists anywhere. My 
point was that I've read the docs more than anyone else in my institute and I 
was completely unaware of this. 

It also quite vague - if you hand that to a younger programmer in particular, 
how do they implement it in practice? When is it important to do it?  If this 
one factor silently breaks multiprocessor scaling of pl/pgsql, and 
multiprocessing is the biggest trend in CPU processing of the last decade 
(comparing server CPUS of 2005 with 2015), then why is this information not up 
front and clear?


A second point to keep in mind that optimization and 
parallelisation/scalability are not always the same thing. 

For example, in one project I took a bunch of looped parallel UPDATEs on a set 
of 50 tables, and rewrote them so as to run the loop all at once inside a 
pl/pgsql function. Crudely, I took out the table-level for loop and put it at 
row-level instead. 

I expected they'd execute much faster if UPDATEs were using data still in 
cache. Also, I would be updating without writing out WAL entries to disk 
repeatedly. 

It turns out the update per row ran much faster - as expected - when I used one 
table, but when I ran it in parallel on many tables, the performance was even 
worse than when I started. If you look at the benchmarks, you'll see that 
performance drops through the floor at 8-16 cores. I think that was when I 
first noticed this bug/feature.

[If anyone is curious, the way I solved that one in the end was to 
pre-calculate every possible way the tables might be updated after N loops of 
updates using Python, and import that as a lookup table into PG. It turns out 
that although we had 10's of GB of data per table, there were only about 100,00 
different types of situation, and only e.g. 80 iterations to consider). Then I 
ran a single set of UPDATEs with no pl/pgsql. It was something like a 1x 
performance improvement.]

Graeme.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell
 
 3. I don't disagree that the benchmark code is objectively 'bad' in the 
 sense that it is missing an important optimisation.
 
 Particularly with regards documentation, a patch improving things is
 much more likely to improve the situation than griping.  Also,
 conversation on this list gets recorded for posterity and google is
 remarkably good at matching people looking for problems with
 solutions.  So, even in absence of a patch perhaps we've made the
 lives of future head-scratchers a little bit easier with this
 discussion.

I agree that patchgripe, and about the google aspect. But nonetheless, a 
well-intentioned gripe is  ignorance of a problem. 

As mentioned earlier, I'm sick just now and will be back in hospital again 
tomorrow  monday, so a patch may be a little bit much to ask from me here :-) 
It's a bit much even keeping up with the posts on the thread so far.

I might try to fix the documentation a bit later, though as someone with no 
experience in marking up volatility on pl/pgsql functions I doubt my efforts 
would be that great. I also have other OSS project contributions that need some 
attention first. 

Re: the google effect. Are these mailing list archives mirrored anywhere, 
incidentally? For example, I notice we just lost http:reddit.com/r/amd at the 
weekend, all the discussion of the last few years on that forum is out of 
reach.  

Graeme Bell


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell

On 09 Jul 2015, at 17:42, Merlin Moncure mmonc...@gmail.com wrote:

 The community maintains it's own mailing list archives in
 postgresql.org.  Short of an array of tactical nuclear strikes this is
 going to be preserved 

Good to know, I've seen a lot of dead software projects throughout my life. 

But still - we will have to pray that Kim Jong Un never decides to become a 
MySQL contributor... :)

Graeme.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell
On 08 Jul 2015, at 22:27, Andres Freund and...@anarazel.de wrote:

 On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote:
 On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote:
 
 Well, right, which is why I mentioned even with dozens of clients.
 Shouldn't that scale to at least all of the CPUs in use if the function is
 CPU intensive (which it is)?
 
 only in the absence of inter-process locking and cache line bouncing.
 
 And addititionally memory bandwidth (shared between everything, even in
 the numa case), cross socket/bus bandwidth (absolutely performance
 critical in multi-socket configurations), cache capacity (shared between
 cores, and sometimes even sockets!).

1. Note for future readers - it's also worth noting that depending on the 
operation, and on your hardware, you may have less CPU cores than you think 
to parallelise upon.

1a. For example AMD CPUs list the number of integer cores (e.g. 16), but there 
is actually only half as many cores available for floating point work (8). So 
if your functions need to use floating point, your scaling will suffer badly on 
FP functions. 

https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
  In terms of hardware complexity and functionality, this module is equal to 
a dual-core processor in its integer power, and to a single-core processor in 
its floating-point power: for each two integer cores, there is one 
floating-point core.


1b. Or, if you have hyper-threading enabled on an Intel CPU, you may think you 
have e.g. 8 cores, but if all the threads are running the same type of 
operation repeatedly, it won't be possible for the hyper-threading to work well 
and you'll only get 4 in practice. Maybe less due to overheads. Or, if your 
work is continuallly going to main memory for data (e.g. limited by the memory 
bus), it will run at 4-core speed, because the cores have to share the same 
memory bus. 

Hyper-threading depends on the 2 logical cores being asked to perform two 
different types of tasks at once (each having relatively lower demands on 
memory).

When execution resources would not be used by the current task in a processor 
without hyper-threading, and especially when the processor is stalled, a 
hyper-threading equipped processor can use those execution resources to execute 
another scheduled task.
https://en.wikipedia.org/wiki/Hyper-threading
https://en.wikipedia.org/wiki/Superscalar


2. Keep in mind also when benchmarking that it's normal to see an small 
drop-off when you hit the maximum number of cores for your system. 
After all, the O/S and the benchmark program and anything else you have running 
will need a core or two.

 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell

On 08 Jul 2015, at 13:20, Andres Freund and...@anarazel.de wrote:

 On 2015-07-08 11:13:04 +, Graeme B. Bell wrote:
 I'm guessing you are maybe pressed for time at the moment because I
 already clearly included this on the last email, as well as the links
 to the alternative benchmarks with the same problem I referred to on
 both of my last emails which are also trivial to drop into pgbench
 (cut/paste).
 
 You realize that you want something here, not Merlin, right?

Hi Andreas,

My email was saying it's not helpful for anyone on the list for him to keep 
asking me to give him X and me to keep sending it.  Do you disagree with that 
idea?

I tried to phrase my request politely, but perhaps I failed. If you have 
suggestions for better ways to say I already sent it, twice more politely in 
this situation, I'd welcome them off list. 

He asked me to disclose the function body I was testing. I did that, *and* also 
disclosed the entire approach to the benchmark too in a way that made it 
trivial for him or others to replicate the situation I'd found. I'm pretty sure 
you should not be discouraging this kind of thing in bug/performance reports. 

I get your point that when you're asking for other people to look at something 
with you, don't antagonise them. 

I didn't intend it as antagonising and Merlin hasn't mailed me anything to say 
he was antagonised. I'm quite sure he's capable of defending himself or 
communicating with me himself if he does feel antagonised by something. I hope 
we can end the discussion of that here?

Merlin, if you were antagonised, sorry, I did not mean to antagonise you. I 
just wanted to just wanted make it clear that I'd sent you what you asked for, 
+ more, and that I was surprised you hadn't noticed it. 

 To clear up the issue I build a little test harness around your comment 
 below.
 http://github.com/gbb/t;
 
 Well, that requires reviewing the source code of the run script and
 such.

No, of course it doesn't.  It appears that you didn't look at the repo or read 
my previous mail before you wrote this. 

I do not wish to antagonise you either, so please go and look at the repo 
before you write the next reply. 

http://github.com/gbb/t
Just pick any function you like, there are 6 there, and 3 of them demonstrate 2 
different problems, all of it is clearly documented.

When you open up the repo, there are the tests
https://github.com/gbb/t/tree/master/tests

You don't need to review any code from the run script. The functions are there 
as isolated files and what they are intended to demonstrate is clearly 
described with text and graphics. I could see your point if I had mailed out 
some giant script with a bunch of SQL calls embedded in its guts, but that's 
the opposite of what I did here.  

Did you find it difficult to navigate the repo structure (2 folders, a few 
files)? If so please let me know off-list what was difficult and I will see if 
I can improve it. 

 I think we shouldn't discuss this on two threads (-performance, -bugs),
 that makes it hard to follow. Given Tom's more detailed answer I think
 the -bugs thread already contains more pertinent information.

I don't necessarily disagree with this idea, but...

How many people concerned with performance are following the -bugs list? How 
much space is there for discussion of this on -bugs? Since only working 
solutions for this performance problem so far are all user-side rather than 
commiter-side, why would you want to restrict that information to a 
commiter-side list?

It has developed this way because I noticed it as a performance issue first, 
then decided to report it as a potential bug.

Perhaps it would be useful to keep the discussion separate as the -commiter 
side aspects (how to fix this at the server level) and -user side (what you can 
do to improve performance right now).  I will defer to general opinion on this 
in my follow-up posts. 

Graeme. 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-09 Thread Graeme B. Bell
On 09 Jul 2015, at 05:38, Tom Lane t...@sss.pgh.pa.us wrote:

 If you
 write your is_prime function purely in plpgsql, and don't bother to mark
 it nonvolatile, *it will not scale*.  

 much for properly written plpgsql; but there's an awful lot of bad plpgsql
 code out there, and it can make a huge difference for that.


Hi Tom, 

I object to phrases like 'don't bother to mark it' and 'bad plpgsql' here. That 
is putting the blame on programmers. Clearly, if there is no end of code out 
there that isn't right in this regard, there's something very wrong in the 
project documentation.

1. I have been writing pl/pgsql on and off for a couple of years now and I've 
read quite a bit of the postgres doumentation, but I don't recall seeing a 
clear statement telling me I should mark pl/pgsql functions nonvolatile 
wherever possible or throw all performance and scalability out the window. I'm 
sure there may be a line hidden somewhere in the docs, but judging from the 
impact it has in practice, this seems like a very fundamental concept that 
should be repeatedly and clearly marked in the docs. 

2. Furthermore, I have never come across anything in the documentation that 
made it clear to me that any pl/pgsql function I write will, by default, be 
taking out locks for every single statement in the code. I've written code in I 
dunno, maybe 15-20 different languages in my life, and I can't think of another 
language offhand that does that by default. From the reactions on this thread 
to this benchmark and the par_psql benchmarks, it doesn't seem that it was even 
immediately obvious to many postgres enthusiasts and developers.

3. I don't disagree that the benchmark code is objectively 'bad' in the sense 
that it is missing an important optimisation. 

But I really don't think it helps to frame this as laziness or bad in any 
other sense of the word e.g. 'clumsy'.

Let's look at the postgresql documentation for some examples of 'bad' and lazy 
code: 

http://www.postgresql.org/docs/9.3/static/plpgsql-structure.html
http://www.postgresql.org/docs/9.3/static/plpgsql-declarations.html

There are about 13 functions on that page.
How many functions on that page make use non-volatile or immutable wherever it 
would be appropriate?  
zero.

or this one: 
http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html
zero

or this one: 
http://www.postgresql.org/docs/9.3/static/plpgsql-cursors.html#PLPGSQL-CURSOR-USING
zero

The reason 90% of people out there are 'not bothering' and 'writing bad code' 
is because **99% of the postgresql documentation teaches them to do it that 
way**. 

So when you talk about other people 'not bothering' to do things - who is 
really at fault here what for what you see as endemic 'bad' or 'lazy' code? Is 
it the new postgres programmers, or the people that taught them with bad 
examples consistently throughout the *entire project documentation*, starting 
from the very first example? 

I think I'm going to raise this as a documentation bug. 

Graeme. 









-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)

2015-07-09 Thread Graeme B. Bell

This is a reply to to Andreas's post on the #13495 documentation thread in 
-bugs. 
I am responding to it here because it relates to #13493 only.

Andres wrote, re: #13493

 This issue is absolutely critical for performance and scalability of code,

 Pft. In most cases it doesn't actually matter that much because the
 contained query are the expensive stuff. It's just when you do lots of
 very short and cheap things that it has such a big effect.  Usually the
 effect on the planner is bigger.

Hi Andres,

'Pft' is kinda rude - I wouldn't comment on it normally,  but seeing as you 
just lectured me on -performance on something you perceived as impolite (just 
like you lectured me on not spreading things onto multiple threads), can you 
please try to set a good example? You don't encourage new contributors into 
open source communities this way. 

Getting to the point. I think the gap between our viewpoints comes from the 
fact I (and others here at my institute) have a bunch of pl/pgsql code here 
with for loops and calculations, which we see as 'code'. Thinking of all the 
users I know myself, I know there are plenty of GIS people out there using for 
loops and pgsql to simulate models on data in the DB, and I expect the same is 
true among e.g. older scientists with DB datasets. 

Whereas it sounds like you and Tom see pl/pgsql as 'glue' and don't see any 
problem. As I have never seen statistics on pl/pgsql use-cases among users at 
large, I don't know what happens everywhere else outside of GIS-world and 
pgdev-world. Have you any references/data you can share on that? I would be 
interested to know because I don't want to overclaim on the importance of these 
bugs or any other bugs in future. In this case, #13493 wrecked the code for 
estimates on a 20 million euro national roadbuilding project here and it cost 
me a few weeks of my life, but for all I know you're totally right about the 
general importance to the world at large.

Though keep in mind: This isn't just only about scaling up one program. It's a 
db-level problem. If you have a large GIS DB server with many users, 
long-running queries etc. on large amounts of data, then you only need e.g. 2-3 
people to be running some code with for-loops or a long series of calculation 
in pl/pgsql, and everything will fall apart in pgsql-land. 

Last point. When I wrote 'absolutely critical' I was under the impression this 
bug could have some serious impact on postgis/pgrouting. Since I wanted to 
double check what you said about 'expensive stuff' vs 'short/cheap stuff', I 
ran some benchmarks to check on a few functions. 

You are right that only short, looped things are affected. e.g. for loops with 
calculations and so on. Didn't see any trouble with the calls I made to postgis 
inside or outside of pgsql. This confirms/replicates your findings. Updated 
numbers/tests posted to github shortly.

Regards

Graeme Bell

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Graeme B. Bell
On 07 Jul 2015, at 22:52, Merlin Moncure mmonc...@gmail.com wrote:

 On Tue, Jul 7, 2015 at 3:33 PM, Graeme B. Bell graeme.b...@nibio.no wrote:
 
 Hi Merlin,
 
 Long story short - thanks for the reply, but you're not measuring anything 
 about the parallelism of code running in a pl/pgsql environment here. You're 
 just measuring whether postgres can parallelise entering that environment 
 and get back out. Don't get me wrong - it's great that this scales well 
 because it affects situations where you have lots of calls to trivial 
 functions.
 However it's not the problem I'm talking about. I mean 'real' pl'pgsql 
 functions. e.g. things that you might find in postgis or similar.
 
 Maybe so.  But it will be a lot easier for me (and others on this)
 list if you submit a self contained test case that runs via pgbench.


Hi Merlin, 

I'm guessing you are maybe pressed for time at the moment because I already 
clearly included this on the last email, as well as the links to the 
alternative benchmarks with the same problem I referred to on both of my last 
emails which are also trivial to drop into pgbench (cut/paste). 

e.g. did you see these parts of my previous email 

To clear up the issue I build a little test harness around your comment below.
http://github.com/gbb/t;

Just pick any function you like, there are 6 there, and 3 of them demonstrate 2 
different problems, all of it is clearly documented. 

I haven't used perf with pgbench before, and I can't run any code today. 
If you're interested in this but short on time, maybe you can glance at the 
repo above and just add 'perf' at the appropriate point in the rbuild wrapper.

Graeme. 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-08 Thread Graeme B. Bell
 On 07/07/2015 08:05 PM, Craig James wrote:
 
 
 No ideas, but I ran into the same thing. I have a set of C/C++ functions
 that put some chemistry calculations into Postgres as extensions (things
 like, calculate the molecular weight of this molecule). As SQL
 functions, the whole thing bogged down, and we never got the scalability
 we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at
 the same time, even with dozens of clients.


Hi all,

The sample code / results were put up last night at http://github.com/gbb/t

Craig's problem sounds similar to my own, assuming he means running C 
indirectly via SQL vs running C more directly.
Lots of parallel connections to postgres but maximum 2 CPUs of scaling (and it 
gets worse, as you try to run more things).

Tom Lane has posted an interesting comment over on the bugs list which identies 
a likely source at least one of the problems, maybe both. 
It seems to be linked to internal locking inside postgres (which makes sense, 
given the results - both problems feel 'lock-y').
Also, he mentions a workaround for some functions that scales to 8-way 
apparently. 

http://www.postgresql.org/message-id/31265.1436317...@sss.pgh.pa.us

I think it's potentially a big problem for CPU intensive postgres libraries 
like pgrouting, or perhaps the postgis  postgis raster functions, things like 
that.
I don't know how well their functions are marked for e.g. immutability. 
Are there any postgis devs on this list?

Graeme Bell



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Hi Wes

1. The first interesting thing is that prior to my mentioning this problem to 
C_ a year or two back, the power loss protection was advertised everywhere 
as simply that, without qualifiers about 'not inflight data'. Check out the 
marketing of the M500 for the first year or so and try to find an example where 
they say 'but inflight data isn't protected!'. 

2. The second (and more important) interesting thing is that this is irrelevant!

Fsync'd data is BY DEFINITION not data in flight. 
Fsync means This data is secure on the disk! 
However, the drives corrupt it.

Postgres's sanity depends on a reliable fsync. That's why we see posts on the 
performance list saying 'fsync=no makes your postgres faster but really, don't 
do it in production. 
We are talking about internal DB corruption, not just a crash and a few lost 
transactions.

These drives return from fsync while data is still in volatile cache.
That's breaking the spec, and it's why they are not OK for postgres by 
themselves. 

This is not about 'in-flight' data, it's about fsync'd wal log data. 

Graeme. 


On 07 Jul 2015, at 16:15, Wes Vaske (wvaske) wva...@micron.com wrote:

 The M500/M550/M600 are consumer class drives that don't have power protection 
 for all inflight data.* (like the Samsung 8x0 series and the Intel 3x0  5x0 
 series).
 
 The M500DC has full power protection for inflight data and is an 
 enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700 series).
 
 So any drive without the capacitors to protect inflight data will suffer from 
 data loss if you're using disk write cache and you pull the power.
 
 *Big addendum:
 There are two issues on powerloss that will mess with Postgres. Data Loss and 
 Data Corruption. The micron consumer drives will have power loss protection 
 against Data Corruption and the enterprise drive will have power loss 
 protection against BOTH.
 
 https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf
  
 
 The Data Corruption problem is only an issue in non-SLC NAND but it's 
 industry wide. And even though some drives will protect against that, the 
 protection of inflight data that's been fsync'd is more important and should 
 disqualify *any* consumer drives from *any* company from consideration for 
 use with Postgres.
 
 Wes Vaske | Senior Storage Solutions Engineer
 Micron Technology 
 
 -Original Message-
 From: Graeme B. Bell [mailto:graeme.b...@nibio.no] 
 Sent: Tuesday, July 07, 2015 8:26 AM
 To: Merlin Moncure
 Cc: Wes Vaske (wvaske); Craig James; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 
 As I have warned elsewhere,
 
 The M500/M550 from $SOME_COMPANY is NOT SUITABLE for postgres unless you have 
 a RAID controller with BBU to protect yourself.
 The M500/M550 are NOT plug-pull safe despite the 'power loss protection' 
 claimed on the packaging. Not all fsync'd data is preserved in the event of a 
 power loss, which completely undermines postgres's sanity. 
 
 I would be extremely skeptical about the M500DC given the name and 
 manufacturer. 
 
 I went to quite a lot of trouble to provide $SOME_COMPANYs engineers with the 
 full details of this fault after extensive testing (we have e.g. 20-25 of 
 these disks) on multiple machines and controllers, at their request. Result: 
 they stopped replying to me, and soon after I saw their PR reps talking about 
 how 'power loss protection isn't about protecting all data during a power 
 loss'. 
 
 The only safe way to use an M500/M550 with postgres is:
 
 a) disable the disk cache, which will cripple performance to about 3-5% of 
 normal.
 b) use a battery backed or cap-backed RAID controller, which will generally 
 hurt performance, by limiting you to the peak performance of the flash on the 
 raid controller. 
 
 If you are buying such a drive, I strongly recommend buying only one and 
 doing extensive plug pull testing before commiting to several. 
 For myself, my time is valuable enough that it will be cheaper to buy intel 
 in future. 
 
 Graeme.
 
 On 07 Jul 2015, at 15:12, Merlin Moncure mmonc...@gmail.com wrote:
 
 On Thu, Jul 2, 2015 at 1:00 PM, Wes Vaske (wvaske) wva...@micron.com wrote:
 Storage Review has a pretty good process and reviewed the M500DC when it 
 released last year. 
 http://www.storagereview.com/micron_m500dc_enterprise_ssd_review
 
 
 
 The only database-specific info we have available are for Cassandra and 
 MSSQL:
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/cassandra_and_m500dc_enterprise_ssd_tech_brief.pdf
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/sql_server_2014_and_m500dc_raid_configuration_tech_brief.pdf
 
 
 
 (some of that info might be relevant)
 
 
 
 In terms of endurance, the M500DC is rated to 2 Drive Writes Per Day (DWPD) 
 for 5-years. For comparison:
 
 Micron M500DC (20nm) - 2 DWPD
 
 Intel S3500 (20nm

Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

 Why would you think that you don't need RAID for ZFS?
 
 Reason I'm asking if because we are moving to ZFS on FreeBSD for our future 
 projects.


Because you have zraid. :-)

https://blogs.oracle.com/bonwick/entry/raid_z

General points:

1. It's my understanding that ZFS is designed to talk to the hardware directly, 
and so it would be bad to hide the physical layer from ZFS unless you had to.
After all, I don't think they implemented a raid-like system inside ZFS just 
for the fun of it. 

2. You have zraid built in and easy to manage within ZFS - and well tested 
compared to NewRaidController (TM) - why add another layer of management to 
your disk storage?

3. You reintroduce the raid write hole.

4. There might be some argument for hardware raid (existing system) but with 
software raid (the point I was addressing) it makes little sense at all.

5. If you're on hardware raid and your controller dies, you're screwed in 
several ways. It's harder to get a new raid controller than a new pc. Your 
chances of recovery are lower than zfs. IMHO more scary to recover from a 
failed raid controller, too. 

6. Recovery is faster if the disks aren't full. e.g. ZFS recovers what it is 
there. This might not seem a big deal but chances are it would save you 50% of 
your downtime in a crisis. 

However, I think with Linux you might want to use RAID for the boot disk. I 
don't know if linux can boot from ZFS yet. I would (and am) using Freebsd with 
zfs.

Graeme.


On 07 Jul 2015, at 18:56, Wei Shan weishan@gmail.com wrote:

 Hi Graeme,
 
 Why would you think that you don't need RAID for ZFS?
 
 Reason I'm asking if because we are moving to ZFS on FreeBSD for our future 
 projects.
 
 Regards,
 Wei Shan
 
 On 8 July 2015 at 00:46, Graeme B. Bell graeme.b...@nibio.no wrote:
 
  RAID controllers are completely unnecessary for SSD as they currently
  exist.
 
 Agreed. The best solution is not to buy cheap disks and not to buy RAID 
 controllers now, imho.
 
 In my own situation, I had a tight budget, high performance demand and a 
 newish machine with RAID controller and HDDs in it as a starting point.
 So it was more a question of 'what can you do with a free raid controller and 
 not much money' back in 2013. And it has worked very well.
 Still, I had hoped for a bit more from the cheaper SSDs though, I'd hoped to 
 use fastpath on the controller and bypass the cache.
 
 The way NVMe prices are going though, I wouldn't do it again if I was doing 
 it this year. I'd just go direct to nvme and trash the raid controller. These 
 sammy and intel nvmes are basically enterprise hardware at consumer prices. 
 Heck, I'll probably put one in my next gaming PC.
 
 Re: software raid.
 
 I agree, but once you accept that software raid is now pretty much superior 
 to hardware raid, you start looking at ZFS and thinking 'why the heck am I 
 even using software raid?'
 
 G
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 
 -- 
 Regards,
 Ang Wei Shan



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell
 
 RAID controllers are completely unnecessary for SSD as they currently
 exist.

Agreed. The best solution is not to buy cheap disks and not to buy RAID 
controllers now, imho.

In my own situation, I had a tight budget, high performance demand and a newish 
machine with RAID controller and HDDs in it as a starting point. 
So it was more a question of 'what can you do with a free raid controller and 
not much money' back in 2013. And it has worked very well.
Still, I had hoped for a bit more from the cheaper SSDs though, I'd hoped to 
use fastpath on the controller and bypass the cache. 

The way NVMe prices are going though, I wouldn't do it again if I was doing it 
this year. I'd just go direct to nvme and trash the raid controller. These 
sammy and intel nvmes are basically enterprise hardware at consumer prices. 
Heck, I'll probably put one in my next gaming PC. 

Re: software raid. 

I agree, but once you accept that software raid is now pretty much superior to 
hardware raid, you start looking at ZFS and thinking 'why the heck am I even 
using software raid?'

G



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

The comment on HDDs is true and gave me another thought. 

These new 'shingled' HDDs (the 8TB ones) rely on rewriting all the data on 
tracks that overlap your data, any time you change the data. Result: disks 
8-20x slower during writes, after they fill up. 

Do they have power loss protection for the data being rewritten during 
reshingling? You could have data commited at position X and you accidentally 
nuke data at position Y.

[I know that using a shingled disk sounds crazy (it sounds crazy to me) but you 
can bet there are people that just want to max out the disk bays in their 
server... ]

Graeme. 

On 07 Jul 2015, at 19:28, Michael Nolan htf...@gmail.com wrote:

 
 
 On Tue, Jul 7, 2015 at 10:59 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 07/07/2015 05:15 PM, Wes Vaske (wvaske) wrote:
 The M500/M550/M600 are consumer class drives that don't have power
 protection for all inflight data.* (like the Samsung 8x0 series and
 the Intel 3x0  5x0 series).
 
 The M500DC has full power protection for inflight data and is an
 enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700
 series).
 
 So any drive without the capacitors to protect inflight data will
 suffer from data loss if you're using disk write cache and you pull
 the power.
 
 Wow, I would be pretty angry if I installed a SSD in my desktop, and it loses 
 a file that I saved just before pulling the power plug.
 
 That can (and does) happen with spinning disks, too.
  
 
 *Big addendum: There are two issues on powerloss that will mess with
 Postgres. Data Loss and Data Corruption. The micron consumer drives
 will have power loss protection against Data Corruption and the
 enterprise drive will have power loss protection against BOTH.
 
 https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf
 
  The Data Corruption problem is only an issue in non-SLC NAND but
 it's industry wide. And even though some drives will protect against
 that, the protection of inflight data that's been fsync'd is more
 important and should disqualify *any* consumer drives from *any*
 company from consideration for use with Postgres.
 
 So it lies about fsync()... The next question is, does it nevertheless 
 enforce the correct ordering of persisting fsync'd data? If you write to file 
 A and fsync it, then write to another file B and fsync it too, is it 
 guaranteed that if B is persisted, A is as well? Because if it isn't, you can 
 end up with filesystem (or database) corruption anyway.
 
 - Heikki
 
 
 The sad fact is that MANY drives (ssd as well as spinning) lie about their 
 fsync status.
 --
 Mike Nolan   
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Yikes. I would not be able to sleep tonight if it were not for the BBU cache in 
front of these disks... 

diskchecker.pl consistently reported several examples of corruption 
post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think it's 
pretty much open to debate what types of madness and corruption you'll find if 
you look close enough.

G


On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote:

 
 So it lies about fsync()... The next question is, does it nevertheless 
 enforce the correct ordering of persisting fsync'd data? If you write to file 
 A and fsync it, then write to another file B and fsync it too, is it 
 guaranteed that if B is persisted, A is as well? Because if it isn't, you can 
 end up with filesystem (or database) corruption anyway.
 
 - Heikki



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

On 07 Jul 2015, at 19:47, Scott Marlowe scott.marl...@gmail.com wrote:

 [I know that using a shingled disk sounds crazy (it sounds crazy to me) but 
 you can bet there are people that just want to max out the disk bays in 
 their server... ]
 
 Let's just say no online backup companies are using those disks. :)

I'm not so sure. Literally the most famous online backup company is (or was 
planning to):  
https://www.backblaze.com/blog/6-tb-hard-drive-face-off/
But I think that a massive read-only archive really is the only use for these 
things. I hope they go out of fashion, soon. 

But I was thinking more of the 'small company postgres server' or 'charitable 
organisation postgres server'.
Someone is going to make this mistake, you can bet. 
Probably not someone on THIS list, of course... 

 Biggest current production spinners being used I know of are 4TB,
 non-shingled.

I think we may have some 6TB WD reds around here. I'll need to look around.

G



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Cache flushing isn't an atomic operation though. Even if the ordering is right, 
you are likely to have a partial fsync on the disk when the lights go out - 
isn't your FS still corrupt?

On 07 Jul 2015, at 21:53, Heikki Linnakangas hlinn...@iki.fi wrote:

 On 07/07/2015 09:01 PM, Wes Vaske (wvaske) wrote:
 
 Right, to be precise, the problem isn't the drive lies about fsync(). It lies 
 about FLUSH CACHE instead. Search  replace fsync() with FLUSH CACHE, and the 
 same question remains: When the drive breaks its promise wrt. FLUSH CACHE, 
 does it nevertheless guarantee that the order the data is eventually flushed 
 to disk is consistent with the order in which the data and FLUSH CACHE were 
 sent to the drive? That's an important distinction, because it makes the 
 difference between the most recent data the application saved might be lost 
 even though the FLUSH CACHE command returned and your filesystem is 
 corrupt.
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

That is a very good question, which I have raised elsewhere on the postgresql 
lists previously.

In practice: I have *never* managed to make diskchecker fail with the BBU 
enabled in front of the drives and I spent days trying with plug pulls till I 
reached the point where as a statistical event it just can't be that likely at 
all. That's not to say it can't ever happen, just that I've taken all 
reasonable measures that I can to find out on the time and money budget I had 
available. 

In theory: It may be the fact the BBU makes the drives run at about half speed, 
so that the capacitors go a good bit further to empty the cache, after all: 
without the BBU in the way, the drive manages to save everything but the last 
fragment of writes. But I also suspect that the controller itself maybe 
replaying the last set of writes from around the time of power loss. 

Anyway I'm 50/50 on those two explanations. Any other thoughts welcome. 

This raises another interesting question. Does anyone hear have a document 
explaining how their BBU cache works EXACTLY (at cache / sata level) on their 
server? Because I haven't been able to find any for mine (Dell PERC 
H710/H710P). Can anyone tell me with godlike authority and precision, what 
exactly happens inside that BBU post-power failure?

There is rather too much magic involved for me to be happy.

G

On 07 Jul 2015, at 18:27, Vitalii Tymchyshyn v...@tym.im wrote:

 Hi.
 
 How would BBU cache help you if it lies about fsync? I suppose any RAID 
 controller removes data from BBU cache after it was fsynced by the drive. As 
 I know, there is no other magic command for drive to tell controller that 
 the data is safe now and can be removed from BBU cache.
 
 Вт, 7 лип. 2015 11:59 Graeme B. Bell graeme.b...@nibio.no пише:
 
 Yikes. I would not be able to sleep tonight if it were not for the BBU cache 
 in front of these disks...
 
 diskchecker.pl consistently reported several examples of corruption 
 post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think 
 it's pretty much open to debate what types of madness and corruption you'll 
 find if you look close enough.
 
 G
 
 
 On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote:
 
 
  So it lies about fsync()... The next question is, does it nevertheless 
  enforce the correct ordering of persisting fsync'd data? If you write to 
  file A and fsync it, then write to another file B and fsync it too, is it 
  guaranteed that if B is persisted, A is as well? Because if it isn't, you 
  can end up with filesystem (or database) corruption anyway.
 
  - Heikki
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell
 
 This raises another interesting question. Does anyone hear have a document 
 explaining how their BBU cache works EXACTLY (at cache / sata level) on their 
 server? Because I haven't been able to find any for mine (Dell PERC 
 H710/H710P). Can anyone tell me with godlike authority and precision, what 
 exactly happens inside that BBU post-power failure?


(and if you have that manual - how can you know it's accurate? that the 
implementation matches the manual and is free of bugs? because my M500s didn't 
match the packaging and neither did a  H710 we bought - Dell had advertised 
features in some marketing material that were only present on the H710P)

And I see UBER (unrecoverable bit error) rates for SSDs and HDDs, but has 
anyone ever seen them for the flash-based cache on their raid controller?

Sleep well, friends.

Graeme. 

On 07 Jul 2015, at 18:54, Graeme B. Bell graeme.b...@nibio.no wrote:

 
 That is a very good question, which I have raised elsewhere on the postgresql 
 lists previously.
 
 In practice: I have *never* managed to make diskchecker fail with the BBU 
 enabled in front of the drives and I spent days trying with plug pulls till I 
 reached the point where as a statistical event it just can't be that likely 
 at all. That's not to say it can't ever happen, just that I've taken all 
 reasonable measures that I can to find out on the time and money budget I had 
 available. 
 
 In theory: It may be the fact the BBU makes the drives run at about half 
 speed, so that the capacitors go a good bit further to empty the cache, after 
 all: without the BBU in the way, the drive manages to save everything but the 
 last fragment of writes. But I also suspect that the controller itself maybe 
 replaying the last set of writes from around the time of power loss. 
 
 Anyway I'm 50/50 on those two explanations. Any other thoughts welcome. 
 
 This raises another interesting question. Does anyone hear have a document 
 explaining how their BBU cache works EXACTLY (at cache / sata level) on their 
 server? Because I haven't been able to find any for mine (Dell PERC 
 H710/H710P). Can anyone tell me with godlike authority and precision, what 
 exactly happens inside that BBU post-power failure?
 
 There is rather too much magic involved for me to be happy.
 
 G
 
 On 07 Jul 2015, at 18:27, Vitalii Tymchyshyn v...@tym.im wrote:
 
 Hi.
 
 How would BBU cache help you if it lies about fsync? I suppose any RAID 
 controller removes data from BBU cache after it was fsynced by the drive. As 
 I know, there is no other magic command for drive to tell controller that 
 the data is safe now and can be removed from BBU cache.
 
 Вт, 7 лип. 2015 11:59 Graeme B. Bell graeme.b...@nibio.no пише:
 
 Yikes. I would not be able to sleep tonight if it were not for the BBU cache 
 in front of these disks...
 
 diskchecker.pl consistently reported several examples of corruption 
 post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think 
 it's pretty much open to debate what types of madness and corruption you'll 
 find if you look close enough.
 
 G
 
 
 On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote:
 
 
 So it lies about fsync()... The next question is, does it nevertheless 
 enforce the correct ordering of persisting fsync'd data? If you write to 
 file A and fsync it, then write to another file B and fsync it too, is it 
 guaranteed that if B is persisted, A is as well? Because if it isn't, you 
 can end up with filesystem (or database) corruption anyway.
 
 - Heikki
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Graeme B. Bell

Hi Merlin, 

Long story short - thanks for the reply, but you're not measuring anything 
about the parallelism of code running in a pl/pgsql environment here. You're 
just measuring whether postgres can parallelise entering that environment and 
get back out. Don't get me wrong - it's great that this scales well because it 
affects situations where you have lots of calls to trivial functions. 
However it's not the problem I'm talking about. I mean 'real' pl'pgsql 
functions. e.g. things that you might find in postgis or similar. 

If you re-read my previous email or look at par_psql  (http://parpsql.com) and 
look at the benchmarks there you'll maybe see more about what I'm talking about.

To clear up the issue I build a little test harness around your comment below. 
If anyone was wondering if it's par_psql itself that causes bad scaling in 
postgres.
The answer is clearly no. :-)

What I found this evening is that there are several problems here. I did some 
testing here using a machine with 16 physical cores and lots of memory/IO. 

- Using a table as a source of input rather than a fixed parameter e.g. 'select 
col1... ' vs. 'select 3'. Please note I am not talking about poor performance, 
I am talking about poor scaling of performance to multicore. There should be no 
reason for this when read-locks are being taken on the table, and no reason for 
this when it is combined with e.g. a bunch of pl/pgsql work in a function. 
However the impact of this problem is only seen above 8 cores where performance 
crashes. 

- Using pl/pgsql itself intensively (e.g. anything non-trivial) causes 
horrifically bad scaling above 2 cores on the systems I've tested and 
performance crashes very hard soon after. This matches what I've seen elsewhere 
in big projects and in par_psql's tests. 

Of course, it could be some wacky postgresql.conf setting (I doubt it here), so 
I'd be glad if others could give it a try. If you're bored, set the time to 5s 
and run, from testing I can tell you it shouldn't alter the results. 

The repo will be up in around 30 minutes time on http://github.com/gbb/t, 
and I'm going to submit it as a bug to the pg bugs list. 

Graeme. 


On 06 Jul 2015, at 18:40, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, Jul 3, 2015 at 9:48 AM, Graeme B. Bell graeme.b...@nibio.no wrote:
 Hi everyone,
 
 I've written a new open source tool for easily parallelising SQL scripts in 
 postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]
 
 Using it, I'm seeing a problem that I've also seen in other postgres 
 projects involving high degrees of parallelisation in the last 12 months.
 
 Basically:
 
 - I have machines here with up to 16 CPU cores and 128GB memory, very fast 
 SSDs and controller etc, carefully configured kernel/postgresql.conf for 
 high performance.
 
 - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff 
 ...), e.g. almost up to 16x performance improvement.
 
 - Non-DB stuff like GDAL, python etc. parallelise nearly perfectly.
 
 - HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. 
 SELECT myfunction(some_stuff)) do not parallelise well, even when they are 
 independently defined functions, or accessing tables in a read-only way. 
 They hit a limit of 2.5x performance improvement relative to single-CPU 
 performance (pg9.4) and merely 2x performance (pg9.3) regardless of how many 
 CPU cores I throw at them. This is about 6 times slower than I'm expecting.
 
 I can't see what would be locking. It seems like it's the pl/pgsql 
 environment itself that is somehow locking or incurring some huge frictional 
 costs. Whether I use independently defined functions, independent source 
 tables, independent output tables, makes no difference whatsoever, so it 
 doesn't feel 'lock-related'. It also doesn't seem to be WAL/synchronisation 
 related, as the machines I'm using can hit absurdly high pgbench rates, and 
 I'm using unlogged tables for output.
 
 Take a quick peek here: 
 https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md
 
 I'm wondering what I'm missing here. Any ideas?
 
 I'm not necessarily seeing your results.   via pgbench,
 
 mmoncure@mernix2 11:34 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -f b.sql
 transaction type: Custom query
 scaling factor: 1
 query mode: simple
 number of clients: 1
 number of threads: 1
 duration: 60 s
 number of transactions actually processed: 658833
 latency average: 0.091 ms
 tps = 10980.538470 (including connections establishing)
 tps = 10980.994547 (excluding connections establishing)
 mmoncure@mernix2 11:35 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -c4 -j4 -f b.sql
 transaction type: Custom query
 scaling factor: 1
 query mode: simple
 number of clients: 4
 number of threads: 4
 duration: 60 s
 number of transactions actually processed: 2847631
 latency average: 0.084 ms
 tps = 47460.430447 (including connections establishing)
 tps = 47463.702074 (excluding connections establishing)
 
 b.sql:
 select f

Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

1. Does the sammy nvme have *complete* power loss protection though, for all 
fsync'd data?
I am very badly burned by my experiences with Crucial SSDs and their 'power 
loss protection' which doesn't actually ensure all fsync'd data gets into flash.
It certainly looks pretty with all those capacitors on top in the photos, but 
we need some plug pull tests to be sure. 

2. Apologies for the typo in the previous post, raidz5 should have been raidz1. 

3. Also, something to think about when you start having single disk solutions 
(or non-ZFS raid, for that matter).

SSDs are so unlike HDDs. 

The samsung nvme has a UBER (uncorrectable bit error rate) measured at 1 in 
10^17. That's one bit gone bad in 12500 TB, a good number.  Chances are the 
drives fails before you hit a bit error, and if not, ZFS would catch it.

Whereas current HDDS are at the 1 in 10^14 level. That means an error every 
12TB, by the specs. That means, every time you fill your cheap 6-8TB seagate 
drive, it likely corrupted some of your data *even if it performed according to 
the spec*. (That's also why RAID5 isn't viable for rebuilding large arrays, 
incidentally).

Graeme Bell


On 07 Jul 2015, at 12:56, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote:

 
 
 - Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Mkrtchyan, Tigran tigran.mkrtch...@desy.de
 Cc: Graeme B. Bell graeme.b...@nibio.no, Steve Crawford 
 scrawf...@pinpointresearch.com, Wes Vaske (wvaske)
 wva...@micron.com, pgsql-performance pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:38:10 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 I am unsure about the performance side but, ZFS is generally very attractive 
 to
 me.
 
 Key advantages:
 
 1) Checksumming and automatic fixing-of-broken-things on every file (not just
 postgres pages, but your scripts, O/S, program files).
 2) Built-in  lightweight compression (doesn't help with TOAST tables, in fact
 may slow them down, but helpful for other things). This may actually be a net
 negative for pg so maybe turn it off.
 3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that 
 it's
 safe to replace a RAID array with a single drive... you can use a couple of
 NVMe SSDs with ZFS mirror or zraid, and  get the same availability you'd get
 from a RAID controller. Slightly better, arguably, since they claim to have
 fixed the raid write-hole problem.
 4) filesystem snapshotting
 
 Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU
 with multiple NVMe drives will outperform quite a lot of the alternatives, 
 with
 great data integrity guarantees.
 
 
 We are planing to have a test setup as well. For now I have single NVMe SSD 
 on my
 test system:
 
 # lspci | grep NVM
 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD 
 Controller 171X (rev 03)
 
 # mount | grep nvm
 /dev/nvme0n1p1 on /var/lib/pgsql/9.5 type ext4 
 (rw,noatime,nodiratime,data=ordered)
 
 
 and quite happy with it. We have write heavy workload on it to see when it 
 will
 break. Postgres Performs very well. About x2.5 faster than with regular disks
 with a single client and almost linear with multiple clients (picture 
 attached.
 On Y number of high level op/s our application does, X number of clients). The
 setup is used last 3 months. Looks promising but for production we need to
 to have disk size twice as big as on the test system. Until today, I was
 planning to use a RAID10 with a HW controller...
 
 Related to ZFS. We use ZFSonlinux and behaviour is not as good as with 
 solaris.
 Let's re-phrase it: performance is unpredictable. We run READZ2 with 30x3TB 
 disks.
 
 Tigran.
 
 
 Haven't built one yet. Hope to, later this year. Steve, I would love to know
 more about how you're getting on with your NVMe disk in postgres!
 
 Graeme.
 
 On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote:
 
 Thanks for the Info.
 
 So if RAID controllers are not an option, what one should use to build
 big databases? LVM with xfs? BtrFs? Zfs?
 
 Tigran.
 
 - Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Steve Crawford scrawf...@pinpointresearch.com
 Cc: Wes Vaske (wvaske) wva...@micron.com, pgsql-performance
 pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:22:00 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 Completely agree with Steve.
 
 1. Intel NVMe looks like the best bet if you have modern enough hardware 
 for
 NVMe. Otherwise e.g. S3700 mentioned elsewhere.
 
 2. RAID controllers.
 
 We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various 
 machines.
 This might give people idea about where the risk lies in the path from 
 disk to
 CPU.
 
 We've had 2 RAID card failures in the last 12 months that nuked the array 
 with
 days of downtime, and 2 problems with batteries suddenly becoming useless 
 or
 suddenly reporting wildly varying

Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell
Hi Karl,

Great post, thanks. 

Though I don't think it's against conventional wisdom to aggregate writes into 
larger blocks rather than rely on 4k performance on ssds :-) 

128kb blocks + compression certainly makes sense. But it might make less sense 
I suppose if you had some incredibly high rate of churn in your rows. 
But for the work we do here, we could use 16MB blocks for all the difference it 
would make. (Tip to others: don't do that. 128kb block performance is already 
enough out the IO bus to most ssds)

Do you have your WAL log on a compressed zfs fs? 

Graeme Bell


On 07 Jul 2015, at 13:28, Karl Denninger k...@denninger.net wrote:

 Lz4 compression and standard 128kb block size has shown to be materially 
 faster here than using 8kb blocks and no compression, both with rotating 
 disks and SSDs.
 
 This is workload dependent in my experience but in the applications we put 
 Postgres to there is a very material improvement in throughput using 
 compression and the larger blocksize, which is counter-intuitive and also 
 opposite the conventional wisdom.
 
 For best throughput we use mirrored vdev sets.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

I am unsure about the performance side but, ZFS is generally very attractive to 
me. 

Key advantages:

1) Checksumming and automatic fixing-of-broken-things on every file (not just 
postgres pages, but your scripts, O/S, program files). 
2) Built-in  lightweight compression (doesn't help with TOAST tables, in fact 
may slow them down, but helpful for other things). This may actually be a net 
negative for pg so maybe turn it off. 
3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that 
it's safe to replace a RAID array with a single drive... you can use a couple 
of NVMe SSDs with ZFS mirror or zraid, and  get the same availability you'd get 
from a RAID controller. Slightly better, arguably, since they claim to have 
fixed the raid write-hole problem. 
4) filesystem snapshotting

Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU 
with multiple NVMe drives will outperform quite a lot of the alternatives, with 
great data integrity guarantees. 

Haven't built one yet. Hope to, later this year. Steve, I would love to know 
more about how you're getting on with your NVMe disk in postgres!

Graeme. 

On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote:

 Thanks for the Info.
 
 So if RAID controllers are not an option, what one should use to build
 big databases? LVM with xfs? BtrFs? Zfs?
 
 Tigran.
 
 - Original Message -
 From: Graeme B. Bell graeme.b...@nibio.no
 To: Steve Crawford scrawf...@pinpointresearch.com
 Cc: Wes Vaske (wvaske) wva...@micron.com, pgsql-performance 
 pgsql-performance@postgresql.org
 Sent: Tuesday, July 7, 2015 12:22:00 PM
 Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
 
 Completely agree with Steve.
 
 1. Intel NVMe looks like the best bet if you have modern enough hardware for
 NVMe. Otherwise e.g. S3700 mentioned elsewhere.
 
 2. RAID controllers.
 
 We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines.
 This might give people idea about where the risk lies in the path from disk 
 to
 CPU.
 
 We've had 2 RAID card failures in the last 12 months that nuked the array 
 with
 days of downtime, and 2 problems with batteries suddenly becoming useless or
 suddenly reporting wildly varying temperatures/overheating. There may have 
 been
 other RAID problems I don't know about.
 
 Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week 
 (I
 guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs.
 
 So by my estimates:
 30% annual problem rate with RAID controllers
 30-50% failure rate with Seagate HDDs (backblaze saw similar results)
 0% failure rate with HGST HDDs.
 0% failure in our SSDs.   (to be fair, our one samsung SSD apparently has a 
 bug
 in TRIM under linux, which I'll need to investigate to see if we have been
 affected by).
 
 also, RAID controllers aren't free - not just the money but also the 
 management
 of them (ever tried writing a complex install script that interacts work with
 MegaCLI? It can be done but it's not much fun.). Just take a look at the
 MegaCLI manual and ask yourself... is this even worth it (if you have a good
 MTBF on an enterprise SSD).
 
 RAID was meant to be about ensuring availability of data. I have trouble
 believing that these days
 
 Graeme Bell
 
 
 On 06 Jul 2015, at 18:56, Steve Crawford scrawf...@pinpointresearch.com 
 wrote:
 
 
 2. We don't typically have redundant electronic components in our servers. 
 Sure,
 we have dual power supplies and dual NICs (though generally to handle 
 external
 failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no
 backup RAID card. Intel Enterprise SSD already have power-fail protection 
 so I
 don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD
 I'm left to wonder if placing a RAID card in front merely adds a new point 
 of
 failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at
 you, RAID backup battery).
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Completely agree with Steve.

1. Intel NVMe looks like the best bet if you have modern enough hardware for 
NVMe. Otherwise e.g. S3700 mentioned elsewhere.

2. RAID controllers. 

We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines. 
This might give people idea about where the risk lies in the path from disk to 
CPU. 

We've had 2 RAID card failures in the last 12 months that nuked the array with 
days of downtime, and 2 problems with batteries suddenly becoming useless or 
suddenly reporting wildly varying temperatures/overheating. There may have been 
other RAID problems I don't know about. 

Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I 
guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs.

So by my estimates:
30% annual problem rate with RAID controllers
30-50% failure rate with Seagate HDDs (backblaze saw similar results)
0% failure rate with HGST HDDs. 
0% failure in our SSDs.   (to be fair, our one samsung SSD apparently has a bug 
in TRIM under linux, which I'll need to investigate to see if we have been 
affected by). 

also, RAID controllers aren't free - not just the money but also the management 
of them (ever tried writing a complex install script that interacts work with 
MegaCLI? It can be done but it's not much fun.). Just take a look at the 
MegaCLI manual and ask yourself... is this even worth it (if you have a good 
MTBF on an enterprise SSD).

RAID was meant to be about ensuring availability of data. I have trouble 
believing that these days

Graeme Bell


On 06 Jul 2015, at 18:56, Steve Crawford scrawf...@pinpointresearch.com wrote:

 
 2. We don't typically have redundant electronic components in our servers. 
 Sure, we have dual power supplies and dual NICs (though generally to handle 
 external failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks 
 and...no backup RAID card. Intel Enterprise SSD already have power-fail 
 protection so I don't need a RAID card to give me BBU. Given the MTBF of good 
 enterprise SSD I'm left to wonder if placing a RAID card in front merely adds 
 a new point of failure and scheduled-downtime-inducing hands-on maintenance 
 (I'm looking at you, RAID backup battery).



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-07 Thread Graeme B. Bell
Hi everyone,

I've written a new open source tool for easily parallelising SQL scripts in 
postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]

Using it, I'm seeing a problem I've seen in other postgres projects involving 
parallelisation in the last 12 months.

Basically:

- I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs and 
controller etc, carefully configured kernel/postgresql.conf for high 
performance.

- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), 
e.g. almost up to 16x performance improvement.

- Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT 
myfunction(some_stuff)) do not parallelise well, even when they are independent 
or accessing tables in a read-only way. They hit a limit at 2.5x performance 
improvement relative to single-CPU performance (pg9.4) and 2x performance 
(pg9.3). This is about 6 times slower than I'm expecting. 

- Can't see what would be locking. It seems like it's the pl/pgsql environment 
itself that is somehow locking or incurring some huge frictional costs. Whether 
I use independently defined functions, independent source tables, independent 
output tables, makes no difference whatsoever, so it doesn't feel 'locky'. It 
also doesn't seem to be WAL/synchronisation related, as the machines I'm using 
can hit absurdly high pgbench rates, and I'm using unlogged tables.

Curious? Take a quick peek here: 
https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

Wondering what I'm missing here. Any ideas?

Graeme. 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

Thanks, this is very useful to know about the 730. When you say 'tested it with 
plug-pulls', you were using diskchecker.pl, right?

Graeme.

On 07 Jul 2015, at 14:39, Karl Denninger k...@denninger.net wrote:

 
 Incidentally while there are people who have questioned the 730 series power 
 loss protection I've tested it with plug-pulls and in addition it watchdogs 
 its internal power loss capacitors -- from the smartctl -a display of one of 
 them on an in-service machine here:
 
 175 Power_Loss_Cap_Test 0x0033   100   100   010Pre-fail  Always  
  -   643 (4 6868)



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Graeme B. Bell

As I have warned elsewhere,

The M500/M550 from $SOME_COMPANY is NOT SUITABLE for postgres unless you have a 
RAID controller with BBU to protect yourself.
The M500/M550 are NOT plug-pull safe despite the 'power loss protection' 
claimed on the packaging. Not all fsync'd data is preserved in the event of a 
power loss, which completely undermines postgres's sanity. 

I would be extremely skeptical about the M500DC given the name and 
manufacturer. 

I went to quite a lot of trouble to provide $SOME_COMPANYs engineers with the 
full details of this fault after extensive testing (we have e.g. 20-25 of these 
disks) on multiple machines and controllers, at their request. Result: they 
stopped replying to me, and soon after I saw their PR reps talking about how 
'power loss protection isn't about protecting all data during a power loss'. 

The only safe way to use an M500/M550 with postgres is:

a) disable the disk cache, which will cripple performance to about 3-5% of 
normal.
b) use a battery backed or cap-backed RAID controller, which will generally 
hurt performance, by limiting you to the peak performance of the flash on the 
raid controller. 

If you are buying such a drive, I strongly recommend buying only one and doing 
extensive plug pull testing before commiting to several. 
For myself, my time is valuable enough that it will be cheaper to buy intel in 
future. 

Graeme.

On 07 Jul 2015, at 15:12, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, Jul 2, 2015 at 1:00 PM, Wes Vaske (wvaske) wva...@micron.com wrote:
 Storage Review has a pretty good process and reviewed the M500DC when it 
 released last year. 
 http://www.storagereview.com/micron_m500dc_enterprise_ssd_review
 
  
 
 The only database-specific info we have available are for Cassandra and MSSQL:
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/cassandra_and_m500dc_enterprise_ssd_tech_brief.pdf
 
 http://www.micron.com/~/media/documents/products/technical-marketing-brief/sql_server_2014_and_m500dc_raid_configuration_tech_brief.pdf
 
  
 
 (some of that info might be relevant)
 
  
 
 In terms of endurance, the M500DC is rated to 2 Drive Writes Per Day (DWPD) 
 for 5-years. For comparison:
 
 Micron M500DC (20nm) – 2 DWPD
 
 Intel S3500 (20nm) – 0.3 DWPD
 
 Intel S3510 (16nm) – 0.3 DWPD
 
 Intel S3710 (20nm) – 10 DWPD
 
  
 
 They’re all great drives, the question is how write-intensive is the workload.
 
 
 
 
 Intel added a new product, the 3610, that is rated for 3 DWPD.  Pricing looks 
 to be around 1.20$/GB.
 
 merlin 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

2015-07-03 Thread Graeme B. Bell
Hi everyone,

I've written a new open source tool for easily parallelising SQL scripts in 
postgres.   [obligatory plug:   https://github.com/gbb/par_psql   ]

Using it, I'm seeing a problem that I've also seen in other postgres projects 
involving high degrees of parallelisation in the last 12 months.

Basically:

- I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs 
and controller etc, carefully configured kernel/postgresql.conf for high 
performance.

- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), 
e.g. almost up to 16x performance improvement.

- Non-DB stuff like GDAL, python etc. parallelise nearly perfectly. 

- HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT 
myfunction(some_stuff)) do not parallelise well, even when they are 
independently defined functions, or accessing tables in a read-only way. They 
hit a limit of 2.5x performance improvement relative to single-CPU performance 
(pg9.4) and merely 2x performance (pg9.3) regardless of how many CPU cores I 
throw at them. This is about 6 times slower than I'm expecting. 


I can't see what would be locking. It seems like it's the pl/pgsql environment 
itself that is somehow locking or incurring some huge frictional costs. Whether 
I use independently defined functions, independent source tables, independent 
output tables, makes no difference whatsoever, so it doesn't feel 
'lock-related'. It also doesn't seem to be WAL/synchronisation related, as the 
machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged 
tables for output. 

Take a quick peek here: 
https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

I'm wondering what I'm missing here. Any ideas? 

Graeme.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Need more IOPS? This should get you drooling... (5xnvme drives)

2015-06-04 Thread Graeme B. Bell
I previously mentioned on the list that nvme drives are going to be a very big 
thing this year for DB performance.

This video shows what happens if you get an 'enthusiast'-class motherboard and 
5 of the 400GB intel 750 drives.
https://www.youtube.com/watch?v=-hE8Vg1qPSw

Total transfer speed: 10.3  GB/second.
Total IOPS: 2 million  (!)

+ nice power loss protection (Intel)
+ lower latency too-about 20ms vs 100ms for SATA3   
(http://www.anandtech.com/show/7843/testing-sata-express-with-asus/4)
+ substantially lower CPU use per I/O  
(http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme/5)

You're probably wondering 'how much' though? 
$400 per drive! Peanuts. 

Assuming for the moment you're working in RAID0 or with tablespaces, and just 
want raw speed:
$2400 total for 2 TB of storage, including a good quality motherboard, with 2 
million battery backed IOPS and 10GB/second bulk transfers.

These drives are going to utterly wreck the profit margins on high-end DB 
hardware. 

Graeme Bell

p.s. No, I don't have shares in Intel, but maybe I should...




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need more IOPS? This should get you drooling... (5xnvme drives)

2015-06-04 Thread Graeme B. Bell

Images/data here

http://www.pcper.com/reviews/Storage/Five-Intel-SSD-750s-Tested-Two-Million-IOPS-and-10-GBsec-Achievement-Unlocked



On 04 Jun 2015, at 13:07, Graeme Bell g...@skogoglandskap.no wrote:

 I previously mentioned on the list that nvme drives are going to be a very 
 big thing this year for DB performance.
 
 This video shows what happens if you get an 'enthusiast'-class motherboard 
 and 5 of the 400GB intel 750 drives.
 https://www.youtube.com/watch?v=-hE8Vg1qPSw
 
 Total transfer speed: 10.3  GB/second.
 Total IOPS: 2 million  (!)
 
 + nice power loss protection (Intel)
 + lower latency too-about 20ms vs 100ms for SATA3   
 (http://www.anandtech.com/show/7843/testing-sata-express-with-asus/4)
 + substantially lower CPU use per I/O  
 (http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme/5)
 
 You're probably wondering 'how much' though? 
 $400 per drive! Peanuts. 
 
 Assuming for the moment you're working in RAID0 or with tablespaces, and just 
 want raw speed:
 $2400 total for 2 TB of storage, including a good quality motherboard, with 2 
 million battery backed IOPS and 10GB/second bulk transfers.
 
 These drives are going to utterly wreck the profit margins on high-end DB 
 hardware. 
 
 Graeme Bell
 
 p.s. No, I don't have shares in Intel, but maybe I should...
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need more IOPS? This should get you drooling... (5xnvme drives)

2015-06-04 Thread Graeme B. Bell

Note also - these disks are close to the performance of memory from a few 
generations ago (e.g. 10GB/second bulk transfers)

They also have bigger/faster versions of the drives, 1.2TB each. 

I suspect that 5 of those would feel somewhat similar to having 6TB of memory 
in your db server ... :-)  [better in fact, since writes are fast too]

Graeme.

On 04 Jun 2015, at 13:29, Dorian Hoxha dorian.ho...@gmail.com wrote:

 This looks great when you want in-memory (something like unlogged tables) and 
 you also want replication. (meaning, I don't know of an alternative to get 
 replication with unlogged than to just get faster drives + logged tables?)
 
 On Thu, Jun 4, 2015 at 1:23 PM, Graeme B. Bell g...@skogoglandskap.no wrote:
 
 Images/data here
 
 http://www.pcper.com/reviews/Storage/Five-Intel-SSD-750s-Tested-Two-Million-IOPS-and-10-GBsec-Achievement-Unlocked
 
 
 
 On 04 Jun 2015, at 13:07, Graeme Bell g...@skogoglandskap.no wrote:
 
  I previously mentioned on the list that nvme drives are going to be a very 
  big thing this year for DB performance.
 
  This video shows what happens if you get an 'enthusiast'-class motherboard 
  and 5 of the 400GB intel 750 drives.
  https://www.youtube.com/watch?v=-hE8Vg1qPSw
 
  Total transfer speed: 10.3  GB/second.
  Total IOPS: 2 million  (!)
 
  + nice power loss protection (Intel)
  + lower latency too-about 20ms vs 100ms for SATA3   
  (http://www.anandtech.com/show/7843/testing-sata-express-with-asus/4)
  + substantially lower CPU use per I/O  
  (http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme/5)
 
  You're probably wondering 'how much' though?
  $400 per drive! Peanuts.
 
  Assuming for the moment you're working in RAID0 or with tablespaces, and 
  just want raw speed:
  $2400 total for 2 TB of storage, including a good quality motherboard, with 
  2 million battery backed IOPS and 10GB/second bulk transfers.
 
  These drives are going to utterly wreck the profit margins on high-end DB 
  hardware.
 
  Graeme Bell
 
  p.s. No, I don't have shares in Intel, but maybe I should...
 
 
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres is using 100% CPU

2015-06-02 Thread Graeme B. Bell
 I believe yes / 0 are the default settings for synchronous commit and 
 commit_delay.   ** (Interestingly the manual pages do not specify.) ** 

Sorry, I've just spotted the settings in the text. The statement (marked **) is 
incorrect. 

Defaults are yes/0. 
(http://www.postgresql.org/docs/9.4/static/runtime-config-wal.html)

Graeme.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres is using 100% CPU

2015-06-02 Thread Graeme B. Bell
On Sun, May 31, 2015 at 7:53 PM, Yves Dorfsman y...@zioup.com wrote:

 That's the thing, even on an old laptop with a slow IDE disk, 273
 individual
 inserts should not take more than a second.
 

I think that would depend on settings such as synchronous_commit, commit_delay, 
or whether 2-phase commit is being used. 

If synchronous commit is enabled and commit_delay was not used (e.g. 0), and 
you have a client synchronously making individual inserts to the DB (1 
transaction each), then surely you have delays due to waiting for each 
transaction to commit synchronously to WAL on disk? 

I believe yes / 0 are the default settings for synchronous commit and 
commit_delay. (Interestingly the manual pages do not specify.)


Assuming a 5400RPM laptop drive (which is a typical drive - some laptop drives 
run  5000RPM), and assuming you are writing a sequential log to disk (with 
very short gaps between entries being added, e.g. no seek time, only rotational 
latency) will mean 5400 transactions per minute, 1 write per rotation. 

That's a maximum 90 transactions per second synchronised to WAL. It would take 
just over 3 seconds.


Ashik, try altering your postgresql.conf to say 'commit_delay=100' or 
'synchronous_commit=off'. Let us know if that fixes the problem. Read up on the 
options before you change them.

Graeme Bell





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Some performance testing?

2015-04-09 Thread Graeme B. Bell
 
 Josh, there seems to be an inconsistency in your blog.  You say 3.10.X is
 safe, but the graph you show with the poor performance seems to be from
 3.13.X which as I understand it is a later kernel.  Can you clarify which
 3.X kernels are good to use and which are not?

Sorry to cut in - 

So far we've found kernel 3.18 to be excellent for postgres 9.3 performance 
(pgbench + our own queries run much faster than with the 2.6.32-504 centos 6 
kernel, and we haven't encountered random stalls or slowness).

We use elrepo to get prebuilt rpms of the latest mainline stable kernel 
(kernel-ml).

http://elrepo.org/tiki/kernel-ml

Graeme Bell

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Some performance testing?

2015-04-09 Thread Graeme B. Bell
ext4 settings

ext4, nobarrier
noatime+nodatime, 
stripestride aligned between raid10  ext4 correctly.


Some other useful things to know

-- h710p
readahead disabled on H710P
writeback cache enabled on H710P  
Direct IO enabled on H710P

-- os filesystem settings
linux readahead enabled (16384), 
nr_requests=975
NOOP scheduler
non-NUMA

-- pg
io_concurrency on
async commit.*** see below!

All settings were kept identical on the server before and after the kernel 
change, so this performance increase can be entirely attributed to the newer 
kernel  and its synergies with our configuration. 3.18 contains about 5-10 
years of linux kernel development vs. 2.6 kernels (except where backported).

I have conducted quite a lot of plug-pull testing with diskchecker.pl, and 
rather a lot of testing of scheduling/IO/RAID controller/etc parameters. The 
OS/RAID controller/file system settings are as fast as I've been able to 
achieve without compromising database  integrity (please note: this server can 
run async_commit because of the work we use it for, but we do not use that 
setting on our other main production servers). 

Our local DBs run extremely nicely for all our normal queries which involve 
quite a mix of random small IO and full-table operations on e.g. 20GB+ tables , 
so they're not optimised for pgbench specifically.

Graeme Bell



On 09 Apr 2015, at 13:56, Przemysław Deć przemyslaw@linuxpolska.pl wrote:

 Wow, thats huge performance gain.
 And it was on ext4?
 
 -- 
 Linux Polska Sp. z o.o.
 Przemysław Deć - Senior Solutions Architect
 RHCSA, RHCJA, PostgreSQL Professional Certification
 mob: +48 519 130 141
 email: p...@linuxpolska.pl
 www.linuxpolska.pl 
 ___
 Linux Polska Sp. z o. o.  Al. Jerozolimskie 123A (26 p.); 02-017 Warszawa; 
 tel. (+48) 222139571; fax (+48)222139671
 KRS - 326158 Sąd Rejonowy dla M. St. Warszawy w Warszawie, XII Wydział 
 Gospodarczy KRS
 Kapitał zakładowy wpłacony 1 000 500PLN;  NIP 7010181018;  REGON 141791601
 Mail Attachment.jpeg
 
 
 2015-04-09 13:01 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no:
 
 From a measurement I took back when we did the upgrade:
 
 performance with 2.6: (pgbench, size 100, 32 clients)
 
 48 651 transactions per second (read only)
 6 504 transactions per second (read-write)
 
 
 performance with 3.18 (pgbench, size 100, 32 clients)
 
 129 303 transactions per second  (read only)
 16 895 transactions (read-write)
 
 
 So that looks like 2.6x improvement to reads and writes. That was an 8 core 
 xeon server with H710P and 4x crucial M550 SSDs in RAID, pg9.3.
 
 Graeme Bell
 
 
 
 
 
 On 09 Apr 2015, at 12:39, Przemysław Deć przemyslaw@linuxpolska.pl 
 wrote:
 
  Can you say how much faster it was?
 
  Przemek Deć
 
  2015-04-09 11:04 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no:
  
   Josh, there seems to be an inconsistency in your blog.  You say 3.10.X is
   safe, but the graph you show with the poor performance seems to be from
   3.13.X which as I understand it is a later kernel.  Can you clarify which
   3.X kernels are good to use and which are not?
 
  Sorry to cut in -
 
  So far we've found kernel 3.18 to be excellent for postgres 9.3 performance 
  (pgbench + our own queries run much faster than with the 2.6.32-504 centos 
  6 kernel, and we haven't encountered random stalls or slowness).
 
  We use elrepo to get prebuilt rpms of the latest mainline stable kernel 
  (kernel-ml).
 
  http://elrepo.org/tiki/kernel-ml
 
  Graeme Bell
 
  --
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 
 
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Some performance testing?

2015-04-09 Thread Graeme B. Bell

From a measurement I took back when we did the upgrade:

performance with 2.6: (pgbench, size 100, 32 clients)

48 651 transactions per second (read only)
6 504 transactions per second (read-write)


performance with 3.18 (pgbench, size 100, 32 clients)

129 303 transactions per second  (read only)
16 895 transactions (read-write)


So that looks like 2.6x improvement to reads and writes. That was an 8 core 
xeon server with H710P and 4x crucial M550 SSDs in RAID, pg9.3.   

Graeme Bell





On 09 Apr 2015, at 12:39, Przemysław Deć przemyslaw@linuxpolska.pl wrote:

 Can you say how much faster it was?
 
 Przemek Deć
 
 2015-04-09 11:04 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no:
 
  Josh, there seems to be an inconsistency in your blog.  You say 3.10.X is
  safe, but the graph you show with the poor performance seems to be from
  3.13.X which as I understand it is a later kernel.  Can you clarify which
  3.X kernels are good to use and which are not?
 
 Sorry to cut in -
 
 So far we've found kernel 3.18 to be excellent for postgres 9.3 performance 
 (pgbench + our own queries run much faster than with the 2.6.32-504 centos 6 
 kernel, and we haven't encountered random stalls or slowness).
 
 We use elrepo to get prebuilt rpms of the latest mainline stable kernel 
 (kernel-ml).
 
 http://elrepo.org/tiki/kernel-ml
 
 Graeme Bell
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] NVMe or AHCI PCI-express? A comment for people benchmarking...

2015-04-09 Thread Graeme B. Bell
A tangent to the performance testing thread here, but an important issue that 
you will see come up in your work this year or next. 

PCIe SSD may include AHCI PCI SSD or   NVMe PCI SSD.

AHCI = old style, basically it's faster than SATA3 but quite similar in terms 
of how the operating system sees the flash device.
NVMe = new style, requires a very new motherboard, operating system  drivers, 
but extremely fast and low latency, very high IOPS. 
For example, Macbooks have PCIe SSDs in them, but not NVMe (currently).

The difference is very important since NVMe offers multiples of performance in 
terms of everything we love: lower latency, higher IOPS, lower CPU overhead and 
higher throughput. 
http://www.anandtech.com/show/7843/testing-sata-express-with-asus/4
scroll down to the App to SSD IO Read Latency graph. Look at the two bottom 
lines.

So I'd suggest it's probably worth noting in any benchmark if you are using 
NVMe and if so which driver version, since development is ongoing.

On the topic of PCIe NVMe SSDs, some interesting reading:

- 
http://www.tweaktown.com/reviews/6773/samsung-xs1715-1-6tb-2-5-inch-nvme-pcie-enterprise-ssd-review/index.html
it can deliver 750,000 random read IOPS and 115,000 write IOPS 

- or any of these nice toys... 
http://imagescdn.tweaktown.com/content/6/7/6773_11777_samsung_xs1715_1_6tb_2_5_inch_nvme_pcie_enterprise_ssd_review.png

all with capacitor backing (which you should plug-pull test, of course).

Graeme.


 I currently have access to a matched pair of 20-core, 128GB RAM servers
 with SSD-PCI storage, for about 2 weeks before they go into production.
 Are there any performance tests people would like to see me run on
 these?  Otherwise, I'll just do some pgbench and DVDStore.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Configuration tips for very large database

2015-02-13 Thread Graeme B. Bell

 
 
 Hi Nico,
 
 No one has mentioned the elephant in the room, but a database can
 be very I/O intensive and you may not be getting the performance
 you need from your virtual disk running on your VMware disk subsystem.
 What do IOmeter or other disk performance evaluation software report?
 
 Regards,
 Ken

Anecdatum: 

Moving from a contended VMware hard-disk based filesystem running over the 
network, to a bare metal RAID10 SSD, resulted in many DB operations running 
20-30x faster.

Table sizes circa 10-20G, millions of rows. 

Graeme.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Survey: Max TPS you've ever seen

2015-02-12 Thread Graeme B. Bell
 1. O/S


Under O/S, don't forget to mention linux kernel version. 

We saw a MASSIVE increase in TPS (I think it was a doubling? Don't have the 
data to hand right now) on our multicore RHEL6 servers, when moving from a 
stock RHEL6 kernel to an ELREPO 3.18 series kernel. That's what 10 years of 
kernel development will do for you. 

 - 16 SSD + 4 PCIe SSD storage

Similarly, it's useful to specify

- exactly which drives were being used during the test  (PCIe and SATA SSDs 
perform pretty differently!). Similarly if you're using e.g. a dell server with 
a ssd cache in front of the disks, remember to mention it. 

- Also exactly which PCI interface, now that there are different types of PCI 
attached SSD becoming available (traditional pciE SSD vs NVMe) with 
substantially different performance and overheads. 

(Performance junkies: Check out nvmE if you haven't heard of it) 
   
http://www.thessdreview.com/daily-news/latest-buzz/marvell-displays-88ss1094-nvme-ssd-controller-2-9gbs/
   
http://www.thessdreview.com/daily-news/latest-buzz/memblaze-pmc-collaborate-pblaze4-pcie-ssd-hyperscale-data-centers-3-2gbs-reads-85-iops/

- Which firmware (some ssds exhibit noteable performance changes with firmware)

- which filesystem and filesystem options (try benchmarking with a fresh ext4 
filesystem and nobarriers - then compare against a mostly full filesystem with 
barriers on an SSD. You should see quite a difference)

- which RAID controller.  (Good luck if you're using an H710 with modern SSDs 
for example... the controller's write cache is the choke point for performance)

- readahead settings (We *tripled* our read performance on large 
tables/transfers by changing this from the default value in linux up to around 
16MB)

- filesystem queue depth and scheduler ( e.g. shallow/deep queues on ssds and 
e.g. cfq vs. noop schedulers on ssds)

- if anything else is running on the same server/filesystem (e.g. background db 
activity, web servers etc, operating system sharing the same disk)

- even things like raid stripe size and filesystem block size can have a small 
impact if you're going for absolute maximum TPS. 

However honestly all of this is probably dwarfed by the question of what you're 
doing with your database. If what you do doesn't actually look like pgbench 
activity (e.g. your server is mostly burning clock cycles on running ancient 
legacy pl/sql code) then you're taking the wrong benchmark if you use pgbench.  


(Also, another note for performance junkies - some interesting news from the 
gaming world - spending extra money on 'fast memory' is probably a waste in the 
current generation of computers)

  http://www.anandtech.com/show/7364/memory-scaling-on-haswell/3

Graeme Bell

On 11 Feb 2015, at 01:31, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:

 On 10/02/15 10:29, Gavin Flower wrote:
 On 10/02/15 08:30, Luis Antonio Dias de Sá Junior wrote:
 Hi,
 
 A survay: with pgbench using TPS-B, what is the maximum TPS you're
 ever seen?
 
 For me: 12000 TPS.
 
 --
 Luis Antonio Dias de Sá Junior
 Important to specify:
 
 1. O/S
 2. version of PostgreSQL
 3. PostgreSQL configuration
 4. hardware configuration
 5. anything else that might affect performance
 
 I suspect that Linux will out perform Microsoft on the same hardware,
 and optimum configuration for both O/S's...
 
 
 
 Yes, exactly - and also the pgbench parameters:
 
 - scale
 - number of clients
 - number of threads
 - statement options (prepared or simple etc)
 - length of test
 
 We've managed to get 4 to 6 TPS on some pretty serious hardware:
 
 - 60 core, 1 TB ram
 - 16 SSD + 4 PCIe SSD storage
 - Ubuntu 14.04
 - Postgres 9.4 (beta and rc)
 
 ...with Postgres parameters customized:
 
 - checkpoint_segments 1920
 - checkpoint_completion_target 0.8
 - wal_buffers  256MB
 - wal_sync_method open_datasync
 - shared_buffers 10GB
 - max_connections 600
 - effective_io_concurrency 10
 
 ..and finally pgbench parameters
 
 - scale 2000
 - clients 32, 64, 128, 256 (best results at 32 and 64 generally)
 - threads = 1/2 client number
 - prepared option
 - 10 minute test run time
 
 Points to note, we did *not* disable fsync or prevent buffers being actually 
 written (common dirty tricks in benchmarks). However, as others have remarked 
 - raw numbers mean little. Pgbench is very useful for testing how tuning 
 configurations are helping (or not) for a particular hardware and software 
 setup, but is less useful for answering the question how many TPS can 
 postgres do...
 
 Regards
 
 Mark
 
 
 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Survey: Max TPS you've ever seen

2015-02-10 Thread Graeme B. Bell
I'd suggest you run it on a large ramdisk with fsync turned off on a 32 core 
computer, see what you get, that will be a good indication of a maximum.

Keep in mind though that 'postgres' with fsync (vs. without) is such a 
different creature that the comparison isn't meaningful. 
Similarly 'postgres' on volatile backing store vs. non-volatile isn't really a 
meaningful comparison. 

There's also a question here about the 't' in TPS. If you have no fsync and 
volatile storage, are you really doing 'transactions'? Depending on the 
definition you take, a transaction may have some sense of 'reliability' or 
atomicity which isn't reflected well in a ramdisk/no-fsync benchmark. 

It's probably not ideal to fill a mailing list with numbers that have no 
meaning attached to them, so why not set up a little web database or Google doc 
to record max TPS and how it was achieved?

For example, imagine I tell you that the highest I've achieved is 124 tps. 
How does it help you if I say that? 

Graeme Bell

On 10 Feb 2015, at 11:48, Luis Antonio Dias de Sá Junior 
luisjunior...@gmail.com wrote:

 No problem with this. If anyone want to specify more details.
 
 But I want to know how far postgres can go. No matter OS or other variables.
 
 Gavin, you got more than 12000 TPS?
 
 2015-02-09 19:29 GMT-02:00 Gavin Flower gavinflo...@archidevsys.co.nz:
 On 10/02/15 08:30, Luis Antonio Dias de Sá Junior wrote:
 Hi,
 
 A survay: with pgbench using TPS-B, what is the maximum TPS you're ever seen?
 
 For me: 12000 TPS.
 
 -- 
 Luis Antonio Dias de Sá Junior
 Important to specify:
 
 1. O/S
 2. version of PostgreSQL
 3. PostgreSQL configuration
 4. hardware configuration
 5. anything else that might affect performance
 
 I suspect that Linux will out perform Microsoft on the same hardware, and 
 optimum configuration for both O/S's...
 
 
 Cheers,
 Gavin
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 
 -- 
 Luis Antonio Dias de Sá Junior



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8xIntel S3500 SSD in RAID10 on Dell H710p

2014-12-16 Thread Graeme B. Bell
 
 I have a beast of a Dell server with the following specifications:
   • 4x Xeon E5-4657LV2 (48 cores total)
   • 196GB RAM
   • 2x SCSI 900GB in RAID1 (for the OS)
   • 8x Intel S3500 SSD 240GB in RAID10
   • H710p RAID controller, 1GB cache
 Centos 6.6, RAID10 SSDs uses XFS (mkfs.xfs -i size=512 /dev/sdb).

Things to check

- disk cache settings (EnDskCache - for SSD should be on or you're going to 
lose 90% of your performance)

- OS settings e.g. 

echo noop  /sys/block/sda/queue/scheduler
echo 975  /sys/block/sda/queue/nr_requests
blockdev --setra 16384 /dev/sdb

- OS kernel version 

We use H710Ps with SSDs as well, and these settings make a measurable 
difference to our performance here (though we measure more than just pgbench 
since it's a poor proxy for our use cases).

Also

- SSDs - is the filesystem aligned and block size chosen correctly (you don't 
want to be forced to read 2 blocks of SSD to get every data block)? RAID stripe 
size? May make a small difference. 

- are the SSDs all sitting on different SATA channels? You don't want them to 
be forced to share one channel's worth of bandwidth. The H710P has 8 SATA 
channels I think (?) and you mention 10 devices above. 

Graeme Bell.

On 10 Dec 2014, at 00:28, Strahinja Kustudić strahin...@nordeus.com wrote:

 I have a beast of a Dell server with the following specifications:
   • 4x Xeon E5-4657LV2 (48 cores total)
   • 196GB RAM
   • 2x SCSI 900GB in RAID1 (for the OS)
   • 8x Intel S3500 SSD 240GB in RAID10
   • H710p RAID controller, 1GB cache
 Centos 6.6, RAID10 SSDs uses XFS (mkfs.xfs -i size=512 /dev/sdb).
 
 Here are some relevant postgresql.conf settings:
 shared_buffers = 8GB
 work_mem = 64MB
 maintenance_work_mem = 1GB
 synchronous_commit = off
 checkpoint_segments = 256
 checkpoint_timeout = 10min
 checkpoint_completion_target = 0.9
 seq_page_cost = 1.0
 effective_cache_size = 100GB
 
 I ran some fast pgbench tests with 4, 6 and 8 drives in RAID10 and here are 
 the results:
 
 time /usr/pgsql-9.1/bin/pgbench -U postgres -i -s 12000 pgbench # 292GB DB
 
 4 drives  6 drives8 drives
 105 min   98 min  94 min
 
 /usr/pgsql-9.1/bin/pgbench -U postgres -c 96 -T 600 -N pgbench   # Write test
 
 4 drives  6 drives8 drives
 6567  74278073
 
 /usr/pgsql-9.1/bin/pgbench -U postgres -c 96 -T 600 pgbench  # Read/Write test
 
 4 drives  6 drives8 drives
 3651  54747203
 
 /usr/pgsql-9.1/bin/pgbench -U postgres -c 96 -T 600 -S pgbench  # Read test
 
 4 drives  6 drives8 drives
 17628 25482   28698
 
 
 A few notes:
   • I ran these tests only once, so take these number with reserve. I 
 didn't have the time to run them more times, because I had to test how the 
 server works with our app and it takes a considerable amount of time to run 
 them all.
   • I wanted to use a bigger scale factor, but there is a bug in pgbench 
 with big scale factors.
   • Postgres 9.1 was chosen, since the app which will run on this server 
 uses 9.1.
   • These tests are with the H710p controller set to write-back (WB) and 
 with adaptive read ahead (ADRA). I ran a few tests with write-through (WT) 
 and no read ahead (NORA), but the results were worse.
   • All tests were run using 96 clients as recommended on the pgbench 
 wiki page, but I'm sure I would get better results if I used 48 clients (1 
 for each core), which I tried with the R/W test and got 7986 on 8 drives, 
 which is almost 800TPS better than with 96 clients.
 
 Since our app is tied to the Postgres performance a lot, I'm currently trying 
 to optimize it. Do you have any suggestions what Postgres/system settings I 
 could try to tweak to increase performance? I have a feeling I could get more 
 performance out of this system.
 
 
 Regards,
 Strahinja


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning the configuration

2014-12-16 Thread Graeme B. Bell
 
 I don't understand the logic behind using drives, 
 which are best for random io, for sequent io workloads.

Because they are also best for sequential IO. I get 1.3-1.4GB/second from 4 
SSDs in RAID or 500MB/s for single disk systems, even with cheap models. 
Are you getting more than that from high-end spinning rust?

Graeme.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning the configuration

2014-12-15 Thread Graeme B. Bell
 Very much agree with this.  Because SSD is fast doesn't make it suited for 
 certain things, and a streaming sequential 100% write workload is one of 
 them.   I've worked with everything from local disk to high-end SAN and even 
 at the high end we've always put any DB logs on spinning disk.  RAID1 is 
 generally sufficient.  SSD is king for read heavy random I/O workload.


1. Here we found SSD sustained serial writes were faster on SSD than to disk, 
by a factor of 3, both in RAID and single disk configurations. 

2. Also, something to watch out for is extended stalls due to synchronous write 
activity / clearing out of cache, when a lot of data has been building up in 
write caches.  By placing the WAL on the same disk as the ordinary database, 
you avoid having too much dirty cache building up because the WAL forces the 
disk to flush more often. So you can trade off some DB filesystem performance 
here to avoid blocking / IO lag spikes.

3. There's also the question of disk bays. When you have extra disks for OS, 
for logs, etc. , in some situations you're using up disks that could be used to 
extend your main database filesystem, particularly when those disks also need 
to be protected by the appropriate RAID mirrors and RAID hotspares. It can be 
cheaper to put the logs to SSD than to have 1 extra hdd + its RAID1 mirror + 
its hotspare + possible shelfspare, plus pay for a bigger chassis to have 3 
more disk bays.

4. Finally there's the issue of simplicity. If you get a fast SSD and run 
OS/logs/DB off a single RAID volume, there's less chance for error when some 
unlucky person has to do an emergency fix/rebuild later, than if they have to 
check disk caching policy etc across a range of devices and ensure different 
parts of the filesystem are mounted in all the right places. Makes 
documentation easier. 

Graeme Bell



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning for Scalability

2014-10-03 Thread Graeme B. Bell
Hi Roberto, 

Hardware etc. is a solution; but you have not yet characterised the problem. 

You should investigate if the events are mostly... 

- reads
- writes
- computationally intensive
- memory intensive
- I/O intensive
- network I/O intensive
- independent?  (e.g. does it matter if you split the database in two?)

You should also find out if the current server comfortably supports 3 million 
events per day or if you already have problems there that need addressed. 
Whereas if it handles 3 million with plenty of spare I/O, memory, CPU, network 
bandwidth, then maybe it will handle 5 million without changing anything.

Once you've gathered this information (using tools like pg_stat_statements, 
top, iotop, ... and by thinking about what the tables are doing), look at it 
and see if the answer is obvious.
If not, think about what is confusing for a while, and then write your thoughts 
and data as a new question to the list.

Graeme.



On 03 Oct 2014, at 10:55, Roberto Grandi roberto.gra...@trovaprezzi.it wrote:

 Dear Pg people,
 
 I would ask for your help considering this scaling issue. We are planning to 
 move from 3Millions of events/day instance of postgres (8 CPU, 65 gb ram) to 
 5 millions of items/day.
 What do you suggest in order to plan this switch? Add separate server? 
 Increase RAM? Use SSD?
 
 Any real help will be really precious and appreciated.
 Roberto
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-30 Thread Graeme B. Bell

 The existing cost estimation
 code effectively assumes that they're perfectly uniformly distributed;
 which is a good average-case assumption but can be horribly wrong in
 the worst case.


Sorry, just an outsider jumping in with a quick comment.

Every year or two the core count goes up. Can/should/does postgres ever attempt 
two strategies in parallel, in cases where strategy A is generally good but 
strategy B prevents bad worst case behaviour? Kind of like a Schrödinger's Cat 
approach to scheduling. What problems would it raise?

Graeme. 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-30 Thread Graeme B. Bell

Thanks for your replies everyone. 

 You can't run two plans and have them both returning rows to the client,

That wasn't what I had in mind. 

I can envisage cases where the worst case behaviour of one plan results in 
zero rows by the time the alternative plan has generated the complete result, 
never mind a single row (e.g. anything with LIMIT in it could fall into that 
category). Maybe it's enough to alleviate the problems caused by planning 
heuristics known to have bad worst-case performance that is hard to avoid with 
a single-threaded approach?

Providing we're not modifying data in the query, and providing we kill the 
'loser' thread when either (the first result / all results) come in, maybe 
there's value in letting them race and picking the best plan retrospectively.


I guess it's going into another topic, but I wonder what % of DBs/queries look 
like this: 

- little or no I/O thrash (e.g. tuples mostly in memory already or DB 
configured to have a relatively low 'random_page_cost')
- ordered results, or, the whole result set is being produced at once.
- SELECTs only

In my own work (national scale GIS) this is what most of our queries  query 
environments look like. 

Graeme


On 30 Sep 2014, at 18:32, Tom Lane t...@sss.pgh.pa.us wrote:

 Graeme B. Bell g...@skogoglandskap.no writes:
 Every year or two the core count goes up. Can/should/does postgres ever 
 attempt two strategies in parallel, in cases where strategy A is generally 
 good but strategy B prevents bad worst case behaviour? Kind of like a 
 Schrödinger's Cat approach to scheduling. What problems would it raise?
 
 You can't run two plans and have them both returning rows to the client,
 or performing inserts/updates/deletes as the case may be.
 
   regards, tom lane
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow postgreSQL 9.3.4 query

2014-09-29 Thread Graeme B. Bell

Hi,

Two things:

- Make sure you are creating a GIST index on your geometry column in postgis.
- Try using st_intersects rather than . I've noticed that  isn't using 
indices correctly in some situations e.g. function indices for st_transform'd 
geo columns.

Graeme


On 26 Sep 2014, at 18:17, Burgess, Freddie fburg...@radiantblue.com wrote:

 Workflow description:
 
 1.) User draws a polygon around an area of interest, via UI.
 2.) UI responses with how many sensors reside within the area of the polygon.
 3.) Hibernate generates the count query detailed in the attachment.
 
 Performance data is included in the attachment, via EXPLAIN PLAN, query takes 
 approx 6 minutes to return count to UI.
 Amount of data processed is also included in the attachment, 185 million row 
 partition.
 
 Hardware
 
 VM 
 80GB memory
 8 CPU Xeon
 Linux 2.6.32-431.3.1.el6.x86-64
 40TB disk, Database size: 8TB 
 PostgreSQL 9.3.4 with POSTGIS 2.1.1, Red Hat 4.4.7-4, 64 bit  
 streaming replication
 
 Postgresql.conf
 
 max_connection = 100
 shared_buffers = 32GB
 work_mem = 16MB
 maintenance_work_mem = 1GB
 seq_page_cost = 1.0
 random_page_cost = 2.0
 cpu_tuple_cost = 0.03
 effective_cache_size = 48GB
 
 
 From: Graeme B. Bell [g...@skogoglandskap.no]
 Sent: Friday, September 26, 2014 9:55 AM
 To: Burgess, Freddie
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Very slow postgreSQL 9.3.4 query
 
 A good way to start would be to introduce the query - describe what it is 
 meant to do, give some performance data (your measurements of time taken, 
 amount of data being processed, hardware used etc).
 
 Graeme.
 
 
 On 26 Sep 2014, at 15:04, Burgess, Freddie fburg...@radiantblue.com wrote:
 
 Help, please can anyone offer suggestions on how to speed this query up.
 
 thanks
 
 
 Poor Pref query.txt
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3

2014-08-21 Thread Graeme B. Bell
 HT off is common knowledge for better benchmarking result

It's wise to use the qualifer 'for better benchmarking results'. 

It's worth keeping in mind here that a benchmark is not the same as normal 
production use. 

For example, where I work we do lots of long-running queries in parallel over a 
big range of datasets rather than many short-term transactions as fast as 
possible. Our biggest DB server is also used for GDAL work and R at the same 
time*. Pretty far from pgbench; not everyone is constrained by locks.

I suppose that if your code is basically N copies of the same function, 
hyper-threading isn't likely to help much because it was introduced to allow 
different parts of the processor to be used in parallel when you're running 
hetarogenous code. 

But if you're hammering just one part of the CPU... well, adding another layer 
of logical complexity for your CPU to manage probably isn't going to do much 
good.

Should HT be on or off when you're running 64 very mixed types of long-term 
queries which involve variously either heavy use of real number calculations or 
e.g. logic/string handling, and different data sets? It's a much more complex 
question than simply maxing out your pgbench scores. 

I don't have the data now unfortunately, but I remember seeing a benefit for HT 
on our 4 core e3 when running GDAL/Postgis work in parallel last year. It's not 
surprising though; the GDAL calls are almost certainly using different 
functions of the processor compared to postgres and there should be very little 
lock contention. In light of this interesting data I'm now leaning towards 
proposing HT off for our mapservers (which receive short, similar requests over 
and over), but for the hetaragenous servers, I think I'll keep it on for now.

Graeme. 



* unrelated. There's also huge advantages for us in keeping these different 
programs running on the same machine since we found we can get much better 
transfer rates through unix sockets than with TCP over the network.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PGSQL 9.3 - Materialized View - multithreading

2014-04-07 Thread Graeme B. Bell
On 04 Apr 2014, at 18:29, Nicolas Paris nipari...@gmail.com wrote:

 Hello,
 
 My question is about multiprocess and materialized View.
 http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
 I (will) have something like 3600 materialised views, and I would like to 
 know the way to refresh them in a multithread way
 (anderstand 8 cpu cores - 8 refresh process  in the same time)

Hi Nick,

out of DB solution: 

1. Produce a text file which contains the 3600 refresh commands you want to run 
in parallel. You can do that with select and format() if you don't have a list 
already. 

2. I'm going to simulate your 3600 'refresh' commands here with some select and 
sleep statements that finish at unknown times.

(In BASH): 
  for i in {1..3600} ; do echo echo \select pg_sleep(1+random()::int*10); 
select $i\ | psql mydb ; done  3600commands

3. Install Gnu Parallel and type: 

parallel  3600commands

4. Parallel will automatically work out the appropriate number of cores/threads 
for your CPUs, or you can control it manually with -j. 
It will also give you a live progress report if you use --progress.
e.g. this command balances 8 jobs at a time, prints a dynamic progress report 
and dumps stdout to /dev/null

parallel -j 8 --progress   3600commands  /dev/null

5. If you want to make debugging easier use the parameter --tag to tag output 
for each command. 

Of course it would be much more elegant if someone implemented something like 
Gnu Parallel inside postgres or psql ... :-)

Hope this helps  have a nice day, 

Graeme.







-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PGSQL 9.3 - Materialized View - multithreading

2014-04-07 Thread Graeme B. Bell

Hi again Nick.

Glad it helped. 

Generally, I would expect that doing all the A's first, then all the B's, and 
so on, would be fastest since you can re-use the data from cache.

Concurrency when reading isn't generally a problem. Lots of things can read at 
the same time and it will be nice and fast.
It's concurrent writes or concurrent read/write of the same data item that 
causes problems with locking. That shouldn't be happening here, judging by your 
description.

If possible, try to make sure nothing is modifying those source tables 
A/B/C/D/E/F when you are doing your view refresh.

Graeme.

On 07 Apr 2014, at 14:49, Nicolas Paris nipari...@gmail.com wrote:

 Hello,
 Thanks for this clear explanation !
 
 Then I have a sub-question :
 Supposed I have 3600 materialised views say 600 mat views from 6 main table. 
 (A,B,C,D,E,F are repetead 600 times with some differences) 
 Is it faster to :
 1) parallel refresh  600 time A, then 600 time B etc,
 OR
 2) parallel refresh  600 time A,B,C,D,E,F
 
 I guess 1) is faster because they are 600 access to same table loaded in 
 memory ? But do parallel access to the same table implies concurency
  and bad performance ?
 
 Thanks
 
 Nicolas PARIS
 
 
 2014-04-07 12:29 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no:
 On 04 Apr 2014, at 18:29, Nicolas Paris nipari...@gmail.com wrote:
 
  Hello,
 
  My question is about multiprocess and materialized View.
  http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
  I (will) have something like 3600 materialised views, and I would like to 
  know the way to refresh them in a multithread way
  (anderstand 8 cpu cores - 8 refresh process  in the same time)
 
 Hi Nick,
 
 out of DB solution:
 
 1. Produce a text file which contains the 3600 refresh commands you want to 
 run in parallel. You can do that with select and format() if you don't have a 
 list already.
 
 2. I'm going to simulate your 3600 'refresh' commands here with some select 
 and sleep statements that finish at unknown times.
 
 (In BASH):
   for i in {1..3600} ; do echo echo \select pg_sleep(1+random()::int*10); 
 select $i\ | psql mydb ; done  3600commands
 
 3. Install Gnu Parallel and type:
 
 parallel  3600commands
 
 4. Parallel will automatically work out the appropriate number of 
 cores/threads for your CPUs, or you can control it manually with -j.
 It will also give you a live progress report if you use --progress.
 e.g. this command balances 8 jobs at a time, prints a dynamic progress report 
 and dumps stdout to /dev/null
 
 parallel -j 8 --progress   3600commands  /dev/null
 
 5. If you want to make debugging easier use the parameter --tag to tag output 
 for each command.
 
 Of course it would be much more elegant if someone implemented something like 
 Gnu Parallel inside postgres or psql ... :-)
 
 Hope this helps  have a nice day,
 
 Graeme.
 
 
 
 
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PGSQL 9.3 - Materialized View - multithreading

2014-04-07 Thread Graeme B. Bell

- http://wiki.postgresql.org/wiki/Performance_Optimization
- run it on the most powerful machine you can find
- get some more memory
- get a big (512-1TB) SSD drive
- avoid recalculating the same things over and over. if your views have many 
similar elements, then calculate those first into a partial result, then build 
the final views from the partial result.
- make sure your source tables are fully indexed and have good statistics
- run all the views once with \timing and keep track of how long they took. Fix 
the slow ones.

G


On 07 Apr 2014, at 15:56, Nicolas Paris nipari...@gmail.com wrote:

 Excellent.
 
 Maybe the last sub-question :
 
 Those 3600 mat views do have indexes. 
 I guess I will get better performances in dropping indexes first, then 
 refresh, then re-creating indexes.
 
 Are there other way to improve performances (like mat views storage 
 parameters), because this routines will be at night, and need to be finished 
 quickly.
 
 Thanks
 
 Nicolas PARIS
 
 
 2014-04-07 14:59 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no:
 
 Hi again Nick.
 
 Glad it helped.
 
 Generally, I would expect that doing all the A's first, then all the B's, and 
 so on, would be fastest since you can re-use the data from cache.
 
 Concurrency when reading isn't generally a problem. Lots of things can read 
 at the same time and it will be nice and fast.
 It's concurrent writes or concurrent read/write of the same data item that 
 causes problems with locking. That shouldn't be happening here, judging by 
 your description.
 
 If possible, try to make sure nothing is modifying those source tables 
 A/B/C/D/E/F when you are doing your view refresh.
 
 Graeme.
 
 On 07 Apr 2014, at 14:49, Nicolas Paris nipari...@gmail.com wrote:
 
  Hello,
  Thanks for this clear explanation !
 
  Then I have a sub-question :
  Supposed I have 3600 materialised views say 600 mat views from 6 main 
  table. (A,B,C,D,E,F are repetead 600 times with some differences)
  Is it faster to :
  1) parallel refresh  600 time A, then 600 time B etc,
  OR
  2) parallel refresh  600 time A,B,C,D,E,F
 
  I guess 1) is faster because they are 600 access to same table loaded in 
  memory ? But do parallel access to the same table implies concurency
   and bad performance ?
 
  Thanks
 
  Nicolas PARIS
 
 
  2014-04-07 12:29 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no:
  On 04 Apr 2014, at 18:29, Nicolas Paris nipari...@gmail.com wrote:
 
   Hello,
  
   My question is about multiprocess and materialized View.
   http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
   I (will) have something like 3600 materialised views, and I would like to 
   know the way to refresh them in a multithread way
   (anderstand 8 cpu cores - 8 refresh process  in the same time)
 
  Hi Nick,
 
  out of DB solution:
 
  1. Produce a text file which contains the 3600 refresh commands you want to 
  run in parallel. You can do that with select and format() if you don't have 
  a list already.
 
  2. I'm going to simulate your 3600 'refresh' commands here with some select 
  and sleep statements that finish at unknown times.
 
  (In BASH):
for i in {1..3600} ; do echo echo \select pg_sleep(1+random()::int*10); 
  select $i\ | psql mydb ; done  3600commands
 
  3. Install Gnu Parallel and type:
 
  parallel  3600commands
 
  4. Parallel will automatically work out the appropriate number of 
  cores/threads for your CPUs, or you can control it manually with -j.
  It will also give you a live progress report if you use --progress.
  e.g. this command balances 8 jobs at a time, prints a dynamic progress 
  report and dumps stdout to /dev/null
 
  parallel -j 8 --progress   3600commands  /dev/null
 
  5. If you want to make debugging easier use the parameter --tag to tag 
  output for each command.
 
  Of course it would be much more elegant if someone implemented something 
  like Gnu Parallel inside postgres or psql ... :-)
 
  Hope this helps  have a nice day,
 
  Graeme.
 
 
 
 
 
 
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump vs pg_basebackup

2014-03-25 Thread Graeme B. Bell

Postgresql rsync backups require the DB to be shutdown during the 'second' 
rsync.

1. rsync the DB onto the backup filesystem  (produces e.g. 95-99.99% consistent 
DB on the backup filesystem)
2. shut down the DB
3. rsync the shut down DB onto the backup filesystem(synchronises the last 
few files to make the DB consistent, and is usually very fast)
4. start the DB up again

Is there any way to notify postgres to pause transactions (and note that they 
should be restarted), and flush out write buffers etc, instead of doing a full 
shutdown? 
e.g. so that the second rsync call would bring the backup filesystem's 
representation of the DB into a recoverable state without needing to shutdown 
the production DB completely. 

G

On 25 Mar 2014, at 16:29, Ilya Kosmodemiansky 
ilya.kosmodemian...@postgresql-consulting.com wrote:

 Joshua,
 
 On Tue, Mar 25, 2014 at 4:22 PM, Joshua D. Drake j...@commandprompt.com 
 wrote:
 The advantage is that you can create backups that don't
 have to be restored, just started. You can also use the differential
 portions of rsync to do it multiple times a day without much issue.
 
 Are you sure, that it is a nice idea on a database with heavy write workload?
 
 And also Im not sure, that differential backups using rsync will be
 recoverable, if you have actually meant that.
 
 
 
 JD
 
 --
 Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
 PostgreSQL Support, Training, Professional Services and Development
 High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
 Political Correctness is for cowards.
 
 
 
 -- 
 Ilya Kosmodemiansky,
 
 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
 i...@postgresql-consulting.com
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance