[PERFORM] V8 optimisation (if you're using javascript in postgres)
https://medium.com/@c2c/nodejs-a-quick-optimization-advice-7353b820c92e 100% performance boost, for mysterious reasons that may be worth knowing about… Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] One long transaction or multiple short transactions?
> I don't think inserts can cause contention on the server. Insert do not lock > tables during the transaction. You may have contention on sequence but it > won't vary with transaction size. Perhaps there could be a trigger on inserts which creates some lock contention? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] One long transaction or multiple short transactions?
Sounds like a locking problem, but assuming you aren’t sherlock holmes and simply want to get the thing working as soon as possible: Stick a fast SSD in there (whether you stay on VM or physical). If you have enough I/O, you may be able to solve the problem with brute force. SSDs are a lot cheaper than your time. Suggest you forward this to your operators: a talk I have about optimising multi-threaded work in postgres: http://graemebell.net/foss4gcomo.pdf (Slides: “Input/Output” in the middle of the talk and also the slides at the end labelled “For Techies") Graeme Bell p.s. You mentioned a VM. Consider making the machine physical and not VM. You’ll get a performance boost and remove the risk of DB corruption from untrustworthy VM fsyncs. One day there will be a power cut or O/S crash during these your writes and with a VM you’ve a reasonable chance of nuking your DB because VM virtualised storage often doesn’t honour fsync (for performance reasons), but it’s fundamental to correct operation of PG. > On 08 Oct 2015, at 01:40, Carlowrote: > > > I am told 32 cores on a LINUX VM. The operators have tried limiting the > number of threads. They feel that the number of connections is optimal. > However, under the same conditions they noticed a sizable boost in > performance if the same import was split into two successive imports which > had shorter transactions. > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large object write performance
>> First the database was on a partition where compression was enabled, I >> changed it to an uncompressed one to see if it makes a difference thinking >> maybe the cpu couldn't handle the load. > It made little difference in my case. > > My regular gmirror partition seems faster: > dd bs=8k count=25600 if=/dev/zero of=./test > 25600+0 records in > 25600+0 records out > 209715200 bytes transferred in 1.513112 secs (138598612 bytes/sec) > > the zfs compressed partition also goes faster: > dd bs=8k count=25600 if=/dev/zero of=./test > 25600+0 records in > 25600+0 records out > 209715200 bytes transferred in 0.979065 secs (214199479 bytes/sec) > but this one didn't really go that fast in my test (maybe 10%) Please can you run iozone and look for low random write performance with small blocks? (4k) http://www.slashroot.in/linux-file-system-read-write-performance-test Also please can you CC to the list with your replies to my on-list emails? Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large object write performance
Seems a bit slow. 1. Can you share the script (the portion that does the file transfer) to the list? Maybe you’re doing something unusual there by mistake. Similarly the settings you’re using for scp. 2. What’s the network like? For example, what if the underlying network is only capable of 10MB/s peak, and scp is using compression and the files are highly compressible? Have you tried storing zip or gzip’d versions of the file into postgres? (that’s probably a good idea anyway) 3. ZFS performance can depend on available memory and use of caches (memory + L2ARC for reading, ZIL cache for writing). Maybe put an intel SSD in there (or a pair of them) and use it as a ZIL cache. 4. Use dd to measure the write performance of ZFS doing a local write to the machine. What speed do you get? 5. Transfer a zip’d file over the network using scp. What speed do you get? 6. Is your postgres running all the time or do you start it before this test? Perhaps check if any background tasks are running when you use postgres - autovacuum, autoanalyze etc. Graeme Bell > On 08 Oct 2015, at 11:17, Bram Van Steenlandtwrote: > > Hi, > > I use postgresql often but I'm not very familiar with how it works internal. > > I've made a small script to backup files from different computers to a > postgresql database. > Sort of a versioning networked backup system. > It works with large objects (oid in table, linked to large object), which I > import using psycopg > > It works well but slow. > > The database (9.2.9) on the server (freebsd10) runs on a zfs mirror. > If I copy a file to the mirror using scp I get 37MB/sec > My script achieves something like 7 or 8MB/sec on large (+100MB) files. > > I've never used postgresql for something like this, is there something I can > do to speed things up ? > It's not a huge problem as it's only the initial run that takes a while > (after that, most files are already in the db). > Still it would be nice if it would be a little faster. > cpu is mostly idle on the server, filesystem is running 100%. > This is a seperate postgresql server (I've used freebsd profiles to have 2 > postgresql server running) so I can change this setup so it will work better > for this application. > > I've read different suggestions online but I'm unsure which is best, they all > speak of files which are only a few Kb, not 100MB or bigger. > > ps. english is not my native language > > thx > Bram > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large object write performance
>> >> > Like this ? > > gmirror (iozone -s 4 -a /dev/mirror/gm0s1e) = 806376 (faster drives) > zfs uncompressed (iozone -s 4 -a /datapool/data) = 650136 > zfs compressed (iozone -s 4 -a /datapool/data) = 676345 If you can get the complete tables (as in the images on the blog post) with random performance compared to sequential etc, different block sizes, that would be very interesting. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large object write performance
> On 08 Oct 2015, at 11:17, Bram Van Steenlandtwrote: > > The database (9.2.9) on the server (freebsd10) runs on a zfs mirror. > If I copy a file to the mirror using scp I get 37MB/sec > My script achieves something like 7 or 8MB/sec on large (+100MB) files. This may help - great blog article about ZFS with postgres and how use you can zfs compression to boost i/o performance substantially. If your machine is making a lot of smaller writes in postgres (as opposed to presumably large writes by scp) then this may alleviate things a bit. https://www.citusdata.com/blog/64-zfs-compression Graeme Bell p.s. Apologies for top-posting on my previous message. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large object write performance
> On 08 Oct 2015, at 13:50, Bram Van Steenlandtwrote: >>> 1. The part is "fobj = lobject(db.db,0,"r",0,fpath)", I don't think there >>> is anything there Re: lobject http://initd.org/psycopg/docs/usage.html#large-objects "Psycopg large object support *efficient* import/export with file system files using the lo_import() and lo_export() libpq functions.” See * lobject seems to default to string handling in Python That’s going to be slow. Try using lo_import / export? Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large object write performance
>> >> http://initd.org/psycopg/docs/usage.html#large-objects >> >> >> "Psycopg large object support *efficient* import/export with file system >> files using the lo_import() and lo_export() libpq functions.” >> >> See * >> > I was under the impression they meant that the lobject was using lo_import > and lo_export. > I can't seem to find how to use lo_import en export, I searched google and > came to the conclusion the lobject was the way to go. > >>> x.lo_import() > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'psycopg2._psyco Bram, I recommend posting this as a question on a python/psycopg mailing list, for advice. You are probably not the first person to encounter it. Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Another parallel postgres project...
I previously posted about par_psql, but I recently found another PG parallelism project which can do a few extra things that par_psql can’t: https://github.com/moat/pmpp pmpp: Poor Man's Parallel Processing. Corey Huinker had the idea of using dblink async as a foundation for distributing queries. This allows parallelisation at the query level and across multiple dbs simultaneously. Nice idea! Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] incredible surprise news from intel/micron right now...
On 28 Jul 2015, at 22:29, Graeme B. Bell graeme.b...@nibio.no wrote: Entering production, availability 2016 1000x faster than nand flash/ssd , eg dram-latency 10x denser than dram 1000x write endurance of nand Priced between flash and dram Manufactured by intel/micron Non-volatile http://www.anandtech.com/show/9541/intel-announces-optane-storage-brand-for-3d-xpoint-products Some new information (for anyone putting thought into 2016 DB hardware purchases). Throughput seems to be good. 7x better IOPS than one of the best enterprise PCIe SSDs on the market, with queue depth 1, 5x better as queue depth gets higher. Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any ideas how can I speed up this query?
QUERY SELECT COUNT(*) FROM occurrences WHERE (lat = -27.91550355958 AND lat = -27.015680440420002 AND lng = 152.13307044728307 AND lng = 153.03137355271693 AND category_id = 1 AND (ST_Intersects( ST_Buffer(ST_PointFromText('POINT(152.58 -27.465592)')::geography, 5)::geography, location::geography))); How I can assist planner in providing better row estimates for Bitmap Heap Scan section? By googling this phrase from your EXPLAIN: Rows Removed by Index Recheck: 748669 - you can find this explanation: http://stackoverflow.com/questions/26418715/postgresql-rows-removed-by-index The inner Bitmap Index Scan node is producing a bitmap, putting 1 to all the places where records that match your search key are found, and 0 otherwise. As your table is quite big, the size of the bitmap is getting bigger, then available memory for these kind of operations, configured via work_mem, becomes small to keep the whole bitmap. When in lack of a memory, inner node will start producing 1 not for records, but rather for blocks that are known to contain matching records. This means, that outer node Bitmap Heap Scan has to read all records from such block and re-check them. Obiously, there'll be some non-matching ones, and their number is what you see as Rows Removed by Index Recheck. Therefore, try substantially increasing your work_mem (use set. so that it's on a per-session basis, not global) so that you don't have to read in all the rows to re-check them. This is why Googling phrases from your explain before list-posting is always a good idea :-) BTW - what are your statistics set to? If you have a huge table, it can be worth raising them from the default. http://www.postgresql.org/docs/9.4/static/planner-stats.html ALTER TABLE SET STATISTICS, try raising this to 1000. POSTGRESQL VERSION INFO For postgis-related questions, remember to also include the postgis version. Hope this helps and good luck Graeme Bell. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] autofreeze/vacuuming - avoiding the random performance hit
Some of you may have had annoying problems in the past with autofreeze or autovacuum running at unexpected moments and dropping the performance of your server randomly. On our SSD-RAID10 based system we found a 20GB table finished it's vacuum freeze in about 100 seconds. There were no noticeable interruptions to our services; maybe a tiny little bit of extra latency on the web maps, very hard to tell if it was real or imagination. If auto-stuff in postgresql has been a pain point for you in the past, I can confirm that SSD drives are a nice solution (and also for any other autovacuum/analyze type stuff) since they can handle incoming random IO very nicely while also making very fast progress with the housekeeping work. Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] autofreeze/vacuuming - avoiding the random performance hit
Entire database. People have talked about using SSDs for data/indices and spinning disks for WAL. However I find having everything on the same disks is good for 3 reasons. 1. The SSD is simply vastly faster than the disks. That means if huge amount of WAL is being written out (e.g. tons of data inserted), WAL isn't lagging at all. Anyone arguing that WAL suits spinning disk because they write fast sequentially should acknowledge that SSDs also write fast sequentially - considerably faster. 2. By having extra 'fsync' events, IO is less bumpy. Every time wal is written out, all your buffers are getting flushed out (in principle), which helps to avoid huge IO spikes. 3. Simpler setup, less volumes to worry about in linux or disk types to manage. For example, we only need spare SSDs in the hotspare bay and on the shelf. Even a single HDD for wal requires a mirrored HDD, plus a hotspare (that's 3 bays gone from e.g. 8), plus some more on the shelf... all to get worse performance. Our DBs have been a total dream since I put SSDs everywhere. It got rid of every throughput/latency/io spike problem. The only thing I'd do differently today is that I'd buy intel ssds instead of the ones we chose; and preferably a NVMe direct connect with software raid in place of hardware raid and sata. Graeme Bell. On 28 Jul 2015, at 17:51, Wei Shan weishan@gmail.com wrote: Did you put your entire database on SSD or just the WAL/indexes? On 28 July 2015 at 23:39, Graeme B. Bell graeme.b...@nibio.no wrote: Some of you may have had annoying problems in the past with autofreeze or autovacuum running at unexpected moments and dropping the performance of your server randomly. On our SSD-RAID10 based system we found a 20GB table finished it's vacuum freeze in about 100 seconds. There were no noticeable interruptions to our services; maybe a tiny little bit of extra latency on the web maps, very hard to tell if it was real or imagination. If auto-stuff in postgresql has been a pain point for you in the past, I can confirm that SSD drives are a nice solution (and also for any other autovacuum/analyze type stuff) since they can handle incoming random IO very nicely while also making very fast progress with the housekeeping work. Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Regards, Ang Wei Shan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] incredible surprise news from intel/micron right now...
Entering production, availability 2016 1000x faster than nand flash/ssd , eg dram-latency 10x denser than dram 1000x write endurance of nand Priced between flash and dram Manufactured by intel/micron Non-volatile Guess what's going in my 2016 db servers :-) Please, don't be vapourware... http://hothardware.com/news/intel-and-micron-jointly-drop-disruptive-game-changing-3d-xpoint-cross-point-memory-1000x-faster-than-nand -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] parallelisation provides postgres performance (script example + ppt slides)
Hi all, 1. For those that don't like par_psql (http://github.com/gbb/par_psql), here's an alternative approach that uses the Gnu Parallel command to organise parallelism for queries that take days to run usually. Short script and GIS-focused, but may give you a few ideas about how to parallelise your own code with Gnu Parallel. https://github.com/gbb/fast_map_intersection 2. Also, I gave a talk at FOSS4G Como about these tools, and how to get better performance from your DB with parallelisation. May be helpful to people who are new to parallelisation / multi-core work with postgres. http://graemebell.net/foss4gcomo.pdf Graeme Bell. p.s. (this version of the slides still has a few typos, which will be fixed soon when I get the source ppts back from my colleague's laptop). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hyperthreadin low performance (and some discussion about benchmarking)
On 23 Jul 2015, at 13:37, domenico febbo mimmopastic...@gmail.com wrote: is the problem also in PostgreSQL 9.4.x? I'm going to buy a production's server with 4 sockets E7-4850 12 cores so 12*4 = 48 cores (and 96 threads using HT). What do you suggest? Using or not HT? BR 1. If you have enough money to buy a 4-socket E7, then you certainly have enough money to pay someone (maybe yourself) for the 30 minutes of work needed to run a benchmark on the machine with and without hyperthreading and compare them. I mean literally, run pgbench, reboot, turn on/off HT, run pgbench. Then you'll know what works best for your configuration. Don't be lazy about this, it's as important as the money you're throwing at the hardware. 2. Keep in mind most of the numbers people throw around are pgbench numbers. Pgbench is representative of some workloads (e.g. bank transactions) and less representative of others (mixed query types, GIS work, scientific work, heavy IO, interaction with other applications/libraries...). Are you using the server for other tasks besides postgres, for example? I find I get better performance with HT when I'm using postgres with GDAL on the same server. Probably because the HT cores are being asked to do two different types of things, which is where HT shines. 3. IMPORTANT : it doesn't matter how pgbench performs for other people on other computers and what they think is best. What matters is 'how does YOUR normal workload perform on YOUR computer'. The best way to do that is to put together a simple simulated workload that looks like your intended use of the system. Leave it running. If it's for an important system, look at all aspects of performance: transactions per second, I/O stalls, latency, ... If you can't do that, pgbench can be used instead. Finally. A serious point. The lack of diversity in postgres benchmarking is quite amazing, to my mind, and is probably at the root of the eternal disagreements about optimal settings as well as the existence of long-standing hidden scaling/performance bugs (or weird kernel interactions). pgbench is useful, but really... let's make some more tools (or share links, if you know of them). Since contribution gripe, here is my own (first, tiny) contribution, which I mentioned earlier in the month: https://github.com/gbb/t. As a point of contrast. Take a look at how computer game players measure the performance of graphics cards and disk drives in their product reviews. http://www.guru3d.com/articles-pages/radeon-r9-290-review-benchmarks,32.html 32 pages of data and discussion to test the performance of a single model (among thousands of possibilities and millions of configurations)! And this article is ordinary, run of the mill stuff in the gaming scene, literally the first link I hit in Google. Has anyone ever in the history of these lists ever posted so much diverse and structured evidence in support of their beliefs about a postgres setting? Gaming reviewers use a multitude of real-world games, synthetic benchmarks, theoretical estimates... as someone with a foot in both worlds it is quite amusing to see that game-players address benchmarking and optimisation of performance far more seriously, scientifically (and successfully) than most professional database admins. Many graphics card reviews care very much about reproducability/repeated results, surrounding test conditions (very detailed information about other components used in the test, software versioning), warmup effects, benchmark quirks, performance at different scales/settings, and so on... writing 'I saw some post where someone said they got a better result from XYZ' would certainly not be good enough in that community. Graeme Bell. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
No, of course it doesn't. It appears that you didn't look at the repo or read my previous mail before you wrote this. FFS, I *ran* some of the tests and reported on results. With you in CC. Just checked back. So you did. I'm sorry, I made the mistake I accused you of. But... why then did you say I hadn't provided him with individual functions, when you've seen the repo yourself? I don't understand. You knew they're there. What I mean is that I don't just run random code from some random github repository. Sure, but surely that's not an issue when the SQL functions are also seperately provided and clearly labelled in the repo? Do you feel there is a difference about the trustworthiness of isolated files containing an SQL function presented in a github repo, and SQL functions presented in an email? I am not sure I can agree with that idea, I think they are both just SQL functions. The difference is that one also offers you a bit more if you want to check/try it. I do not wish to antagonise you either, so please go and look at the repo before you write the next reply. Over and out. Seems there has been a misunderstanding here and I feel I'm still missing something in what you're saying. Sorry Andres. Let's just forget this. I don't think we disagree especially on this and I am not looking to make an enemy here. Also, thanks for running the benchmarks to get some numbers. Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 09 Jul 2015, at 15:22, Thomas Kellerer spam_ea...@gmx.net wrote: Graeme B. Bell schrieb am 09.07.2015 um 11:44: I don't recall seeing a clear statement telling me I should mark pl/pgsql functions nonvolatile wherever possible or throw all performance and scalability out the window. From: http://www.postgresql.org/docs/current/static/xfunc-volatility.html For best optimization results, you should label your functions with the strictest volatility category that is valid for them. Hi Thomas, Thank you very much for the link. However, the point I was making wasn't that no sentence exists anywhere. My point was that I've read the docs more than anyone else in my institute and I was completely unaware of this. It also quite vague - if you hand that to a younger programmer in particular, how do they implement it in practice? When is it important to do it? If this one factor silently breaks multiprocessor scaling of pl/pgsql, and multiprocessing is the biggest trend in CPU processing of the last decade (comparing server CPUS of 2005 with 2015), then why is this information not up front and clear? A second point to keep in mind that optimization and parallelisation/scalability are not always the same thing. For example, in one project I took a bunch of looped parallel UPDATEs on a set of 50 tables, and rewrote them so as to run the loop all at once inside a pl/pgsql function. Crudely, I took out the table-level for loop and put it at row-level instead. I expected they'd execute much faster if UPDATEs were using data still in cache. Also, I would be updating without writing out WAL entries to disk repeatedly. It turns out the update per row ran much faster - as expected - when I used one table, but when I ran it in parallel on many tables, the performance was even worse than when I started. If you look at the benchmarks, you'll see that performance drops through the floor at 8-16 cores. I think that was when I first noticed this bug/feature. [If anyone is curious, the way I solved that one in the end was to pre-calculate every possible way the tables might be updated after N loops of updates using Python, and import that as a lookup table into PG. It turns out that although we had 10's of GB of data per table, there were only about 100,00 different types of situation, and only e.g. 80 iterations to consider). Then I ran a single set of UPDATEs with no pl/pgsql. It was something like a 1x performance improvement.] Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
3. I don't disagree that the benchmark code is objectively 'bad' in the sense that it is missing an important optimisation. Particularly with regards documentation, a patch improving things is much more likely to improve the situation than griping. Also, conversation on this list gets recorded for posterity and google is remarkably good at matching people looking for problems with solutions. So, even in absence of a patch perhaps we've made the lives of future head-scratchers a little bit easier with this discussion. I agree that patchgripe, and about the google aspect. But nonetheless, a well-intentioned gripe is ignorance of a problem. As mentioned earlier, I'm sick just now and will be back in hospital again tomorrow monday, so a patch may be a little bit much to ask from me here :-) It's a bit much even keeping up with the posts on the thread so far. I might try to fix the documentation a bit later, though as someone with no experience in marking up volatility on pl/pgsql functions I doubt my efforts would be that great. I also have other OSS project contributions that need some attention first. Re: the google effect. Are these mailing list archives mirrored anywhere, incidentally? For example, I notice we just lost http:reddit.com/r/amd at the weekend, all the discussion of the last few years on that forum is out of reach. Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 09 Jul 2015, at 17:42, Merlin Moncure mmonc...@gmail.com wrote: The community maintains it's own mailing list archives in postgresql.org. Short of an array of tactical nuclear strikes this is going to be preserved Good to know, I've seen a lot of dead software projects throughout my life. But still - we will have to pray that Kim Jong Un never decides to become a MySQL contributor... :) Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 08 Jul 2015, at 22:27, Andres Freund and...@anarazel.de wrote: On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote: On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote: Well, right, which is why I mentioned even with dozens of clients. Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)? only in the absence of inter-process locking and cache line bouncing. And addititionally memory bandwidth (shared between everything, even in the numa case), cross socket/bus bandwidth (absolutely performance critical in multi-socket configurations), cache capacity (shared between cores, and sometimes even sockets!). 1. Note for future readers - it's also worth noting that depending on the operation, and on your hardware, you may have less CPU cores than you think to parallelise upon. 1a. For example AMD CPUs list the number of integer cores (e.g. 16), but there is actually only half as many cores available for floating point work (8). So if your functions need to use floating point, your scaling will suffer badly on FP functions. https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture) In terms of hardware complexity and functionality, this module is equal to a dual-core processor in its integer power, and to a single-core processor in its floating-point power: for each two integer cores, there is one floating-point core. 1b. Or, if you have hyper-threading enabled on an Intel CPU, you may think you have e.g. 8 cores, but if all the threads are running the same type of operation repeatedly, it won't be possible for the hyper-threading to work well and you'll only get 4 in practice. Maybe less due to overheads. Or, if your work is continuallly going to main memory for data (e.g. limited by the memory bus), it will run at 4-core speed, because the cores have to share the same memory bus. Hyper-threading depends on the 2 logical cores being asked to perform two different types of tasks at once (each having relatively lower demands on memory). When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. https://en.wikipedia.org/wiki/Hyper-threading https://en.wikipedia.org/wiki/Superscalar 2. Keep in mind also when benchmarking that it's normal to see an small drop-off when you hit the maximum number of cores for your system. After all, the O/S and the benchmark program and anything else you have running will need a core or two. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 08 Jul 2015, at 13:20, Andres Freund and...@anarazel.de wrote: On 2015-07-08 11:13:04 +, Graeme B. Bell wrote: I'm guessing you are maybe pressed for time at the moment because I already clearly included this on the last email, as well as the links to the alternative benchmarks with the same problem I referred to on both of my last emails which are also trivial to drop into pgbench (cut/paste). You realize that you want something here, not Merlin, right? Hi Andreas, My email was saying it's not helpful for anyone on the list for him to keep asking me to give him X and me to keep sending it. Do you disagree with that idea? I tried to phrase my request politely, but perhaps I failed. If you have suggestions for better ways to say I already sent it, twice more politely in this situation, I'd welcome them off list. He asked me to disclose the function body I was testing. I did that, *and* also disclosed the entire approach to the benchmark too in a way that made it trivial for him or others to replicate the situation I'd found. I'm pretty sure you should not be discouraging this kind of thing in bug/performance reports. I get your point that when you're asking for other people to look at something with you, don't antagonise them. I didn't intend it as antagonising and Merlin hasn't mailed me anything to say he was antagonised. I'm quite sure he's capable of defending himself or communicating with me himself if he does feel antagonised by something. I hope we can end the discussion of that here? Merlin, if you were antagonised, sorry, I did not mean to antagonise you. I just wanted to just wanted make it clear that I'd sent you what you asked for, + more, and that I was surprised you hadn't noticed it. To clear up the issue I build a little test harness around your comment below. http://github.com/gbb/t; Well, that requires reviewing the source code of the run script and such. No, of course it doesn't. It appears that you didn't look at the repo or read my previous mail before you wrote this. I do not wish to antagonise you either, so please go and look at the repo before you write the next reply. http://github.com/gbb/t Just pick any function you like, there are 6 there, and 3 of them demonstrate 2 different problems, all of it is clearly documented. When you open up the repo, there are the tests https://github.com/gbb/t/tree/master/tests You don't need to review any code from the run script. The functions are there as isolated files and what they are intended to demonstrate is clearly described with text and graphics. I could see your point if I had mailed out some giant script with a bunch of SQL calls embedded in its guts, but that's the opposite of what I did here. Did you find it difficult to navigate the repo structure (2 folders, a few files)? If so please let me know off-list what was difficult and I will see if I can improve it. I think we shouldn't discuss this on two threads (-performance, -bugs), that makes it hard to follow. Given Tom's more detailed answer I think the -bugs thread already contains more pertinent information. I don't necessarily disagree with this idea, but... How many people concerned with performance are following the -bugs list? How much space is there for discussion of this on -bugs? Since only working solutions for this performance problem so far are all user-side rather than commiter-side, why would you want to restrict that information to a commiter-side list? It has developed this way because I noticed it as a performance issue first, then decided to report it as a potential bug. Perhaps it would be useful to keep the discussion separate as the -commiter side aspects (how to fix this at the server level) and -user side (what you can do to improve performance right now). I will defer to general opinion on this in my follow-up posts. Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 09 Jul 2015, at 05:38, Tom Lane t...@sss.pgh.pa.us wrote: If you write your is_prime function purely in plpgsql, and don't bother to mark it nonvolatile, *it will not scale*. much for properly written plpgsql; but there's an awful lot of bad plpgsql code out there, and it can make a huge difference for that. Hi Tom, I object to phrases like 'don't bother to mark it' and 'bad plpgsql' here. That is putting the blame on programmers. Clearly, if there is no end of code out there that isn't right in this regard, there's something very wrong in the project documentation. 1. I have been writing pl/pgsql on and off for a couple of years now and I've read quite a bit of the postgres doumentation, but I don't recall seeing a clear statement telling me I should mark pl/pgsql functions nonvolatile wherever possible or throw all performance and scalability out the window. I'm sure there may be a line hidden somewhere in the docs, but judging from the impact it has in practice, this seems like a very fundamental concept that should be repeatedly and clearly marked in the docs. 2. Furthermore, I have never come across anything in the documentation that made it clear to me that any pl/pgsql function I write will, by default, be taking out locks for every single statement in the code. I've written code in I dunno, maybe 15-20 different languages in my life, and I can't think of another language offhand that does that by default. From the reactions on this thread to this benchmark and the par_psql benchmarks, it doesn't seem that it was even immediately obvious to many postgres enthusiasts and developers. 3. I don't disagree that the benchmark code is objectively 'bad' in the sense that it is missing an important optimisation. But I really don't think it helps to frame this as laziness or bad in any other sense of the word e.g. 'clumsy'. Let's look at the postgresql documentation for some examples of 'bad' and lazy code: http://www.postgresql.org/docs/9.3/static/plpgsql-structure.html http://www.postgresql.org/docs/9.3/static/plpgsql-declarations.html There are about 13 functions on that page. How many functions on that page make use non-volatile or immutable wherever it would be appropriate? zero. or this one: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html zero or this one: http://www.postgresql.org/docs/9.3/static/plpgsql-cursors.html#PLPGSQL-CURSOR-USING zero The reason 90% of people out there are 'not bothering' and 'writing bad code' is because **99% of the postgresql documentation teaches them to do it that way**. So when you talk about other people 'not bothering' to do things - who is really at fault here what for what you see as endemic 'bad' or 'lazy' code? Is it the new postgres programmers, or the people that taught them with bad examples consistently throughout the *entire project documentation*, starting from the very first example? I think I'm going to raise this as a documentation bug. Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [BUGS] BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4)
This is a reply to to Andreas's post on the #13495 documentation thread in -bugs. I am responding to it here because it relates to #13493 only. Andres wrote, re: #13493 This issue is absolutely critical for performance and scalability of code, Pft. In most cases it doesn't actually matter that much because the contained query are the expensive stuff. It's just when you do lots of very short and cheap things that it has such a big effect. Usually the effect on the planner is bigger. Hi Andres, 'Pft' is kinda rude - I wouldn't comment on it normally, but seeing as you just lectured me on -performance on something you perceived as impolite (just like you lectured me on not spreading things onto multiple threads), can you please try to set a good example? You don't encourage new contributors into open source communities this way. Getting to the point. I think the gap between our viewpoints comes from the fact I (and others here at my institute) have a bunch of pl/pgsql code here with for loops and calculations, which we see as 'code'. Thinking of all the users I know myself, I know there are plenty of GIS people out there using for loops and pgsql to simulate models on data in the DB, and I expect the same is true among e.g. older scientists with DB datasets. Whereas it sounds like you and Tom see pl/pgsql as 'glue' and don't see any problem. As I have never seen statistics on pl/pgsql use-cases among users at large, I don't know what happens everywhere else outside of GIS-world and pgdev-world. Have you any references/data you can share on that? I would be interested to know because I don't want to overclaim on the importance of these bugs or any other bugs in future. In this case, #13493 wrecked the code for estimates on a 20 million euro national roadbuilding project here and it cost me a few weeks of my life, but for all I know you're totally right about the general importance to the world at large. Though keep in mind: This isn't just only about scaling up one program. It's a db-level problem. If you have a large GIS DB server with many users, long-running queries etc. on large amounts of data, then you only need e.g. 2-3 people to be running some code with for-loops or a long series of calculation in pl/pgsql, and everything will fall apart in pgsql-land. Last point. When I wrote 'absolutely critical' I was under the impression this bug could have some serious impact on postgis/pgrouting. Since I wanted to double check what you said about 'expensive stuff' vs 'short/cheap stuff', I ran some benchmarks to check on a few functions. You are right that only short, looped things are affected. e.g. for loops with calculations and so on. Didn't see any trouble with the calls I made to postgis inside or outside of pgsql. This confirms/replicates your findings. Updated numbers/tests posted to github shortly. Regards Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 07 Jul 2015, at 22:52, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Jul 7, 2015 at 3:33 PM, Graeme B. Bell graeme.b...@nibio.no wrote: Hi Merlin, Long story short - thanks for the reply, but you're not measuring anything about the parallelism of code running in a pl/pgsql environment here. You're just measuring whether postgres can parallelise entering that environment and get back out. Don't get me wrong - it's great that this scales well because it affects situations where you have lots of calls to trivial functions. However it's not the problem I'm talking about. I mean 'real' pl'pgsql functions. e.g. things that you might find in postgis or similar. Maybe so. But it will be a lot easier for me (and others on this) list if you submit a self contained test case that runs via pgbench. Hi Merlin, I'm guessing you are maybe pressed for time at the moment because I already clearly included this on the last email, as well as the links to the alternative benchmarks with the same problem I referred to on both of my last emails which are also trivial to drop into pgbench (cut/paste). e.g. did you see these parts of my previous email To clear up the issue I build a little test harness around your comment below. http://github.com/gbb/t; Just pick any function you like, there are 6 there, and 3 of them demonstrate 2 different problems, all of it is clearly documented. I haven't used perf with pgbench before, and I can't run any code today. If you're interested in this but short on time, maybe you can glance at the repo above and just add 'perf' at the appropriate point in the rbuild wrapper. Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 07/07/2015 08:05 PM, Craig James wrote: No ideas, but I ran into the same thing. I have a set of C/C++ functions that put some chemistry calculations into Postgres as extensions (things like, calculate the molecular weight of this molecule). As SQL functions, the whole thing bogged down, and we never got the scalability we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time, even with dozens of clients. Hi all, The sample code / results were put up last night at http://github.com/gbb/t Craig's problem sounds similar to my own, assuming he means running C indirectly via SQL vs running C more directly. Lots of parallel connections to postgres but maximum 2 CPUs of scaling (and it gets worse, as you try to run more things). Tom Lane has posted an interesting comment over on the bugs list which identies a likely source at least one of the problems, maybe both. It seems to be linked to internal locking inside postgres (which makes sense, given the results - both problems feel 'lock-y'). Also, he mentions a workaround for some functions that scales to 8-way apparently. http://www.postgresql.org/message-id/31265.1436317...@sss.pgh.pa.us I think it's potentially a big problem for CPU intensive postgres libraries like pgrouting, or perhaps the postgis postgis raster functions, things like that. I don't know how well their functions are marked for e.g. immutability. Are there any postgis devs on this list? Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
Hi Wes 1. The first interesting thing is that prior to my mentioning this problem to C_ a year or two back, the power loss protection was advertised everywhere as simply that, without qualifiers about 'not inflight data'. Check out the marketing of the M500 for the first year or so and try to find an example where they say 'but inflight data isn't protected!'. 2. The second (and more important) interesting thing is that this is irrelevant! Fsync'd data is BY DEFINITION not data in flight. Fsync means This data is secure on the disk! However, the drives corrupt it. Postgres's sanity depends on a reliable fsync. That's why we see posts on the performance list saying 'fsync=no makes your postgres faster but really, don't do it in production. We are talking about internal DB corruption, not just a crash and a few lost transactions. These drives return from fsync while data is still in volatile cache. That's breaking the spec, and it's why they are not OK for postgres by themselves. This is not about 'in-flight' data, it's about fsync'd wal log data. Graeme. On 07 Jul 2015, at 16:15, Wes Vaske (wvaske) wva...@micron.com wrote: The M500/M550/M600 are consumer class drives that don't have power protection for all inflight data.* (like the Samsung 8x0 series and the Intel 3x0 5x0 series). The M500DC has full power protection for inflight data and is an enterprise-class drive (like the Samsung 845DC or Intel S3500 S3700 series). So any drive without the capacitors to protect inflight data will suffer from data loss if you're using disk write cache and you pull the power. *Big addendum: There are two issues on powerloss that will mess with Postgres. Data Loss and Data Corruption. The micron consumer drives will have power loss protection against Data Corruption and the enterprise drive will have power loss protection against BOTH. https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf The Data Corruption problem is only an issue in non-SLC NAND but it's industry wide. And even though some drives will protect against that, the protection of inflight data that's been fsync'd is more important and should disqualify *any* consumer drives from *any* company from consideration for use with Postgres. Wes Vaske | Senior Storage Solutions Engineer Micron Technology -Original Message- From: Graeme B. Bell [mailto:graeme.b...@nibio.no] Sent: Tuesday, July 07, 2015 8:26 AM To: Merlin Moncure Cc: Wes Vaske (wvaske); Craig James; pgsql-performance@postgresql.org Subject: Re: [PERFORM] New server: SSD/RAID recommendations? As I have warned elsewhere, The M500/M550 from $SOME_COMPANY is NOT SUITABLE for postgres unless you have a RAID controller with BBU to protect yourself. The M500/M550 are NOT plug-pull safe despite the 'power loss protection' claimed on the packaging. Not all fsync'd data is preserved in the event of a power loss, which completely undermines postgres's sanity. I would be extremely skeptical about the M500DC given the name and manufacturer. I went to quite a lot of trouble to provide $SOME_COMPANYs engineers with the full details of this fault after extensive testing (we have e.g. 20-25 of these disks) on multiple machines and controllers, at their request. Result: they stopped replying to me, and soon after I saw their PR reps talking about how 'power loss protection isn't about protecting all data during a power loss'. The only safe way to use an M500/M550 with postgres is: a) disable the disk cache, which will cripple performance to about 3-5% of normal. b) use a battery backed or cap-backed RAID controller, which will generally hurt performance, by limiting you to the peak performance of the flash on the raid controller. If you are buying such a drive, I strongly recommend buying only one and doing extensive plug pull testing before commiting to several. For myself, my time is valuable enough that it will be cheaper to buy intel in future. Graeme. On 07 Jul 2015, at 15:12, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jul 2, 2015 at 1:00 PM, Wes Vaske (wvaske) wva...@micron.com wrote: Storage Review has a pretty good process and reviewed the M500DC when it released last year. http://www.storagereview.com/micron_m500dc_enterprise_ssd_review The only database-specific info we have available are for Cassandra and MSSQL: http://www.micron.com/~/media/documents/products/technical-marketing-brief/cassandra_and_m500dc_enterprise_ssd_tech_brief.pdf http://www.micron.com/~/media/documents/products/technical-marketing-brief/sql_server_2014_and_m500dc_raid_configuration_tech_brief.pdf (some of that info might be relevant) In terms of endurance, the M500DC is rated to 2 Drive Writes Per Day (DWPD) for 5-years. For comparison: Micron M500DC (20nm) - 2 DWPD Intel S3500 (20nm
Re: [PERFORM] New server: SSD/RAID recommendations?
Why would you think that you don't need RAID for ZFS? Reason I'm asking if because we are moving to ZFS on FreeBSD for our future projects. Because you have zraid. :-) https://blogs.oracle.com/bonwick/entry/raid_z General points: 1. It's my understanding that ZFS is designed to talk to the hardware directly, and so it would be bad to hide the physical layer from ZFS unless you had to. After all, I don't think they implemented a raid-like system inside ZFS just for the fun of it. 2. You have zraid built in and easy to manage within ZFS - and well tested compared to NewRaidController (TM) - why add another layer of management to your disk storage? 3. You reintroduce the raid write hole. 4. There might be some argument for hardware raid (existing system) but with software raid (the point I was addressing) it makes little sense at all. 5. If you're on hardware raid and your controller dies, you're screwed in several ways. It's harder to get a new raid controller than a new pc. Your chances of recovery are lower than zfs. IMHO more scary to recover from a failed raid controller, too. 6. Recovery is faster if the disks aren't full. e.g. ZFS recovers what it is there. This might not seem a big deal but chances are it would save you 50% of your downtime in a crisis. However, I think with Linux you might want to use RAID for the boot disk. I don't know if linux can boot from ZFS yet. I would (and am) using Freebsd with zfs. Graeme. On 07 Jul 2015, at 18:56, Wei Shan weishan@gmail.com wrote: Hi Graeme, Why would you think that you don't need RAID for ZFS? Reason I'm asking if because we are moving to ZFS on FreeBSD for our future projects. Regards, Wei Shan On 8 July 2015 at 00:46, Graeme B. Bell graeme.b...@nibio.no wrote: RAID controllers are completely unnecessary for SSD as they currently exist. Agreed. The best solution is not to buy cheap disks and not to buy RAID controllers now, imho. In my own situation, I had a tight budget, high performance demand and a newish machine with RAID controller and HDDs in it as a starting point. So it was more a question of 'what can you do with a free raid controller and not much money' back in 2013. And it has worked very well. Still, I had hoped for a bit more from the cheaper SSDs though, I'd hoped to use fastpath on the controller and bypass the cache. The way NVMe prices are going though, I wouldn't do it again if I was doing it this year. I'd just go direct to nvme and trash the raid controller. These sammy and intel nvmes are basically enterprise hardware at consumer prices. Heck, I'll probably put one in my next gaming PC. Re: software raid. I agree, but once you accept that software raid is now pretty much superior to hardware raid, you start looking at ZFS and thinking 'why the heck am I even using software raid?' G -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Regards, Ang Wei Shan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
RAID controllers are completely unnecessary for SSD as they currently exist. Agreed. The best solution is not to buy cheap disks and not to buy RAID controllers now, imho. In my own situation, I had a tight budget, high performance demand and a newish machine with RAID controller and HDDs in it as a starting point. So it was more a question of 'what can you do with a free raid controller and not much money' back in 2013. And it has worked very well. Still, I had hoped for a bit more from the cheaper SSDs though, I'd hoped to use fastpath on the controller and bypass the cache. The way NVMe prices are going though, I wouldn't do it again if I was doing it this year. I'd just go direct to nvme and trash the raid controller. These sammy and intel nvmes are basically enterprise hardware at consumer prices. Heck, I'll probably put one in my next gaming PC. Re: software raid. I agree, but once you accept that software raid is now pretty much superior to hardware raid, you start looking at ZFS and thinking 'why the heck am I even using software raid?' G -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
The comment on HDDs is true and gave me another thought. These new 'shingled' HDDs (the 8TB ones) rely on rewriting all the data on tracks that overlap your data, any time you change the data. Result: disks 8-20x slower during writes, after they fill up. Do they have power loss protection for the data being rewritten during reshingling? You could have data commited at position X and you accidentally nuke data at position Y. [I know that using a shingled disk sounds crazy (it sounds crazy to me) but you can bet there are people that just want to max out the disk bays in their server... ] Graeme. On 07 Jul 2015, at 19:28, Michael Nolan htf...@gmail.com wrote: On Tue, Jul 7, 2015 at 10:59 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/07/2015 05:15 PM, Wes Vaske (wvaske) wrote: The M500/M550/M600 are consumer class drives that don't have power protection for all inflight data.* (like the Samsung 8x0 series and the Intel 3x0 5x0 series). The M500DC has full power protection for inflight data and is an enterprise-class drive (like the Samsung 845DC or Intel S3500 S3700 series). So any drive without the capacitors to protect inflight data will suffer from data loss if you're using disk write cache and you pull the power. Wow, I would be pretty angry if I installed a SSD in my desktop, and it loses a file that I saved just before pulling the power plug. That can (and does) happen with spinning disks, too. *Big addendum: There are two issues on powerloss that will mess with Postgres. Data Loss and Data Corruption. The micron consumer drives will have power loss protection against Data Corruption and the enterprise drive will have power loss protection against BOTH. https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf The Data Corruption problem is only an issue in non-SLC NAND but it's industry wide. And even though some drives will protect against that, the protection of inflight data that's been fsync'd is more important and should disqualify *any* consumer drives from *any* company from consideration for use with Postgres. So it lies about fsync()... The next question is, does it nevertheless enforce the correct ordering of persisting fsync'd data? If you write to file A and fsync it, then write to another file B and fsync it too, is it guaranteed that if B is persisted, A is as well? Because if it isn't, you can end up with filesystem (or database) corruption anyway. - Heikki The sad fact is that MANY drives (ssd as well as spinning) lie about their fsync status. -- Mike Nolan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
Yikes. I would not be able to sleep tonight if it were not for the BBU cache in front of these disks... diskchecker.pl consistently reported several examples of corruption post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think it's pretty much open to debate what types of madness and corruption you'll find if you look close enough. G On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote: So it lies about fsync()... The next question is, does it nevertheless enforce the correct ordering of persisting fsync'd data? If you write to file A and fsync it, then write to another file B and fsync it too, is it guaranteed that if B is persisted, A is as well? Because if it isn't, you can end up with filesystem (or database) corruption anyway. - Heikki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
On 07 Jul 2015, at 19:47, Scott Marlowe scott.marl...@gmail.com wrote: [I know that using a shingled disk sounds crazy (it sounds crazy to me) but you can bet there are people that just want to max out the disk bays in their server... ] Let's just say no online backup companies are using those disks. :) I'm not so sure. Literally the most famous online backup company is (or was planning to): https://www.backblaze.com/blog/6-tb-hard-drive-face-off/ But I think that a massive read-only archive really is the only use for these things. I hope they go out of fashion, soon. But I was thinking more of the 'small company postgres server' or 'charitable organisation postgres server'. Someone is going to make this mistake, you can bet. Probably not someone on THIS list, of course... Biggest current production spinners being used I know of are 4TB, non-shingled. I think we may have some 6TB WD reds around here. I'll need to look around. G -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
Cache flushing isn't an atomic operation though. Even if the ordering is right, you are likely to have a partial fsync on the disk when the lights go out - isn't your FS still corrupt? On 07 Jul 2015, at 21:53, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/07/2015 09:01 PM, Wes Vaske (wvaske) wrote: Right, to be precise, the problem isn't the drive lies about fsync(). It lies about FLUSH CACHE instead. Search replace fsync() with FLUSH CACHE, and the same question remains: When the drive breaks its promise wrt. FLUSH CACHE, does it nevertheless guarantee that the order the data is eventually flushed to disk is consistent with the order in which the data and FLUSH CACHE were sent to the drive? That's an important distinction, because it makes the difference between the most recent data the application saved might be lost even though the FLUSH CACHE command returned and your filesystem is corrupt. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
That is a very good question, which I have raised elsewhere on the postgresql lists previously. In practice: I have *never* managed to make diskchecker fail with the BBU enabled in front of the drives and I spent days trying with plug pulls till I reached the point where as a statistical event it just can't be that likely at all. That's not to say it can't ever happen, just that I've taken all reasonable measures that I can to find out on the time and money budget I had available. In theory: It may be the fact the BBU makes the drives run at about half speed, so that the capacitors go a good bit further to empty the cache, after all: without the BBU in the way, the drive manages to save everything but the last fragment of writes. But I also suspect that the controller itself maybe replaying the last set of writes from around the time of power loss. Anyway I'm 50/50 on those two explanations. Any other thoughts welcome. This raises another interesting question. Does anyone hear have a document explaining how their BBU cache works EXACTLY (at cache / sata level) on their server? Because I haven't been able to find any for mine (Dell PERC H710/H710P). Can anyone tell me with godlike authority and precision, what exactly happens inside that BBU post-power failure? There is rather too much magic involved for me to be happy. G On 07 Jul 2015, at 18:27, Vitalii Tymchyshyn v...@tym.im wrote: Hi. How would BBU cache help you if it lies about fsync? I suppose any RAID controller removes data from BBU cache after it was fsynced by the drive. As I know, there is no other magic command for drive to tell controller that the data is safe now and can be removed from BBU cache. Вт, 7 лип. 2015 11:59 Graeme B. Bell graeme.b...@nibio.no пише: Yikes. I would not be able to sleep tonight if it were not for the BBU cache in front of these disks... diskchecker.pl consistently reported several examples of corruption post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think it's pretty much open to debate what types of madness and corruption you'll find if you look close enough. G On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote: So it lies about fsync()... The next question is, does it nevertheless enforce the correct ordering of persisting fsync'd data? If you write to file A and fsync it, then write to another file B and fsync it too, is it guaranteed that if B is persisted, A is as well? Because if it isn't, you can end up with filesystem (or database) corruption anyway. - Heikki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
This raises another interesting question. Does anyone hear have a document explaining how their BBU cache works EXACTLY (at cache / sata level) on their server? Because I haven't been able to find any for mine (Dell PERC H710/H710P). Can anyone tell me with godlike authority and precision, what exactly happens inside that BBU post-power failure? (and if you have that manual - how can you know it's accurate? that the implementation matches the manual and is free of bugs? because my M500s didn't match the packaging and neither did a H710 we bought - Dell had advertised features in some marketing material that were only present on the H710P) And I see UBER (unrecoverable bit error) rates for SSDs and HDDs, but has anyone ever seen them for the flash-based cache on their raid controller? Sleep well, friends. Graeme. On 07 Jul 2015, at 18:54, Graeme B. Bell graeme.b...@nibio.no wrote: That is a very good question, which I have raised elsewhere on the postgresql lists previously. In practice: I have *never* managed to make diskchecker fail with the BBU enabled in front of the drives and I spent days trying with plug pulls till I reached the point where as a statistical event it just can't be that likely at all. That's not to say it can't ever happen, just that I've taken all reasonable measures that I can to find out on the time and money budget I had available. In theory: It may be the fact the BBU makes the drives run at about half speed, so that the capacitors go a good bit further to empty the cache, after all: without the BBU in the way, the drive manages to save everything but the last fragment of writes. But I also suspect that the controller itself maybe replaying the last set of writes from around the time of power loss. Anyway I'm 50/50 on those two explanations. Any other thoughts welcome. This raises another interesting question. Does anyone hear have a document explaining how their BBU cache works EXACTLY (at cache / sata level) on their server? Because I haven't been able to find any for mine (Dell PERC H710/H710P). Can anyone tell me with godlike authority and precision, what exactly happens inside that BBU post-power failure? There is rather too much magic involved for me to be happy. G On 07 Jul 2015, at 18:27, Vitalii Tymchyshyn v...@tym.im wrote: Hi. How would BBU cache help you if it lies about fsync? I suppose any RAID controller removes data from BBU cache after it was fsynced by the drive. As I know, there is no other magic command for drive to tell controller that the data is safe now and can be removed from BBU cache. Вт, 7 лип. 2015 11:59 Graeme B. Bell graeme.b...@nibio.no пише: Yikes. I would not be able to sleep tonight if it were not for the BBU cache in front of these disks... diskchecker.pl consistently reported several examples of corruption post-power-loss (usually 10 - 30 ) on unprotected M500s/M550s, so I think it's pretty much open to debate what types of madness and corruption you'll find if you look close enough. G On 07 Jul 2015, at 16:59, Heikki Linnakangas hlinn...@iki.fi wrote: So it lies about fsync()... The next question is, does it nevertheless enforce the correct ordering of persisting fsync'd data? If you write to file A and fsync it, then write to another file B and fsync it too, is it guaranteed that if B is persisted, A is as well? Because if it isn't, you can end up with filesystem (or database) corruption anyway. - Heikki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Hi Merlin, Long story short - thanks for the reply, but you're not measuring anything about the parallelism of code running in a pl/pgsql environment here. You're just measuring whether postgres can parallelise entering that environment and get back out. Don't get me wrong - it's great that this scales well because it affects situations where you have lots of calls to trivial functions. However it's not the problem I'm talking about. I mean 'real' pl'pgsql functions. e.g. things that you might find in postgis or similar. If you re-read my previous email or look at par_psql (http://parpsql.com) and look at the benchmarks there you'll maybe see more about what I'm talking about. To clear up the issue I build a little test harness around your comment below. If anyone was wondering if it's par_psql itself that causes bad scaling in postgres. The answer is clearly no. :-) What I found this evening is that there are several problems here. I did some testing here using a machine with 16 physical cores and lots of memory/IO. - Using a table as a source of input rather than a fixed parameter e.g. 'select col1... ' vs. 'select 3'. Please note I am not talking about poor performance, I am talking about poor scaling of performance to multicore. There should be no reason for this when read-locks are being taken on the table, and no reason for this when it is combined with e.g. a bunch of pl/pgsql work in a function. However the impact of this problem is only seen above 8 cores where performance crashes. - Using pl/pgsql itself intensively (e.g. anything non-trivial) causes horrifically bad scaling above 2 cores on the systems I've tested and performance crashes very hard soon after. This matches what I've seen elsewhere in big projects and in par_psql's tests. Of course, it could be some wacky postgresql.conf setting (I doubt it here), so I'd be glad if others could give it a try. If you're bored, set the time to 5s and run, from testing I can tell you it shouldn't alter the results. The repo will be up in around 30 minutes time on http://github.com/gbb/t, and I'm going to submit it as a bug to the pg bugs list. Graeme. On 06 Jul 2015, at 18:40, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Jul 3, 2015 at 9:48 AM, Graeme B. Bell graeme.b...@nibio.no wrote: Hi everyone, I've written a new open source tool for easily parallelising SQL scripts in postgres. [obligatory plug: https://github.com/gbb/par_psql ] Using it, I'm seeing a problem that I've also seen in other postgres projects involving high degrees of parallelisation in the last 12 months. Basically: - I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs and controller etc, carefully configured kernel/postgresql.conf for high performance. - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance improvement. - Non-DB stuff like GDAL, python etc. parallelise nearly perfectly. - HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not parallelise well, even when they are independently defined functions, or accessing tables in a read-only way. They hit a limit of 2.5x performance improvement relative to single-CPU performance (pg9.4) and merely 2x performance (pg9.3) regardless of how many CPU cores I throw at them. This is about 6 times slower than I'm expecting. I can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or incurring some huge frictional costs. Whether I use independently defined functions, independent source tables, independent output tables, makes no difference whatsoever, so it doesn't feel 'lock-related'. It also doesn't seem to be WAL/synchronisation related, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged tables for output. Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md I'm wondering what I'm missing here. Any ideas? I'm not necessarily seeing your results. via pgbench, mmoncure@mernix2 11:34 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -f b.sql transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 658833 latency average: 0.091 ms tps = 10980.538470 (including connections establishing) tps = 10980.994547 (excluding connections establishing) mmoncure@mernix2 11:35 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -c4 -j4 -f b.sql transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 4 number of threads: 4 duration: 60 s number of transactions actually processed: 2847631 latency average: 0.084 ms tps = 47460.430447 (including connections establishing) tps = 47463.702074 (excluding connections establishing) b.sql: select f
Re: [PERFORM] New server: SSD/RAID recommendations?
1. Does the sammy nvme have *complete* power loss protection though, for all fsync'd data? I am very badly burned by my experiences with Crucial SSDs and their 'power loss protection' which doesn't actually ensure all fsync'd data gets into flash. It certainly looks pretty with all those capacitors on top in the photos, but we need some plug pull tests to be sure. 2. Apologies for the typo in the previous post, raidz5 should have been raidz1. 3. Also, something to think about when you start having single disk solutions (or non-ZFS raid, for that matter). SSDs are so unlike HDDs. The samsung nvme has a UBER (uncorrectable bit error rate) measured at 1 in 10^17. That's one bit gone bad in 12500 TB, a good number. Chances are the drives fails before you hit a bit error, and if not, ZFS would catch it. Whereas current HDDS are at the 1 in 10^14 level. That means an error every 12TB, by the specs. That means, every time you fill your cheap 6-8TB seagate drive, it likely corrupted some of your data *even if it performed according to the spec*. (That's also why RAID5 isn't viable for rebuilding large arrays, incidentally). Graeme Bell On 07 Jul 2015, at 12:56, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote: - Original Message - From: Graeme B. Bell graeme.b...@nibio.no To: Mkrtchyan, Tigran tigran.mkrtch...@desy.de Cc: Graeme B. Bell graeme.b...@nibio.no, Steve Crawford scrawf...@pinpointresearch.com, Wes Vaske (wvaske) wva...@micron.com, pgsql-performance pgsql-performance@postgresql.org Sent: Tuesday, July 7, 2015 12:38:10 PM Subject: Re: [PERFORM] New server: SSD/RAID recommendations? I am unsure about the performance side but, ZFS is generally very attractive to me. Key advantages: 1) Checksumming and automatic fixing-of-broken-things on every file (not just postgres pages, but your scripts, O/S, program files). 2) Built-in lightweight compression (doesn't help with TOAST tables, in fact may slow them down, but helpful for other things). This may actually be a net negative for pg so maybe turn it off. 3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that it's safe to replace a RAID array with a single drive... you can use a couple of NVMe SSDs with ZFS mirror or zraid, and get the same availability you'd get from a RAID controller. Slightly better, arguably, since they claim to have fixed the raid write-hole problem. 4) filesystem snapshotting Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU with multiple NVMe drives will outperform quite a lot of the alternatives, with great data integrity guarantees. We are planing to have a test setup as well. For now I have single NVMe SSD on my test system: # lspci | grep NVM 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 171X (rev 03) # mount | grep nvm /dev/nvme0n1p1 on /var/lib/pgsql/9.5 type ext4 (rw,noatime,nodiratime,data=ordered) and quite happy with it. We have write heavy workload on it to see when it will break. Postgres Performs very well. About x2.5 faster than with regular disks with a single client and almost linear with multiple clients (picture attached. On Y number of high level op/s our application does, X number of clients). The setup is used last 3 months. Looks promising but for production we need to to have disk size twice as big as on the test system. Until today, I was planning to use a RAID10 with a HW controller... Related to ZFS. We use ZFSonlinux and behaviour is not as good as with solaris. Let's re-phrase it: performance is unpredictable. We run READZ2 with 30x3TB disks. Tigran. Haven't built one yet. Hope to, later this year. Steve, I would love to know more about how you're getting on with your NVMe disk in postgres! Graeme. On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote: Thanks for the Info. So if RAID controllers are not an option, what one should use to build big databases? LVM with xfs? BtrFs? Zfs? Tigran. - Original Message - From: Graeme B. Bell graeme.b...@nibio.no To: Steve Crawford scrawf...@pinpointresearch.com Cc: Wes Vaske (wvaske) wva...@micron.com, pgsql-performance pgsql-performance@postgresql.org Sent: Tuesday, July 7, 2015 12:22:00 PM Subject: Re: [PERFORM] New server: SSD/RAID recommendations? Completely agree with Steve. 1. Intel NVMe looks like the best bet if you have modern enough hardware for NVMe. Otherwise e.g. S3700 mentioned elsewhere. 2. RAID controllers. We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines. This might give people idea about where the risk lies in the path from disk to CPU. We've had 2 RAID card failures in the last 12 months that nuked the array with days of downtime, and 2 problems with batteries suddenly becoming useless or suddenly reporting wildly varying
Re: [PERFORM] New server: SSD/RAID recommendations?
Hi Karl, Great post, thanks. Though I don't think it's against conventional wisdom to aggregate writes into larger blocks rather than rely on 4k performance on ssds :-) 128kb blocks + compression certainly makes sense. But it might make less sense I suppose if you had some incredibly high rate of churn in your rows. But for the work we do here, we could use 16MB blocks for all the difference it would make. (Tip to others: don't do that. 128kb block performance is already enough out the IO bus to most ssds) Do you have your WAL log on a compressed zfs fs? Graeme Bell On 07 Jul 2015, at 13:28, Karl Denninger k...@denninger.net wrote: Lz4 compression and standard 128kb block size has shown to be materially faster here than using 8kb blocks and no compression, both with rotating disks and SSDs. This is workload dependent in my experience but in the applications we put Postgres to there is a very material improvement in throughput using compression and the larger blocksize, which is counter-intuitive and also opposite the conventional wisdom. For best throughput we use mirrored vdev sets. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
I am unsure about the performance side but, ZFS is generally very attractive to me. Key advantages: 1) Checksumming and automatic fixing-of-broken-things on every file (not just postgres pages, but your scripts, O/S, program files). 2) Built-in lightweight compression (doesn't help with TOAST tables, in fact may slow them down, but helpful for other things). This may actually be a net negative for pg so maybe turn it off. 3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that it's safe to replace a RAID array with a single drive... you can use a couple of NVMe SSDs with ZFS mirror or zraid, and get the same availability you'd get from a RAID controller. Slightly better, arguably, since they claim to have fixed the raid write-hole problem. 4) filesystem snapshotting Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU with multiple NVMe drives will outperform quite a lot of the alternatives, with great data integrity guarantees. Haven't built one yet. Hope to, later this year. Steve, I would love to know more about how you're getting on with your NVMe disk in postgres! Graeme. On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran tigran.mkrtch...@desy.de wrote: Thanks for the Info. So if RAID controllers are not an option, what one should use to build big databases? LVM with xfs? BtrFs? Zfs? Tigran. - Original Message - From: Graeme B. Bell graeme.b...@nibio.no To: Steve Crawford scrawf...@pinpointresearch.com Cc: Wes Vaske (wvaske) wva...@micron.com, pgsql-performance pgsql-performance@postgresql.org Sent: Tuesday, July 7, 2015 12:22:00 PM Subject: Re: [PERFORM] New server: SSD/RAID recommendations? Completely agree with Steve. 1. Intel NVMe looks like the best bet if you have modern enough hardware for NVMe. Otherwise e.g. S3700 mentioned elsewhere. 2. RAID controllers. We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines. This might give people idea about where the risk lies in the path from disk to CPU. We've had 2 RAID card failures in the last 12 months that nuked the array with days of downtime, and 2 problems with batteries suddenly becoming useless or suddenly reporting wildly varying temperatures/overheating. There may have been other RAID problems I don't know about. Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs. So by my estimates: 30% annual problem rate with RAID controllers 30-50% failure rate with Seagate HDDs (backblaze saw similar results) 0% failure rate with HGST HDDs. 0% failure in our SSDs. (to be fair, our one samsung SSD apparently has a bug in TRIM under linux, which I'll need to investigate to see if we have been affected by). also, RAID controllers aren't free - not just the money but also the management of them (ever tried writing a complex install script that interacts work with MegaCLI? It can be done but it's not much fun.). Just take a look at the MegaCLI manual and ask yourself... is this even worth it (if you have a good MTBF on an enterprise SSD). RAID was meant to be about ensuring availability of data. I have trouble believing that these days Graeme Bell On 06 Jul 2015, at 18:56, Steve Crawford scrawf...@pinpointresearch.com wrote: 2. We don't typically have redundant electronic components in our servers. Sure, we have dual power supplies and dual NICs (though generally to handle external failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no backup RAID card. Intel Enterprise SSD already have power-fail protection so I don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD I'm left to wonder if placing a RAID card in front merely adds a new point of failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at you, RAID backup battery). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
Completely agree with Steve. 1. Intel NVMe looks like the best bet if you have modern enough hardware for NVMe. Otherwise e.g. S3700 mentioned elsewhere. 2. RAID controllers. We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines. This might give people idea about where the risk lies in the path from disk to CPU. We've had 2 RAID card failures in the last 12 months that nuked the array with days of downtime, and 2 problems with batteries suddenly becoming useless or suddenly reporting wildly varying temperatures/overheating. There may have been other RAID problems I don't know about. Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs. So by my estimates: 30% annual problem rate with RAID controllers 30-50% failure rate with Seagate HDDs (backblaze saw similar results) 0% failure rate with HGST HDDs. 0% failure in our SSDs. (to be fair, our one samsung SSD apparently has a bug in TRIM under linux, which I'll need to investigate to see if we have been affected by). also, RAID controllers aren't free - not just the money but also the management of them (ever tried writing a complex install script that interacts work with MegaCLI? It can be done but it's not much fun.). Just take a look at the MegaCLI manual and ask yourself... is this even worth it (if you have a good MTBF on an enterprise SSD). RAID was meant to be about ensuring availability of data. I have trouble believing that these days Graeme Bell On 06 Jul 2015, at 18:56, Steve Crawford scrawf...@pinpointresearch.com wrote: 2. We don't typically have redundant electronic components in our servers. Sure, we have dual power supplies and dual NICs (though generally to handle external failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no backup RAID card. Intel Enterprise SSD already have power-fail protection so I don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD I'm left to wonder if placing a RAID card in front merely adds a new point of failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at you, RAID backup battery). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Hi everyone, I've written a new open source tool for easily parallelising SQL scripts in postgres. [obligatory plug: https://github.com/gbb/par_psql ] Using it, I'm seeing a problem I've seen in other postgres projects involving parallelisation in the last 12 months. Basically: - I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs and controller etc, carefully configured kernel/postgresql.conf for high performance. - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance improvement. - Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not parallelise well, even when they are independent or accessing tables in a read-only way. They hit a limit at 2.5x performance improvement relative to single-CPU performance (pg9.4) and 2x performance (pg9.3). This is about 6 times slower than I'm expecting. - Can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or incurring some huge frictional costs. Whether I use independently defined functions, independent source tables, independent output tables, makes no difference whatsoever, so it doesn't feel 'locky'. It also doesn't seem to be WAL/synchronisation related, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged tables. Curious? Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md Wondering what I'm missing here. Any ideas? Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
Thanks, this is very useful to know about the 730. When you say 'tested it with plug-pulls', you were using diskchecker.pl, right? Graeme. On 07 Jul 2015, at 14:39, Karl Denninger k...@denninger.net wrote: Incidentally while there are people who have questioned the 730 series power loss protection I've tested it with plug-pulls and in addition it watchdogs its internal power loss capacitors -- from the smartctl -a display of one of them on an in-service machine here: 175 Power_Loss_Cap_Test 0x0033 100 100 010Pre-fail Always - 643 (4 6868) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server: SSD/RAID recommendations?
As I have warned elsewhere, The M500/M550 from $SOME_COMPANY is NOT SUITABLE for postgres unless you have a RAID controller with BBU to protect yourself. The M500/M550 are NOT plug-pull safe despite the 'power loss protection' claimed on the packaging. Not all fsync'd data is preserved in the event of a power loss, which completely undermines postgres's sanity. I would be extremely skeptical about the M500DC given the name and manufacturer. I went to quite a lot of trouble to provide $SOME_COMPANYs engineers with the full details of this fault after extensive testing (we have e.g. 20-25 of these disks) on multiple machines and controllers, at their request. Result: they stopped replying to me, and soon after I saw their PR reps talking about how 'power loss protection isn't about protecting all data during a power loss'. The only safe way to use an M500/M550 with postgres is: a) disable the disk cache, which will cripple performance to about 3-5% of normal. b) use a battery backed or cap-backed RAID controller, which will generally hurt performance, by limiting you to the peak performance of the flash on the raid controller. If you are buying such a drive, I strongly recommend buying only one and doing extensive plug pull testing before commiting to several. For myself, my time is valuable enough that it will be cheaper to buy intel in future. Graeme. On 07 Jul 2015, at 15:12, Merlin Moncure mmonc...@gmail.com wrote: On Thu, Jul 2, 2015 at 1:00 PM, Wes Vaske (wvaske) wva...@micron.com wrote: Storage Review has a pretty good process and reviewed the M500DC when it released last year. http://www.storagereview.com/micron_m500dc_enterprise_ssd_review The only database-specific info we have available are for Cassandra and MSSQL: http://www.micron.com/~/media/documents/products/technical-marketing-brief/cassandra_and_m500dc_enterprise_ssd_tech_brief.pdf http://www.micron.com/~/media/documents/products/technical-marketing-brief/sql_server_2014_and_m500dc_raid_configuration_tech_brief.pdf (some of that info might be relevant) In terms of endurance, the M500DC is rated to 2 Drive Writes Per Day (DWPD) for 5-years. For comparison: Micron M500DC (20nm) – 2 DWPD Intel S3500 (20nm) – 0.3 DWPD Intel S3510 (16nm) – 0.3 DWPD Intel S3710 (20nm) – 10 DWPD They’re all great drives, the question is how write-intensive is the workload. Intel added a new product, the 3610, that is rated for 3 DWPD. Pricing looks to be around 1.20$/GB. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Hi everyone, I've written a new open source tool for easily parallelising SQL scripts in postgres. [obligatory plug: https://github.com/gbb/par_psql ] Using it, I'm seeing a problem that I've also seen in other postgres projects involving high degrees of parallelisation in the last 12 months. Basically: - I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs and controller etc, carefully configured kernel/postgresql.conf for high performance. - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance improvement. - Non-DB stuff like GDAL, python etc. parallelise nearly perfectly. - HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not parallelise well, even when they are independently defined functions, or accessing tables in a read-only way. They hit a limit of 2.5x performance improvement relative to single-CPU performance (pg9.4) and merely 2x performance (pg9.3) regardless of how many CPU cores I throw at them. This is about 6 times slower than I'm expecting. I can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or incurring some huge frictional costs. Whether I use independently defined functions, independent source tables, independent output tables, makes no difference whatsoever, so it doesn't feel 'lock-related'. It also doesn't seem to be WAL/synchronisation related, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged tables for output. Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md I'm wondering what I'm missing here. Any ideas? Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Need more IOPS? This should get you drooling... (5xnvme drives)
I previously mentioned on the list that nvme drives are going to be a very big thing this year for DB performance. This video shows what happens if you get an 'enthusiast'-class motherboard and 5 of the 400GB intel 750 drives. https://www.youtube.com/watch?v=-hE8Vg1qPSw Total transfer speed: 10.3 GB/second. Total IOPS: 2 million (!) + nice power loss protection (Intel) + lower latency too-about 20ms vs 100ms for SATA3 (http://www.anandtech.com/show/7843/testing-sata-express-with-asus/4) + substantially lower CPU use per I/O (http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme/5) You're probably wondering 'how much' though? $400 per drive! Peanuts. Assuming for the moment you're working in RAID0 or with tablespaces, and just want raw speed: $2400 total for 2 TB of storage, including a good quality motherboard, with 2 million battery backed IOPS and 10GB/second bulk transfers. These drives are going to utterly wreck the profit margins on high-end DB hardware. Graeme Bell p.s. No, I don't have shares in Intel, but maybe I should... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need more IOPS? This should get you drooling... (5xnvme drives)
Images/data here http://www.pcper.com/reviews/Storage/Five-Intel-SSD-750s-Tested-Two-Million-IOPS-and-10-GBsec-Achievement-Unlocked On 04 Jun 2015, at 13:07, Graeme Bell g...@skogoglandskap.no wrote: I previously mentioned on the list that nvme drives are going to be a very big thing this year for DB performance. This video shows what happens if you get an 'enthusiast'-class motherboard and 5 of the 400GB intel 750 drives. https://www.youtube.com/watch?v=-hE8Vg1qPSw Total transfer speed: 10.3 GB/second. Total IOPS: 2 million (!) + nice power loss protection (Intel) + lower latency too-about 20ms vs 100ms for SATA3 (http://www.anandtech.com/show/7843/testing-sata-express-with-asus/4) + substantially lower CPU use per I/O (http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme/5) You're probably wondering 'how much' though? $400 per drive! Peanuts. Assuming for the moment you're working in RAID0 or with tablespaces, and just want raw speed: $2400 total for 2 TB of storage, including a good quality motherboard, with 2 million battery backed IOPS and 10GB/second bulk transfers. These drives are going to utterly wreck the profit margins on high-end DB hardware. Graeme Bell p.s. No, I don't have shares in Intel, but maybe I should... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need more IOPS? This should get you drooling... (5xnvme drives)
Note also - these disks are close to the performance of memory from a few generations ago (e.g. 10GB/second bulk transfers) They also have bigger/faster versions of the drives, 1.2TB each. I suspect that 5 of those would feel somewhat similar to having 6TB of memory in your db server ... :-) [better in fact, since writes are fast too] Graeme. On 04 Jun 2015, at 13:29, Dorian Hoxha dorian.ho...@gmail.com wrote: This looks great when you want in-memory (something like unlogged tables) and you also want replication. (meaning, I don't know of an alternative to get replication with unlogged than to just get faster drives + logged tables?) On Thu, Jun 4, 2015 at 1:23 PM, Graeme B. Bell g...@skogoglandskap.no wrote: Images/data here http://www.pcper.com/reviews/Storage/Five-Intel-SSD-750s-Tested-Two-Million-IOPS-and-10-GBsec-Achievement-Unlocked On 04 Jun 2015, at 13:07, Graeme Bell g...@skogoglandskap.no wrote: I previously mentioned on the list that nvme drives are going to be a very big thing this year for DB performance. This video shows what happens if you get an 'enthusiast'-class motherboard and 5 of the 400GB intel 750 drives. https://www.youtube.com/watch?v=-hE8Vg1qPSw Total transfer speed: 10.3 GB/second. Total IOPS: 2 million (!) + nice power loss protection (Intel) + lower latency too-about 20ms vs 100ms for SATA3 (http://www.anandtech.com/show/7843/testing-sata-express-with-asus/4) + substantially lower CPU use per I/O (http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme/5) You're probably wondering 'how much' though? $400 per drive! Peanuts. Assuming for the moment you're working in RAID0 or with tablespaces, and just want raw speed: $2400 total for 2 TB of storage, including a good quality motherboard, with 2 million battery backed IOPS and 10GB/second bulk transfers. These drives are going to utterly wreck the profit margins on high-end DB hardware. Graeme Bell p.s. No, I don't have shares in Intel, but maybe I should... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres is using 100% CPU
I believe yes / 0 are the default settings for synchronous commit and commit_delay. ** (Interestingly the manual pages do not specify.) ** Sorry, I've just spotted the settings in the text. The statement (marked **) is incorrect. Defaults are yes/0. (http://www.postgresql.org/docs/9.4/static/runtime-config-wal.html) Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres is using 100% CPU
On Sun, May 31, 2015 at 7:53 PM, Yves Dorfsman y...@zioup.com wrote: That's the thing, even on an old laptop with a slow IDE disk, 273 individual inserts should not take more than a second. I think that would depend on settings such as synchronous_commit, commit_delay, or whether 2-phase commit is being used. If synchronous commit is enabled and commit_delay was not used (e.g. 0), and you have a client synchronously making individual inserts to the DB (1 transaction each), then surely you have delays due to waiting for each transaction to commit synchronously to WAL on disk? I believe yes / 0 are the default settings for synchronous commit and commit_delay. (Interestingly the manual pages do not specify.) Assuming a 5400RPM laptop drive (which is a typical drive - some laptop drives run 5000RPM), and assuming you are writing a sequential log to disk (with very short gaps between entries being added, e.g. no seek time, only rotational latency) will mean 5400 transactions per minute, 1 write per rotation. That's a maximum 90 transactions per second synchronised to WAL. It would take just over 3 seconds. Ashik, try altering your postgresql.conf to say 'commit_delay=100' or 'synchronous_commit=off'. Let us know if that fixes the problem. Read up on the options before you change them. Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Some performance testing?
Josh, there seems to be an inconsistency in your blog. You say 3.10.X is safe, but the graph you show with the poor performance seems to be from 3.13.X which as I understand it is a later kernel. Can you clarify which 3.X kernels are good to use and which are not? Sorry to cut in - So far we've found kernel 3.18 to be excellent for postgres 9.3 performance (pgbench + our own queries run much faster than with the 2.6.32-504 centos 6 kernel, and we haven't encountered random stalls or slowness). We use elrepo to get prebuilt rpms of the latest mainline stable kernel (kernel-ml). http://elrepo.org/tiki/kernel-ml Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Some performance testing?
ext4 settings ext4, nobarrier noatime+nodatime, stripestride aligned between raid10 ext4 correctly. Some other useful things to know -- h710p readahead disabled on H710P writeback cache enabled on H710P Direct IO enabled on H710P -- os filesystem settings linux readahead enabled (16384), nr_requests=975 NOOP scheduler non-NUMA -- pg io_concurrency on async commit.*** see below! All settings were kept identical on the server before and after the kernel change, so this performance increase can be entirely attributed to the newer kernel and its synergies with our configuration. 3.18 contains about 5-10 years of linux kernel development vs. 2.6 kernels (except where backported). I have conducted quite a lot of plug-pull testing with diskchecker.pl, and rather a lot of testing of scheduling/IO/RAID controller/etc parameters. The OS/RAID controller/file system settings are as fast as I've been able to achieve without compromising database integrity (please note: this server can run async_commit because of the work we use it for, but we do not use that setting on our other main production servers). Our local DBs run extremely nicely for all our normal queries which involve quite a mix of random small IO and full-table operations on e.g. 20GB+ tables , so they're not optimised for pgbench specifically. Graeme Bell On 09 Apr 2015, at 13:56, Przemysław Deć przemyslaw@linuxpolska.pl wrote: Wow, thats huge performance gain. And it was on ext4? -- Linux Polska Sp. z o.o. Przemysław Deć - Senior Solutions Architect RHCSA, RHCJA, PostgreSQL Professional Certification mob: +48 519 130 141 email: p...@linuxpolska.pl www.linuxpolska.pl ___ Linux Polska Sp. z o. o. Al. Jerozolimskie 123A (26 p.); 02-017 Warszawa; tel. (+48) 222139571; fax (+48)222139671 KRS - 326158 Sąd Rejonowy dla M. St. Warszawy w Warszawie, XII Wydział Gospodarczy KRS Kapitał zakładowy wpłacony 1 000 500PLN; NIP 7010181018; REGON 141791601 Mail Attachment.jpeg 2015-04-09 13:01 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no: From a measurement I took back when we did the upgrade: performance with 2.6: (pgbench, size 100, 32 clients) 48 651 transactions per second (read only) 6 504 transactions per second (read-write) performance with 3.18 (pgbench, size 100, 32 clients) 129 303 transactions per second (read only) 16 895 transactions (read-write) So that looks like 2.6x improvement to reads and writes. That was an 8 core xeon server with H710P and 4x crucial M550 SSDs in RAID, pg9.3. Graeme Bell On 09 Apr 2015, at 12:39, Przemysław Deć przemyslaw@linuxpolska.pl wrote: Can you say how much faster it was? Przemek Deć 2015-04-09 11:04 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no: Josh, there seems to be an inconsistency in your blog. You say 3.10.X is safe, but the graph you show with the poor performance seems to be from 3.13.X which as I understand it is a later kernel. Can you clarify which 3.X kernels are good to use and which are not? Sorry to cut in - So far we've found kernel 3.18 to be excellent for postgres 9.3 performance (pgbench + our own queries run much faster than with the 2.6.32-504 centos 6 kernel, and we haven't encountered random stalls or slowness). We use elrepo to get prebuilt rpms of the latest mainline stable kernel (kernel-ml). http://elrepo.org/tiki/kernel-ml Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Some performance testing?
From a measurement I took back when we did the upgrade: performance with 2.6: (pgbench, size 100, 32 clients) 48 651 transactions per second (read only) 6 504 transactions per second (read-write) performance with 3.18 (pgbench, size 100, 32 clients) 129 303 transactions per second (read only) 16 895 transactions (read-write) So that looks like 2.6x improvement to reads and writes. That was an 8 core xeon server with H710P and 4x crucial M550 SSDs in RAID, pg9.3. Graeme Bell On 09 Apr 2015, at 12:39, Przemysław Deć przemyslaw@linuxpolska.pl wrote: Can you say how much faster it was? Przemek Deć 2015-04-09 11:04 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no: Josh, there seems to be an inconsistency in your blog. You say 3.10.X is safe, but the graph you show with the poor performance seems to be from 3.13.X which as I understand it is a later kernel. Can you clarify which 3.X kernels are good to use and which are not? Sorry to cut in - So far we've found kernel 3.18 to be excellent for postgres 9.3 performance (pgbench + our own queries run much faster than with the 2.6.32-504 centos 6 kernel, and we haven't encountered random stalls or slowness). We use elrepo to get prebuilt rpms of the latest mainline stable kernel (kernel-ml). http://elrepo.org/tiki/kernel-ml Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] NVMe or AHCI PCI-express? A comment for people benchmarking...
A tangent to the performance testing thread here, but an important issue that you will see come up in your work this year or next. PCIe SSD may include AHCI PCI SSD or NVMe PCI SSD. AHCI = old style, basically it's faster than SATA3 but quite similar in terms of how the operating system sees the flash device. NVMe = new style, requires a very new motherboard, operating system drivers, but extremely fast and low latency, very high IOPS. For example, Macbooks have PCIe SSDs in them, but not NVMe (currently). The difference is very important since NVMe offers multiples of performance in terms of everything we love: lower latency, higher IOPS, lower CPU overhead and higher throughput. http://www.anandtech.com/show/7843/testing-sata-express-with-asus/4 scroll down to the App to SSD IO Read Latency graph. Look at the two bottom lines. So I'd suggest it's probably worth noting in any benchmark if you are using NVMe and if so which driver version, since development is ongoing. On the topic of PCIe NVMe SSDs, some interesting reading: - http://www.tweaktown.com/reviews/6773/samsung-xs1715-1-6tb-2-5-inch-nvme-pcie-enterprise-ssd-review/index.html it can deliver 750,000 random read IOPS and 115,000 write IOPS - or any of these nice toys... http://imagescdn.tweaktown.com/content/6/7/6773_11777_samsung_xs1715_1_6tb_2_5_inch_nvme_pcie_enterprise_ssd_review.png all with capacitor backing (which you should plug-pull test, of course). Graeme. I currently have access to a matched pair of 20-core, 128GB RAM servers with SSD-PCI storage, for about 2 weeks before they go into production. Are there any performance tests people would like to see me run on these? Otherwise, I'll just do some pgbench and DVDStore. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Configuration tips for very large database
Hi Nico, No one has mentioned the elephant in the room, but a database can be very I/O intensive and you may not be getting the performance you need from your virtual disk running on your VMware disk subsystem. What do IOmeter or other disk performance evaluation software report? Regards, Ken Anecdatum: Moving from a contended VMware hard-disk based filesystem running over the network, to a bare metal RAID10 SSD, resulted in many DB operations running 20-30x faster. Table sizes circa 10-20G, millions of rows. Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Survey: Max TPS you've ever seen
1. O/S Under O/S, don't forget to mention linux kernel version. We saw a MASSIVE increase in TPS (I think it was a doubling? Don't have the data to hand right now) on our multicore RHEL6 servers, when moving from a stock RHEL6 kernel to an ELREPO 3.18 series kernel. That's what 10 years of kernel development will do for you. - 16 SSD + 4 PCIe SSD storage Similarly, it's useful to specify - exactly which drives were being used during the test (PCIe and SATA SSDs perform pretty differently!). Similarly if you're using e.g. a dell server with a ssd cache in front of the disks, remember to mention it. - Also exactly which PCI interface, now that there are different types of PCI attached SSD becoming available (traditional pciE SSD vs NVMe) with substantially different performance and overheads. (Performance junkies: Check out nvmE if you haven't heard of it) http://www.thessdreview.com/daily-news/latest-buzz/marvell-displays-88ss1094-nvme-ssd-controller-2-9gbs/ http://www.thessdreview.com/daily-news/latest-buzz/memblaze-pmc-collaborate-pblaze4-pcie-ssd-hyperscale-data-centers-3-2gbs-reads-85-iops/ - Which firmware (some ssds exhibit noteable performance changes with firmware) - which filesystem and filesystem options (try benchmarking with a fresh ext4 filesystem and nobarriers - then compare against a mostly full filesystem with barriers on an SSD. You should see quite a difference) - which RAID controller. (Good luck if you're using an H710 with modern SSDs for example... the controller's write cache is the choke point for performance) - readahead settings (We *tripled* our read performance on large tables/transfers by changing this from the default value in linux up to around 16MB) - filesystem queue depth and scheduler ( e.g. shallow/deep queues on ssds and e.g. cfq vs. noop schedulers on ssds) - if anything else is running on the same server/filesystem (e.g. background db activity, web servers etc, operating system sharing the same disk) - even things like raid stripe size and filesystem block size can have a small impact if you're going for absolute maximum TPS. However honestly all of this is probably dwarfed by the question of what you're doing with your database. If what you do doesn't actually look like pgbench activity (e.g. your server is mostly burning clock cycles on running ancient legacy pl/sql code) then you're taking the wrong benchmark if you use pgbench. (Also, another note for performance junkies - some interesting news from the gaming world - spending extra money on 'fast memory' is probably a waste in the current generation of computers) http://www.anandtech.com/show/7364/memory-scaling-on-haswell/3 Graeme Bell On 11 Feb 2015, at 01:31, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 10/02/15 10:29, Gavin Flower wrote: On 10/02/15 08:30, Luis Antonio Dias de Sá Junior wrote: Hi, A survay: with pgbench using TPS-B, what is the maximum TPS you're ever seen? For me: 12000 TPS. -- Luis Antonio Dias de Sá Junior Important to specify: 1. O/S 2. version of PostgreSQL 3. PostgreSQL configuration 4. hardware configuration 5. anything else that might affect performance I suspect that Linux will out perform Microsoft on the same hardware, and optimum configuration for both O/S's... Yes, exactly - and also the pgbench parameters: - scale - number of clients - number of threads - statement options (prepared or simple etc) - length of test We've managed to get 4 to 6 TPS on some pretty serious hardware: - 60 core, 1 TB ram - 16 SSD + 4 PCIe SSD storage - Ubuntu 14.04 - Postgres 9.4 (beta and rc) ...with Postgres parameters customized: - checkpoint_segments 1920 - checkpoint_completion_target 0.8 - wal_buffers 256MB - wal_sync_method open_datasync - shared_buffers 10GB - max_connections 600 - effective_io_concurrency 10 ..and finally pgbench parameters - scale 2000 - clients 32, 64, 128, 256 (best results at 32 and 64 generally) - threads = 1/2 client number - prepared option - 10 minute test run time Points to note, we did *not* disable fsync or prevent buffers being actually written (common dirty tricks in benchmarks). However, as others have remarked - raw numbers mean little. Pgbench is very useful for testing how tuning configurations are helping (or not) for a particular hardware and software setup, but is less useful for answering the question how many TPS can postgres do... Regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Survey: Max TPS you've ever seen
I'd suggest you run it on a large ramdisk with fsync turned off on a 32 core computer, see what you get, that will be a good indication of a maximum. Keep in mind though that 'postgres' with fsync (vs. without) is such a different creature that the comparison isn't meaningful. Similarly 'postgres' on volatile backing store vs. non-volatile isn't really a meaningful comparison. There's also a question here about the 't' in TPS. If you have no fsync and volatile storage, are you really doing 'transactions'? Depending on the definition you take, a transaction may have some sense of 'reliability' or atomicity which isn't reflected well in a ramdisk/no-fsync benchmark. It's probably not ideal to fill a mailing list with numbers that have no meaning attached to them, so why not set up a little web database or Google doc to record max TPS and how it was achieved? For example, imagine I tell you that the highest I've achieved is 124 tps. How does it help you if I say that? Graeme Bell On 10 Feb 2015, at 11:48, Luis Antonio Dias de Sá Junior luisjunior...@gmail.com wrote: No problem with this. If anyone want to specify more details. But I want to know how far postgres can go. No matter OS or other variables. Gavin, you got more than 12000 TPS? 2015-02-09 19:29 GMT-02:00 Gavin Flower gavinflo...@archidevsys.co.nz: On 10/02/15 08:30, Luis Antonio Dias de Sá Junior wrote: Hi, A survay: with pgbench using TPS-B, what is the maximum TPS you're ever seen? For me: 12000 TPS. -- Luis Antonio Dias de Sá Junior Important to specify: 1. O/S 2. version of PostgreSQL 3. PostgreSQL configuration 4. hardware configuration 5. anything else that might affect performance I suspect that Linux will out perform Microsoft on the same hardware, and optimum configuration for both O/S's... Cheers, Gavin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Luis Antonio Dias de Sá Junior -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8xIntel S3500 SSD in RAID10 on Dell H710p
I have a beast of a Dell server with the following specifications: • 4x Xeon E5-4657LV2 (48 cores total) • 196GB RAM • 2x SCSI 900GB in RAID1 (for the OS) • 8x Intel S3500 SSD 240GB in RAID10 • H710p RAID controller, 1GB cache Centos 6.6, RAID10 SSDs uses XFS (mkfs.xfs -i size=512 /dev/sdb). Things to check - disk cache settings (EnDskCache - for SSD should be on or you're going to lose 90% of your performance) - OS settings e.g. echo noop /sys/block/sda/queue/scheduler echo 975 /sys/block/sda/queue/nr_requests blockdev --setra 16384 /dev/sdb - OS kernel version We use H710Ps with SSDs as well, and these settings make a measurable difference to our performance here (though we measure more than just pgbench since it's a poor proxy for our use cases). Also - SSDs - is the filesystem aligned and block size chosen correctly (you don't want to be forced to read 2 blocks of SSD to get every data block)? RAID stripe size? May make a small difference. - are the SSDs all sitting on different SATA channels? You don't want them to be forced to share one channel's worth of bandwidth. The H710P has 8 SATA channels I think (?) and you mention 10 devices above. Graeme Bell. On 10 Dec 2014, at 00:28, Strahinja Kustudić strahin...@nordeus.com wrote: I have a beast of a Dell server with the following specifications: • 4x Xeon E5-4657LV2 (48 cores total) • 196GB RAM • 2x SCSI 900GB in RAID1 (for the OS) • 8x Intel S3500 SSD 240GB in RAID10 • H710p RAID controller, 1GB cache Centos 6.6, RAID10 SSDs uses XFS (mkfs.xfs -i size=512 /dev/sdb). Here are some relevant postgresql.conf settings: shared_buffers = 8GB work_mem = 64MB maintenance_work_mem = 1GB synchronous_commit = off checkpoint_segments = 256 checkpoint_timeout = 10min checkpoint_completion_target = 0.9 seq_page_cost = 1.0 effective_cache_size = 100GB I ran some fast pgbench tests with 4, 6 and 8 drives in RAID10 and here are the results: time /usr/pgsql-9.1/bin/pgbench -U postgres -i -s 12000 pgbench # 292GB DB 4 drives 6 drives8 drives 105 min 98 min 94 min /usr/pgsql-9.1/bin/pgbench -U postgres -c 96 -T 600 -N pgbench # Write test 4 drives 6 drives8 drives 6567 74278073 /usr/pgsql-9.1/bin/pgbench -U postgres -c 96 -T 600 pgbench # Read/Write test 4 drives 6 drives8 drives 3651 54747203 /usr/pgsql-9.1/bin/pgbench -U postgres -c 96 -T 600 -S pgbench # Read test 4 drives 6 drives8 drives 17628 25482 28698 A few notes: • I ran these tests only once, so take these number with reserve. I didn't have the time to run them more times, because I had to test how the server works with our app and it takes a considerable amount of time to run them all. • I wanted to use a bigger scale factor, but there is a bug in pgbench with big scale factors. • Postgres 9.1 was chosen, since the app which will run on this server uses 9.1. • These tests are with the H710p controller set to write-back (WB) and with adaptive read ahead (ADRA). I ran a few tests with write-through (WT) and no read ahead (NORA), but the results were worse. • All tests were run using 96 clients as recommended on the pgbench wiki page, but I'm sure I would get better results if I used 48 clients (1 for each core), which I tried with the R/W test and got 7986 on 8 drives, which is almost 800TPS better than with 96 clients. Since our app is tied to the Postgres performance a lot, I'm currently trying to optimize it. Do you have any suggestions what Postgres/system settings I could try to tweak to increase performance? I have a feeling I could get more performance out of this system. Regards, Strahinja -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning the configuration
I don't understand the logic behind using drives, which are best for random io, for sequent io workloads. Because they are also best for sequential IO. I get 1.3-1.4GB/second from 4 SSDs in RAID or 500MB/s for single disk systems, even with cheap models. Are you getting more than that from high-end spinning rust? Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning the configuration
Very much agree with this. Because SSD is fast doesn't make it suited for certain things, and a streaming sequential 100% write workload is one of them. I've worked with everything from local disk to high-end SAN and even at the high end we've always put any DB logs on spinning disk. RAID1 is generally sufficient. SSD is king for read heavy random I/O workload. 1. Here we found SSD sustained serial writes were faster on SSD than to disk, by a factor of 3, both in RAID and single disk configurations. 2. Also, something to watch out for is extended stalls due to synchronous write activity / clearing out of cache, when a lot of data has been building up in write caches. By placing the WAL on the same disk as the ordinary database, you avoid having too much dirty cache building up because the WAL forces the disk to flush more often. So you can trade off some DB filesystem performance here to avoid blocking / IO lag spikes. 3. There's also the question of disk bays. When you have extra disks for OS, for logs, etc. , in some situations you're using up disks that could be used to extend your main database filesystem, particularly when those disks also need to be protected by the appropriate RAID mirrors and RAID hotspares. It can be cheaper to put the logs to SSD than to have 1 extra hdd + its RAID1 mirror + its hotspare + possible shelfspare, plus pay for a bigger chassis to have 3 more disk bays. 4. Finally there's the issue of simplicity. If you get a fast SSD and run OS/logs/DB off a single RAID volume, there's less chance for error when some unlucky person has to do an emergency fix/rebuild later, than if they have to check disk caching policy etc across a range of devices and ensure different parts of the filesystem are mounted in all the right places. Makes documentation easier. Graeme Bell -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planning for Scalability
Hi Roberto, Hardware etc. is a solution; but you have not yet characterised the problem. You should investigate if the events are mostly... - reads - writes - computationally intensive - memory intensive - I/O intensive - network I/O intensive - independent? (e.g. does it matter if you split the database in two?) You should also find out if the current server comfortably supports 3 million events per day or if you already have problems there that need addressed. Whereas if it handles 3 million with plenty of spare I/O, memory, CPU, network bandwidth, then maybe it will handle 5 million without changing anything. Once you've gathered this information (using tools like pg_stat_statements, top, iotop, ... and by thinking about what the tables are doing), look at it and see if the answer is obvious. If not, think about what is confusing for a while, and then write your thoughts and data as a new question to the list. Graeme. On 03 Oct 2014, at 10:55, Roberto Grandi roberto.gra...@trovaprezzi.it wrote: Dear Pg people, I would ask for your help considering this scaling issue. We are planning to move from 3Millions of events/day instance of postgres (8 CPU, 65 gb ram) to 5 millions of items/day. What do you suggest in order to plan this switch? Add separate server? Increase RAM? Use SSD? Any real help will be really precious and appreciated. Roberto -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
The existing cost estimation code effectively assumes that they're perfectly uniformly distributed; which is a good average-case assumption but can be horribly wrong in the worst case. Sorry, just an outsider jumping in with a quick comment. Every year or two the core count goes up. Can/should/does postgres ever attempt two strategies in parallel, in cases where strategy A is generally good but strategy B prevents bad worst case behaviour? Kind of like a Schrödinger's Cat approach to scheduling. What problems would it raise? Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another abort-early plan disaster on 9.3
Thanks for your replies everyone. You can't run two plans and have them both returning rows to the client, That wasn't what I had in mind. I can envisage cases where the worst case behaviour of one plan results in zero rows by the time the alternative plan has generated the complete result, never mind a single row (e.g. anything with LIMIT in it could fall into that category). Maybe it's enough to alleviate the problems caused by planning heuristics known to have bad worst-case performance that is hard to avoid with a single-threaded approach? Providing we're not modifying data in the query, and providing we kill the 'loser' thread when either (the first result / all results) come in, maybe there's value in letting them race and picking the best plan retrospectively. I guess it's going into another topic, but I wonder what % of DBs/queries look like this: - little or no I/O thrash (e.g. tuples mostly in memory already or DB configured to have a relatively low 'random_page_cost') - ordered results, or, the whole result set is being produced at once. - SELECTs only In my own work (national scale GIS) this is what most of our queries query environments look like. Graeme On 30 Sep 2014, at 18:32, Tom Lane t...@sss.pgh.pa.us wrote: Graeme B. Bell g...@skogoglandskap.no writes: Every year or two the core count goes up. Can/should/does postgres ever attempt two strategies in parallel, in cases where strategy A is generally good but strategy B prevents bad worst case behaviour? Kind of like a Schrödinger's Cat approach to scheduling. What problems would it raise? You can't run two plans and have them both returning rows to the client, or performing inserts/updates/deletes as the case may be. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very slow postgreSQL 9.3.4 query
Hi, Two things: - Make sure you are creating a GIST index on your geometry column in postgis. - Try using st_intersects rather than . I've noticed that isn't using indices correctly in some situations e.g. function indices for st_transform'd geo columns. Graeme On 26 Sep 2014, at 18:17, Burgess, Freddie fburg...@radiantblue.com wrote: Workflow description: 1.) User draws a polygon around an area of interest, via UI. 2.) UI responses with how many sensors reside within the area of the polygon. 3.) Hibernate generates the count query detailed in the attachment. Performance data is included in the attachment, via EXPLAIN PLAN, query takes approx 6 minutes to return count to UI. Amount of data processed is also included in the attachment, 185 million row partition. Hardware VM 80GB memory 8 CPU Xeon Linux 2.6.32-431.3.1.el6.x86-64 40TB disk, Database size: 8TB PostgreSQL 9.3.4 with POSTGIS 2.1.1, Red Hat 4.4.7-4, 64 bit streaming replication Postgresql.conf max_connection = 100 shared_buffers = 32GB work_mem = 16MB maintenance_work_mem = 1GB seq_page_cost = 1.0 random_page_cost = 2.0 cpu_tuple_cost = 0.03 effective_cache_size = 48GB From: Graeme B. Bell [g...@skogoglandskap.no] Sent: Friday, September 26, 2014 9:55 AM To: Burgess, Freddie Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Very slow postgreSQL 9.3.4 query A good way to start would be to introduce the query - describe what it is meant to do, give some performance data (your measurements of time taken, amount of data being processed, hardware used etc). Graeme. On 26 Sep 2014, at 15:04, Burgess, Freddie fburg...@radiantblue.com wrote: Help, please can anyone offer suggestions on how to speed this query up. thanks Poor Pref query.txt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3
HT off is common knowledge for better benchmarking result It's wise to use the qualifer 'for better benchmarking results'. It's worth keeping in mind here that a benchmark is not the same as normal production use. For example, where I work we do lots of long-running queries in parallel over a big range of datasets rather than many short-term transactions as fast as possible. Our biggest DB server is also used for GDAL work and R at the same time*. Pretty far from pgbench; not everyone is constrained by locks. I suppose that if your code is basically N copies of the same function, hyper-threading isn't likely to help much because it was introduced to allow different parts of the processor to be used in parallel when you're running hetarogenous code. But if you're hammering just one part of the CPU... well, adding another layer of logical complexity for your CPU to manage probably isn't going to do much good. Should HT be on or off when you're running 64 very mixed types of long-term queries which involve variously either heavy use of real number calculations or e.g. logic/string handling, and different data sets? It's a much more complex question than simply maxing out your pgbench scores. I don't have the data now unfortunately, but I remember seeing a benefit for HT on our 4 core e3 when running GDAL/Postgis work in parallel last year. It's not surprising though; the GDAL calls are almost certainly using different functions of the processor compared to postgres and there should be very little lock contention. In light of this interesting data I'm now leaning towards proposing HT off for our mapservers (which receive short, similar requests over and over), but for the hetaragenous servers, I think I'll keep it on for now. Graeme. * unrelated. There's also huge advantages for us in keeping these different programs running on the same machine since we found we can get much better transfer rates through unix sockets than with TCP over the network. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PGSQL 9.3 - Materialized View - multithreading
On 04 Apr 2014, at 18:29, Nicolas Paris nipari...@gmail.com wrote: Hello, My question is about multiprocess and materialized View. http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way (anderstand 8 cpu cores - 8 refresh process in the same time) Hi Nick, out of DB solution: 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with select and format() if you don't have a list already. 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times. (In BASH): for i in {1..3600} ; do echo echo \select pg_sleep(1+random()::int*10); select $i\ | psql mydb ; done 3600commands 3. Install Gnu Parallel and type: parallel 3600commands 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manually with -j. It will also give you a live progress report if you use --progress. e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null parallel -j 8 --progress 3600commands /dev/null 5. If you want to make debugging easier use the parameter --tag to tag output for each command. Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-) Hope this helps have a nice day, Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PGSQL 9.3 - Materialized View - multithreading
Hi again Nick. Glad it helped. Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can re-use the data from cache. Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and fast. It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That shouldn't be happening here, judging by your description. If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view refresh. Graeme. On 07 Apr 2014, at 14:49, Nicolas Paris nipari...@gmail.com wrote: Hello, Thanks for this clear explanation ! Then I have a sub-question : Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with some differences) Is it faster to : 1) parallel refresh 600 time A, then 600 time B etc, OR 2) parallel refresh 600 time A,B,C,D,E,F I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same table implies concurency and bad performance ? Thanks Nicolas PARIS 2014-04-07 12:29 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no: On 04 Apr 2014, at 18:29, Nicolas Paris nipari...@gmail.com wrote: Hello, My question is about multiprocess and materialized View. http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way (anderstand 8 cpu cores - 8 refresh process in the same time) Hi Nick, out of DB solution: 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with select and format() if you don't have a list already. 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times. (In BASH): for i in {1..3600} ; do echo echo \select pg_sleep(1+random()::int*10); select $i\ | psql mydb ; done 3600commands 3. Install Gnu Parallel and type: parallel 3600commands 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manually with -j. It will also give you a live progress report if you use --progress. e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null parallel -j 8 --progress 3600commands /dev/null 5. If you want to make debugging easier use the parameter --tag to tag output for each command. Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-) Hope this helps have a nice day, Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PGSQL 9.3 - Materialized View - multithreading
- http://wiki.postgresql.org/wiki/Performance_Optimization - run it on the most powerful machine you can find - get some more memory - get a big (512-1TB) SSD drive - avoid recalculating the same things over and over. if your views have many similar elements, then calculate those first into a partial result, then build the final views from the partial result. - make sure your source tables are fully indexed and have good statistics - run all the views once with \timing and keep track of how long they took. Fix the slow ones. G On 07 Apr 2014, at 15:56, Nicolas Paris nipari...@gmail.com wrote: Excellent. Maybe the last sub-question : Those 3600 mat views do have indexes. I guess I will get better performances in dropping indexes first, then refresh, then re-creating indexes. Are there other way to improve performances (like mat views storage parameters), because this routines will be at night, and need to be finished quickly. Thanks Nicolas PARIS 2014-04-07 14:59 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no: Hi again Nick. Glad it helped. Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can re-use the data from cache. Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and fast. It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That shouldn't be happening here, judging by your description. If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view refresh. Graeme. On 07 Apr 2014, at 14:49, Nicolas Paris nipari...@gmail.com wrote: Hello, Thanks for this clear explanation ! Then I have a sub-question : Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with some differences) Is it faster to : 1) parallel refresh 600 time A, then 600 time B etc, OR 2) parallel refresh 600 time A,B,C,D,E,F I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same table implies concurency and bad performance ? Thanks Nicolas PARIS 2014-04-07 12:29 GMT+02:00 Graeme B. Bell g...@skogoglandskap.no: On 04 Apr 2014, at 18:29, Nicolas Paris nipari...@gmail.com wrote: Hello, My question is about multiprocess and materialized View. http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way (anderstand 8 cpu cores - 8 refresh process in the same time) Hi Nick, out of DB solution: 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with select and format() if you don't have a list already. 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times. (In BASH): for i in {1..3600} ; do echo echo \select pg_sleep(1+random()::int*10); select $i\ | psql mydb ; done 3600commands 3. Install Gnu Parallel and type: parallel 3600commands 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manually with -j. It will also give you a live progress report if you use --progress. e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null parallel -j 8 --progress 3600commands /dev/null 5. If you want to make debugging easier use the parameter --tag to tag output for each command. Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-) Hope this helps have a nice day, Graeme. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump vs pg_basebackup
Postgresql rsync backups require the DB to be shutdown during the 'second' rsync. 1. rsync the DB onto the backup filesystem (produces e.g. 95-99.99% consistent DB on the backup filesystem) 2. shut down the DB 3. rsync the shut down DB onto the backup filesystem(synchronises the last few files to make the DB consistent, and is usually very fast) 4. start the DB up again Is there any way to notify postgres to pause transactions (and note that they should be restarted), and flush out write buffers etc, instead of doing a full shutdown? e.g. so that the second rsync call would bring the backup filesystem's representation of the DB into a recoverable state without needing to shutdown the production DB completely. G On 25 Mar 2014, at 16:29, Ilya Kosmodemiansky ilya.kosmodemian...@postgresql-consulting.com wrote: Joshua, On Tue, Mar 25, 2014 at 4:22 PM, Joshua D. Drake j...@commandprompt.com wrote: The advantage is that you can create backups that don't have to be restored, just started. You can also use the differential portions of rsync to do it multiple times a day without much issue. Are you sure, that it is a nice idea on a database with heavy write workload? And also Im not sure, that differential backups using rsync will be recoverable, if you have actually meant that. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc Political Correctness is for cowards. -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance