Mark, I increased the read ahead to 16384 and it doesn't improve performance. My RAID 0 use a stripe size of 256k, the maximum size supported by the controller. Thanks!
Charles On Sat, Jul 15, 2017 at 1:02 AM, Mark Kirkwood < mark.kirkw...@catalyst.net.nz> wrote: > Ah yes - that seems more sensible (but still slower than I would expect > for 5 disks RAID 0). You should be able to get something like 5 * (single > disk speed) i.e about 500MB/s. > > Might be worth increasing device read ahead (more than you have already). > Some of these so-called 'smart' RAID cards need to be hit over the head > before they will perform. E.g: I believe you have it set to 128 - I'd try > 4096 or even 16384 (In the past I've used those settings on some extremely > stupid cards that refused to max out their disks known speeds). > > Also worth investigating is RAID stripe size - for DW work it makes sense > for it to be reasonably big (256K to 1M), which again will help speed is > sequential scans. > > Cheers > > Mark > > P.s I used to work for Greenplum, so this type of problem came up a lot > :-) . The best cards were the LSI and Areca! > > > > On 15/07/17 02:09, Charles Nadeau wrote: > >> Mark, >> >> First I must say that I changed my disks configuration from 4 disks in >> RAID 10 to 5 disks in RAID 0 because I almost ran out of disk space during >> the last ingest of data. >> Here is the result test you asked. It was done with a cold cache: >> >> flows=# \timing >> Timing is on. >> flows=# explain select count(*) from flows; >> QUERY PLAN >> ------------------------------------------------------------ >> ----------------------------------- >> Finalize Aggregate (cost=17214914.09..17214914.09 rows=1 width=8) >> -> Gather (cost=17214914.07..17214914.09 rows=1 width=8) >> Workers Planned: 1 >> -> Partial Aggregate (cost=17213914.07..17213914.07 >> rows=1 width=8) >> -> Parallel Seq Scan on flows >> (cost=0.00..17019464.49 rows=388899162 width=0) >> (5 rows) >> >> Time: 171.835 ms >> flows=# select pg_relation_size('flows'); >> pg_relation_size >> ------------------ >> 129865867264 >> (1 row) >> >> Time: 57.157 ms >> flows=# select count(*) from flows; >> LOG: duration: 625546.522 ms statement: select count(*) from flows; >> count >> ----------- >> 589831190 >> (1 row) >> >> Time: 625546.662 ms >> >> The throughput reported by Postgresql is almost 198MB/s, and the >> throughput as mesured by dstat during the query execution was between 25 >> and 299MB/s. It is much better than what I had before! The i/o wait was >> about 12% all through the query. One thing I noticed is the discrepency >> between the read throughput reported by pg_activity and the one reported by >> dstat: pg_activity always report a value lower than dstat. >> >> Besides the change of disks configuration, here is what contributed the >> most to the improvment of the performance so far: >> >> Using Hugepage >> Increasing effective_io_concurrency to 256 >> Reducing random_page_cost from 22 to 4 >> Reducing min_parallel_relation_size to 512kB to have more workers >> when doing sequential parallel scan of my biggest table >> >> >> Thanks for recomending this test, I now know what the real throughput >> should be! >> >> Charles >> >> On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood < >> mark.kirkw...@catalyst.net.nz <mailto:mark.kirkw...@catalyst.net.nz>> >> wrote: >> >> Hmm - how are you measuring that sequential scan speed of 4MB/s? >> I'd recommend doing a very simple test e.g, here's one on my >> workstation - 13 GB single table on 1 SATA drive - cold cache >> after reboot, sequential scan using Postgres 9.6.2: >> >> bench=# EXPLAIN SELECT count(*) FROM pgbench_accounts; >> QUERY PLAN >> ------------------------------------------------------------ >> ------------------------ >> Aggregate (cost=2889345.00..2889345.01 rows=1 width=8) >> -> Seq Scan on pgbench_accounts (cost=0.00..2639345.00 >> rows=100000000 width=0) >> (2 rows) >> >> >> bench=# SELECT pg_relation_size('pgbench_accounts'); >> pg_relation_size >> ------------------ >> 13429514240 >> (1 row) >> >> bench=# SELECT count(*) FROM pgbench_accounts; >> count >> ----------- >> 100000000 >> (1 row) >> >> Time: 118884.277 ms >> >> >> So doing the math seq read speed is about 110MB/s (i.e 13 GB in >> 120 sec). Sure enough, while I was running the query iostat showed: >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> sda 0.00 0.00 926.00 0.00 114.89 0.00 >> 254.10 1.90 2.03 2.03 0.00 1.08 100.00 >> >> >> So might be useful for us to see something like that from your >> system - note you need to check you really have flushed the cache, >> and that no other apps are using the db. >> >> regards >> >> Mark >> >> >> On 12/07/17 00:46, Charles Nadeau wrote: >> >> After reducing random_page_cost to 4 and testing more, I can >> report that the aggregate read throughput for parallel >> sequential scan is about 90MB/s. However the throughput for >> sequential scan is still around 4MB/s. >> >> >> >> >> >> -- >> Charles Nadeau Ph.D. >> http://charlesnadeau.blogspot.com/ >> > > -- Charles Nadeau Ph.D. http://charlesnadeau.blogspot.com/