Re: [PERFORM] Very poor read performance, query independent

2017-08-19 Thread Mark Kirkwood

Nice!

Pleased that the general idea worked well for you!

I'm also relieved that you did not follow my recommendation exactly - 
I'm been trialling a Samsung 960 Evo (256GB) and Intel 600p (256GB) and 
I've stumbled across the serious disadvantages of (consumer) M.2 drives 
using TLC NAND - terrible sustained write performance! While these guys 
can happily do ~ 2GB/s reads, their write performance is only 'burst 
capable'. They have small SLC NAND 'write caches' that do  ~1GB/s for a 
*limited time* (10-20s) and after that you get ~ 200 MB/s! Ouch - my old 
Crucial 550 can do 350 MB/s sustained writes (so two of them in RAID0 
are doing 700 MB/s for hours).


Bigger capacity drives can do better - but overall I'm not that 
impressed with the current trend of using TLC NAND.


regards

Mark


On 21/07/17 00:50, Charles Nadeau wrote:

Mark,

I received yesterday a second server having 16 drives bays. Just for a 
quick trial, I used 2 old 60GB SSD (a Kingston V300 and a ADATA SP900) 
to build a RAID0. To my surprise, my very pecky RAID controller (HP 
P410i) recognised them without a fuss (although as SATAII drives at 
3Gb/s. A quick fio benchmark gives me 22000 random 4k read IOPS, more 
than my 5 146GB 10k SAS disks in RAID0). I moved my most frequently 
used index to this array and will try to do some benchmarks.
Knowing that SSDs based on SandForce-2281 controller are recognised by 
my server, I may buy a pair of bigger/newer ones to put my tables on.


Thanks!

Charles

On Sat, Jul 15, 2017 at 1:57 AM, Mark Kirkwood 
> 
wrote:


Thinking about this a bit more - if somewhat more blazing
performance is needed, then this could be achieved via losing the
RAID card and spinning disks altogether and buying 1 of the NVME
or SATA solid state products: e.g

- Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds
and 200K IOPS)

- Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)


The Samsung needs an M.2 port on the mobo (but most should have
'em - and if not PCIe X4 adapter cards are quite cheap). The Intel
is a bit more expensive compared to the Samsung, and is slower but
has a longer lifetime. However for your workload the Sammy is
probably fine.

regards

Mark

On 15/07/17 11:09, Mark Kirkwood wrote:

Ah yes - that seems more sensible (but still slower than I
would expect for 5 disks RAID 0).




-- 
Sent via pgsql-performance mailing list

(pgsql-performance@postgresql.org
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance





--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-25 Thread Charles Nadeau
All,

Here is a list of what I did based of the suggestions made after my initial
post:
*Reduce max_parallel_workers to 4: Values higher makes the workers wait for
data as the RAID0 array can't deliver high enough IOPS.
*Reduce random_page_cost to 1: Forcing the use of index makes queries
faster despite low random throughput.
*Increase shared_buffer to 66GB and effective_cache_size to 53GB: With the
new server having 144GB of RAM, increasing shared_buffer allows Postgresql
to keep a lot of data in memory reducing the need to go to disk.
*Reduce min_parallel_relation_size to 512kB to have more workers when doing
sequential parallel scan
*Increased the /sys/block/sd[ac]/queue/read_ahead_kb to 16384 for my arrays
using HDD
*Reused old SSDs (that are compatible with my RAID controller, to my
surprise) to put my most used index and tables.

Thanks to everybody who made suggestions. I now know more about Postgresql
tuning.

Charles

On Mon, Jul 10, 2017 at 4:03 PM, Charles Nadeau 
wrote:

> I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
> Hardware is:
>
> *2x Intel Xeon E5550
>
> *72GB RAM
>
> *Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
> read/20% write) for Postgresql data only:
>
> Logical Drive: 3
>
> Size: 273.4 GB
>
> Fault Tolerance: 1+0
>
> Heads: 255
>
> Sectors Per Track: 32
>
> Cylinders: 65535
>
> Strip Size: 128 KB
>
> Full Stripe Size: 256 KB
>
> Status: OK
>
> Caching: Enabled
>
> Unique Identifier: 600508B1001037383941424344450A00
>
> Disk Name: /dev/sdc
>
> Mount Points: /mnt/data 273.4 GB
>
> OS Status: LOCKED
>
> Logical Drive Label: A00A194750123456789ABCDE516F
>
> Mirror Group 0:
>
> physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)
>
> physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, OK)
>
> Mirror Group 1:
>
> physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
>
> physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)
>
> Drive Type: Data
>
> Formatted with ext4 with: sudo mkfs.ext4 -E stride=32,stripe_width=64 -v
> /dev/sdc1.
>
> Mounted in /etc/fstab with this line: 
> "UUID=99fef4ae-51dc-4365-9210-0b153b1cbbd0
> /mnt/data ext4 rw,nodiratime,user_xattr,noatime,nobarrier,errors=remount-ro
> 0 1"
>
> Postgresql is the only application running on this server.
>
>
> Postgresql is used as a mini data warehouse to generate reports and do
> statistical analysis. It is used by at most 2 users and fresh data is added
> every 10 days. The database has 16 tables: one is 224GB big and the rest
> are between 16kB and 470MB big.
>
>
> My configuration is:
>
>
> name | current_setting | source
>
> -+--
> --+--
>
> application_name | psql | client
>
> autovacuum_vacuum_scale_factor | 0 | configuration file
>
> autovacuum_vacuum_threshold | 2000 | configuration file
>
> checkpoint_completion_target | 0.9 | configuration file
>
> checkpoint_timeout | 30min | configuration file
>
> client_encoding | UTF8 | client
>
> client_min_messages | log | configuration file
>
> cluster_name | 9.6/main | configuration file
>
> cpu_index_tuple_cost | 0.001 | configuration file
>
> cpu_operator_cost | 0.0005 | configuration file
>
> cpu_tuple_cost | 0.003 | configuration file
>
> DateStyle | ISO, YMD | configuration file
>
> default_statistics_target | 100 | configuration file
>
> default_text_search_config | pg_catalog.english | configuration file
>
> dynamic_shared_memory_type | posix | configuration file
>
> effective_cache_size | 22GB | configuration file
>
> effective_io_concurrency | 4 | configuration file
>
> external_pid_file | /var/run/postgresql/9.6-main.pid | configuration file
>
> lc_messages | C | configuration file
>
> lc_monetary | en_CA.UTF-8 | configuration file
>
> lc_numeric | en_CA.UTF-8 | configuration file
>
> lc_time | en_CA.UTF-8 | configuration file
>
> listen_addresses | * | configuration file
>
> lock_timeout | 100s | configuration file
>
> log_autovacuum_min_duration | 0 | configuration file
>
> log_checkpoints | on | configuration file
>
> log_connections | on | configuration file
>
> log_destination | csvlog | configuration file
>
> log_directory | /mnt/bigzilla/data/toburn/hp/postgresql/pg_log |
> configuration file
>
> log_disconnections | on | configuration file
>
> log_error_verbosity | default | configuration file
>
> log_file_mode | 0600 | configuration file
>
> log_filename | postgresql-%Y-%m-%d_%H%M%S.log | configuration file
>
> log_line_prefix | user=%u,db=%d,app=%aclient=%h | configuration file
>
> log_lock_waits | on | configuration file
>
> log_min_duration_statement | 0 | configuration file
>
> log_min_error_statement | debug1 | configuration file
>
> log_min_messages | debug1 | configuration file
>
> log_rotation_size | 1GB | configuration file
>
> log_temp_files | 0 | configuration file
>
> log_timezone | localtime | configuration file
>
> logging_collector | on | 

Re: [PERFORM] Very poor read performance, query independent

2017-07-20 Thread Charles Nadeau
Mark,

I received yesterday a second server having 16 drives bays. Just for a
quick trial, I used 2 old 60GB SSD (a Kingston V300 and a ADATA SP900) to
build a RAID0. To my surprise, my very pecky RAID controller (HP P410i)
recognised them without a fuss (although as SATAII drives at 3Gb/s. A quick
fio benchmark gives me 22000 random 4k read IOPS, more than my 5 146GB 10k
SAS disks in RAID0). I moved my most frequently used index to this array
and will try to do some benchmarks.
Knowing that SSDs based on SandForce-2281 controller are recognised by my
server, I may buy a pair of bigger/newer ones to put my tables on.

Thanks!

Charles

On Sat, Jul 15, 2017 at 1:57 AM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> Thinking about this a bit more - if somewhat more blazing performance is
> needed, then this could be achieved via losing the RAID card and spinning
> disks altogether and buying 1 of the NVME or SATA solid state products: e.g
>
> - Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds and 200K
> IOPS)
>
> - Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)
>
>
> The Samsung needs an M.2 port on the mobo (but most should have 'em - and
> if not PCIe X4 adapter cards are quite cheap). The Intel is a bit more
> expensive compared to the Samsung, and is slower but has a longer lifetime.
> However for your workload the Sammy is probably fine.
>
> regards
>
> Mark
>
> On 15/07/17 11:09, Mark Kirkwood wrote:
>
>> Ah yes - that seems more sensible (but still slower than I would expect
>> for 5 disks RAID 0).
>>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-19 Thread Charles Nadeau
Justin,

Thanks for the extensive reading list, very educative.

After reading
https://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
I was thinking that it could be a NUMA/THP-related problem.
Turning off THP solved the "swap storm" problem. Some queries are even 40%
faster with THP off.
Then also turning off KSM improved performance by another 5%
I was seriously worried about this issue as we received today another
server with 144GB of RAM.

I will try to post a little summary of all the suggestion I received via
this thread later this week/early next week.

Thanks!

Charles

On Tue, Jul 18, 2017 at 8:01 PM, Justin Pryzby  wrote:

> On Tue, Jul 18, 2017 at 02:13:58PM -0300, Claudio Freire wrote:
> > On Tue, Jul 18, 2017 at 1:01 PM, Claudio Freire 
> wrote:
> > > On Tue, Jul 18, 2017 at 6:20 AM, Charles Nadeau
> > >  wrote:
> > >> Claudio,
> > >>
> > >> At one moment
> > >> during the query, there is a write storm to the swap drive (a bit
> like this
> > >> case:
> > >> https://www.postgresql.org/message-id/AANLkTi%
> 3Diw4fC2RgTxhw0aGpyXANhOT%3DXBnjLU1_v6PdA%40mail.gmail.com).
> > >> I can hardly explain it as there is plenty of memory on this server.
> > >
> > > That sounds a lot like NUMA zone_reclaim issues:
> > >
> > > https://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com
> >
> > I realize you have zone_reclaim_mode set to 0. Still, the symptoms are
> > eerily similar.
>
> Did you look at disabling KSM and/or THP ?
>
> sudo sh -c 'echo 2 >/sys/kernel/mm/ksm/run'
>
> https://www.postgresql.org/message-id/20170524155855.
> GH31097%40telsasoft.com
> https://www.postgresql.org/message-id/CANQNgOrD02f8mR3Y8Pi=
> zfsol14rqnqa8hwz1r4rsndlr1b...@mail.gmail.com
> https://www.postgresql.org/message-id/CAHyXU0y9hviyKWvQZxX5UWfH9M2LY
> vwvAOPQ_DUPva2b71t12g%40mail.gmail.com
> https://www.postgresql.org/message-id/20130716195834.
> 8fe5c79249cb2ff0d4270...@yahoo.es
> https://www.postgresql.org/message-id/CAE_gQfW3dBiELcOppYN6v%3D8%2B%
> 2BpEeywD7iXGw-OT3doB8SXO4_A%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/1436268563235-
> 5856914.post%40n5.nabble.com#1436268563235-5856914.p...@n5.nabble.com
> https://www.postgresql.org/message-id/CAL_0b1tJOZCx3Lo3Eve1RqGaT%2BJJ_
> q7w4pkj87wfwwxbtug...@mail.gmail.com
> https://www.postgresql.org/message-id/556e2068.7070...@vuole.me
> https://www.postgresql.org/message-id/1415981309.90631.
> YahooMailNeo%40web133205.mail.ir2.yahoo.com
> https://www.postgresql.org/message-id/CAHyXU0yXYpCXN4%3D81ZDRQu-
> oGzrcq2qNAXDpyz4oiQPPAGk4ew%40mail.gmail.com
> https://www.pythian.com/blog/performance-tuning-hugepages-in-linux/
> http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-
> hadoop-workloads/
>
> Justin
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-18 Thread Scott Marlowe
On Tue, Jul 18, 2017 at 3:20 AM, Charles Nadeau
 wrote:
> Claudio,
>
> Find attached the iostat measured while redoing the query above
> (iostat1.txt). sda holds my temp directory (noop i/o scheduler), sdb the
> swap partition (cfq i/o scheduler) only and sdc (5 disks RAID0, noop i/o
> scheduler) holds the data. I didn't pay attention to the load caused by 12
> parallel scans as I thought the RAID card would be smart enough to
> re-arrange the read requests optimally regardless of the load. At one moment
> during the query, there is a write storm to the swap drive (a bit like this
> case:
> https://www.postgresql.org/message-id/AANLkTi%3Diw4fC2RgTxhw0aGpyXANhOT%3DXBnjLU1_v6PdA%40mail.gmail.com).

My experience from that case (and  few more) has led me to believe
that Linux database servers with plenty of memory should have their
swaps turned off. The Linux kernel works hard to swap out little used
memory to make more space for caching active data.

Problem is that whatever decides to swap stuff out gets stupid when
presented with 512GB RAM and starts swapping out things like sys v
shared_buffers etc.

Here's the thing, either your memory is big enough to buffer your
whole data set, so nothing should get swapped out to make room for
caching.

OR your dataset is much bigger than memory. In which case, making more
room gets very little if it comes at the cost of waiting for stuff you
need to get read back in.

Linux servers should also have zone reclaim turned off, and THP disabled.

Try running "sudo swapoff -a" and see if it gets rid of your swap storms.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-18 Thread Justin Pryzby
On Tue, Jul 18, 2017 at 02:13:58PM -0300, Claudio Freire wrote:
> On Tue, Jul 18, 2017 at 1:01 PM, Claudio Freire  
> wrote:
> > On Tue, Jul 18, 2017 at 6:20 AM, Charles Nadeau
> >  wrote:
> >> Claudio,
> >>
> >> At one moment
> >> during the query, there is a write storm to the swap drive (a bit like this
> >> case:
> >> https://www.postgresql.org/message-id/AANLkTi%3Diw4fC2RgTxhw0aGpyXANhOT%3DXBnjLU1_v6PdA%40mail.gmail.com).
> >> I can hardly explain it as there is plenty of memory on this server.
> >
> > That sounds a lot like NUMA zone_reclaim issues:
> >
> > https://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com
> 
> I realize you have zone_reclaim_mode set to 0. Still, the symptoms are
> eerily similar.

Did you look at disabling KSM and/or THP ?

sudo sh -c 'echo 2 >/sys/kernel/mm/ksm/run'

https://www.postgresql.org/message-id/20170524155855.GH31097%40telsasoft.com
https://www.postgresql.org/message-id/CANQNgOrD02f8mR3Y8Pi=zfsol14rqnqa8hwz1r4rsndlr1b...@mail.gmail.com
https://www.postgresql.org/message-id/CAHyXU0y9hviyKWvQZxX5UWfH9M2LYvwvAOPQ_DUPva2b71t12g%40mail.gmail.com
https://www.postgresql.org/message-id/20130716195834.8fe5c79249cb2ff0d4270...@yahoo.es
https://www.postgresql.org/message-id/CAE_gQfW3dBiELcOppYN6v%3D8%2B%2BpEeywD7iXGw-OT3doB8SXO4_A%40mail.gmail.com
https://www.postgresql.org/message-id/flat/1436268563235-5856914.post%40n5.nabble.com#1436268563235-5856914.p...@n5.nabble.com
https://www.postgresql.org/message-id/cal_0b1tjozcx3lo3eve1rqgat%2bjj_q7w4pkj87wfwwxbtug...@mail.gmail.com
https://www.postgresql.org/message-id/556e2068.7070...@vuole.me
https://www.postgresql.org/message-id/1415981309.90631.YahooMailNeo%40web133205.mail.ir2.yahoo.com
https://www.postgresql.org/message-id/CAHyXU0yXYpCXN4%3D81ZDRQu-oGzrcq2qNAXDpyz4oiQPPAGk4ew%40mail.gmail.com
https://www.pythian.com/blog/performance-tuning-hugepages-in-linux/
http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-18 Thread Claudio Freire
On Tue, Jul 18, 2017 at 1:01 PM, Claudio Freire  wrote:
> On Tue, Jul 18, 2017 at 6:20 AM, Charles Nadeau
>  wrote:
>> Claudio,
>>
>> At one moment
>> during the query, there is a write storm to the swap drive (a bit like this
>> case:
>> https://www.postgresql.org/message-id/AANLkTi%3Diw4fC2RgTxhw0aGpyXANhOT%3DXBnjLU1_v6PdA%40mail.gmail.com).
>> I can hardly explain it as there is plenty of memory on this server.
>
> That sounds a lot like NUMA zone_reclaim issues:
>
> https://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com

I realize you have zone_reclaim_mode set to 0. Still, the symptoms are
eerily similar.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-18 Thread Claudio Freire
On Tue, Jul 18, 2017 at 6:20 AM, Charles Nadeau
 wrote:
> Claudio,
>
> At one moment
> during the query, there is a write storm to the swap drive (a bit like this
> case:
> https://www.postgresql.org/message-id/AANLkTi%3Diw4fC2RgTxhw0aGpyXANhOT%3DXBnjLU1_v6PdA%40mail.gmail.com).
> I can hardly explain it as there is plenty of memory on this server.

That sounds a lot like NUMA zone_reclaim issues:

https://www.postgresql.org/message-id/500616cb.3070...@2ndquadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-18 Thread Charles Nadeau
Claudio,

Find attached the iostat measured while redoing the query above
(iostat1.txt). sda holds my temp directory (noop i/o scheduler), sdb the
swap partition (cfq i/o scheduler) only and sdc (5 disks RAID0, noop i/o
scheduler) holds the data. I didn't pay attention to the load caused by 12
parallel scans as I thought the RAID card would be smart enough to
re-arrange the read requests optimally regardless of the load. At one
moment during the query, there is a write storm to the swap drive (a bit
like this case:
https://www.postgresql.org/message-id/AANLkTi%3Diw4fC2RgTxhw0aGpyXANhOT%3DXBnjLU1_v6PdA%40mail.gmail.com).
I can hardly explain it as there is plenty of memory on this server. The
execution time of the query was 4801.1s (about 1h20min).
I reduced max_parallel_workers_per_gather to 2 and max_parallel_workers to
3, restarted postgresql then ran the query again while running iostat again
(iostat2.txt): The query ran much faster, 1992.8s (about 33min) instead of
4801.1s (about 1h20min) and the swap storm is gone! You were right about
the max_parallel_workers_per_gather!!
For the last test, I changed the scheduler on sdc to deadline (iostat3.txt)
keeping max_parallel_workers_per_gather=2 and max_parallel_workers=3 then
restarted postgresql. The execution time is almost the same: 1938.7s vs
1992.8s for the noop scheduler.

Thanks a lot for the suggestion, I'll keep my number of worker low to make
sure I maximize my array usage.

Charles

On Mon, Jul 17, 2017 at 10:56 PM, Claudio Freire 
wrote:

> On Fri, Jul 14, 2017 at 12:34 PM, Charles Nadeau
>  wrote:
> >  Workers Planned: 12
> >  Workers Launched: 12
> >  Buffers: shared hit=728798037
> read=82974833
> >  ->  Hash Semi Join
> > (cost=30059688.07..47951761.31 rows=220376 width=20) (actual
> > time=1268845.181..2007864.725 rows=7057357 loops=13)
> >Hash Cond: (flows.dstaddr =
> > flows_1.dstaddr)
> >Buffers: shared hit=728795193
> > read=82974833
> >->  Nested Loop
> > (cost=0.03..17891246.86 rows=220376 width=20) (actual
> time=0.207..723790.283
> > rows=37910370 loops=13)
> >  Buffers: shared
> hit=590692229
> > read=14991777
> >  ->  Parallel Seq Scan on
> flows
> > (cost=0.00..16018049.14 rows=55094048 width=20) (actual
> > time=0.152..566179.117 rows=45371630 loops=13)
> >Buffers: shared
> > hit=860990 read=14991777
> >  ->  Index Only Scan using
> > mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
> > (actual time=0.002..0.002 rows=1 loops=589831190)
> >Index Cond: (ipaddr
> >>=
> > (flows.srcaddr)::ip4r)
> >Heap Fetches: 0
> >Buffers: shared
> > hit=589831203
>
> 12 workers on a parallel sequential scan on a RAID-10 volume of
> rotating disks may not be a good idea.
>
> Have you measured average request size and average wait times with iostat?
>
> Run "iostat -x -m -d 60" while running the query and copy a few
> relevant lines (or attach the whole thing). I suspect 12 parallel
> sequential scans are degrading your array's performance to random I/O
> performance, and that explains the 10MB/s very well (a rotating disk
> will give you about 3-4MB/s at random I/O, and you've got 2 mirrors on
> that array).
>
> You could try setting the max_parallel_workers_per_gather to 2, which
> should be the optimum allocation for your I/O layout.
>
> You might also want to test switching to the deadline scheduler. While
> the controller may get more aggregate thoughput rearranging your I/O
> requests, high I/O latency will severly reduce postgres' ability to
> saturate the I/O system itself, and deadlines tends to minimize
> latency. I've had good results in the past using deadline, but take
> this suggestion with a grain of salt, YMMV.
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/
charles@hpdl380g6:~$ iostat -x -m -d 60
Linux 4.4.0-86-generic (hpdl380g6)  2017-07-18  _x86_64_(16 CPU)

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   1.4515.243.854.91 0.68 0.47   269.64 
3.15  351.06  501.22  233.27   4.84   4.24
sdb   1.52   635.12   67.38   10.80 0.27 2.5473.58 
7.66   97.90   28.43  531.24   2.26  17.65
sdc   0.08 0.01   47.900.0115.62 0.00   667.83 
0.97   20.30   20.306.89   3.05  14.62

Device: 

Re: [PERFORM] Very poor read performance, query independent

2017-07-17 Thread Claudio Freire
On Fri, Jul 14, 2017 at 12:34 PM, Charles Nadeau
 wrote:
>  Workers Planned: 12
>  Workers Launched: 12
>  Buffers: shared hit=728798037 read=82974833
>  ->  Hash Semi Join
> (cost=30059688.07..47951761.31 rows=220376 width=20) (actual
> time=1268845.181..2007864.725 rows=7057357 loops=13)
>Hash Cond: (flows.dstaddr =
> flows_1.dstaddr)
>Buffers: shared hit=728795193
> read=82974833
>->  Nested Loop
> (cost=0.03..17891246.86 rows=220376 width=20) (actual time=0.207..723790.283
> rows=37910370 loops=13)
>  Buffers: shared hit=590692229
> read=14991777
>  ->  Parallel Seq Scan on flows
> (cost=0.00..16018049.14 rows=55094048 width=20) (actual
> time=0.152..566179.117 rows=45371630 loops=13)
>Buffers: shared
> hit=860990 read=14991777
>  ->  Index Only Scan using
> mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
> (actual time=0.002..0.002 rows=1 loops=589831190)
>Index Cond: (ipaddr >>=
> (flows.srcaddr)::ip4r)
>Heap Fetches: 0
>Buffers: shared
> hit=589831203

12 workers on a parallel sequential scan on a RAID-10 volume of
rotating disks may not be a good idea.

Have you measured average request size and average wait times with iostat?

Run "iostat -x -m -d 60" while running the query and copy a few
relevant lines (or attach the whole thing). I suspect 12 parallel
sequential scans are degrading your array's performance to random I/O
performance, and that explains the 10MB/s very well (a rotating disk
will give you about 3-4MB/s at random I/O, and you've got 2 mirrors on
that array).

You could try setting the max_parallel_workers_per_gather to 2, which
should be the optimum allocation for your I/O layout.

You might also want to test switching to the deadline scheduler. While
the controller may get more aggregate thoughput rearranging your I/O
requests, high I/O latency will severly reduce postgres' ability to
saturate the I/O system itself, and deadlines tends to minimize
latency. I've had good results in the past using deadline, but take
this suggestion with a grain of salt, YMMV.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-17 Thread Charles Nadeau
Igor,

The 1st clause of the where statement won't select addresses the same way
as the one I wrote using the extension for IPv6 and IPv6 data types.

flowstimestamp is a view:
flows=# \d+ flowstimestamp
View "public.flowstimestamp"
   Column|   Type   | Modifiers | Storage | Description
-+--+---+-+-
 flow_id | bigint   |   | plain   |
 unix_secs   | bigint   |   | plain   |
 unix_nsecs  | bigint   |   | plain   |
 sysuptime   | bigint   |   | plain   |
 exaddr  | ip4  |   | plain   |
 dpkts   | integer  |   | plain   |
 doctets | bigint   |   | plain   |
 first   | bigint   |   | plain   |
 last| bigint   |   | plain   |
 engine_type | smallint |   | plain   |
 engine_id   | smallint |   | plain   |
 srcaddr | ip4  |   | plain   |
 dstaddr | ip4  |   | plain   |
 nexthop | ip4  |   | plain   |
 input   | integer  |   | plain   |
 output  | integer  |   | plain   |
 srcport | integer  |   | plain   |
 dstport | integer  |   | plain   |
 prot| smallint |   | plain   |
 tos | smallint |   | plain   |
 tcp_flags   | smallint |   | plain   |
 src_mask| smallint |   | plain   |
 dst_mask| smallint |   | plain   |
 src_as  | integer  |   | plain   |
 dst_as  | integer  |   | plain   |
 timestamp   | timestamp with time zone |   | plain   |
View definition:
 SELECT flows.flow_id,
flows.unix_secs,
flows.unix_nsecs,
flows.sysuptime,
flows.exaddr,
flows.dpkts,
flows.doctets,
flows.first,
flows.last,
flows.engine_type,
flows.engine_id,
flows.srcaddr,
flows.dstaddr,
flows.nexthop,
flows.input,
flows.output,
flows.srcport,
flows.dstport,
flows.prot,
flows.tos,
flows.tcp_flags,
flows.src_mask,
flows.dst_mask,
flows.src_as,
flows.dst_as,
to_timestamp((flows.unix_secs + flows.unix_nsecs / 10)::double
precision) AS "timestamp"
   FROM flows;

And it can use the indexes of flows:
Indexes:
"flows_pkey" PRIMARY KEY, btree (flow_id)
"flows_dstaddr_dstport" btree (dstaddr, dstport)
"flows_srcaddr_dstaddr_idx" btree (srcaddr, dstaddr)
"flows_srcaddr_srcport" btree (srcaddr, srcport)
"flows_srcport_dstport_idx" btree (srcport, dstport)

Thanks!

Charles

On Fri, Jul 14, 2017 at 10:18 PM, Igor Neyman <iney...@perceptron.com>
wrote:

>
>
>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] *On Behalf Of *Igor Neyman
> *Sent:* Friday, July 14, 2017 3:13 PM
> *To:* Charles Nadeau <charles.nad...@gmail.com>
>
> *Cc:* Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> *From:* Charles Nadeau [mailto:charles.nad...@gmail.com
> <charles.nad...@gmail.com>]
> *Sent:* Friday, July 14, 2017 11:35 AM
> *To:* Igor Neyman <iney...@perceptron.com>
> *Cc:* Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> Igor,
>
>
>
> Initially temp_buffer was left to its default value (8MB). Watching the
> content of the directory that stores the temporary files, I found that I
> need at most 21GB of temporary files space. Should I set temp_buffer to
> 21GB?
>
> Here is the explain you requested with work_mem set to 6GB:
>
> flows=# set work_mem='6GB';
>
> SET
>
> flows=# explain (analyze, buffers) SELECT DISTINCT
>
>srcaddr,
>
>dstaddr,
>
>dstport,
>
>COUNT(*) AS conversation,
>
>SUM(doctets) / 1024 / 1024 AS mbytes
>
> FROM
>
>flowscompact,
>
>mynetworks
>
> WHERE
>
>mynetworks.ipaddr >>= flowscompact.srcaddr
>
>AND dstaddr IN
>
>(
>
>   SELECT
>
>  dstaddr
>
>   FROM
>
>  dstexterne
>
>)
>
> GROUP BY
>
>srcaddr,
>
>dstaddr,
>
>dstport
>
> ORD

Re: [PERFORM] Very poor read performance, query independent

2017-07-16 Thread Charles Nadeau
Scott,

The temp tablespace is on a disk of his own.
Thanks!

Charles

On Sat, Jul 15, 2017 at 7:58 PM, Scott Marlowe 
wrote:

> On Sat, Jul 15, 2017 at 11:53 AM, Charles Nadeau
>  wrote:
> > Mark,
> >
> > I increased the read ahead to 16384 and it doesn't improve performance.
> My
> > RAID 0 use a stripe size of 256k, the maximum size supported by the
> > controller.
>
> Are your queries still spilling to disk for sorts? If this is the
> case, and they're just too big to fit in memory, then you need to move
> your temp space, where sorts happen, onto another disk array that
> isn't your poor overworked raid-10 array. Contention between sorts and
> reads can kill performance quick, esp on spinning rust.
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-16 Thread Charles Nadeau
Igor,

I set the work_mem to 12GB, restarted postrgresql, repeat the same "explain
(analyze, buffers)..." as above and the read throughput was very low, at
most 10MB/s. All the sorting operation are now done in memory.
I lowered the work_mem back to 6GB, restarted postrgresql, repeat the same
"explain (analyze, buffers)..." as above and the read throughput was very
low, at most 10MB/s. The 1st sorting operation is still done in memory, the
second one to disk. I think I need about 4GB to do all sort in memory.
One thing I remember from Friday's "explain (analyze, buffers)...". I set
temp_buffer and work_mem to 6GB as I read both your message one after the
other. So I decided to try again: I then set work_mem=6GB,
temp_buffers=6GB, restarted postrgresql, repeat the same "explain (analyze,
buffers)..." as above and the read throughput was very low again, at most
10MB/s. The 1st sorting operation is still done in memory, the second one
to disk.
For the last test, I brought back work_mem and temp_buffer to their
original value. The read throughput is still low.
In all cases, about 20 minutes after the query starts, it start writing to
disk furiously. Here are the peak values as reported by dstat:
work_mem=12GB, temp_buffers=8MB: peak of 393MB/s
work_mem=6GB, temp_buffers=8MB: peak of 579MB/s
work_mem=6GB, temp_buffers=6GB: peak of 418MB/s
work_mem=1468006kB and temp_buffers=8MB, peak of 61MB/s
Also, at peak write, the server alomost ran of memory: the cache almost
goes to 0 and it starts swapping.

This query is a bit extreme in terms of sorting. Maybe I should try to
benchmark while counting all the records of my biggest table like Mark
Kirkwood suggested. I'll do some more tests and post the results back to
the mailing list.
Thanks!

Charles

On Fri, Jul 14, 2017 at 9:13 PM, Igor Neyman <iney...@perceptron.com> wrote:

> *From:* Charles Nadeau [mailto:charles.nad...@gmail.com]
> *Sent:* Friday, July 14, 2017 11:35 AM
> *To:* Igor Neyman <iney...@perceptron.com>
> *Cc:* Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> Igor,
>
>
>
> Initially temp_buffer was left to its default value (8MB). Watching the
> content of the directory that stores the temporary files, I found that I
> need at most 21GB of temporary files space. Should I set temp_buffer to
> 21GB?
>
> Here is the explain you requested with work_mem set to 6GB:
>
> flows=# set work_mem='6GB';
>
> SET
>
> flows=# explain (analyze, buffers) SELECT DISTINCT
>
>srcaddr,
>
>dstaddr,
>
>dstport,
>
>COUNT(*) AS conversation,
>
>SUM(doctets) / 1024 / 1024 AS mbytes
>
> FROM
>
>flowscompact,
>
>mynetworks
>
> WHERE
>
>mynetworks.ipaddr >>= flowscompact.srcaddr
>
>AND dstaddr IN
>
>(
>
>   SELECT
>
>  dstaddr
>
>   FROM
>
>  dstexterne
>
>)
>
> GROUP BY
>
>srcaddr,
>
>dstaddr,
>
>dstport
>
> ORDER BY
>
>mbytes DESC LIMIT 50;
>
>
>  QUERY PLAN
>
>
> 
> 
> 
>
>  Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual
> time=2227678.196..2227678.223 rows=50 loops=1)
>
>Buffers: shared hit=728798038 read=82974833, temp read=381154
> written=381154
>
>->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52)
> (actual time=2227678.194..2227678.217 rows=50 loops=1)
>
>  Buffers: shared hit=728798038 read=82974833, temp read=381154
> written=381154
>
>  ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52)
> (actual time=2227678.192..2227678.202 rows=50 loops=1)
>
>Sort Key: (((sum(flows.doctets) / '1024'::numeric) /
> '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport,
> (count(*))
>
>Sort Method: quicksort  Memory: 654395kB
>
>Buffers: shared hit=728798038 read=82974833, temp
> read=381154 written=381154
>
>->  GroupAggregate  (cost=48059426.65..48079260.50
> rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671
> loops=1)
>
>  Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
>
>  Buffers: shared hit=728798038 read=82974833, temp
> read=381154 written=381154
>
>  ->  Sort  (cost=48059426.65..48060748.90
> rows=2644514 width=20) (actual time=2167896.815..218

Re: [PERFORM] Very poor read performance, query independent

2017-07-15 Thread Mark Kirkwood
Right, that is a bit of a show stopper for those SSD (the Intel needs 
SATA 6Gb/s and the Sammy's need PCIe 3.0 to perform to their rated specs).


regards

Mark


On 16/07/17 04:12, Charles Nadeau wrote:

Mark,

The server is a . It doesn't really work with SATA drives. And when 
you find one that is compatible, it is only used at 3Gb/s with a 
maximum of 5 IOPS (a well know caracteristic of the HP P410i SAS 
RAID controller). I am looking at getting a Kingston Digital HyperX 
Predator that I could use in one of the PCIe 2.0 x4 slot. However I am 
worried about the "thermal runaway", i.e. when the server can't get a 
temperature reading from a PCIe card, it spins the fans at full speed 
to protect the server against high temperature. The machine being next 
to my desk I worry about the deafening noise it will create.

Thanks!

Chales

On Sat, Jul 15, 2017 at 1:57 AM, Mark Kirkwood 
> 
wrote:


Thinking about this a bit more - if somewhat more blazing
performance is needed, then this could be achieved via losing the
RAID card and spinning disks altogether and buying 1 of the NVME
or SATA solid state products: e.g

- Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds
and 200K IOPS)

- Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)


The Samsung needs an M.2 port on the mobo (but most should have
'em - and if not PCIe X4 adapter cards are quite cheap). The Intel
is a bit more expensive compared to the Samsung, and is slower but
has a longer lifetime. However for your workload the Sammy is
probably fine.

regards

Mark

On 15/07/17 11:09, Mark Kirkwood wrote:

Ah yes - that seems more sensible (but still slower than I
would expect for 5 disks RAID 0).




-- 
Sent via pgsql-performance mailing list

(pgsql-performance@postgresql.org
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance





--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-15 Thread Scott Marlowe
On Sat, Jul 15, 2017 at 11:53 AM, Charles Nadeau
 wrote:
> Mark,
>
> I increased the read ahead to 16384 and it doesn't improve performance. My
> RAID 0 use a stripe size of 256k, the maximum size supported by the
> controller.

Are your queries still spilling to disk for sorts? If this is the
case, and they're just too big to fit in memory, then you need to move
your temp space, where sorts happen, onto another disk array that
isn't your poor overworked raid-10 array. Contention between sorts and
reads can kill performance quick, esp on spinning rust.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-15 Thread Charles Nadeau
Mark,

I increased the read ahead to 16384 and it doesn't improve performance. My
RAID 0 use a stripe size of 256k, the maximum size supported by the
controller.
Thanks!

Charles

On Sat, Jul 15, 2017 at 1:02 AM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> Ah yes - that seems more sensible (but still slower than I would expect
> for 5 disks RAID 0). You should be able to get something like 5 * (single
> disk speed) i.e about 500MB/s.
>
> Might be worth increasing device read ahead (more than you have already).
> Some of these so-called 'smart' RAID cards need to be hit over the head
> before they will perform. E.g: I believe you have it set to 128 - I'd try
> 4096 or even 16384 (In the past I've used those settings on some extremely
> stupid cards that refused to max out their disks known speeds).
>
> Also worth investigating is RAID stripe size - for DW work it makes sense
> for it to be reasonably big (256K to 1M), which again will help speed is
> sequential scans.
>
> Cheers
>
> Mark
>
> P.s I used to work for Greenplum, so this type of problem came up a lot
> :-) . The best cards were the LSI and Areca!
>
>
>
> On 15/07/17 02:09, Charles Nadeau wrote:
>
>> Mark,
>>
>> First I must say that I changed my disks configuration from 4 disks in
>> RAID 10 to 5 disks in RAID 0 because I almost ran out of disk space during
>> the last ingest of data.
>> Here is the result test you asked. It was done with a cold cache:
>>
>> flows=# \timing
>> Timing is on.
>> flows=# explain select count(*) from flows;
>>   QUERY PLAN
>> 
>> ---
>>  Finalize Aggregate  (cost=17214914.09..17214914.09 rows=1 width=8)
>>->  Gather  (cost=17214914.07..17214914.09 rows=1 width=8)
>>  Workers Planned: 1
>>  ->  Partial Aggregate  (cost=17213914.07..17213914.07
>> rows=1 width=8)
>>->  Parallel Seq Scan on flows
>>  (cost=0.00..17019464.49 rows=388899162 width=0)
>> (5 rows)
>>
>> Time: 171.835 ms
>> flows=# select pg_relation_size('flows');
>>  pg_relation_size
>> --
>>  129865867264
>> (1 row)
>>
>> Time: 57.157 ms
>> flows=# select count(*) from flows;
>> LOG:  duration: 625546.522 ms  statement: select count(*) from flows;
>>count
>> ---
>>  589831190
>> (1 row)
>>
>> Time: 625546.662 ms
>>
>> The throughput reported by Postgresql is almost 198MB/s, and the
>> throughput as mesured by dstat during the query execution was between 25
>> and 299MB/s. It is much better than what I had before! The i/o wait was
>> about 12% all through the query. One thing I noticed is the discrepency
>> between the read throughput reported by pg_activity and the one reported by
>> dstat: pg_activity always report a value lower than dstat.
>>
>> Besides the change of disks configuration, here is what contributed the
>> most to the improvment of the performance so far:
>>
>> Using Hugepage
>> Increasing effective_io_concurrency to 256
>> Reducing random_page_cost from 22 to 4
>> Reducing min_parallel_relation_size to 512kB to have more workers
>> when doing sequential parallel scan of my biggest table
>>
>>
>> Thanks for recomending this test, I now know what the real throughput
>> should be!
>>
>> Charles
>>
>> On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood <
>> mark.kirkw...@catalyst.net.nz >
>> wrote:
>>
>> Hmm - how are you measuring that sequential scan speed of 4MB/s?
>> I'd recommend doing a very simple test e.g, here's one on my
>> workstation - 13 GB single table on 1 SATA drive - cold cache
>> after reboot, sequential scan using Postgres 9.6.2:
>>
>> bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
>>  QUERY PLAN
>> 
>> 
>>  Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
>>->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00
>> rows=1 width=0)
>> (2 rows)
>>
>>
>> bench=#  SELECT pg_relation_size('pgbench_accounts');
>>  pg_relation_size
>> --
>>   13429514240
>> (1 row)
>>
>> bench=# SELECT count(*) FROM pgbench_accounts;
>>count
>> ---
>>  1
>> (1 row)
>>
>> Time: 118884.277 ms
>>
>>
>> So doing the math seq read speed is about 110MB/s (i.e 13 GB in
>> 120 sec). Sure enough, while I was running the query iostat showed:
>>
>> Device: rrqm/s   wrqm/s r/s w/srMB/s wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm %util
>> sda   0.00 0.00  926.000.00 114.89  0.00
>> 254.10 1.902.032.030.00   1.08 100.00
>>
>>
>> 

Re: [PERFORM] Very poor read performance, query independent

2017-07-15 Thread Charles Nadeau
Mark,

The server is a HP DL 380 G6. It doesn't really work with SATA drives. And
when you find one that is compatible, it is only used at 3Gb/s with a
maximum of 5 IOPS (a well know caracteristic of the HP P410i SAS RAID
controller). I am looking at getting a Kingston Digital HyperX Predator
that I could use in one of the PCIe 2.0 x4 slot. However I am worried about
the "thermal runaway", i.e. when the server can't get a temperature reading
from a PCIe card, it spins the fans at full speed to protect the server
against high temperature. The machine being next to my desk I worry about
the deafening noise it will create.
Thanks!

Chales

On Sat, Jul 15, 2017 at 1:57 AM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> Thinking about this a bit more - if somewhat more blazing performance is
> needed, then this could be achieved via losing the RAID card and spinning
> disks altogether and buying 1 of the NVME or SATA solid state products: e.g
>
> - Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds and 200K
> IOPS)
>
> - Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)
>
>
> The Samsung needs an M.2 port on the mobo (but most should have 'em - and
> if not PCIe X4 adapter cards are quite cheap). The Intel is a bit more
> expensive compared to the Samsung, and is slower but has a longer lifetime.
> However for your workload the Sammy is probably fine.
>
> regards
>
> Mark
>
> On 15/07/17 11:09, Mark Kirkwood wrote:
>
>> Ah yes - that seems more sensible (but still slower than I would expect
>> for 5 disks RAID 0).
>>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Mark Kirkwood
Thinking about this a bit more - if somewhat more blazing performance is 
needed, then this could be achieved via losing the RAID card and 
spinning disks altogether and buying 1 of the NVME or SATA solid state 
products: e.g


- Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds and 
200K IOPS)


- Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)


The Samsung needs an M.2 port on the mobo (but most should have 'em - 
and if not PCIe X4 adapter cards are quite cheap). The Intel is a bit 
more expensive compared to the Samsung, and is slower but has a longer 
lifetime. However for your workload the Sammy is probably fine.


regards

Mark

On 15/07/17 11:09, Mark Kirkwood wrote:
Ah yes - that seems more sensible (but still slower than I would 
expect for 5 disks RAID 0).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Mark Kirkwood
Ah yes - that seems more sensible (but still slower than I would expect 
for 5 disks RAID 0). You should be able to get something like 5 * 
(single disk speed) i.e about 500MB/s.


Might be worth increasing device read ahead (more than you have 
already). Some of these so-called 'smart' RAID cards need to be hit over 
the head before they will perform. E.g: I believe you have it set to 128 
- I'd try 4096 or even 16384 (In the past I've used those settings on 
some extremely stupid cards that refused to max out their disks known 
speeds).


Also worth investigating is RAID stripe size - for DW work it makes 
sense for it to be reasonably big (256K to 1M), which again will help 
speed is sequential scans.


Cheers

Mark


On 15/07/17 02:09, Charles Nadeau wrote:

Mark,

First I must say that I changed my disks configuration from 4 disks in 
RAID 10 to 5 disks in RAID 0 because I almost ran out of disk space 
during the last ingest of data.

Here is the result test you asked. It was done with a cold cache:

flows=# \timing
Timing is on.
flows=# explain select count(*) from flows;
  QUERY PLAN

---
 Finalize Aggregate  (cost=17214914.09..17214914.09 rows=1 width=8)
   ->  Gather  (cost=17214914.07..17214914.09 rows=1 width=8)
 Workers Planned: 1
 ->  Partial Aggregate  (cost=17213914.07..17213914.07
rows=1 width=8)
   ->  Parallel Seq Scan on flows
 (cost=0.00..17019464.49 rows=388899162 width=0)
(5 rows)

Time: 171.835 ms
flows=# select pg_relation_size('flows');
 pg_relation_size
--
 129865867264
(1 row)

Time: 57.157 ms
flows=# select count(*) from flows;
LOG:  duration: 625546.522 ms  statement: select count(*) from flows;
   count
---
 589831190
(1 row)

Time: 625546.662 ms

The throughput reported by Postgresql is almost 198MB/s, and the 
throughput as mesured by dstat during the query execution was between 
25 and 299MB/s. It is much better than what I had before! The i/o wait 
was about 12% all through the query. One thing I noticed is the 
discrepency between the read throughput reported by pg_activity and 
the one reported by dstat: pg_activity always report a value lower 
than dstat.


Besides the change of disks configuration, here is what contributed 
the most to the improvment of the performance so far:


Using Hugepage
Increasing effective_io_concurrency to 256
Reducing random_page_cost from 22 to 4
Reducing min_parallel_relation_size to 512kB to have more workers
when doing sequential parallel scan of my biggest table


Thanks for recomending this test, I now know what the real throughput 
should be!


Charles

On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood 
> 
wrote:


Hmm - how are you measuring that sequential scan speed of 4MB/s?
I'd recommend doing a very simple test e.g, here's one on my
workstation - 13 GB single table on 1 SATA drive - cold cache
after reboot, sequential scan using Postgres 9.6.2:

bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
 QUERY PLAN


 Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
   ->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00
rows=1 width=0)
(2 rows)


bench=#  SELECT pg_relation_size('pgbench_accounts');
 pg_relation_size
--
  13429514240
(1 row)

bench=# SELECT count(*) FROM pgbench_accounts;
   count
---
 1
(1 row)

Time: 118884.277 ms


So doing the math seq read speed is about 110MB/s (i.e 13 GB in
120 sec). Sure enough, while I was running the query iostat showed:

Device: rrqm/s   wrqm/s r/s w/srMB/s wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm %util
sda   0.00 0.00  926.000.00 114.89  0.00 
 254.10 1.902.032.030.00   1.08 100.00



So might be useful for us to see something like that from your
system - note you need to check you really have flushed the cache,
and that no other apps are using the db.

regards

Mark


On 12/07/17 00:46, Charles Nadeau wrote:

After reducing random_page_cost to 4 and testing more, I can
report that the aggregate read throughput for parallel
sequential scan is about 90MB/s. However the throughput for
sequential scan is still around 4MB/s.





--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Friday, July 14, 2017 3:13 PM
To: Charles Nadeau <charles.nad...@gmail.com>
Cc: Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <iney...@perceptron.com<mailto:iney...@perceptron.com>>
Cc: Jeff Janes <jeff.ja...@gmail.com<mailto:jeff.ja...@gmail.com>>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=728795193 
read=82974833
   ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229 
read=14991777
 ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
   Buffers: shared hit=860990 
read=14991777
 ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
   Heap Fetches: 0

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Igor Neyman
From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <iney...@perceptron.com>
Cc: Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=728795193 
read=82974833
   ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229 
read=14991777
 ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
   Buffers: shared hit=860990 
read=14991777
 ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
   Heap Fetches: 0
   Buffers: shared hit=589831203
   ->  Hash  (cost=30059641.47..30059641.47 
rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)
 Buckets: 4194304 (originally 
16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
 Buffers: shared hit=138102964 
read=67983056
 -

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Charles Nadeau
.00..29729327.92 rows=660467447 width=4) (actual
time=0.389..1201072.707 rows=125838232 loops=13)
 Join Filter:
(mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
 Rows Removed by
Join Filter: 503353617
 Buffers: shared
hit=138102964 read=67983056
 ->  Seq Scan on
flows flows_1  (cost=0.00..17836152.73 rows=661128576 width=4) (actual
time=0.322..343152.274 rows=589831190 loops=13)
   Buffers:
shared hit=138102915 read=67983056
 ->  Materialize
 (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2
loops=7667805470)
   Buffers:
shared hit=13
   ->  Seq Scan
on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual
time=0.006..0.007 rows=4 loops=13)

 Buffers: shared hit=13
 Planning time: 0.941 ms
 Execution time: 2228345.171 ms
(48 rows)


With a work_mem at 6GB, I noticed that for the first 20 minutes the query
was running, the i/o wait was much lower, hovering aroun 3% then it jumped
45% until almost the end of the query.

flowscompact and dstexterne are actually views. I use views to simplify
query writing and to "abstract" queries that are use often in other
queries. flowscompact is a view built on table flows (having about 590
million rows), it only keeps the most often used fields.

flows=# \d+ flowscompact;
View "public.flowscompact"
  Column   |   Type   | Modifiers | Storage | Description
---+--+---+-+-
 flow_id   | bigint   |   | plain   |
 sysuptime | bigint   |   | plain   |
 exaddr| ip4  |   | plain   |
 dpkts | integer  |   | plain   |
 doctets   | bigint   |   | plain   |
 first | bigint   |   | plain   |
 last  | bigint   |   | plain   |
 srcaddr   | ip4  |   | plain   |
 dstaddr   | ip4  |   | plain   |
 srcport   | integer  |   | plain   |
 dstport   | integer  |   | plain   |
 prot  | smallint |   | plain   |
 tos   | smallint |   | plain   |
 tcp_flags | smallint |   | plain   |
 timestamp | timestamp with time zone |   | plain   |
View definition:
 SELECT flowstimestamp.flow_id,
flowstimestamp.sysuptime,
flowstimestamp.exaddr,
flowstimestamp.dpkts,
flowstimestamp.doctets,
flowstimestamp.first,
flowstimestamp.last,
flowstimestamp.srcaddr,
flowstimestamp.dstaddr,
flowstimestamp.srcport,
flowstimestamp.dstport,
flowstimestamp.prot,
flowstimestamp.tos,
flowstimestamp.tcp_flags,
flowstimestamp."timestamp"
   FROM flowstimestamp;

mynetworks is a table having one column and 4 rows; it contains a list of
our network networks:

flows=# select * from mynetworks;
 ipaddr

 192.168.0.0/24
 10.112.12.0/30
 10.112.12.4/30
 10.112.12.8/30
(4 row)
flows=# \d+ mynetworks;
Table "public.mynetworks"
 Column | Type | Modifiers | Storage | Stats target | Description
+--+---+-+--+-
 ipaddr | ip4r |   | plain   |  |
Indexes:
"mynetworks_ipaddr_idx" gist (ipaddr)

dstexterne is a view listing all the destination IPv4 addresses not inside
our network; it has one column and 3.8 million rows.

flows=# \d+ dstexterne;
  View "public.dstexterne"
 Column  | Type | Modifiers | Storage | Description
-+--+---+-+-
 dstaddr | ip4  |   | plain   |
View definition:
 SELECT DISTINCT flowscompact.dstaddr
   FROM flowscompact
 LEFT JOIN mynetworks ON mynetworks.ipaddr >> flowscompact.dstaddr::ip4r
  WHERE mynetworks.ipaddr IS NULL;

Thanks!

Charles

On Wed, Jul 12, 2017 at 6:39 PM, Igor Neyman <iney...@perceptron.com> wrote:

>
>
>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org <pgsql-performance-ow...@postgresql.org>] *On Behalf
> Of *Charles Nadeau
> *Sent:* Wednesday, July 12, 2017 6:05 AM
> *To:* Jeff Janes <jeff.ja...@gmail.com>
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
>
&g

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Charles Nadeau
Mark,

First I must say that I changed my disks configuration from 4 disks in RAID
10 to 5 disks in RAID 0 because I almost ran out of disk space during the
last ingest of data.
Here is the result test you asked. It was done with a cold cache:

flows=# \timing
Timing is on.
flows=# explain select count(*) from flows;
  QUERY PLAN


---
 Finalize Aggregate  (cost=17214914.09..17214914.09 rows=1 width=8)
   ->  Gather  (cost=17214914.07..17214914.09 rows=1 width=8)
 Workers Planned: 1
 ->  Partial Aggregate  (cost=17213914.07..17213914.07 rows=1
width=8)
   ->  Parallel Seq Scan on flows  (cost=0.00..17019464.49
rows=388899162 width=0)
(5 rows)

Time: 171.835 ms
flows=# select pg_relation_size('flows');
 pg_relation_size
--
 129865867264
(1 row)

Time: 57.157 ms
flows=# select count(*) from flows;
LOG:  duration: 625546.522 ms  statement: select count(*) from flows;
   count
---
 589831190
(1 row)

Time: 625546.662 ms

The throughput reported by Postgresql is almost 198MB/s, and the throughput
as mesured by dstat during the query execution was between 25 and 299MB/s.
It is much better than what I had before! The i/o wait was about 12% all
through the query. One thing I noticed is the discrepency between the read
throughput reported by pg_activity and the one reported by dstat:
pg_activity always report a value lower than dstat.

Besides the change of disks configuration, here is what contributed the
most to the improvment of the performance so far:

Using Hugepage
Increasing effective_io_concurrency to 256
Reducing random_page_cost from 22 to 4
Reducing min_parallel_relation_size to 512kB to have more workers when
doing sequential parallel scan of my biggest table


Thanks for recomending this test, I now know what the real throughput
should be!

Charles

On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> Hmm - how are you measuring that sequential scan speed of 4MB/s? I'd
> recommend doing a very simple test e.g, here's one on my workstation - 13
> GB single table on 1 SATA drive - cold cache after reboot, sequential scan
> using Postgres 9.6.2:
>
> bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
>  QUERY PLAN
> 
> 
>  Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
>->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00 rows=1
> width=0)
> (2 rows)
>
>
> bench=#  SELECT pg_relation_size('pgbench_accounts');
>  pg_relation_size
> --
>   13429514240
> (1 row)
>
> bench=# SELECT count(*) FROM pgbench_accounts;
>count
> ---
>  1
> (1 row)
>
> Time: 118884.277 ms
>
>
> So doing the math seq read speed is about 110MB/s (i.e 13 GB in 120 sec).
> Sure enough, while I was running the query iostat showed:
>
> Device: rrqm/s   wrqm/s r/s w/srMB/s wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.00  926.000.00 114.89 0.00   254.10
>1.902.032.030.00   1.08 100.00
>
>
> So might be useful for us to see something like that from your system -
> note you need to check you really have flushed the cache, and that no other
> apps are using the db.
>
> regards
>
> Mark
>
>
> On 12/07/17 00:46, Charles Nadeau wrote:
>
>> After reducing random_page_cost to 4 and testing more, I can report that
>> the aggregate read throughput for parallel sequential scan is about 90MB/s.
>> However the throughput for sequential scan is still around 4MB/s.
>>
>>
>


-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Jeff Janes
On Wed, Jul 12, 2017 at 3:04 AM, Charles Nadeau 
wrote:

> Jeff,
>
> Here are the 2 EXPLAINs for one of my simplest query:
>


It looks like dstexterne and flowcompact are both views over flow.  Can you
share the definition of those views?

I think the iowait > 12.5% is due to the parallel query execution.  But
then the question is, why is it only 25% when you have 10 fold parallelism?

It certainly looks like you are doing more than 4MB/s there, so maybe
something is wrong with the instrumentation, or how you are interpreting
it.

Although it is still less than perhaps it could do.  To put a baseline on
what you can expect out of parallel seq scans, can you do something like:

explain (analyze, buffers) select avg(doctets) from flow;

Cheers,

Jeff


Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman


From: 
pgsql-performance-ow...@postgresql.org<mailto:pgsql-performance-ow...@postgresql.org>
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Wednesday, July 12, 2017 6:05 AM
To: Jeff Janes <jeff.ja...@gmail.com<mailto:jeff.ja...@gmail.com>>
Cc: pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Very poor read performance, query independent


flows=# explain (analyze, buffers) SELECT DISTINCT
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport,
flows-#COUNT(*) AS conversation,
flows-#SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#flowscompact,
flows-#mynetworks
flows-# WHERE
flows-#mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#AND dstaddr IN
flows-#(
flows(#   SELECT
flows(#  dstaddr
flows(#   FROM
flows(#  dstexterne
flows(#)
flows-# GROUP BY
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport
flows-# ORDER BY
flows-#mbytes DESC LIMIT 50;
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.6", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.7", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.8", size 639696896
LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers) SELECT 
DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

  QUERY PLAN
--
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual 
time=2764548.863..2764548.891 rows=50 loops=1)
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52) (actual 
time=2764548.861..2764548.882 rows=50 loops=1)
 Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52) 
(actual time=2764548.859..2764548.872 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 563150kB
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
   I/O Timings: read=5323746.860
   ->  GroupAggregate  (cost=37698151.34..37714980.68 rows=2243913 
width=52) (actual time=2696721.610..2752109.551 rows=4691734 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37698151.34..37699273.29 rows=2243913 
width=20) (actual time=2696711.428..2732781.705 rows=81896988 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 2721856kB
   Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Gather  (cost=19463936.00..37650810.19 
rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988 
loops=1)
 Workers Planned: 9
 Workers Launched: 9
 Buffers: shared hit=1116590559 read=15851133
 I/O Timings: read=5323746.860
 ->  Hash Semi Join  
(cost=19462936.00..37622883.23 rows=249324 width=20) (actual 
time=1847579.360..2602039.780 rows=8189699 loops=10)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=1116588309 
read=15851133
   I/O Timings: read=5323746.860
   ->  Nested Loop  (cost=0.03..18159012.30 
rows=249324 width=20) (actual time=1.562..736556.583 rows=45499045 loops=10)
  

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Wednesday, July 12, 2017 6:05 AM
To: Jeff Janes <jeff.ja...@gmail.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent


flows=# explain (analyze, buffers) SELECT DISTINCT
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport,
flows-#COUNT(*) AS conversation,
flows-#SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#flowscompact,
flows-#mynetworks
flows-# WHERE
flows-#mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#AND dstaddr IN
flows-#(
flows(#   SELECT
flows(#  dstaddr
flows(#   FROM
flows(#  dstexterne
flows(#)
flows-# GROUP BY
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport
flows-# ORDER BY
flows-#mbytes DESC LIMIT 50;
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.6", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.7", size 1073741824
LOG:  temporary file: path 
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.8", size 639696896
LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers) SELECT 
DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

  QUERY PLAN
--
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual 
time=2764548.863..2764548.891 rows=50 loops=1)
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52) (actual 
time=2764548.861..2764548.882 rows=50 loops=1)
 Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52) 
(actual time=2764548.859..2764548.872 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 563150kB
   Buffers: shared hit=1116590560 read=15851133, temp read=340244 
written=340244
   I/O Timings: read=5323746.860
   ->  GroupAggregate  (cost=37698151.34..37714980.68 rows=2243913 
width=52) (actual time=2696721.610..2752109.551 rows=4691734 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
 I/O Timings: read=5323746.860
 ->  Sort  (cost=37698151.34..37699273.29 rows=2243913 
width=20) (actual time=2696711.428..2732781.705 rows=81896988 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 2721856kB
   Buffers: shared hit=1116590560 read=15851133, temp 
read=340244 written=340244
   I/O Timings: read=5323746.860
   ->  Gather  (cost=19463936.00..37650810.19 
rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988 
loops=1)
 Workers Planned: 9
 Workers Launched: 9
 Buffers: shared hit=1116590559 read=15851133
 I/O Timings: read=5323746.860
 ->  Hash Semi Join  
(cost=19462936.00..37622883.23 rows=249324 width=20) (actual 
time=1847579.360..2602039.780 rows=8189699 loops=10)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=1116588309 
read=15851133
   I/O Timings: read=5323746.860
   ->  Nested Loop  (cost=0.03..18159012.30 
rows=249324 width=20) (actual time=1.562..736556.583 rows=45499045 loops=10)
 Buffers: shared hit=996551813 
read=15851133
 

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread bricklen
On Wed, Jul 12, 2017 at 12:30 AM, Charles Nadeau 
wrote:

>
> I use noop as the scheduler because it is better to let the RAID
> controller re-arrange the IO operation before they reach the disk. Read
> ahead is set to 128:
>
> charles@hpdl380g6:~$ cat /sys/block/sdc/queue/read_ahead_kb
> 128
> charles@hpdl380g6:~$ cat /sys/block/sdc/queue/scheduler
> [noop] deadline cfq
>
>
>
Perhaps pg_test_fsync (
https://www.postgresql.org/docs/9.6/static/pgtestfsync.html) and
pg_test_timing will help shed some light here, or at the very least give
some numbers to compare against.


Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Rick Otten
On Wed, Jul 12, 2017 at 9:38 AM, Charles Nadeau 
wrote:

> Rick,
>
> Should the number of page should always be correlated to the VmPeak of the
> postmaster or could it be set to reflect shared_buffer or another setting?
> Thanks!
>
>
The documentation implies that you may need to adjust its size when you
change shared_buffer settings.

I usually check it every now and then (I haven't build a formal monitor
yet.) to see if all of the huge pages are free/used and if it looks like
they are all getting consumed - consider bumping it higher.  If there are
lots free, you are probably fine.

cat /proc/meminfo | grep -i "^huge"

--

Also regarding my note on effective_io_concurrency, which I'm not sure you
tried tweaking yet.

With file system and hardware caching between you and your spindles, your
best setting for effective_io_concurrency may be much higher than the
actual number of spindles.   It is worth experimenting with.   If you can,
try several values.  You can use pg_bench to put consistent workloads on
your database for measurement purposes.


Charles
>
> On Mon, Jul 10, 2017 at 5:25 PM, Rick Otten 
> wrote:
>
>> Although probably not the root cause, at the least I would set up
>> hugepages  ( https://www.postgresql.org/docs/9.6/static/kernel-resourc
>> es.html#LINUX-HUGE-PAGES ), and bump effective_io_concurrency up quite a
>> bit as well (256 ?).
>>
>>


Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Wednesday, July 12, 2017 3:21 AM
To: Igor Neyman <iney...@perceptron.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I set shared_buffers to 24 GB and effective_cache_size to 64GB and I can see 
that the queries are faster due to the fact that the index are used more often. 
Knowing I have 72GB of RAM and the server is exclusively dedicated to 
Postgresql, what could be the maximum value for effective_cache?
Thanks!

Charles

64GB for effective_cache_size should be good enough, adding couple more GB 
wouldn’t change much.

Igor


Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Charles Nadeau
Rick,

Should the number of page should always be correlated to the VmPeak of the
postmaster or could it be set to reflect shared_buffer or another setting?
Thanks!

Charles

On Mon, Jul 10, 2017 at 5:25 PM, Rick Otten 
wrote:

> Although probably not the root cause, at the least I would set up
> hugepages  ( https://www.postgresql.org/docs/9.6/static/kernel-
> resources.html#LINUX-HUGE-PAGES ), and bump effective_io_concurrency up
> quite a bit as well (256 ?).
>
>
> On Mon, Jul 10, 2017 at 10:03 AM, Charles Nadeau  > wrote:
>
>> I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
>> Hardware is:
>>
>> *2x Intel Xeon E5550
>>
>> *72GB RAM
>>
>> *Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
>> read/20% write) for Postgresql data only:
>>
>> Logical Drive: 3
>>
>> Size: 273.4 GB
>>
>> Fault Tolerance: 1+0
>>
>> Heads: 255
>>
>> Sectors Per Track: 32
>>
>> Cylinders: 65535
>>
>> Strip Size: 128 KB
>>
>> Full Stripe Size: 256 KB
>>
>> Status: OK
>>
>> Caching: Enabled
>>
>> Unique Identifier: 600508B1001037383941424344450A00
>>
>> Disk Name: /dev/sdc
>>
>> Mount Points: /mnt/data 273.4 GB
>>
>> OS Status: LOCKED
>>
>> Logical Drive Label: A00A194750123456789ABCDE516F
>>
>> Mirror Group 0:
>>
>> physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)
>>
>> physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, OK)
>>
>> Mirror Group 1:
>>
>> physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
>>
>> physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)
>>
>> Drive Type: Data
>>
>> Formatted with ext4 with: sudo mkfs.ext4 -E stride=32,stripe_width=64 -v
>> /dev/sdc1.
>>
>> Mounted in /etc/fstab with this line: 
>> "UUID=99fef4ae-51dc-4365-9210-0b153b1cbbd0
>> /mnt/data ext4 rw,nodiratime,user_xattr,noatime,nobarrier,errors=remount-ro
>> 0 1"
>>
>> Postgresql is the only application running on this server.
>>
>>
>> Postgresql is used as a mini data warehouse to generate reports and do
>> statistical analysis. It is used by at most 2 users and fresh data is added
>> every 10 days. The database has 16 tables: one is 224GB big and the rest
>> are between 16kB and 470MB big.
>>
>>
>> My configuration is:
>>
>>
>> name | current_setting | source
>>
>> -+--
>> --+--
>>
>> application_name | psql | client
>>
>> autovacuum_vacuum_scale_factor | 0 | configuration file
>>
>> autovacuum_vacuum_threshold | 2000 | configuration file
>>
>> checkpoint_completion_target | 0.9 | configuration file
>>
>> checkpoint_timeout | 30min | configuration file
>>
>> client_encoding | UTF8 | client
>>
>> client_min_messages | log | configuration file
>>
>> cluster_name | 9.6/main | configuration file
>>
>> cpu_index_tuple_cost | 0.001 | configuration file
>>
>> cpu_operator_cost | 0.0005 | configuration file
>>
>> cpu_tuple_cost | 0.003 | configuration file
>>
>> DateStyle | ISO, YMD | configuration file
>>
>> default_statistics_target | 100 | configuration file
>>
>> default_text_search_config | pg_catalog.english | configuration file
>>
>> dynamic_shared_memory_type | posix | configuration file
>>
>> effective_cache_size | 22GB | configuration file
>>
>> effective_io_concurrency | 4 | configuration file
>>
>> external_pid_file | /var/run/postgresql/9.6-main.pid | configuration file
>>
>> lc_messages | C | configuration file
>>
>> lc_monetary | en_CA.UTF-8 | configuration file
>>
>> lc_numeric | en_CA.UTF-8 | configuration file
>>
>> lc_time | en_CA.UTF-8 | configuration file
>>
>> listen_addresses | * | configuration file
>>
>> lock_timeout | 100s | configuration file
>>
>> log_autovacuum_min_duration | 0 | configuration file
>>
>> log_checkpoints | on | configuration file
>>
>> log_connections | on | configuration file
>>
>> log_destination | csvlog | configuration file
>>
>> log_directory | /mnt/bigzilla/data/toburn/hp/postgresql/pg_log |
>> configuration file
>>
>> log_disconnections | on | configuration file
>>
>> log_error_verbosity | default | configuration file
>>
>> log_file_mode | 0600 | configuration file
>>
>> log_filename | postgresql-%Y-%m-%d_%H%M%S.log | configuration file
>>
>> log_line_prefix | user=%u,db=%d,app=%aclient=%h | configuration file
>>
>> log_lock_waits | on | configuration file
>>
>> log_min_duration_statement | 0 | configuration file
>>
>> log_min_error_statement | debug1 | configuration file
>>
>> log_min_messages | debug1 | configuration file
>>
>> log_rotation_size | 1GB | configuration file
>>
>> log_temp_files | 0 | configuration file
>>
>> log_timezone | localtime | configuration file
>>
>> logging_collector | on | configuration file
>>
>> maintenance_work_mem | 3GB | configuration file
>>
>> max_connections | 10 | configuration file
>>
>> max_locks_per_transaction | 256 | configuration file
>>
>> max_parallel_workers_per_gather | 14 | configuration file
>>
>> max_stack_depth | 2MB | 

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Charles Nadeau
Jeff,

Here are the 2 EXPLAINs for one of my simplest query:

flows=# SET track_io_timing = on;
LOG:  duration: 24.101 ms  statement: SET track_io_timing = on;
SET
flows=# explain (analyze, timing off) SELECT DISTINCT
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport,
flows-#COUNT(*) AS conversation,
flows-#SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#flowscompact,
flows-#mynetworks
flows-# WHERE
flows-#mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#AND dstaddr IN
flows-#(
flows(#   SELECT
flows(#  dstaddr
flows(#   FROM
flows(#  dstexterne
flows(#)
flows-# GROUP BY
flows-#srcaddr,
flows-#dstaddr,
flows-#dstport
flows-# ORDER BY
flows-#mbytes DESC LIMIT 50;
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.3", size
1073741824
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.4", size
1073741824
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.5", size
639696896
LOG:  duration: 2632108.352 ms  statement: explain (analyze, timing off)
SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

  QUERY PLAN

---
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual rows=50
loops=1)
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52)
(actual rows=50 loops=1)
 ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52)
(actual rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) /
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport,
(count(*))
   Sort Method: quicksort  Memory: 563150kB
   ->  GroupAggregate  (cost=37698151.34..37714980.68
rows=2243913 width=52) (actual rows=4691734 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 ->  Sort  (cost=37698151.34..37699273.29 rows=2243913
width=20) (actual rows=81896988 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr,
flows.dstport
   Sort Method: external merge  Disk: 2721856kB
   ->  Gather  (cost=19463936.00..37650810.19
rows=2243913 width=20) (actual rows=81896988 loops=1)
 Workers Planned: 9
 Workers Launched: 9
 ->  Hash Semi Join
 (cost=19462936.00..37622883.23 rows=249324 width=20) (actual rows=8189699
loops=10)
   Hash Cond: (flows.dstaddr =
flows_1.dstaddr)
   ->  Nested Loop
 (cost=0.03..18159012.30 rows=249324 width=20) (actual rows=45499045
loops=10)
 ->  Parallel Seq Scan on flows
 (cost=0.00..16039759.79 rows=62330930 width=20) (actual rows=54155970
loops=10)
 ->  Index Only Scan using
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
(actual rows=1 loops=541559704)
   Index Cond: (ipaddr >>=
(flows.srcaddr)::ip4r)
   Heap Fetches: 48679396
   ->  Hash
 (cost=19462896.74..19462896.74 rows=11210 width=4) (actual rows=3099798
loops=10)
 Buckets: 4194304 (originally
16384)  Batches: 1 (originally 1)  Memory Usage: 141746kB
 ->  HashAggregate
 (cost=19462829.48..19462863.11 rows=11210 width=4) (actual rows=3099798
loops=10)
   Group Key:
flows_1.dstaddr
   ->  Nested Loop Anti
Join  (cost=0.12..19182620.78 rows=560417390 width=4) (actual
rows=113420172 loops=10)
 Join Filter:
(mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
 Rows Removed by
Join Filter: 453681377
 ->  Index Only
Scan using flows_srcaddr_dstaddr_idx on flows flows_1
 (cost=0.12..9091067.70 rows=560978368 width=4) (actual rows=541559704
loops=10)
   Heap
Fetches: 91

Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Charles Nadeau
Joshua,

I use noop as the scheduler because it is better to let the RAID controller
re-arrange the IO operation before they reach the disk. Read ahead is set
to 128:

charles@hpdl380g6:~$ cat /sys/block/sdc/queue/read_ahead_kb
128
charles@hpdl380g6:~$ cat /sys/block/sdc/queue/scheduler
[noop] deadline cfq

Thanks!

Charles

On Wed, Jul 12, 2017 at 1:42 AM, Joshua D. Drake 
wrote:

> On 07/11/2017 04:15 PM, Merlin Moncure wrote:
>
>> On Mon, Jul 10, 2017 at 9:03 AM, Charles Nadeau
>>  wrote:
>>
>>> I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
>>> Hardware is:
>>>
>>> *2x Intel Xeon E5550
>>>
>>> *72GB RAM
>>>
>>> *Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
>>> read/20% write) for Postgresql data only:
>>>
>>> The problem I have is very poor read. When I benchmark my array with fio
>>> I
>>> get random reads of about 200MB/s and 1100IOPS and sequential reads of
>>> about
>>> 286MB/s and 21000IPS. But when I watch my queries using pg_activity, I
>>> get
>>> at best 4MB/s. Also using dstat I can see that iowait time is at about
>>> 25%.
>>> This problem is not query-dependent.
>>>
>>
>> Stop right there. 1100 iops * 8kb = ~8mb/sec raw which might
>> reasonably translate to 4mb/sec to the client. 200mb/sec random
>> read/sec on spinning media is simply not plausible;
>>
>
> Sure it is, if he had more than 4 disks ;) but he also isn't going to get
> 1100 IOPS from 4 10k disks. The average 10k disk is going to get around 130
> IOPS . If he only has 4 then there is no way he is getting 1100 IOPS.
>
> Using the above specs (4x146GB) the best he can reasonably hope for from
> the drives themselves is about 50MB/s add in the 1GB FWBC and that is how
> he is getting those high numbers for IOPS but that is because of caching.
>
> He may need to adjust his readahead as well as his kernel scheduler. At a
> minimum he should be able to saturate the drives without issue.
>
> JD
>
>
>
> --
> Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
>
> PostgreSQL Centered full stack support, consulting and development.
> Advocate: @amplifypostgres || Learn: https://pgconf.us
> * Unless otherwise stated, opinions are my own.   *
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-12 Thread Charles Nadeau
Igor,

I set shared_buffers to 24 GB and effective_cache_size to 64GB and I can
see that the queries are faster due to the fact that the index are used
more often. Knowing I have 72GB of RAM and the server is exclusively
dedicated to Postgresql, what could be the maximum value for
effective_cache?
Thanks!

Charles

On Tue, Jul 11, 2017 at 5:16 PM, Igor Neyman <iney...@perceptron.com> wrote:

>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] *On Behalf Of *Igor Neyman
> *Sent:* Tuesday, July 11, 2017 10:34 AM
> *To:* Charles Nadeau <charles.nad...@gmail.com>
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> *From:* Charles Nadeau [mailto:charles.nad...@gmail.com
> <charles.nad...@gmail.com>]
> *Sent:* Tuesday, July 11, 2017 6:43 AM
> *To:* Igor Neyman <iney...@perceptron.com>
> *Cc:* Andreas Kretschmer <andr...@a-kretschmer.de>;
> pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> Igor,
>
>
>
> I reduced the value of random_page_cost to 4 but the read speed remains
> low.
>
> Regarding effective_cache_size and shared_buffer, do you mean they should
> be both equal to 64GB?
>
> Thanks for suggestions!
>
>
>
> Charles
>
>
>
> No, they should not be equal.
>
> From the docs:
>
>
>
> effective_cache_size (integer)
>
> Sets the planner's assumption about the effective size of the disk cache
> that is available to a single query. This is factored into estimates of the
> cost of using an index; a higher value makes it more likely index scans
> will be used, a lower value makes it more likely sequential scans will be
> used. When setting this parameter you should consider both PostgreSQL's
> shared buffers and the portion of the kernel's disk cache that will be used
> for PostgreSQL data files. Also, take into account the expected number of
> concurrent queries on different tables, since they will have to share the
> available space. This parameter has no effect on the size of shared memory
> allocated by PostgreSQL, nor does it reserve kernel disk cache; it is used
> only for estimation purposes. The system also does not assume data remains
> in the disk cache between queries. The default is 4 gigabytes (4GB).
>
> So, I’d set shared_buffers at 24GB and effective_cache_size at 64GB.
>
>
>
> Regards,
>
> Igor
>
>
>
> Also, maybe it’s time to look at execution plans (explain analyze) of
> specific slow queries, instead of trying to solve the problem “in general”.
>
>
>
> Igor
>
>
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Mark Kirkwood
Hmm - how are you measuring that sequential scan speed of 4MB/s? I'd 
recommend doing a very simple test e.g, here's one on my workstation - 
13 GB single table on 1 SATA drive - cold cache after reboot, sequential 
scan using Postgres 9.6.2:


bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
 QUERY PLAN

 Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
   ->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00 
rows=1 width=0)

(2 rows)


bench=#  SELECT pg_relation_size('pgbench_accounts');
 pg_relation_size
--
  13429514240
(1 row)

bench=# SELECT count(*) FROM pgbench_accounts;
   count
---
 1
(1 row)

Time: 118884.277 ms


So doing the math seq read speed is about 110MB/s (i.e 13 GB in 120 
sec). Sure enough, while I was running the query iostat showed:


Device: rrqm/s   wrqm/s r/s w/srMB/s wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.00  926.000.00 114.89 0.00   
254.10 1.902.032.030.00   1.08 100.00



So might be useful for us to see something like that from your system - 
note you need to check you really have flushed the cache, and that no 
other apps are using the db.


regards

Mark

On 12/07/17 00:46, Charles Nadeau wrote:
After reducing random_page_cost to 4 and testing more, I can report 
that the aggregate read throughput for parallel sequential scan is 
about 90MB/s. However the throughput for sequential scan is still 
around 4MB/s.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Jeff Janes
On Tue, Jul 11, 2017 at 4:42 PM, Joshua D. Drake 
wrote:

> On 07/11/2017 04:15 PM, Merlin Moncure wrote:
>
>> On Mon, Jul 10, 2017 at 9:03 AM, Charles Nadeau
>>  wrote:
>>
>>> I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
>>> Hardware is:
>>>
>>> *2x Intel Xeon E5550
>>>
>>> *72GB RAM
>>>
>>> *Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
>>> read/20% write) for Postgresql data only:
>>>
>>> The problem I have is very poor read. When I benchmark my array with fio
>>> I
>>> get random reads of about 200MB/s and 1100IOPS and sequential reads of
>>> about
>>> 286MB/s and 21000IPS. But when I watch my queries using pg_activity, I
>>> get
>>> at best 4MB/s. Also using dstat I can see that iowait time is at about
>>> 25%.
>>> This problem is not query-dependent.
>>>
>>
>> Stop right there. 1100 iops * 8kb = ~8mb/sec raw which might
>> reasonably translate to 4mb/sec to the client. 200mb/sec random
>> read/sec on spinning media is simply not plausible;
>>
>
> Sure it is, if he had more than 4 disks ;)


Or more to the point here, if each random read is 4MB long.  Which makes it
more like sequential reads, randomly-piecewise, rather than random reads.


> but he also isn't going to get 1100 IOPS from 4 10k disks. The average 10k
> disk is going to get around 130 IOPS . If he only has 4 then there is no
> way he is getting 1100 IOPS.
>

I wouldn't be sure.  He is using an iodepth of 256 in his benchmark.  It
wouldn't be all that outrageous for a disk to be able to find 3 or 4
sectors per revolution it can read, when it has that many to choose from.

 Cheers,

Jeff


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Jeff Janes
On Tue, Jul 11, 2017 at 4:02 AM, Charles Nadeau 
wrote:

> Jeff,
>
> I used fio in a quick benchmarking script inspired by https://smcleod.net/
> benchmarking-io/:
>
> #!/bin/bash
> #Random throughput
> echo "Random throughput"
> sync
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4M --iodepth=256 --size=10G
> --readwrite=randread --ramp_time=4
> #Random IOPS
> echo "Random IOPS"
> sync
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4k --iodepth=256 --size=4G
> --readwrite=randread --ramp_time=4
> #Sequential throughput
> echo "Sequential throughput"
> sync
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4M --iodepth=256 --size=10G
> --readwrite=read --ramp_time=4
> #Sequential IOPS
> echo "Sequential IOPS"
> sync
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4k --iodepth=256 --size=4G
> --readwrite=read --ramp_time=4
>
>
I don't think any of those are directly relevant to PostgreSQL, as it
doesn't use direct IO, doesn't use libaio, and is rarely going to get
anywhere near 256 iodepth.  So the best they can do is put a theoretical
ceiling on the performance.  Also, random IO with a 4MB stride doesn't make
any sense from a PostgreSQL perspective.



>
> Performing the test you suggested, I get 128.5MB/s. Monitoring the test, I
> find that the throughput is constant from start to finish and that the
> iowait is also constant at 5%:
>

I would have expected it to do better than that.  Maybe you increase the
kernel readahead setting.  I've found the default to be much too small.
But it doesn't make much difference to you, as you appear to be doing
random IO in your queries, not sequential.


> Could you suggest another way to benchmark random reads?
>

Your 1100 IOPS times 8kb block size gives about 8MB/s of throughput, which
is close to what you report.  So I think I'd would instead focus on tuning
your actual queries.  You say the problem is not query-dependent, but I
think that that just means all the queries you looked at are similar.  If
you looked at a query that can't use indexes, like count(unindexed_column)
from biggest_table; you would find it doing much more IO than 4MB/s.

Can you pick the simplest query you actually care about, and post both an
"explain (analyze, timing off)" and an "explain (analyze, buffers)" for it?
 (Preferably turning "track_io_timing" on first).

One other question I had, you said you had "2x Intel Xeon E5550", which
should be 8 CPU (or 16, if the hyperthreads
are reported as separate CPUs).  But you also said: "Also using dstat I can
see that iowait time is at about 25%".  Usually if there is only one thing
going on on the server, then IOWAIT won't be more than reciprocal of #CPU.
Is the server busy doing other stuff at the same time you are benchmarking
it?

Cheers,

Jeff


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Joshua D. Drake

On 07/11/2017 04:15 PM, Merlin Moncure wrote:

On Mon, Jul 10, 2017 at 9:03 AM, Charles Nadeau
 wrote:

I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
Hardware is:

*2x Intel Xeon E5550

*72GB RAM

*Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
read/20% write) for Postgresql data only:

The problem I have is very poor read. When I benchmark my array with fio I
get random reads of about 200MB/s and 1100IOPS and sequential reads of about
286MB/s and 21000IPS. But when I watch my queries using pg_activity, I get
at best 4MB/s. Also using dstat I can see that iowait time is at about 25%.
This problem is not query-dependent.


Stop right there. 1100 iops * 8kb = ~8mb/sec raw which might
reasonably translate to 4mb/sec to the client. 200mb/sec random
read/sec on spinning media is simply not plausible;


Sure it is, if he had more than 4 disks ;) but he also isn't going to 
get 1100 IOPS from 4 10k disks. The average 10k disk is going to get 
around 130 IOPS . If he only has 4 then there is no way he is getting 
1100 IOPS.


Using the above specs (4x146GB) the best he can reasonably hope for from 
the drives themselves is about 50MB/s add in the 1GB FWBC and that is 
how he is getting those high numbers for IOPS but that is because of 
caching.


He may need to adjust his readahead as well as his kernel scheduler. At 
a minimum he should be able to saturate the drives without issue.


JD



--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Merlin Moncure
On Mon, Jul 10, 2017 at 9:03 AM, Charles Nadeau
 wrote:
> I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
> Hardware is:
>
> *2x Intel Xeon E5550
>
> *72GB RAM
>
> *Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
> read/20% write) for Postgresql data only:
>
> The problem I have is very poor read. When I benchmark my array with fio I
> get random reads of about 200MB/s and 1100IOPS and sequential reads of about
> 286MB/s and 21000IPS. But when I watch my queries using pg_activity, I get
> at best 4MB/s. Also using dstat I can see that iowait time is at about 25%.
> This problem is not query-dependent.

Stop right there. 1100 iops * 8kb = ~8mb/sec raw which might
reasonably translate to 4mb/sec to the client. 200mb/sec random
read/sec on spinning media is simply not plausible; your benchmark is
lying to you.   Random reads on spinning media are absolutely going to
be storage bound.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 11:25 AM
To: Igor Neyman <iney...@perceptron.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent


Attention: This email was sent from someone outside of Perceptron. Always 
exercise caution when opening attachments or clicking links from unknown 
senders or when receiving unexpected emails.

Igor,

The sum of effective_cache_size and shared_buffer will be higher than the 
physical memory I have. Is it OK?
Thanks!

Charles

Yes, that’s normal.

shared_buffers is the maximum that Postgres allowed to allocate, while 
effective_cache_size is just a number that optimizer takes into account when 
creating execution plan.

Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Charles Nadeau
Igor,

The sum of effective_cache_size and shared_buffer will be higher than the
physical memory I have. Is it OK?
Thanks!

Charles

On Tue, Jul 11, 2017 at 4:34 PM, Igor Neyman <iney...@perceptron.com> wrote:

>
>
> *From:* Charles Nadeau [mailto:charles.nad...@gmail.com]
> *Sent:* Tuesday, July 11, 2017 6:43 AM
> *To:* Igor Neyman <iney...@perceptron.com>
> *Cc:* Andreas Kretschmer <andr...@a-kretschmer.de>;
> pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> Igor,
>
>
>
> I reduced the value of random_page_cost to 4 but the read speed remains
> low.
>
> Regarding effective_cache_size and shared_buffer, do you mean they should
> be both equal to 64GB?
>
> Thanks for suggestions!
>
>
>
> Charles
>
>
>
> No, they should not be equal.
>
> From the docs:
>
>
>
> effective_cache_size (integer)
>
> Sets the planner's assumption about the effective size of the disk cache
> that is available to a single query. This is factored into estimates of the
> cost of using an index; a higher value makes it more likely index scans
> will be used, a lower value makes it more likely sequential scans will be
> used. When setting this parameter you should consider both PostgreSQL's
> shared buffers and the portion of the kernel's disk cache that will be used
> for PostgreSQL data files. Also, take into account the expected number of
> concurrent queries on different tables, since they will have to share the
> available space. This parameter has no effect on the size of shared memory
> allocated by PostgreSQL, nor does it reserve kernel disk cache; it is used
> only for estimation purposes. The system also does not assume data remains
> in the disk cache between queries. The default is 4 gigabytes (4GB).
>
> So, I’d set shared_buffers at 24GB and effective_cache_size at 64GB.
>
>
>
> Regards,
>
> Igor
>
>
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Tuesday, July 11, 2017 10:34 AM
To: Charles Nadeau <charles.nad...@gmail.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 6:43 AM
To: Igor Neyman <iney...@perceptron.com<mailto:iney...@perceptron.com>>
Cc: Andreas Kretschmer 
<andr...@a-kretschmer.de<mailto:andr...@a-kretschmer.de>>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I reduced the value of random_page_cost to 4 but the read speed remains low.
Regarding effective_cache_size and shared_buffer, do you mean they should be 
both equal to 64GB?
Thanks for suggestions!

Charles

No, they should not be equal.
From the docs:

effective_cache_size (integer)
Sets the planner's assumption about the effective size of the disk cache that 
is available to a single query. This is factored into estimates of the cost of 
using an index; a higher value makes it more likely index scans will be used, a 
lower value makes it more likely sequential scans will be used. When setting 
this parameter you should consider both PostgreSQL's shared buffers and the 
portion of the kernel's disk cache that will be used for PostgreSQL data files. 
Also, take into account the expected number of concurrent queries on different 
tables, since they will have to share the available space. This parameter has 
no effect on the size of shared memory allocated by PostgreSQL, nor does it 
reserve kernel disk cache; it is used only for estimation purposes. The system 
also does not assume data remains in the disk cache between queries. The 
default is 4 gigabytes (4GB).
So, I’d set shared_buffers at 24GB and effective_cache_size at 64GB.

Regards,
Igor

Also, maybe it’s time to look at execution plans (explain analyze) of specific 
slow queries, instead of trying to solve the problem “in general”.

Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Igor Neyman

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Tuesday, July 11, 2017 6:43 AM
To: Igor Neyman <iney...@perceptron.com>
Cc: Andreas Kretschmer <andr...@a-kretschmer.de>; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

I reduced the value of random_page_cost to 4 but the read speed remains low.
Regarding effective_cache_size and shared_buffer, do you mean they should be 
both equal to 64GB?
Thanks for suggestions!

Charles

No, they should not be equal.
From the docs:

effective_cache_size (integer)
Sets the planner's assumption about the effective size of the disk cache that 
is available to a single query. This is factored into estimates of the cost of 
using an index; a higher value makes it more likely index scans will be used, a 
lower value makes it more likely sequential scans will be used. When setting 
this parameter you should consider both PostgreSQL's shared buffers and the 
portion of the kernel's disk cache that will be used for PostgreSQL data files. 
Also, take into account the expected number of concurrent queries on different 
tables, since they will have to share the available space. This parameter has 
no effect on the size of shared memory allocated by PostgreSQL, nor does it 
reserve kernel disk cache; it is used only for estimation purposes. The system 
also does not assume data remains in the disk cache between queries. The 
default is 4 gigabytes (4GB).
So, I’d set shared_buffers at 24GB and effective_cache_size at 64GB.

Regards,
Igor



Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Charles Nadeau
After reducing random_page_cost to 4 and testing more, I can report that
the aggregate read throughput for parallel sequential scan is about 90MB/s.
However the throughput for sequential scan is still around 4MB/s.

One more question: if a query uses more than one table, can more than one
table be read through a parallel sequential scan? I have many queries
joining  tables and I noticed that there was never more than one table read
in parallel.
Thanks!

Charles

On Mon, Jul 10, 2017 at 8:35 PM, Igor Neyman <iney...@perceptron.com> wrote:

>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] *On Behalf Of *Charles Nadeau
> *Sent:* Monday, July 10, 2017 11:48 AM
> *To:* Andreas Kretschmer <andr...@a-kretschmer.de>
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> Andreas,
>
>
>
> Because the ratio between the Sequential IOPS and Random IOPS is about 29.
> Taking into account that part of the data is in RAM, I obtained an
> "effective" ratio of about 22.
>
> Thanks!
>
>
>
> Charles
>
>
>
> On Mon, Jul 10, 2017 at 5:35 PM, Andreas Kretschmer <
> andr...@a-kretschmer.de> wrote:
>
>
>
> Am 10.07.2017 um 16:03 schrieb Charles Nadeau:
>
> random_page_cost | 22
>
>
>
> why such a high value for random_page_cost?
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
> --
>
> Charles Nadeau Ph.D.
> http://charlesnadeau.blogspot.com/
>
>
>
>
>
> Considering RAM size of 72 GB and your database size of ~225GB, and also
> the fact that Postgres is the only app running on the server, probably 1/3
> of your database resides in memory, so random_page_cost = 22 looks
> extremely high, probably it completely precludes index usage in your
> queries.
>
>
>
> You should try this setting at least at its default value:
> random_page_cost =4, and probably go even lower.
>
> Also, effective_cache_size is at least as big as your shared_buffers.
> Having 72GB RAM t effective_cache_size should be set around 64GB (again
> considering that Postgres is the only app running on the server).
>
>
>
> Regards,
>
> Igor Neyman
>
>
>
>
>
>
>
>
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Charles Nadeau
Jeff,

I used fio in a quick benchmarking script inspired by
https://smcleod.net/benchmarking-io/:

#!/bin/bash
#Random throughput
echo "Random throughput"
sync
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
--filename=test --bs=4M --iodepth=256 --size=10G --readwrite=randread
--ramp_time=4
#Random IOPS
echo "Random IOPS"
sync
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
--filename=test --bs=4k --iodepth=256 --size=4G --readwrite=randread
--ramp_time=4
#Sequential throughput
echo "Sequential throughput"
sync
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
--filename=test --bs=4M --iodepth=256 --size=10G --readwrite=read
--ramp_time=4
#Sequential IOPS
echo "Sequential IOPS"
sync
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
--filename=test --bs=4k --iodepth=256 --size=4G --readwrite=read
--ramp_time=4

Performing the test you suggested, I get 128.5MB/s. Monitoring the test, I
find that the throughput is constant from start to finish and that the
iowait is also constant at 5%:

charles@hpdl380g6:~$ sudo sh -c 'time cat /mnt/data/postgresql/base/16385/*
| wc -c'
[sudo] password for charles:
1.62user 179.94system 29:50.79elapsed 10%CPU (0avgtext+0avgdata
1920maxresident)k
448026264inputs+0outputs (0major+117minor)pagefaults 0swaps
241297594904


After making the changes to HugePage suggested by Rick Otten (above), I
found slightly better results (135.7MB/s):

charles@hpdl380g6:~$ sudo sh -c 'time cat /mnt/data/postgresql/base/16385/*
| wc -c'
[sudo] password for charles:
0.86user 130.84system 28:15.78elapsed 7%CPU (0avgtext+0avgdata
1820maxresident)k
471286792inputs+0outputs (1major+118minor)pagefaults 0swaps
241297594904


Could you suggest another way to benchmark random reads?

Thanks for your help!

Charles

On Mon, Jul 10, 2017 at 9:24 PM, Jeff Janes  wrote:

> On Mon, Jul 10, 2017 at 7:03 AM, Charles Nadeau 
> wrote:
>
>>
>> The problem I have is very poor read. When I benchmark my array with fio
>> I get random reads of about 200MB/s and 1100IOPS and sequential reads of
>> about 286MB/s and 21000IPS.
>>
>
>
> That doesn't seem right.  Sequential is only 43% faster?  What job file
> are giving to fio?
>
> What do you get if you do something simpler, like:
>
> time cat ~/$PGDATA/base/16402/*|wc -c
>
> replacing 16402 with whatever your biggest database is.
>
> Cheers,
>
> Jeff
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Charles Nadeau
Igor,

I reduced the value of random_page_cost to 4 but the read speed remains low.
Regarding effective_cache_size and shared_buffer, do you mean they should
be both equal to 64GB?
Thanks for suggestions!

Charles

On Mon, Jul 10, 2017 at 8:35 PM, Igor Neyman <iney...@perceptron.com> wrote:

>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] *On Behalf Of *Charles Nadeau
> *Sent:* Monday, July 10, 2017 11:48 AM
> *To:* Andreas Kretschmer <andr...@a-kretschmer.de>
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
> Andreas,
>
>
>
> Because the ratio between the Sequential IOPS and Random IOPS is about 29.
> Taking into account that part of the data is in RAM, I obtained an
> "effective" ratio of about 22.
>
> Thanks!
>
>
>
> Charles
>
>
>
> On Mon, Jul 10, 2017 at 5:35 PM, Andreas Kretschmer <
> andr...@a-kretschmer.de> wrote:
>
>
>
> Am 10.07.2017 um 16:03 schrieb Charles Nadeau:
>
> random_page_cost | 22
>
>
>
> why such a high value for random_page_cost?
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
> --
>
> Charles Nadeau Ph.D.
> http://charlesnadeau.blogspot.com/
>
>
>
>
>
> Considering RAM size of 72 GB and your database size of ~225GB, and also
> the fact that Postgres is the only app running on the server, probably 1/3
> of your database resides in memory, so random_page_cost = 22 looks
> extremely high, probably it completely precludes index usage in your
> queries.
>
>
>
> You should try this setting at least at its default value:
> random_page_cost =4, and probably go even lower.
>
> Also, effective_cache_size is at least as big as your shared_buffers.
> Having 72GB RAM t effective_cache_size should be set around 64GB (again
> considering that Postgres is the only app running on the server).
>
>
>
> Regards,
>
> Igor Neyman
>
>
>
>
>
>
>
>
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-11 Thread Charles Nadeau
Rick,

I applied the change you recommended but it didn't speed up the reads.
One thing I forgot to mention earlier is the speed of the backup made with
the COPY operations seems almost normal: I have read speed of up to 85MB/s.
Thanks for your help!

Charles

On Mon, Jul 10, 2017 at 5:25 PM, Rick Otten 
wrote:

> Although probably not the root cause, at the least I would set up
> hugepages  ( https://www.postgresql.org/docs/9.6/static/kernel-
> resources.html#LINUX-HUGE-PAGES ), and bump effective_io_concurrency up
> quite a bit as well (256 ?).
>
>
> On Mon, Jul 10, 2017 at 10:03 AM, Charles Nadeau  > wrote:
>
>> I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
>> Hardware is:
>>
>> *2x Intel Xeon E5550
>>
>> *72GB RAM
>>
>> *Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
>> read/20% write) for Postgresql data only:
>>
>> Logical Drive: 3
>>
>> Size: 273.4 GB
>>
>> Fault Tolerance: 1+0
>>
>> Heads: 255
>>
>> Sectors Per Track: 32
>>
>> Cylinders: 65535
>>
>> Strip Size: 128 KB
>>
>> Full Stripe Size: 256 KB
>>
>> Status: OK
>>
>> Caching: Enabled
>>
>> Unique Identifier: 600508B1001037383941424344450A00
>>
>> Disk Name: /dev/sdc
>>
>> Mount Points: /mnt/data 273.4 GB
>>
>> OS Status: LOCKED
>>
>> Logical Drive Label: A00A194750123456789ABCDE516F
>>
>> Mirror Group 0:
>>
>> physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)
>>
>> physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, OK)
>>
>> Mirror Group 1:
>>
>> physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
>>
>> physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)
>>
>> Drive Type: Data
>>
>> Formatted with ext4 with: sudo mkfs.ext4 -E stride=32,stripe_width=64 -v
>> /dev/sdc1.
>>
>> Mounted in /etc/fstab with this line: 
>> "UUID=99fef4ae-51dc-4365-9210-0b153b1cbbd0
>> /mnt/data ext4 rw,nodiratime,user_xattr,noatime,nobarrier,errors=remount-ro
>> 0 1"
>>
>> Postgresql is the only application running on this server.
>>
>>
>> Postgresql is used as a mini data warehouse to generate reports and do
>> statistical analysis. It is used by at most 2 users and fresh data is added
>> every 10 days. The database has 16 tables: one is 224GB big and the rest
>> are between 16kB and 470MB big.
>>
>>
>> My configuration is:
>>
>>
>> name | current_setting | source
>>
>> -+--
>> --+--
>>
>> application_name | psql | client
>>
>> autovacuum_vacuum_scale_factor | 0 | configuration file
>>
>> autovacuum_vacuum_threshold | 2000 | configuration file
>>
>> checkpoint_completion_target | 0.9 | configuration file
>>
>> checkpoint_timeout | 30min | configuration file
>>
>> client_encoding | UTF8 | client
>>
>> client_min_messages | log | configuration file
>>
>> cluster_name | 9.6/main | configuration file
>>
>> cpu_index_tuple_cost | 0.001 | configuration file
>>
>> cpu_operator_cost | 0.0005 | configuration file
>>
>> cpu_tuple_cost | 0.003 | configuration file
>>
>> DateStyle | ISO, YMD | configuration file
>>
>> default_statistics_target | 100 | configuration file
>>
>> default_text_search_config | pg_catalog.english | configuration file
>>
>> dynamic_shared_memory_type | posix | configuration file
>>
>> effective_cache_size | 22GB | configuration file
>>
>> effective_io_concurrency | 4 | configuration file
>>
>> external_pid_file | /var/run/postgresql/9.6-main.pid | configuration file
>>
>> lc_messages | C | configuration file
>>
>> lc_monetary | en_CA.UTF-8 | configuration file
>>
>> lc_numeric | en_CA.UTF-8 | configuration file
>>
>> lc_time | en_CA.UTF-8 | configuration file
>>
>> listen_addresses | * | configuration file
>>
>> lock_timeout | 100s | configuration file
>>
>> log_autovacuum_min_duration | 0 | configuration file
>>
>> log_checkpoints | on | configuration file
>>
>> log_connections | on | configuration file
>>
>> log_destination | csvlog | configuration file
>>
>> log_directory | /mnt/bigzilla/data/toburn/hp/postgresql/pg_log |
>> configuration file
>>
>> log_disconnections | on | configuration file
>>
>> log_error_verbosity | default | configuration file
>>
>> log_file_mode | 0600 | configuration file
>>
>> log_filename | postgresql-%Y-%m-%d_%H%M%S.log | configuration file
>>
>> log_line_prefix | user=%u,db=%d,app=%aclient=%h | configuration file
>>
>> log_lock_waits | on | configuration file
>>
>> log_min_duration_statement | 0 | configuration file
>>
>> log_min_error_statement | debug1 | configuration file
>>
>> log_min_messages | debug1 | configuration file
>>
>> log_rotation_size | 1GB | configuration file
>>
>> log_temp_files | 0 | configuration file
>>
>> log_timezone | localtime | configuration file
>>
>> logging_collector | on | configuration file
>>
>> maintenance_work_mem | 3GB | configuration file
>>
>> max_connections | 10 | configuration file
>>
>> max_locks_per_transaction | 256 | configuration file
>>
>> 

Re: [PERFORM] Very poor read performance, query independent

2017-07-10 Thread Jeff Janes
On Mon, Jul 10, 2017 at 7:03 AM, Charles Nadeau 
wrote:

>
> The problem I have is very poor read. When I benchmark my array with fio I
> get random reads of about 200MB/s and 1100IOPS and sequential reads of
> about 286MB/s and 21000IPS.
>


That doesn't seem right.  Sequential is only 43% faster?  What job file are
giving to fio?

What do you get if you do something simpler, like:

time cat ~/$PGDATA/base/16402/*|wc -c

replacing 16402 with whatever your biggest database is.

Cheers,

Jeff


Re: [PERFORM] Very poor read performance, query independent

2017-07-10 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Charles Nadeau
Sent: Monday, July 10, 2017 11:48 AM
To: Andreas Kretschmer <andr...@a-kretschmer.de>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Andreas,

Because the ratio between the Sequential IOPS and Random IOPS is about 29. 
Taking into account that part of the data is in RAM, I obtained an "effective" 
ratio of about 22.
Thanks!

Charles

On Mon, Jul 10, 2017 at 5:35 PM, Andreas Kretschmer 
<andr...@a-kretschmer.de<mailto:andr...@a-kretschmer.de>> wrote:


Am 10.07.2017 um 16:03 schrieb Charles Nadeau:
random_page_cost | 22


why such a high value for random_page_cost?

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com<http://www.2ndQuadrant.com>


--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Considering RAM size of 72 GB and your database size of ~225GB, and also the 
fact that Postgres is the only app running on the server, probably 1/3 of your 
database resides in memory, so random_page_cost = 22 looks extremely high, 
probably it completely precludes index usage in your queries.

You should try this setting at least at its default value: random_page_cost =4, 
and probably go even lower.
Also, effective_cache_size is at least as big as your shared_buffers. Having 
72GB RAM t effective_cache_size should be set around 64GB (again considering 
that Postgres is the only app running on the server).

Regards,
Igor Neyman






Re: [PERFORM] Very poor read performance, query independent

2017-07-10 Thread Charles Nadeau
Andreas,

Because the ratio between the Sequential IOPS and Random IOPS is about 29.
Taking into account that part of the data is in RAM, I obtained an
"effective" ratio of about 22.
Thanks!

Charles

On Mon, Jul 10, 2017 at 5:35 PM, Andreas Kretschmer  wrote:

>
>
> Am 10.07.2017 um 16:03 schrieb Charles Nadeau:
>
>> random_page_cost | 22
>>
>
>
> why such a high value for random_page_cost?
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/


Re: [PERFORM] Very poor read performance, query independent

2017-07-10 Thread Andreas Kretschmer



Am 10.07.2017 um 16:03 schrieb Charles Nadeau:
random_page_cost | 22 



why such a high value for random_page_cost?

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-10 Thread Rick Otten
Although probably not the root cause, at the least I would set up hugepages
 (
https://www.postgresql.org/docs/9.6/static/kernel-resources.html#LINUX-HUGE-PAGES
), and bump effective_io_concurrency up quite a bit as well (256 ?).


On Mon, Jul 10, 2017 at 10:03 AM, Charles Nadeau 
wrote:

> I’m running PostgreSQL 9.6.3 on Ubuntu 16.10 (kernel 4.4.0-85-generic).
> Hardware is:
>
> *2x Intel Xeon E5550
>
> *72GB RAM
>
> *Hardware RAID10 (4 x 146GB SAS 10k) P410i controller with 1GB FBWC (80%
> read/20% write) for Postgresql data only:
>
> Logical Drive: 3
>
> Size: 273.4 GB
>
> Fault Tolerance: 1+0
>
> Heads: 255
>
> Sectors Per Track: 32
>
> Cylinders: 65535
>
> Strip Size: 128 KB
>
> Full Stripe Size: 256 KB
>
> Status: OK
>
> Caching: Enabled
>
> Unique Identifier: 600508B1001037383941424344450A00
>
> Disk Name: /dev/sdc
>
> Mount Points: /mnt/data 273.4 GB
>
> OS Status: LOCKED
>
> Logical Drive Label: A00A194750123456789ABCDE516F
>
> Mirror Group 0:
>
> physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)
>
> physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, OK)
>
> Mirror Group 1:
>
> physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
>
> physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)
>
> Drive Type: Data
>
> Formatted with ext4 with: sudo mkfs.ext4 -E stride=32,stripe_width=64 -v
> /dev/sdc1.
>
> Mounted in /etc/fstab with this line: 
> "UUID=99fef4ae-51dc-4365-9210-0b153b1cbbd0
> /mnt/data ext4 rw,nodiratime,user_xattr,noatime,nobarrier,errors=remount-ro
> 0 1"
>
> Postgresql is the only application running on this server.
>
>
> Postgresql is used as a mini data warehouse to generate reports and do
> statistical analysis. It is used by at most 2 users and fresh data is added
> every 10 days. The database has 16 tables: one is 224GB big and the rest
> are between 16kB and 470MB big.
>
>
> My configuration is:
>
>
> name | current_setting | source
>
> -+--
> --+--
>
> application_name | psql | client
>
> autovacuum_vacuum_scale_factor | 0 | configuration file
>
> autovacuum_vacuum_threshold | 2000 | configuration file
>
> checkpoint_completion_target | 0.9 | configuration file
>
> checkpoint_timeout | 30min | configuration file
>
> client_encoding | UTF8 | client
>
> client_min_messages | log | configuration file
>
> cluster_name | 9.6/main | configuration file
>
> cpu_index_tuple_cost | 0.001 | configuration file
>
> cpu_operator_cost | 0.0005 | configuration file
>
> cpu_tuple_cost | 0.003 | configuration file
>
> DateStyle | ISO, YMD | configuration file
>
> default_statistics_target | 100 | configuration file
>
> default_text_search_config | pg_catalog.english | configuration file
>
> dynamic_shared_memory_type | posix | configuration file
>
> effective_cache_size | 22GB | configuration file
>
> effective_io_concurrency | 4 | configuration file
>
> external_pid_file | /var/run/postgresql/9.6-main.pid | configuration file
>
> lc_messages | C | configuration file
>
> lc_monetary | en_CA.UTF-8 | configuration file
>
> lc_numeric | en_CA.UTF-8 | configuration file
>
> lc_time | en_CA.UTF-8 | configuration file
>
> listen_addresses | * | configuration file
>
> lock_timeout | 100s | configuration file
>
> log_autovacuum_min_duration | 0 | configuration file
>
> log_checkpoints | on | configuration file
>
> log_connections | on | configuration file
>
> log_destination | csvlog | configuration file
>
> log_directory | /mnt/bigzilla/data/toburn/hp/postgresql/pg_log |
> configuration file
>
> log_disconnections | on | configuration file
>
> log_error_verbosity | default | configuration file
>
> log_file_mode | 0600 | configuration file
>
> log_filename | postgresql-%Y-%m-%d_%H%M%S.log | configuration file
>
> log_line_prefix | user=%u,db=%d,app=%aclient=%h | configuration file
>
> log_lock_waits | on | configuration file
>
> log_min_duration_statement | 0 | configuration file
>
> log_min_error_statement | debug1 | configuration file
>
> log_min_messages | debug1 | configuration file
>
> log_rotation_size | 1GB | configuration file
>
> log_temp_files | 0 | configuration file
>
> log_timezone | localtime | configuration file
>
> logging_collector | on | configuration file
>
> maintenance_work_mem | 3GB | configuration file
>
> max_connections | 10 | configuration file
>
> max_locks_per_transaction | 256 | configuration file
>
> max_parallel_workers_per_gather | 14 | configuration file
>
> max_stack_depth | 2MB | environment variable
>
> max_wal_size | 4GB | configuration file
>
> max_worker_processes | 14 | configuration file
>
> min_wal_size | 2GB | configuration file
>
> parallel_setup_cost | 1000 | configuration file
>
> parallel_tuple_cost | 0.012 | configuration file
>
> port | 5432 | configuration file
>
> random_page_cost | 22 | configuration file
>
> seq_page_cost | 1 | configuration file
>
> shared_buffers | 34GB | configuration file
>
>