Re: PostgreSQL Read IOPS limit per connection

2019-01-10 Thread Mark Hogg
Hello,

I am happy to hear that you have received all the help.

Please feel free to contact us for professional assistance any time you may
need in the future.

Most Welcome!


Regards,


Mark Avinash Hogg

Director of Business Development

2ndQuadrant

+1(647) 770 9821 Cell

www.2ndquadrant.com

mark.h...@2ndquadrant.com


On Wed, 9 Jan 2019 at 19:20, Merlin Moncure (via Accelo) 
wrote:

> On Wed, Jan 9, 2019 at 3:52 PM Haroldo Kerry  wrote:
>
>> @Justin  @Merlin @ Jeff,
>> Thanks so much for your time and insights, they improved our
>> understanding of the underpinnings of PostgreSQL and allowed us to deal the
>> issues we were facing.
>> Using parallel query on our PG 9.6 improved a lot the query performance -
>> it turns out that a lot of our real world queries could benefit of parallel
>> query, we saw about 4x improvements after turning it on, and now we see
>> much higher storage IOPS thanks to the multiple workers.
>> On our tests effective_io_concurrency did not show such a large effect as
>> the link you sent, I'll have a new look at it, maybe we are doing something
>> wrong or the fact that the SSDs are on the SAN and not local affects the
>> results.
>> On the process we also learned that changing the default Linux I/O
>> scheduler from CFQ to Deadline worked wonders for our Dell SC2020 SAN
>> Storage setup, we used to see latency peaks of 6,000 milliseconds on busy
>> periods (yes, 6 seconds), we now see 80 milliseconds, an almost 100 fold
>> improvement.
>>
>
> The links sent was using a contrived query to force a type of scan that
> benefits from that kind of query; it's a very situational benefit.  It
> would be interesting if you couldn't reproduce using the same mechanic.
>
> merlin
>
>>


Re: PostgreSQL Read IOPS limit per connection

2019-01-09 Thread Merlin Moncure
On Wed, Jan 9, 2019 at 3:52 PM Haroldo Kerry  wrote:

> @Justin  @Merlin @ Jeff,
> Thanks so much for your time and insights, they improved our understanding
> of the underpinnings of PostgreSQL and allowed us to deal the issues we
> were facing.
> Using parallel query on our PG 9.6 improved a lot the query performance -
> it turns out that a lot of our real world queries could benefit of parallel
> query, we saw about 4x improvements after turning it on, and now we see
> much higher storage IOPS thanks to the multiple workers.
> On our tests effective_io_concurrency did not show such a large effect as
> the link you sent, I'll have a new look at it, maybe we are doing something
> wrong or the fact that the SSDs are on the SAN and not local affects the
> results.
> On the process we also learned that changing the default Linux I/O
> scheduler from CFQ to Deadline worked wonders for our Dell SC2020 SAN
> Storage setup, we used to see latency peaks of 6,000 milliseconds on busy
> periods (yes, 6 seconds), we now see 80 milliseconds, an almost 100 fold
> improvement.
>

The links sent was using a contrived query to force a type of scan that
benefits from that kind of query; it's a very situational benefit.  It
would be interesting if you couldn't reproduce using the same mechanic.

merlin

>


Re: PostgreSQL Read IOPS limit per connection

2019-01-09 Thread Haroldo Kerry
@Justin  @Merlin @ Jeff,
Thanks so much for your time and insights, they improved our understanding
of the underpinnings of PostgreSQL and allowed us to deal the issues we
were facing.
Using parallel query on our PG 9.6 improved a lot the query performance -
it turns out that a lot of our real world queries could benefit of parallel
query, we saw about 4x improvements after turning it on, and now we see
much higher storage IOPS thanks to the multiple workers.
On our tests effective_io_concurrency did not show such a large effect as
the link you sent, I'll have a new look at it, maybe we are doing something
wrong or the fact that the SSDs are on the SAN and not local affects the
results.
On the process we also learned that changing the default Linux I/O
scheduler from CFQ to Deadline worked wonders for our Dell SC2020 SAN
Storage setup, we used to see latency peaks of 6,000 milliseconds on busy
periods (yes, 6 seconds), we now see 80 milliseconds, an almost 100 fold
improvement.


Best regards,
Haroldo Kerry



On Wed, Jan 9, 2019 at 5:14 PM Merlin Moncure  wrote:

> On Thu, Dec 27, 2018 at 7:29 PM Justin Pryzby 
> wrote:
> >
> > On Thu, Dec 27, 2018 at 08:20:23PM -0500, Jeff Janes wrote:
> > > Also, you would want to use the newest version of PostgreSQL, as 9.6
> > > doesn't have parallel query, which is much more generally applicable
> than
> > > effective_io_concurrency is.
>
> effective_io_concurrency only applies to certain queries.   When it
> does apply it can work wonders. See:
>
> https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com
>  for  an example of how it can benefit.
>
> parallel query is not going to help single threaded pg_bench results.
> you are going to be entirely latency bound (network from bebench to
> postgres, then postgres to storage).  On my dell crapbox I was getting
> 2200tps so you have some point of slowness relative to me, probably
> not the disk itself.
>
> Geetting faster performance is an age-old problem; you need to
> aggregate specific requests into more general ones, move the
> controlling logic into the database itself, or use various other
> strategies.  Lowering latency is a hardware problem and can force
> trade-offs (like, don't use a SAN) and has specific boundaries that
> are not easy to bust through.
>
> merlin
>
>

-- 

Haroldo Kerry

CTO/COO

Rua do Rócio, 220, 7° andar, conjunto 72

São Paulo – SP / CEP 04552-000

hke...@callix.com.br

www.callix.com.br


Re: PostgreSQL Read IOPS limit per connection

2018-12-27 Thread Jeff Janes
>
>
> *Performance issue:*
>
> I’m trying to figure out if PostgreSQL (PG) has some inherent limit on
> IOPS per connection.
>
> Running pgbench with multiple clients (-c 30) we are able to see 20K+ IOPS
> , which is what we expect. But, if we use just one client, we get 1200
> IOPS, avg disk queue size around 1:
>

The default transaction done by pgbench simply has no opportunity for
dispatching multiple io requests per connection.  It just a series of
single-row lookups and single-row updates or inserts.  You will have to use
a different benchmark if you want to exercise this area.  Probably
something analytics heavy.

Also, you would want to use the newest version of PostgreSQL, as 9.6
doesn't have parallel query, which is much more generally applicable than
effective_io_concurrency is.

One of the issues I’m trying to solve is related to extracting data from a
> large table, which users a full table scan. We see the same 1200 IOPS limit
> of pgbench when we SELECT on this table using just one connection. If there
> is a limitation per connection, I might set up the application to have
> several connections, and then issue SELECTs for different sections of the
> table, and later join the data, but it looks cumbersome, especially if the
> DB can do extract data using more IOPS.
>
The kernel should detect a sequential read in progress and invoke
readahead.  That should be able to keep the CPU quite busy with data for
any decent IO system.  Are you sure IO is even the bottleneck for your
query?

Perhaps your kernel readahead settings need to be tuned.  Also, you may
benefit from parallel query features implemented in newer versions of
PostgreSQL.  In any event, the default transactions of pgbench are not
going to be useful for benchmarking what you care about.

Cheers,

Jeff


Re: PostgreSQL Read IOPS limit per connection

2018-12-27 Thread Haroldo Kerry
Justin,
Thanks for the quick response, I'll check it out.

Happy holidays,
Haroldo Kerry

On Thu, Dec 27, 2018 at 2:55 PM Justin Pryzby  wrote:

> On Thu, Dec 27, 2018 at 02:44:55PM -0200, Haroldo Kerry wrote:
> > PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg80+1),
>
> > Connected to SAN: Dell Compellent SC2020, with 7 x Samsung PM1633 SSDs
> >
> https://www.samsung.com/us/labs/pdfs/collateral/pm1633-prodoverview-2015.pdf
> ,
> > RAID10+RAID5 configuration, 8GB Cache, read-write battery backed cache
> > enabled, connected via dedicated iSCSI switches and dedicated Ethernet
> > ports, in link aggregation mode (2x1Gbps max bandwidth).
>
> > I’m trying to figure out if PostgreSQL (PG) has some inherent limit on
> IOPS
> > per connection.
>
> postgres uses one server backend per client.
>
> > We tried to increase effective_io_size from 1 to 30, to no effect on
> > multiple tests.
>
>
> https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
> => "Currently, this setting only affects bitmap heap scans."
>
> > Is this some inherent limitation of PG or am I misunderstanding
> something?
>
> It is a hsitoric limitation, but nowadays there's parallel query, which
> uses
> 2ndary "backend worker" processes.
>
> It's supported in v9.6 but much more versatile in v10 and v11.
>
> Justin
>


-- 

Haroldo Kerry

CTO/COO

Rua do Rócio, 220, 7° andar, conjunto 72

São Paulo – SP / CEP 04552-000

hke...@callix.com.br

www.callix.com.br


Re: PostgreSQL Read IOPS limit per connection

2018-12-27 Thread Justin Pryzby
On Thu, Dec 27, 2018 at 02:44:55PM -0200, Haroldo Kerry wrote:
> PostgreSQL 9.6.10 on x86_64-pc-linux-gnu (Debian 9.6.10-1.pgdg80+1),

> Connected to SAN: Dell Compellent SC2020, with 7 x Samsung PM1633 SSDs
> https://www.samsung.com/us/labs/pdfs/collateral/pm1633-prodoverview-2015.pdf,
> RAID10+RAID5 configuration, 8GB Cache, read-write battery backed cache
> enabled, connected via dedicated iSCSI switches and dedicated Ethernet
> ports, in link aggregation mode (2x1Gbps max bandwidth).

> I’m trying to figure out if PostgreSQL (PG) has some inherent limit on IOPS
> per connection.

postgres uses one server backend per client.

> We tried to increase effective_io_size from 1 to 30, to no effect on
> multiple tests.

https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR
=> "Currently, this setting only affects bitmap heap scans."

> Is this some inherent limitation of PG or am I misunderstanding something?

It is a hsitoric limitation, but nowadays there's parallel query, which uses
2ndary "backend worker" processes.

It's supported in v9.6 but much more versatile in v10 and v11.

Justin