Re: Batch insert heavily affecting query performance.

Jean Baro Sun, 24 Dec 2017 19:11:27 -0800

Thanks for the clarification guys.

It will be super useful. After trying this I'll post the results!


Merry Christmas!

Em 25 de dez de 2017 00:59, "Danylo Hlynskyi" <abcz2.upr...@gmail.com>
escreveu:

> I had an opportunity to perform insertion of 700MM rows into Aurora
> Postgresql, for which performance insights are available. Turns out, that
> there are two stages of insert slowdown - first happens when max WAL
> buffers limit reached, second happens around 1 hour after.
>
> The first stage cuts insert performance twice, and WALWrite lock is main
> bottleneck. I think WAL just can't sync changes log that fast, so it waits
> while older log entries are flushed. This creates both read and write IO.
>
> The second stage is unique to Aurora/RDS and is characterized by excessive
> read data locks and total read IO. I couldn't figure out why does it read
> so much in a write only process, and AWS support didn't answer yet.
>
> So, for you, try to throttle inserts so WAL is never overfilled and you
> don't experience WALWrite locks, and then increase wal buffers to max.
>
> 24 груд. 2017 р. 21:51 "Jean Baro" <jfb...@gmail.com> пише:
>
> Hi there,
>
> We are testing a new application to try to find performance issues.
>
> AWS RDS m4.large 500GB storage (SSD)
>
> One table only, called Messages:
>
> Uuid
> Country  (ISO)
> Role (Text)
> User id  (Text)
> GroupId (integer)
> Channel (text)
> Title (Text)
> Payload (JSON, up to 20kb)
> Starts_in (UTC)
> Expires_in (UTC)
> Seen (boolean)
> Deleted (boolean)
> LastUpdate (UTC)
> Created_by (UTC)
> Created_in (UTC)
>
> Indexes:
>
> UUID (PK)
> UserID + Country (main index)
> LastUpdate
> GroupID
>
>
> We inserted 160MM rows, around 2KB each. No partitioning.
>
> Insert started at around  3.000 inserts per second, but (as expected)
> started to slow down as the number of rows increased.  In the end we got
> around 500 inserts per second.
>
> Queries by Userd_ID + Country took less than 2 seconds, but while the
> batch insert was running the queries took over 20 seconds!!!
>
> We had 20 Lambda getting messages from SQS and bulk inserting them into
> Postgresql.
>
> The insert performance is important, but we would slow it down if needed
> in order to ensure a more flat query performance. (Below 2 seconds). Each
> query (userId + country) returns around 100 diferent messages, which are
> filtered and order by the synchronous Lambda function. So we don't do any
> special filtering, sorting, ordering or full text search in Postgres. In
> some ways we use it more like a glorified file system. :)
>
> We are going to limit the number of lambda workers to 1 or 2, and then run
> some queries concurrently to see if the query performance is not affect too
> much. We aim to get at least 50 queries per second (returning 100 messages
> each) under 2 seconds, even when there is millions of messages on SQS being
> inserted into PG.
>
> We haven't done any performance tuning in the DB.
>
> With all that said, the question is:
>
> What can be done to ensure good query performance (UserID+ country) even
> when the bulk insert is running (low priority).
>
> We are limited to use AWS RDS at the moment.
>
> Cheers
>
>
>
>

Re: Batch insert heavily affecting query performance.

Reply via email to