Thanks for the clarification guys. It will be super useful. After trying this I'll post the results!
Merry Christmas! Em 25 de dez de 2017 00:59, "Danylo Hlynskyi" <abcz2.upr...@gmail.com> escreveu: > I had an opportunity to perform insertion of 700MM rows into Aurora > Postgresql, for which performance insights are available. Turns out, that > there are two stages of insert slowdown - first happens when max WAL > buffers limit reached, second happens around 1 hour after. > > The first stage cuts insert performance twice, and WALWrite lock is main > bottleneck. I think WAL just can't sync changes log that fast, so it waits > while older log entries are flushed. This creates both read and write IO. > > The second stage is unique to Aurora/RDS and is characterized by excessive > read data locks and total read IO. I couldn't figure out why does it read > so much in a write only process, and AWS support didn't answer yet. > > So, for you, try to throttle inserts so WAL is never overfilled and you > don't experience WALWrite locks, and then increase wal buffers to max. > > 24 груд. 2017 р. 21:51 "Jean Baro" <jfb...@gmail.com> пише: > > Hi there, > > We are testing a new application to try to find performance issues. > > AWS RDS m4.large 500GB storage (SSD) > > One table only, called Messages: > > Uuid > Country (ISO) > Role (Text) > User id (Text) > GroupId (integer) > Channel (text) > Title (Text) > Payload (JSON, up to 20kb) > Starts_in (UTC) > Expires_in (UTC) > Seen (boolean) > Deleted (boolean) > LastUpdate (UTC) > Created_by (UTC) > Created_in (UTC) > > Indexes: > > UUID (PK) > UserID + Country (main index) > LastUpdate > GroupID > > > We inserted 160MM rows, around 2KB each. No partitioning. > > Insert started at around 3.000 inserts per second, but (as expected) > started to slow down as the number of rows increased. In the end we got > around 500 inserts per second. > > Queries by Userd_ID + Country took less than 2 seconds, but while the > batch insert was running the queries took over 20 seconds!!! > > We had 20 Lambda getting messages from SQS and bulk inserting them into > Postgresql. > > The insert performance is important, but we would slow it down if needed > in order to ensure a more flat query performance. (Below 2 seconds). Each > query (userId + country) returns around 100 diferent messages, which are > filtered and order by the synchronous Lambda function. So we don't do any > special filtering, sorting, ordering or full text search in Postgres. In > some ways we use it more like a glorified file system. :) > > We are going to limit the number of lambda workers to 1 or 2, and then run > some queries concurrently to see if the query performance is not affect too > much. We aim to get at least 50 queries per second (returning 100 messages > each) under 2 seconds, even when there is millions of messages on SQS being > inserted into PG. > > We haven't done any performance tuning in the DB. > > With all that said, the question is: > > What can be done to ensure good query performance (UserID+ country) even > when the bulk insert is running (low priority). > > We are limited to use AWS RDS at the moment. > > Cheers > > > >