Batch insert heavily affecting query performance.

2017-12-24 Thread Jean Baro
Hi there,

We are testing a new application to try to find performance issues.

AWS RDS m4.large 500GB storage (SSD)

One table only, called Messages:

Uuid
Country  (ISO)
Role (Text)
User id  (Text)
GroupId (integer)
Channel (text)
Title (Text)
Payload (JSON, up to 20kb)
Starts_in (UTC)
Expires_in (UTC)
Seen (boolean)
Deleted (boolean)
LastUpdate (UTC)
Created_by (UTC)
Created_in (UTC)

Indexes:

UUID (PK)
UserID + Country (main index)
LastUpdate
GroupID


We inserted 160MM rows, around 2KB each. No partitioning.

Insert started at around  3.000 inserts per second, but (as expected)
started to slow down as the number of rows increased.  In the end we got
around 500 inserts per second.

Queries by Userd_ID + Country took less than 2 seconds, but while the batch
insert was running the queries took over 20 seconds!!!

We had 20 Lambda getting messages from SQS and bulk inserting them into
Postgresql.

The insert performance is important, but we would slow it down if needed in
order to ensure a more flat query performance. (Below 2 seconds). Each
query (userId + country) returns around 100 diferent messages, which are
filtered and order by the synchronous Lambda function. So we don't do any
special filtering, sorting, ordering or full text search in Postgres. In
some ways we use it more like a glorified file system. :)

We are going to limit the number of lambda workers to 1 or 2, and then run
some queries concurrently to see if the query performance is not affect too
much. We aim to get at least 50 queries per second (returning 100 messages
each) under 2 seconds, even when there is millions of messages on SQS being
inserted into PG.

We haven't done any performance tuning in the DB.

With all that said, the question is:

What can be done to ensure good query performance (UserID+ country) even
when the bulk insert is running (low priority).

We are limited to use AWS RDS at the moment.

Cheers


Re: Batch insert heavily affecting query performance.

2017-12-24 Thread michael...@sqlexec.com
Are the inserts being done through one connection or multiple connections 
concurrently?

Sent from my iPhone

> On Dec 24, 2017, at 2:51 PM, Jean Baro  wrote:
> 
> Hi there,
> 
> We are testing a new application to try to find performance issues.
> 
> AWS RDS m4.large 500GB storage (SSD)
> 
> One table only, called Messages:
> 
> Uuid
> Country  (ISO)
> Role (Text)
> User id  (Text)
> GroupId (integer)
> Channel (text)
> Title (Text)
> Payload (JSON, up to 20kb)
> Starts_in (UTC)
> Expires_in (UTC)
> Seen (boolean)
> Deleted (boolean)
> LastUpdate (UTC)
> Created_by (UTC)
> Created_in (UTC)
> 
> Indexes:
> 
> UUID (PK)
> UserID + Country (main index)
> LastUpdate 
> GroupID 
> 
> 
> We inserted 160MM rows, around 2KB each. No partitioning.
> 
> Insert started at around  3.000 inserts per second, but (as expected) started 
> to slow down as the number of rows increased.  In the end we got around 500 
> inserts per second.
> 
> Queries by Userd_ID + Country took less than 2 seconds, but while the batch 
> insert was running the queries took over 20 seconds!!!
> 
> We had 20 Lambda getting messages from SQS and bulk inserting them into 
> Postgresql. 
> 
> The insert performance is important, but we would slow it down if needed in 
> order to ensure a more flat query performance. (Below 2 seconds). Each query 
> (userId + country) returns around 100 diferent messages, which are filtered 
> and order by the synchronous Lambda function. So we don't do any special 
> filtering, sorting, ordering or full text search in Postgres. In some ways we 
> use it more like a glorified file system. :)
> 
> We are going to limit the number of lambda workers to 1 or 2, and then run 
> some queries concurrently to see if the query performance is not affect too 
> much. We aim to get at least 50 queries per second (returning 100 messages 
> each) under 2 seconds, even when there is millions of messages on SQS being 
> inserted into PG.
> 
> We haven't done any performance tuning in the DB. 
> 
> With all that said, the question is:
> 
> What can be done to ensure good query performance (UserID+ country) even when 
> the bulk insert is running (low priority).
> 
> We are limited to use AWS RDS at the moment.
> 
> Cheers
> 
> 




Re: Batch insert heavily affecting query performance.

2017-12-24 Thread Jean Baro
Multiple connections, but we are going to test it with only one. Would it
make any difference?

Thanks



Em 24 de dez de 2017 21:52, "michael...@sqlexec.com" 
escreveu:

> Are the inserts being done through one connection or multiple connections
> concurrently?
>
> Sent from my iPhone
>
> > On Dec 24, 2017, at 2:51 PM, Jean Baro  wrote:
> >
> > Hi there,
> >
> > We are testing a new application to try to find performance issues.
> >
> > AWS RDS m4.large 500GB storage (SSD)
> >
> > One table only, called Messages:
> >
> > Uuid
> > Country  (ISO)
> > Role (Text)
> > User id  (Text)
> > GroupId (integer)
> > Channel (text)
> > Title (Text)
> > Payload (JSON, up to 20kb)
> > Starts_in (UTC)
> > Expires_in (UTC)
> > Seen (boolean)
> > Deleted (boolean)
> > LastUpdate (UTC)
> > Created_by (UTC)
> > Created_in (UTC)
> >
> > Indexes:
> >
> > UUID (PK)
> > UserID + Country (main index)
> > LastUpdate
> > GroupID
> >
> >
> > We inserted 160MM rows, around 2KB each. No partitioning.
> >
> > Insert started at around  3.000 inserts per second, but (as expected)
> started to slow down as the number of rows increased.  In the end we got
> around 500 inserts per second.
> >
> > Queries by Userd_ID + Country took less than 2 seconds, but while the
> batch insert was running the queries took over 20 seconds!!!
> >
> > We had 20 Lambda getting messages from SQS and bulk inserting them into
> Postgresql.
> >
> > The insert performance is important, but we would slow it down if needed
> in order to ensure a more flat query performance. (Below 2 seconds). Each
> query (userId + country) returns around 100 diferent messages, which are
> filtered and order by the synchronous Lambda function. So we don't do any
> special filtering, sorting, ordering or full text search in Postgres. In
> some ways we use it more like a glorified file system. :)
> >
> > We are going to limit the number of lambda workers to 1 or 2, and then
> run some queries concurrently to see if the query performance is not affect
> too much. We aim to get at least 50 queries per second (returning 100
> messages each) under 2 seconds, even when there is millions of messages on
> SQS being inserted into PG.
> >
> > We haven't done any performance tuning in the DB.
> >
> > With all that said, the question is:
> >
> > What can be done to ensure good query performance (UserID+ country) even
> when the bulk insert is running (low priority).
> >
> > We are limited to use AWS RDS at the moment.
> >
> > Cheers
> >
> >
>
>


Re: Batch insert heavily affecting query performance.

2017-12-24 Thread MichaelDBA
Yes it would/does make a difference!  When you do it with one connection 
you should see a big performance gain.  Delayed, granted, extend locks 
(locktype=extend) can happen due to many concurrent connections trying 
to insert into the same table at the same time. Each insert request 
results in an extend lock (8k extension), which blocks other writers. 
What normally happens is the these extend locks happen so fast that you 
hardly ever see them in the pg_locks table, except in the case where 
many concurrent connections are trying to do a lot of inserts into the 
same table. The following query will show if this is the case:


select * from pg_locks where granted = false and locktype = 'extend';


Jean Baro 
Sunday, December 24, 2017 7:09 PM
Multiple connections, but we are going to test it with only one. Would 
it make any difference?


Thanks



michael...@sqlexec.com 
Sunday, December 24, 2017 6:52 PM
Are the inserts being done through one connection or multiple 
connections concurrently?


Sent from my iPhone



Jean Baro 
Sunday, December 24, 2017 2:51 PM
Hi there,

We are testing a new application to try to find performance issues.

AWS RDS m4.large 500GB storage (SSD)

One table only, called Messages:

Uuid
Country  (ISO)
Role (Text)
User id  (Text)
GroupId (integer)
Channel (text)
Title (Text)
Payload (JSON, up to 20kb)
Starts_in (UTC)
Expires_in (UTC)
Seen (boolean)
Deleted (boolean)
LastUpdate (UTC)
Created_by (UTC)
Created_in (UTC)

Indexes:

UUID (PK)
UserID + Country (main index)
LastUpdate
GroupID


We inserted 160MM rows, around 2KB each. No partitioning.

Insert started at around  3.000 inserts per second, but (as expected) 
started to slow down as the number of rows increased.  In the end we 
got around 500 inserts per second.


Queries by Userd_ID + Country took less than 2 seconds, but while the 
batch insert was running the queries took over 20 seconds!!!


We had 20 Lambda getting messages from SQS and bulk inserting them 
into Postgresql.


The insert performance is important, but we would slow it down if 
needed in order to ensure a more flat query performance. (Below 2 
seconds). Each query (userId + country) returns around 100 diferent 
messages, which are filtered and order by the synchronous Lambda 
function. So we don't do any special filtering, sorting, ordering or 
full text search in Postgres. In some ways we use it more like a 
glorified file system. :)


We are going to limit the number of lambda workers to 1 or 2, and then 
run some queries concurrently to see if the query performance is not 
affect too much. We aim to get at least 50 queries per second 
(returning 100 messages each) under 2 seconds, even when there is 
millions of messages on SQS being inserted into PG.


We haven't done any performance tuning in the DB.

With all that said, the question is:

What can be done to ensure good query performance (UserID+ country) 
even when the bulk insert is running (low priority).


We are limited to use AWS RDS at the moment.

Cheers






Re: Batch insert heavily affecting query performance.

2017-12-24 Thread Danylo Hlynskyi
I had an opportunity to perform insertion of 700MM rows into Aurora
Postgresql, for which performance insights are available. Turns out, that
there are two stages of insert slowdown - first happens when max WAL
buffers limit reached, second happens around 1 hour after.

The first stage cuts insert performance twice, and WALWrite lock is main
bottleneck. I think WAL just can't sync changes log that fast, so it waits
while older log entries are flushed. This creates both read and write IO.

The second stage is unique to Aurora/RDS and is characterized by excessive
read data locks and total read IO. I couldn't figure out why does it read
so much in a write only process, and AWS support didn't answer yet.

So, for you, try to throttle inserts so WAL is never overfilled and you
don't experience WALWrite locks, and then increase wal buffers to max.

24 груд. 2017 р. 21:51 "Jean Baro"  пише:

Hi there,

We are testing a new application to try to find performance issues.

AWS RDS m4.large 500GB storage (SSD)

One table only, called Messages:

Uuid
Country  (ISO)
Role (Text)
User id  (Text)
GroupId (integer)
Channel (text)
Title (Text)
Payload (JSON, up to 20kb)
Starts_in (UTC)
Expires_in (UTC)
Seen (boolean)
Deleted (boolean)
LastUpdate (UTC)
Created_by (UTC)
Created_in (UTC)

Indexes:

UUID (PK)
UserID + Country (main index)
LastUpdate
GroupID


We inserted 160MM rows, around 2KB each. No partitioning.

Insert started at around  3.000 inserts per second, but (as expected)
started to slow down as the number of rows increased.  In the end we got
around 500 inserts per second.

Queries by Userd_ID + Country took less than 2 seconds, but while the batch
insert was running the queries took over 20 seconds!!!

We had 20 Lambda getting messages from SQS and bulk inserting them into
Postgresql.

The insert performance is important, but we would slow it down if needed in
order to ensure a more flat query performance. (Below 2 seconds). Each
query (userId + country) returns around 100 diferent messages, which are
filtered and order by the synchronous Lambda function. So we don't do any
special filtering, sorting, ordering or full text search in Postgres. In
some ways we use it more like a glorified file system. :)

We are going to limit the number of lambda workers to 1 or 2, and then run
some queries concurrently to see if the query performance is not affect too
much. We aim to get at least 50 queries per second (returning 100 messages
each) under 2 seconds, even when there is millions of messages on SQS being
inserted into PG.

We haven't done any performance tuning in the DB.

With all that said, the question is:

What can be done to ensure good query performance (UserID+ country) even
when the bulk insert is running (low priority).

We are limited to use AWS RDS at the moment.

Cheers


Re: Batch insert heavily affecting query performance.

2017-12-24 Thread Jean Baro
Thanks for the clarification guys.

It will be super useful. After trying this I'll post the results!

Merry Christmas!

Em 25 de dez de 2017 00:59, "Danylo Hlynskyi" 
escreveu:

> I had an opportunity to perform insertion of 700MM rows into Aurora
> Postgresql, for which performance insights are available. Turns out, that
> there are two stages of insert slowdown - first happens when max WAL
> buffers limit reached, second happens around 1 hour after.
>
> The first stage cuts insert performance twice, and WALWrite lock is main
> bottleneck. I think WAL just can't sync changes log that fast, so it waits
> while older log entries are flushed. This creates both read and write IO.
>
> The second stage is unique to Aurora/RDS and is characterized by excessive
> read data locks and total read IO. I couldn't figure out why does it read
> so much in a write only process, and AWS support didn't answer yet.
>
> So, for you, try to throttle inserts so WAL is never overfilled and you
> don't experience WALWrite locks, and then increase wal buffers to max.
>
> 24 груд. 2017 р. 21:51 "Jean Baro"  пише:
>
> Hi there,
>
> We are testing a new application to try to find performance issues.
>
> AWS RDS m4.large 500GB storage (SSD)
>
> One table only, called Messages:
>
> Uuid
> Country  (ISO)
> Role (Text)
> User id  (Text)
> GroupId (integer)
> Channel (text)
> Title (Text)
> Payload (JSON, up to 20kb)
> Starts_in (UTC)
> Expires_in (UTC)
> Seen (boolean)
> Deleted (boolean)
> LastUpdate (UTC)
> Created_by (UTC)
> Created_in (UTC)
>
> Indexes:
>
> UUID (PK)
> UserID + Country (main index)
> LastUpdate
> GroupID
>
>
> We inserted 160MM rows, around 2KB each. No partitioning.
>
> Insert started at around  3.000 inserts per second, but (as expected)
> started to slow down as the number of rows increased.  In the end we got
> around 500 inserts per second.
>
> Queries by Userd_ID + Country took less than 2 seconds, but while the
> batch insert was running the queries took over 20 seconds!!!
>
> We had 20 Lambda getting messages from SQS and bulk inserting them into
> Postgresql.
>
> The insert performance is important, but we would slow it down if needed
> in order to ensure a more flat query performance. (Below 2 seconds). Each
> query (userId + country) returns around 100 diferent messages, which are
> filtered and order by the synchronous Lambda function. So we don't do any
> special filtering, sorting, ordering or full text search in Postgres. In
> some ways we use it more like a glorified file system. :)
>
> We are going to limit the number of lambda workers to 1 or 2, and then run
> some queries concurrently to see if the query performance is not affect too
> much. We aim to get at least 50 queries per second (returning 100 messages
> each) under 2 seconds, even when there is millions of messages on SQS being
> inserted into PG.
>
> We haven't done any performance tuning in the DB.
>
> With all that said, the question is:
>
> What can be done to ensure good query performance (UserID+ country) even
> when the bulk insert is running (low priority).
>
> We are limited to use AWS RDS at the moment.
>
> Cheers
>
>
>
>