Thanks Rick,

We are now partitioning the DB (one table) into 100 sets of data.

As soon as we finish this new experiment we will provide a better EXPLAIN
as you suggested. :)

Em 27 de dez de 2017 13:38, "Rick Otten" <>

On Wed, Dec 27, 2017 at 10:13 AM, Jean Baro <> wrote:

> Hello,
> We are still seeing queries  (by UserID + UserCountry) taking over 2
> seconds, even when there is no batch insert going on at the same time.
> Each query returns from 100 to 200 messagens, which would be a 400kb pay
> load, which is super tiny.
> I don't know what else I can do with the limitations (m4.large), 167MM
> rows, almost 500GB database and 29GB of indexes (all indexes).
> I am probably to optimistic, but I was expecting queries (up to 50 queries
> per second) to return  (99th) under 500ms or even less, as the index is
> simple, there is no aggregation or join involves.

> Any suggestion?

Although you aren't querying by it, if your id column is actually a UUID,
as a best practice I strongly recommend switching the column type to uuid.
If you do query by the primary key, a uuid query will be much faster than a
char or varchar column query.

You'll need to submit a more complete explain plan than what you have below.
  Try using:
       explain (analyze, costs, verbose, buffers) select ...

> The table structure:
> CREATE TABLE public.card
> (
>     id character(36) NOT NULL,
>     user_id character varying(40) NOT NULL,
>     user_country character(2) NOT NULL,
>     user_channel character varying(40),
>     user_role character varying(40),
>     created_by_system_key character(36) NOT NULL,
>     created_by_username character varying(40),
>     created_at timestamp with time zone NOT NULL,
>     last_modified_at timestamp with time zone NOT NULL,
>     date_start timestamp with time zone NOT NULL,
>     date_end timestamp with time zone NOT NULL,
>     payload json NOT NULL,
>     tags character varying(500),
>     menu character varying(50),
>     deleted boolean NOT NULL,
>     campaign character varying(500) NOT NULL,
>     correlation_id character varying(50),
>     PRIMARY KEY (id)
> );
> CREATE INDEX idx_user_country
>     ON public.card USING btree
>     (user_id COLLATE pg_catalog."default", user_country COLLATE
> pg_catalog."default");
> CREATE INDEX idx_last_modified_at
>     ON public.card USING btree
>     (last_modified_at ASC NULLS LAST);
> CREATE INDEX idx_campaign
>     ON public.card USING btree
>     (campaign ASC NULLS LAST)
> 'Index Scan using idx_user_country on card  (cost=0.57..1854.66 rows=460
> width=922)'
> '  Index Cond: (((user_id)::text = '4684'::text) AND (user_country =
> 'BR'::bpchar))'
> Em 25 de dez de 2017 01:10, "Jean Baro" <> escreveu:
>> Thanks for the clarification guys.
>> It will be super useful. After trying this I'll post the results!
>> Merry Christmas!
>> Em 25 de dez de 2017 00:59, "Danylo Hlynskyi" <>
>> escreveu:
>>> I had an opportunity to perform insertion of 700MM rows into Aurora
>>> Postgresql, for which performance insights are available. Turns out, that
>>> there are two stages of insert slowdown - first happens when max WAL
>>> buffers limit reached, second happens around 1 hour after.
>>> The first stage cuts insert performance twice, and WALWrite lock is main
>>> bottleneck. I think WAL just can't sync changes log that fast, so it waits
>>> while older log entries are flushed. This creates both read and write IO.
>>> The second stage is unique to Aurora/RDS and is characterized by
>>> excessive read data locks and total read IO. I couldn't figure out why does
>>> it read so much in a write only process, and AWS support didn't answer yet.
>>> So, for you, try to throttle inserts so WAL is never overfilled and you
>>> don't experience WALWrite locks, and then increase wal buffers to max.
>>> 24 груд. 2017 р. 21:51 "Jean Baro" <> пише:
>>> Hi there,
>>> We are testing a new application to try to find performance issues.
>>> AWS RDS m4.large 500GB storage (SSD)
>>> One table only, called Messages:
>>> Uuid
>>> Country  (ISO)
>>> Role (Text)
>>> User id  (Text)
>>> GroupId (integer)
>>> Channel (text)
>>> Title (Text)
>>> Payload (JSON, up to 20kb)
>>> Starts_in (UTC)
>>> Expires_in (UTC)
>>> Seen (boolean)
>>> Deleted (boolean)
>>> LastUpdate (UTC)
>>> Created_by (UTC)
>>> Created_in (UTC)
>>> Indexes:
>>> UUID (PK)
>>> UserID + Country (main index)
>>> LastUpdate
>>> GroupID
>>> We inserted 160MM rows, around 2KB each. No partitioning.
>>> Insert started at around  3.000 inserts per second, but (as expected)
>>> started to slow down as the number of rows increased.  In the end we got
>>> around 500 inserts per second.
>>> Queries by Userd_ID + Country took less than 2 seconds, but while the
>>> batch insert was running the queries took over 20 seconds!!!
>>> We had 20 Lambda getting messages from SQS and bulk inserting them into
>>> Postgresql.
>>> The insert performance is important, but we would slow it down if needed
>>> in order to ensure a more flat query performance. (Below 2 seconds). Each
>>> query (userId + country) returns around 100 diferent messages, which are
>>> filtered and order by the synchronous Lambda function. So we don't do any
>>> special filtering, sorting, ordering or full text search in Postgres. In
>>> some ways we use it more like a glorified file system. :)
>>> We are going to limit the number of lambda workers to 1 or 2, and then
>>> run some queries concurrently to see if the query performance is not affect
>>> too much. We aim to get at least 50 queries per second (returning 100
>>> messages each) under 2 seconds, even when there is millions of messages on
>>> SQS being inserted into PG.
>>> We haven't done any performance tuning in the DB.
>>> With all that said, the question is:
>>> What can be done to ensure good query performance (UserID+ country) even
>>> when the bulk insert is running (low priority).
>>> We are limited to use AWS RDS at the moment.
>>> Cheers

Reply via email to