Why not store metadata in pg and the payload in S3? On Mon, Nov 27, 2017 at 11:58 AM Jean Baro <jfb...@gmail.com> wrote:
> Hi there, > > We are creating a new DB which will behave most like a file system, I > mean, there will be no complex queries or joins running in the DB. The idea > is to grab the WHOLE set of messages for a particular user and then filter, > order, combine or full text search in the function itself (AWS Lambda). The > maximum number of messages is limited to 1.000 messages per user. So we > expect Postgres to have an amazing performance for this scenario. > > As I am not really familiar with PG (9.6, or 10, in case RDS release it > before February) I would like to share what we are planning to do for this > DB. So if you guys could share your thoughts, that would be great! :) > > Table structure: > > > > · MessageID (UUID) - PK > > · UserCountry (ISO) > > · UserRole (TEXT 15) > > · UserID (TEXT 30) – FK (although there is no constraint) > > · LifeCycle (RANGE DATE? Or 2 TimeStampWithTZ? Start_date and > end_date?) > > · Channel (TEXT 15) > > · Tags (TEXT 2000) > > · Menu (TEXT 200) > > · Icon (TEXT 500) – URL to an image which will be used as an icon; > > · Title (TEXT 150) > > · *Body (JSON – up to 10K) – Meta data describing all the data to > a specific type of message. The JSON changes according to the type of > message. We are assuming most messages will use less than 1K for this > field.* > > · Delete (BOOLEAN) – Soft Delete > > · Created (Timestamp – With TZ) > > · CreatedBy (TEXT 50) > > > > Only 1 table > > · Messages > > 3 indexes: > > · MessageID PK (UUID) > > · Main fetch key (UserCountry + UserID) - ***** > > · End_date (To locate old messages that can be moved to another > DB - which will hold the old messages); > > > > Sizing and worst case scenario: > > > > · 500MM messages in the main DB > > · 4K queries per second (by UserID) – Max time of 500ms per > query. Simples SELECT, with no ORDER, WHERE OR GROUP BY. Just grab all the > messages for a particular user. MAX 1000 messages per USER. > > · 1K inserts per second on average (So that in 1 hour we can > insert around 3MM messages) > > · 1K deletes per second on average (So that in 1 hour we can > remove around 3MM messages) > > > My question is: > > > - Can we use any kind of compression for PostgreSQL which would result > in reduced IO and disk size? > - We are not relying on any kind of table partitioning, is that the > best approach for this scenario? > - Is PG on RDS capable of delivering this type of performance while > requiring low maintenance? > - What about Auto Vacuum? Any suggestion how to optimize it for such a > work load (we will insert and delete millions of rows every day). > > P.S.: We are going to test all this, but if we don't get the performance > we are expecting, all optimization tips from you guys will be really > appreciated. :) > > Thanks > > > > -- Regards, /Aaron