It's hard to give generic recommendations for what really depends on your
specific needs, but here is one attempt:

using HikariCP for connection pooling.


For better scaling, look into PGBouncer, which has very fast "transaction"
and "statement" modes.

... manage 10-12 TB of data in a production environment, considering
> typical transaction loads.


Yes, 10 TB is very doable.

We are considering splitting database "C" into two new databases: "C1" to
> exclusively house the "acc" schema, and "C2" for the remaining schemas. Is
> this a recommended approach for managing growth, and what are the potential
> pros and cons?


If they are logically connected, then keep them the same database. Having
to go across databases (or across clusters) is a lot of added complexity
for little gain.


> ...or could both "C1" and "C2" reside on the same database server?


They could, but you would be sharing all the resources anyway, so you don't
gain much.


> is there a general "limit" or best practice for the maximum amount of data
> a single database server should handle (e.g., 10 TB) and similarly general
> limit per database?


No limit per se, it really depends if you start seeing effects on your
measured performance. Lots of indirect things to keep in mind as well: time
it takes to make backups, autovacuum efforts, time to spin up replicas.
These days 10TB is not considered particularly huge, but it really depends
on your workload. Don't worry about limits per database - it's all about
the total cluster size; which database things are in can be considered a
housekeeping record.

Beyond standard practices like indexing and partitioning, what other best
> practices should we consider implementing to ensure optimal performance and
> manageability with such a large dataset?


This is probably the vaguest question in the email. Obvious things are to
make sure you are doing heavy monitoring, both at the OS level and PG
level, particularly via log_min_duration_statement and pg_stat_statements.
Keep a close eye on bloat. Keep indexes to a minimum and make them all
justify their worth. Use partial and functional indexes. Make sure your
backups are solid (use pgbackrest). Test your restores regularly. Use
pgbouncer. Send simple selects to the read replicas. Automate everything
you can. Be paranoid. Assume the application is going to do everything
wrong and try to destroy your database. Get a seasoned PG DBA who will know
how to do all this and what else to look for (the mailing lists are good,
but mostly reactive and asynchronous, as you are now discovering)

Hardware Configuration Recommendations: Based on our projected data growth
> and desired performance, what hardware configurations (e.g., RAM, CPU,
> storage I/O, storage type like NVMe) would you recommend for future
> database servers to efficiently handle 10-12 TB?


Maybe someone else can attempt specifics, but it's too open-ended of a
question for me. Storage should be fast but above all, stable and reliable.
More RAM is always good. More cores is always good. Postgres scales well
vertically. Offload as much work as possible (including backups) to the
replicas. 10-12 TB is a little meaningless except in regards to backups:
what matters is how much of that 10TB is being actively used.

Open-Source Horizontal Scaling Solutions: Are there any open-source
> horizontal scaling solutions for PostgreSQL (other than Citus Data) that
> the community recommends or has experience with for managing extremely
> large datasets?


Citus is very good for certain datasets. Can be overkill for many
situations. Don't overlook streaming rep + pgpool/haproxy as a good start
for basic horizontal scaling.

The more specific future questions are, the better a reply you will get.
Showing us database sizes is not a very good metric. Some more useful
things to measure would be WAL rate, txn rate, active data size (i.e.
shared_buffers analysis), number or active connections, and which queries
are the most expensive.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Reply via email to