Re: [Rdkit-discuss] Scalability of Postgres cartridge

dmaziuk via Rdkit-discuss Fri, 05 Jun 2020 10:08:21 -0700

On 6/5/2020 4:45 AM, Greg Landrum wrote:

Having said that, the team behind ZINC used to use the RDKit cartridge with
PostgreSQL as the backend for ZINC. They had the database sharded
across multiple instances and managed to get the fingerprint indices to
work there. I don't remember the substructure search performance being
terrible, but it wasn't great either. They have since switched to a
specialized system (Arthor from NextMove software), which offers
significantly better performance.

Generally speaking a database of a billion rows needs hardware capableof running it. Buy a server with 1TB RAM and 64 cores and a couple ofU.2 NVME drives and see how Postgres runs on that.

Then you need to look at the database, e.g. query in an indexedbillion-row table could be OK but inserting a billion-first row will not be.


If you want to scale to these kinds of volumes, you need to do some work.

(And much of the point of no-sql hadoop "cloud" workflows is that if youcan parallelize what you're doing to multiple machines, at some datasize they will start outperforming a centralized fast search engine.)


Dima


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Scalability of Postgres cartridge

Reply via email to