Re: [Rdkit-discuss] Scalability of Postgres cartridge

2020-06-10 Thread Ivan Tubert-Brohman
Thank you everyone for the suggestions. For now I don't have immediate plans to adopt the cartridge but it's good to know these things when the time comes. Best, Ivan On Mon, Jun 8, 2020 at 6:49 PM Finnerty, Jim via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > If you have a

Re: [Rdkit-discuss] Scalability of Postgres cartridge

2020-06-08 Thread Finnerty, Jim via Rdkit-discuss
If you have a billion molecule data source and would like to try an at-scale test, I'd be willing to help out with provisioning the hardware, looking at the efficiency of the plans, etc., using rdkit with Aurora PostgreSQL. If I understand how the rdkit GIST index filtering mechanism works for

Re: [Rdkit-discuss] Scalability of Postgres cartridge

2020-06-05 Thread dmaziuk via Rdkit-discuss
On 6/5/2020 4:45 AM, Greg Landrum wrote: Having said that, the team behind ZINC used to use the RDKit cartridge with PostgreSQL as the backend for ZINC. They had the database sharded across multiple instances and managed to get the fingerprint indices to work there. I don't remember the

Re: [Rdkit-discuss] Scalability of Postgres cartridge

2020-06-05 Thread Greg Landrum
Hi Ivan, I have not pushed the cartridge towards storing billions of molecules. I did a blog post looking at performance with 10 million rows ( http://rdkit.blogspot.com/2020/01/some-thoughts-on-performance-of-rdkit.html) but, as I mentioned there, I probably wouldn't choose a relational database

[Rdkit-discuss] Scalability of Postgres cartridge

2020-06-04 Thread Ivan Tubert-Brohman
Hi, I've never tried the RDKit PostgreSQL cartridge but I'm curious about it. In particular I wonder how far have people pushed it in terms of database size. The documentation gives examples with several million rows; has anyone tried it with a couple billion rows? How fast are substructure