Hi Ivan,

I have not pushed the cartridge towards storing billions of molecules. I
did a blog post looking at performance with 10 million rows (
http://rdkit.blogspot.com/2020/01/some-thoughts-on-performance-of-rdkit.html)
but, as I mentioned there, I probably wouldn't choose a relational database
for the billion molecule case (you're unlikely to have multiple linked
tables with data there, so there's not much point in using a relational DB).

Having said that, the team behind ZINC used to use the RDKit cartridge with
PostgreSQL as the backend for ZINC. They had the database sharded
across multiple instances and managed to get the fingerprint indices to
work there. I don't remember the substructure search performance being
terrible, but it wasn't great either. They have since switched to a
specialized system (Arthor from NextMove software), which offers
significantly better performance.

Best,
-greg



On Thu, Jun 4, 2020 at 2:17 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Hi,
>
> I've never tried the RDKit PostgreSQL cartridge but I'm curious about it.
> In particular I wonder how far have people pushed it in terms of
> database size. The documentation gives examples with several million rows;
> has anyone tried it with a couple billion rows? How fast are substructure
> queries with databases of that size? How much storage is needed after
> accounting for the fingerprints etc.
>
> Best regards,
> Ivan
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to