Dear Theo,

it might be useful to describe your specific application scenario a bit more to provide some context. What do you want to do and how would "efficient" look like?

One advantage of using InChiKeys is that they have a fixed length and can therefore be stored and indexed efficiently in a database. So, if you want e.g. to (frequently) compare one compound against a large compound collection, it might be a good idea to pre-compute these identifiers, store them in a database and do a lookup there. BTW: this is something the RDKit Cartridge can do for you: https://www.rdkit.org/docs/Cartridge.html#substructure-and-exact-structure-search

In other scenarios, other approaches might work better.

Best wishes,
Nils

Am 05.10.2021 um 10:06 schrieb theozh:
Dear Giovanni,

thank you for your explanations and advice. So, I just wanted to exclude that I maybe missed a very basic function of checking identity.

You are suggesting using InChI-Keys (with the very low probability having the same InChI-key for different molecules). Then, what would be the disadvantage of using InChI strings instead of InChI-keys? Computation time & power?

The reponse I got from StackOverflow was that the substructure approach was a little faster than the Canonical SMILES approach. I would assume that a simple string comparison within a fixed set of structures is much faster than calculating the Canonical SMILES again and again for each search.

So, I will check the InChI approach and compare it with the other approaches.

Thanks,
Theo.


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to