Dear Theo,
it might be useful to describe your specific application scenario a bit
more to provide some context. What do you want to do and how would
"efficient" look like?
One advantage of using InChiKeys is that they have a fixed length and
can therefore be stored and indexed efficiently in a database. So, if
you want e.g. to (frequently) compare one compound against a large
compound collection, it might be a good idea to pre-compute these
identifiers, store them in a database and do a lookup there. BTW: this
is something the RDKit Cartridge can do for you:
https://www.rdkit.org/docs/Cartridge.html#substructure-and-exact-structure-search
In other scenarios, other approaches might work better.
Best wishes,
Nils
Am 05.10.2021 um 10:06 schrieb theozh:
Dear Giovanni,
thank you for your explanations and advice. So, I just wanted to exclude
that I maybe missed a very basic function of checking identity.
You are suggesting using InChI-Keys (with the very low probability
having the same InChI-key for different molecules).
Then, what would be the disadvantage of using InChI strings instead of
InChI-keys? Computation time & power?
The reponse I got from StackOverflow was that the substructure approach
was a little faster than the Canonical SMILES approach.
I would assume that a simple string comparison within a fixed set of
structures is much faster than calculating the Canonical SMILES again
and again for each search.
So, I will check the InChI approach and compare it with the other
approaches.
Thanks,
Theo.
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss