[Rdkit-discuss] Non-redundant database of molecules

Wandré Wed, 13 Sep 2017 03:14:58 -0700

Hi,

My name is Wandré and I'm from Brazil.
I'm trying to do a big database of molecules, but, I want to eliminate all
the redundant molecules before insert them in database.
I want to know what is the best method to identify one molecule in RDKit.
Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need to
compare all molecules, one by one, before insert them in database (using
Tanimoto)?
This can be hard to do because my database will have lot of millions of
molecules, so, compare one by one before insert is the only answer?
Compare if the SMILES as already inserted is easy (text compare), but,
compare fingerprint of molecule...


If I really need to compare the fingerprint of molecule, how to store this
data in PostgreSQL without use cartridge? I will generate the fingeprint
(Atompair, for example) and store this fingerprint in database and compare
all the fingerprints, one by one, before insert a now molecule. This
fingerprint (Atompair) have lot of features, so, store this in relational
database is expensive.
It is possible?

Thanks!

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Non-redundant database of molecules

Reply via email to