Let the database do the work for you.  Create a canonical SMILES column
and/or InChI column and declare them to be unique.  As you insert new
rows, postgres will let  you know if there is already a row with the same
SMILES or InChI.
Here's some help on how to handle that.
https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT

TJ O'Donnell

On Wed, Sep 13, 2017 at 3:13 AM, Wandré <wandrevel...@gmail.com> wrote:

> Hi,
>
> My name is Wandré and I'm from Brazil.
> I'm trying to do a big database of molecules, but, I want to eliminate all
> the redundant molecules before insert them in database.
> I want to know what is the best method to identify one molecule in RDKit.
> Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need to
> compare all molecules, one by one, before insert them in database (using
> Tanimoto)?
> This can be hard to do because my database will have lot of millions of
> molecules, so, compare one by one before insert is the only answer?
> Compare if the SMILES as already inserted is easy (text compare), but,
> compare fingerprint of molecule...
>
> If I really need to compare the fingerprint of molecule, how to store this
> data in PostgreSQL without use cartridge? I will generate the fingeprint
> (Atompair, for example) and store this fingerprint in database and compare
> all the fingerprints, one by one, before insert a now molecule. This
> fingerprint (Atompair) have lot of features, so, store this in relational
> database is expensive.
> It is possible?
>
> Thanks!
>
> --
> Wandré Nunes de Pinho Veloso
> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
> Inteligência Computacional - UNIFEI
> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to