Canonical SMILES are only a very rough approximation for "unique molecule"
as they usually don't work well for tautomeric forms of compound.
InChI or Standard InChI is much better although also not perfect.

The "perfect solution" depends also on how uniqueness or redundancy of
molecules is regarded for the purpose of the database.


On Wed, Sep 13, 2017 at 4:56 PM, TJ O'Donnell <t...@acm.org> wrote:

> Let the database do the work for you.  Create a canonical SMILES column
> and/or InChI column and declare them to be unique.  As you insert new
> rows, postgres will let  you know if there is already a row with the same
> SMILES or InChI.
> Here's some help on how to handle that.
> https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT
>
> TJ O'Donnell
>
> On Wed, Sep 13, 2017 at 3:13 AM, Wandré <wandrevel...@gmail.com> wrote:
>
>> Hi,
>>
>> My name is Wandré and I'm from Brazil.
>> I'm trying to do a big database of molecules, but, I want to eliminate
>> all the redundant molecules before insert them in database.
>> I want to know what is the best method to identify one molecule in RDKit.
>> Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need
>> to compare all molecules, one by one, before insert them in database (using
>> Tanimoto)?
>> This can be hard to do because my database will have lot of millions of
>> molecules, so, compare one by one before insert is the only answer?
>> Compare if the SMILES as already inserted is easy (text compare), but,
>> compare fingerprint of molecule...
>>
>> If I really need to compare the fingerprint of molecule, how to store
>> this data in PostgreSQL without use cartridge? I will generate the
>> fingeprint (Atompair, for example) and store this fingerprint in database
>> and compare all the fingerprints, one by one, before insert a now molecule.
>> This fingerprint (Atompair) have lot of features, so, store this in
>> relational database is expensive.
>> It is possible?
>>
>> Thanks!
>>
>> --
>> Wandré Nunes de Pinho Veloso
>> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
>> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
>> Inteligência Computacional - UNIFEI
>> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
>> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
>> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to