Re: [Rdkit-discuss] Non-redundant database of molecules (Wandr?)

2017-09-13 Thread Markus Sitzmann
If you do nothing else (on purpose), SMILES *calculated* by RDKit from any
input are canonical per se (BUT that is only true if you compare it to
other SMILES also calculated by RDKit, you can not compare SMILES between
software packages even if they canonical in the domain of each of the
software packages).

On Wed, Sep 13, 2017 at 9:16 PM, Wandré  wrote:

> Why don't use the InChI function on RDKit?
> Canonical SMILES cannot be generated by RDKit, correct?
>
> --
> Wandré Nunes de Pinho Veloso
> Professor Assistente - Unifei - Campus Avançado de Itabira-MG
> Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
> Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
> Inteligência Computacional - UNIFEI
> Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
> Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
> Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG
>
> 2017-09-13 15:57 GMT-03:00 Chris Swain :
>
>> Hi,
>>
>> I’d use a text based version of the structure InChiKey or canonical
>> SMILES it then becomes a easy task to do the comparison in Python
>>
>> I wrote a script to do this in Vortex but it should be easy to modify.
>> https://www.macinchem.org/reviews/vortex/tut28/scripting_vortex28.php
>>
>>
>> Cheers
>>
>> Chris
>>
>>
>>
>> Today's Topics:
>>
>>   1. Non-redundant database of molecules (Wandr?)
>>
>>
>> --
>>
>> Message: 1
>> Date: Wed, 13 Sep 2017 07:13:56 -0300
>> From: Wandr? 
>> To: rdkit-discuss@lists.sourceforge.net
>> Subject: [Rdkit-discuss] Non-redundant database of molecules
>> Message-ID:
>> 
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi,
>>
>> My name is Wandr? and I'm from Brazil.
>> I'm trying to do a big database of molecules, but, I want to eliminate all
>> the redundant molecules before insert them in database.
>> I want to know what is the best method to identify one molecule in RDKit.
>> Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need to
>> compare all molecules, one by one, before insert them in database (using
>> Tanimoto)?
>> This can be hard to do because my database will have lot of millions of
>> molecules, so, compare one by one before insert is the only answer?
>> Compare if the SMILES as already inserted is easy (text compare), but,
>> compare fingerprint of molecule...
>>
>> If I really need to compare the fingerprint of molecule, how to store this
>> data in PostgreSQL without use cartridge? I will generate the fingeprint
>> (Atompair, for example) and store this fingerprint in database and compare
>> all the fingerprints, one by one, before insert a now molecule. This
>> fingerprint (Atompair) have lot of features, so, store this in relational
>> database is expensive.
>> It is possible?
>>
>> Thanks!
>>
>> --
>> Wandr? Nunes de Pinho Veloso
>> Professor Assistente - Unifei - Campus Avan?ado de Itabira-MG
>> Doutorando em Bioinform?tica - Universidade Federal de Minas Gerais - UFMG
>> Pesquisador do INSILICO - Grupo Interdisciplinar em Simula??o e
>> Intelig?ncia Computacional - UNIFEI
>> Membro do Grupo de Pesquisa Assinaturas Biol?gicas da FIOCRUZ
>> Membro do Grupo de Pesquisa Bioinform?tica Estrutural da UFMG
>> Laborat?rio de Bioinform?tica e Sistemas - LBS, DCC, UFMG
>> -- next part --
>> An HTML attachment was scrubbed...
>>
>> --
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>> --
>>
>> Subject: Digest Footer
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>> --
>>
>> End of Rdkit-discuss Digest, Vol 119, Issue 20
>> **
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> 

Re: [Rdkit-discuss] Non-redundant database of molecules (Wandr?)

2017-09-13 Thread Wandré
Why don't use the InChI function on RDKit?
Canonical SMILES cannot be generated by RDKit, correct?

--
Wandré Nunes de Pinho Veloso
Professor Assistente - Unifei - Campus Avançado de Itabira-MG
Doutorando em Bioinformática - Universidade Federal de Minas Gerais - UFMG
Pesquisador do INSILICO - Grupo Interdisciplinar em Simulação e
Inteligência Computacional - UNIFEI
Membro do Grupo de Pesquisa Assinaturas Biológicas da FIOCRUZ
Membro do Grupo de Pesquisa Bioinformática Estrutural da UFMG
Laboratório de Bioinformática e Sistemas - LBS, DCC, UFMG

2017-09-13 15:57 GMT-03:00 Chris Swain :

> Hi,
>
> I’d use a text based version of the structure InChiKey or canonical SMILES
> it then becomes a easy task to do the comparison in Python
>
> I wrote a script to do this in Vortex but it should be easy to modify.
> https://www.macinchem.org/reviews/vortex/tut28/scripting_vortex28.php
>
>
> Cheers
>
> Chris
>
>
>
> Today's Topics:
>
>   1. Non-redundant database of molecules (Wandr?)
>
>
> --
>
> Message: 1
> Date: Wed, 13 Sep 2017 07:13:56 -0300
> From: Wandr? 
> To: rdkit-discuss@lists.sourceforge.net
> Subject: [Rdkit-discuss] Non-redundant database of molecules
> Message-ID:
> 
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> My name is Wandr? and I'm from Brazil.
> I'm trying to do a big database of molecules, but, I want to eliminate all
> the redundant molecules before insert them in database.
> I want to know what is the best method to identify one molecule in RDKit.
> Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need to
> compare all molecules, one by one, before insert them in database (using
> Tanimoto)?
> This can be hard to do because my database will have lot of millions of
> molecules, so, compare one by one before insert is the only answer?
> Compare if the SMILES as already inserted is easy (text compare), but,
> compare fingerprint of molecule...
>
> If I really need to compare the fingerprint of molecule, how to store this
> data in PostgreSQL without use cartridge? I will generate the fingeprint
> (Atompair, for example) and store this fingerprint in database and compare
> all the fingerprints, one by one, before insert a now molecule. This
> fingerprint (Atompair) have lot of features, so, store this in relational
> database is expensive.
> It is possible?
>
> Thanks!
>
> --
> Wandr? Nunes de Pinho Veloso
> Professor Assistente - Unifei - Campus Avan?ado de Itabira-MG
> Doutorando em Bioinform?tica - Universidade Federal de Minas Gerais - UFMG
> Pesquisador do INSILICO - Grupo Interdisciplinar em Simula??o e
> Intelig?ncia Computacional - UNIFEI
> Membro do Grupo de Pesquisa Assinaturas Biol?gicas da FIOCRUZ
> Membro do Grupo de Pesquisa Bioinform?tica Estrutural da UFMG
> Laborat?rio de Bioinform?tica e Sistemas - LBS, DCC, UFMG
> -- next part --
> An HTML attachment was scrubbed...
>
> --
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> --
>
> Subject: Digest Footer
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> --
>
> End of Rdkit-discuss Digest, Vol 119, Issue 20
> **
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Non-redundant database of molecules (Wandr?)

2017-09-13 Thread Chris Swain
Hi,

I’d use a text based version of the structure InChiKey or canonical SMILES it 
then becomes a easy task to do the comparison in Python

I wrote a script to do this in Vortex but it should be easy to modify.
https://www.macinchem.org/reviews/vortex/tut28/scripting_vortex28.php 



Cheers

Chris
> 
> 
> Today's Topics:
> 
>   1. Non-redundant database of molecules (Wandr?)
> 
> 
> --
> 
> Message: 1
> Date: Wed, 13 Sep 2017 07:13:56 -0300
> From: Wandr? 
> To: rdkit-discuss@lists.sourceforge.net
> Subject: [Rdkit-discuss] Non-redundant database of molecules
> Message-ID:
>   
> Content-Type: text/plain; charset="utf-8"
> 
> Hi,
> 
> My name is Wandr? and I'm from Brazil.
> I'm trying to do a big database of molecules, but, I want to eliminate all
> the redundant molecules before insert them in database.
> I want to know what is the best method to identify one molecule in RDKit.
> Is SMILES ("Chem.MolToSmiles(mol,isomericSmiles=True)") or I will need to
> compare all molecules, one by one, before insert them in database (using
> Tanimoto)?
> This can be hard to do because my database will have lot of millions of
> molecules, so, compare one by one before insert is the only answer?
> Compare if the SMILES as already inserted is easy (text compare), but,
> compare fingerprint of molecule...
> 
> If I really need to compare the fingerprint of molecule, how to store this
> data in PostgreSQL without use cartridge? I will generate the fingeprint
> (Atompair, for example) and store this fingerprint in database and compare
> all the fingerprints, one by one, before insert a now molecule. This
> fingerprint (Atompair) have lot of features, so, store this in relational
> database is expensive.
> It is possible?
> 
> Thanks!
> 
> --
> Wandr? Nunes de Pinho Veloso
> Professor Assistente - Unifei - Campus Avan?ado de Itabira-MG
> Doutorando em Bioinform?tica - Universidade Federal de Minas Gerais - UFMG
> Pesquisador do INSILICO - Grupo Interdisciplinar em Simula??o e
> Intelig?ncia Computacional - UNIFEI
> Membro do Grupo de Pesquisa Assinaturas Biol?gicas da FIOCRUZ
> Membro do Grupo de Pesquisa Bioinform?tica Estrutural da UFMG
> Laborat?rio de Bioinform?tica e Sistemas - LBS, DCC, UFMG
> -- next part --
> An HTML attachment was scrubbed...
> 
> --
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> --
> 
> Subject: Digest Footer
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> 
> --
> 
> End of Rdkit-discuss Digest, Vol 119, Issue 20
> **

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss