[Rdkit-discuss] Substructure search using RDKit PostgreSQL cartridge

2018-05-29 Thread Alfredo Quevedo

Dear user,

I am trying to perform a substructure search using smiles notation under 
the ChEMBL database I have already loaded into my postgreSQL database. I 
am here providing two sample molecules in smiles format as read by the 
RDKit cartrigde into the database:


Molecule 1: CCc1ccc(-n2nc3ccc(NC(=O)c4ccc5c(c4)OCO5)cc3n2)cc1

Molecule 2: COc1ncc(-c2ccc(N(Cc3ccsc3)C(=O)Cn3nnc4c43)cc2)cn1


Both molecules contains a triazole scaffold, and I am trying to select 
both compounds among a whole database using the following smiles 
genereated by RDKit for a triazole: ´c1c[nH]nn1´


My problem is that the search is only able to match molecule 1 but not 
molecule 2. Which may be the problem? Since I am serching in a database 
of compounds previously processed with the RDKit cartrigde, shouldnt the 
subtructure match?


thanks in advance for the help

regards

Alfredo


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molecule does not have substructure match with its fragments

2018-05-29 Thread Greg Landrum
Hi Larissa,

A '*' atom in a SMILES is translated into an atom with atomic number zero.
In a normal substructure match this will only match other atoms of atomic
number zero.

If you want to turn it into a query feature, the easiest way is with the
function Chem.AdjustQueryProperties(). Here's an example:

In [2]: m = Chem.MolFromSmiles('CCO')

In [3]: q = Chem.MolFromSmiles('CC*')

In [4]: m.HasSubstructMatch(q)
Out[4]: False

In [7]: aps = Chem.AdjustQueryParameters()

In [8]: aps.adjustDegree=False

In [9]: aps.adjustRingCount=False

In [10]: nq = Chem.AdjustQueryProperties(q,aps)

In [11]: m.HasSubstructMatch(nq)
Out[11]: True


There's a bit more information on the options available in this RDKit blog
post:
http://rdkit.blogspot.com/2016/07/tuning-substructure-queries-ii.html

I hope this helps,
-greg


On Thu, May 24, 2018 at 2:22 PM Larissa Pusch  wrote:

> Hello,
>
> I am running rdkit version 2017.09.3.
> I read an sdf with
>
> supplier = SDMolSupplier('try/try.sdf')
>
> I then took the first mol from supplier and named it mol. I then performed
>
> fragmented = Recap.RecapDecompose(mol, minFragmentSize=3). I looped
> through its children with:
> fragmented_children_smiles = []
> for key in list(fragmented_children):
>
> fragmented_children_smiles.append(fragmented.GetAllChildren()[key].smiles)
> smile = fragmented_children_smiles[0]
>
> Now, smile is '[*]Nc1ccc(OC)cc1C([*])=O' . Theoretically, smile should of
> course be a substructure of mol. But if I check this like this:
>
> smile  = MolFromSmiles(smile)
> if mol.HasSubstructMatch(smile):
> print('match smile')
>
> nothing gets printed. Apparently, this is because of the [*], if I delete
> them, there is a match. But why are they there in the first place? Why does
> HasSubstructMatch not work when they are included? And, most importantly,
> can I solve this problem, without going trough the code and deleting all
> '[*]'? First of all, I do not know if the smiles would still make sense if
> I did that and also, there are some structures like '([*])' and '()' is of
> course not valid, so deleting them for a large number of smiles would be
> really bothersome...
>
> Thank you for your help!
> Regards,
> Larissa Pusch
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MCS search

2018-05-29 Thread Greg Landrum
HI Colin,

On Mon, May 28, 2018 at 2:48 PM Colin Bournez 
wrote:

> Hi everybody,
>
> Here is my piece of code to run a MCS :
>
> from rdkit import Chem
> from rdkit.Chem import rdFMCS
> mol1 = Chem.MolFromSmiles('CN1CCN(Cc2ccc(C=O)cc2)CC1')
> mol2 = Chem.MolFromSmiles('Cc1ccc(C(=O)Nc2ccc(C)c(Nc3ncccn3)c2)cc1')
> mols = [mol1,mol2]
> res = rdFMCS.FindMCS(mols, completeRingsOnly=True, matchValences=True,
> ringMatchesRingOnly=True, )
>
> Chem.MolFromSmarts(res.smartsString)
>
> My question is why the results is in SMILES :
>
> 'CC1:C:C:C(:C:C:1)CN'
> Instead of :
> 'Cc1ccc(C=O)cc1'
>
>
So here's what I get:

In [1]: from rdkit import Chem
   ...: from rdkit.Chem import rdFMCS
   ...: mol1 = Chem.MolFromSmiles('CN1CCN(Cc2ccc(C=O)cc2)CC1')
   ...: mol2 = Chem.MolFromSmiles('Cc1ccc(C(=O)Nc2ccc(C)c(Nc3ncccn3)c2)cc1')
   ...: mols = [mol1,mol2]
   ...: res = rdFMCS.FindMCS(mols, completeRingsOnly=True,
matchValences=True, ringMatchesRingOnly=True, )
   ...:

In [2]: res.smartsString
Out[2]: '[#7]-[#6]-[#6]1:[#6]:[#6]:[#6](:[#6]:[#6]:1)-[#6]'

In [3]: Chem.MolToSmiles(Chem.MolFromSmarts(res.smartsString))
Out[3]: 'CC1:C:C:C(CN):C:C:1'

 You see "C" in the SMILES because the SMARTS doesn't contain any
information about aromaticity of the atoms ([#6] is aromatic or aliphatic).
One can argue about whether that's correct or not, but that is certainly
what the code currently does. The aromatic bonds do make it so that the
substructures have to be aromatic.

The N is supposed to be in a ring in the mol1 so it should not be 'cut'
> with the option completeRingsOnly=True.
> If anyone has any suggestions?
>

Though this is confusing, it is what the code is currently expected to do.
The completeRingsOnly argument is applied to ring bonds, not ring atoms.
It's worth considering changing this behavior, but it's unlikely to happen
quickly.

-greg



> Thanks,
>
> --
> *Colin Bournez*
> PhD Student, Structural Bioinformatics & Chemoinformatics
> Institut de Chimie Organique et Analytique (ICOA), UMR CNRS-Université
> d'Orléans 7311
> Rue de Chartres, 45067 Orléans, France
> T. +33 238 494 577
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss