Re: [Rdkit-discuss] Preserving hydrogens necessary for imine cis/trans stereochemistry?

2017-05-17 Thread Greg Landrum
There isn't currently any easy way to do this. One workaround that could
make the manual stuff a bit less painful would be find those Hs, convert
them into Ds, call RemoveHs(), and then convert the Ds back into Hs.

It seems like it does make sense to modify the RemoveHs() function:
https://github.com/rdkit/rdkit/issues/1419
As mentioned in the issue report, this will lead to this oddity:

>>> Chem.CanonSmiles('C/C(F)=N/[H]')
'[H]/N=C(/C)F'
>>> Chem.CanonSmiles('C/C(F)=N[H]')
'CC(=N)F'

I can't think of a precedent for that in the RDKit, but I think it's better
to add that oddity than to modify all of the code managing cis/trans double
bonds to be able to handle using the implicit H.

Does anyone have objections to this approach to fixing the problem?

As an aside, while looking into this I noticed that removeHs() also removes
H atoms that have atom map information. This probably also shouldn't
happen. That's now here: https://github.com/rdkit/rdkit/issues/1420

-greg


On Thu, May 18, 2017 at 2:54 AM, Brian Cole  wrote:

> Is there a recommended way in RDKit to preserve hydrogens necessary for
> representing cis/trans stereochemistry of imines?
>
> For example, given the attached SDF I need to maintain explicit hydrogens
> in the output SMILES string to maintain the imine cis/trans
> stereo-chemistry.
>
> mol = Chem.ForwardSDMolSupplier(open('ZINC23714507.sdf'),
> removeHs=False).next()
> print(Chem.MolToSmiles(mol, True))
>
> Yields the correct but ugly smiles:
> [H]/N=C(/[H])C([H])([H])N(C([H])([H])/C([H])=N\[H])S(=O)(=
> O)c1c([H])c([H])c(C(=O)N([H])c2sc3c(c2C(=O)N([H])C([H])([H]
> )[H])C([H])([H])C([H])([H])[N+]([H])(C([H])([H])C([H])([H])
> C([H])([H])[H])C3([H])[H])c([H])c1[H]
>
> RemoveHs is too heavy a hammer as it removes my cis/trans stereo:
>
> print(Chem.MolToSmiles(Chem.RemoveHs(mol), True))
>
> CCC[NH+]1CCc2c(sc(NC(=O)c3ccc(S(=O)(=O)N(CC=N)CC=N)cc3)c2C(=O)NC)C1
>
>
> I can write the explicit loop myself to only remove hydrogens not part of
> stereo chemistry, but seems like this might be functionality buried
> somewhere in the RDKit.
>
> Thanks,
> Brian
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Preserving hydrogens necessary for imine cis/trans stereochemistry?

2017-05-17 Thread Brian Cole
Is there a recommended way in RDKit to preserve hydrogens necessary for
representing cis/trans stereochemistry of imines?

For example, given the attached SDF I need to maintain explicit hydrogens
in the output SMILES string to maintain the imine cis/trans
stereo-chemistry.

mol = Chem.ForwardSDMolSupplier(open('ZINC23714507.sdf'),
removeHs=False).next()
print(Chem.MolToSmiles(mol, True))

Yields the correct but ugly smiles:
[H]/N=C(/[H])C([H])([H])N(C([H])([H])/C([H])=N\[H])S(=O)(=O)c1c([H])c([H])c(C(=O)N([H])c2sc3c(c2C(=O)N([H])C([H])([H])[H])C([H])([H])C([H])([H])[N+]([H])(C([H])([H])C([H])([H])C([H])([H])[H])C3([H])[H])c([H])c1[H]

RemoveHs is too heavy a hammer as it removes my cis/trans stereo:

print(Chem.MolToSmiles(Chem.RemoveHs(mol), True))

CCC[NH+]1CCc2c(sc(NC(=O)c3ccc(S(=O)(=O)N(CC=N)CC=N)cc3)c2C(=O)NC)C1


I can write the explicit loop myself to only remove hydrogens not part of
stereo chemistry, but seems like this might be functionality buried
somewhere in the RDKit.

Thanks,
Brian


ZINC23714507.sdf
Description: Binary data
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-17 Thread Brian Kelley
Dear All,
  In case it helps, there is a wealth of functional groups already in RDKit
available here:

https://github.com/rdkit/rdkit/blob/master/Data/Functional_Group_Hierarchy.txt

For instance, the functional group halogen pattern we use is a bit more
complicated:

[$([F,Cl,Br,I]-!@[#6]);!$([F,Cl,Br,I]-!@C-!@[F,Cl,Br,I]);!$([F,Cl,Br,I]-[C,S](=[O,S,N]))]

That can (1) help you write your own patterns and (2) be used (from python)
as follows:


from __future__ import print_function
from rdkit import Chem
from rdkit.Chem import FilterCatalog

queryDefs = FilterCatalog.GetFlattenedFunctionalGroupHierarchy()
smiles = "ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"
mol = Chem.MolFromSmiles(smiles)
items = sorted(queryDefs.items())
for name, pat in items:
   print("%s\t%s"%(name, mol.HasSubstructMatch(pat)))


AcidChloride False

AcidChloride.Aliphatic False

AcidChloride.Aromatic False

Alcohol False

Alcohol.Aliphatic False

Alcohol.Aromatic False

Aldehyde False

Aldehyde.Aliphatic False

Aldehyde.Aromatic False

Amine True

Amine.Aliphatic True

Amine.Aromatic False

Amine.Cyclic True

Amine.Primary False

Amine.Primary.Aliphatic False

Amine.Primary.Aromatic False

Amine.Secondary True

Amine.Secondary.Aliphatic True

Amine.Secondary.Aromatic False

Amine.Tertiary False

Amine.Tertiary.Aliphatic False

Amine.Tertiary.Aromatic False

Azide False

Azide.Aliphatic False

Azide.Aromatic False

BoronicAcid False

BoronicAcid.Aliphatic False

BoronicAcid.Aromatic False

CarboxylicAcid False

CarboxylicAcid.Aliphatic False

CarboxylicAcid.AlphaAmino False

CarboxylicAcid.Aromatic False

Halogen True

Halogen.Aliphatic False

Halogen.Aromatic True

Halogen.Bromine False

Halogen.Bromine.Aliphatic False

Halogen.Bromine.Aromatic False

Halogen.Bromine.BromoKetone False

Halogen.NotFluorine True

Halogen.NotFluorine.Aliphatic False

Halogen.NotFluorine.Aromatic True

Isocyanate False

Isocyanate.Aliphatic False

Isocyanate.Aromatic False

Nitro False

Nitro.Aliphatic False

Nitro.Aromatic False

SulfonylChloride False

SulfonylChloride.Aliphatic False

SulfonylChloride.Aromatic False

TerminalAlkyne False


Cheers,
 Brian

On Wed, May 17, 2017 at 9:20 AM, Alexis Parenty <
alexis.parenty.h...@gmail.com> wrote:

> Hi Michal, thanks for your response.
> I think I made a typo somewhere in my previous code since it now works
> fine, even without the the kekule notation... Sorry about the confusion...
> Best,
>
> Alexis
>
> On 17 May 2017 at 13:59, Michal Krompiec 
> wrote:
>
>> Hi Alexis,
>> Try aromatic form instead of Kekule notation.
>> Best,
>> Michal
>>
>> On 17 May 2017 at 12:55, Alexis Parenty 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I am looking for substructure match between a smarts and a smiles, but I
>>> want any heteroatom from the smarts to match any heteroatom from a smiles:
>>>
>>>
>>> [image: Inline images 1]
>>>
>>>
>>>
>>>
>>>
>>> The following does not return what I would expect:
>>>
>>> smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " 
>>> ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"
>>>
>>> mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2)
>>> *print*("mol1 is a substructure of mol2: 
>>> {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of 
>>> mol1: {}".format(mol1.HasSubstructMatch(mol2)))
>>>
>>>
>>>
>>> ð  mol1 is a substructure of mol2: False
>>>
>>> ð  mol2 is a substructure of mol1: False
>>>
>>> How could I do that?
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Alexis
>>>
>>>

>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-17 Thread Alexis Parenty
Hi Michal, thanks for your response.
I think I made a typo somewhere in my previous code since it now works
fine, even without the the kekule notation... Sorry about the confusion...
Best,

Alexis

On 17 May 2017 at 13:59, Michal Krompiec  wrote:

> Hi Alexis,
> Try aromatic form instead of Kekule notation.
> Best,
> Michal
>
> On 17 May 2017 at 12:55, Alexis Parenty 
> wrote:
>
>> Hi everyone,
>>
>> I am looking for substructure match between a smarts and a smiles, but I
>> want any heteroatom from the smarts to match any heteroatom from a smiles:
>>
>>
>> [image: Inline images 1]
>>
>>
>>
>>
>>
>> The following does not return what I would expect:
>>
>> smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " 
>> ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"
>>
>> mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2)
>> *print*("mol1 is a substructure of mol2: 
>> {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of 
>> mol1: {}".format(mol1.HasSubstructMatch(mol2)))
>>
>>
>>
>> ð  mol1 is a substructure of mol2: False
>>
>> ð  mol2 is a substructure of mol1: False
>>
>> How could I do that?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Alexis
>>
>>
>>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-17 Thread Michal Krompiec
Hi Alexis,
Try aromatic form instead of Kekule notation.
Best,
Michal

On 17 May 2017 at 12:55, Alexis Parenty 
wrote:

> Hi everyone,
>
> I am looking for substructure match between a smarts and a smiles, but I
> want any heteroatom from the smarts to match any heteroatom from a smiles:
>
>
> [image: Inline images 1]
>
>
>
>
>
> The following does not return what I would expect:
>
> smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = " 
> ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"
>
> mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2)
> *print*("mol1 is a substructure of mol2: 
> {}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a substructure of 
> mol1: {}".format(mol1.HasSubstructMatch(mol2)))
>
>
>
> ð  mol1 is a substructure of mol2: False
>
> ð  mol2 is a substructure of mol1: False
>
> How could I do that?
>
>
>
> Thanks,
>
>
>
> Alexis
>
>
>>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to match any halogen of a structure with any halogen of a substructure?

2017-05-17 Thread Alexis Parenty
Hi everyone,

I am looking for substructure match between a smarts and a smiles, but I
want any heteroatom from the smarts to match any heteroatom from a smiles:


[image: Inline images 1]





The following does not return what I would expect:

smarts1 = " [F,Cl,Br,I]C1=CC(C2[N,O,S]CC[N,O,S]C2)=CC=C1"smiles2 = "
ClC1=CC(C2NCCOC2)=C(C=CC=C3)C3=C1"

mol1 = Chem.MolFromSmarts(smarts1)mol2 = Chem.MolFromSmiles(smiles2)
*print*("mol1 is a substructure of mol2:
{}".format(mol2.HasSubstructMatch(mol1) *print*("mol2 is a
substructure of mol1: {}".format(mol1.HasSubstructMatch(mol2)))



ð  mol1 is a substructure of mol2: False

ð  mol2 is a substructure of mol1: False

How could I do that?



Thanks,



Alexis


>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss