Re: [Rdkit-discuss] UFF geometry optimization of lanthanide complexes

2015-09-15 Thread Michal Krompiec
Hi Greg,
Thanks for your reply, I'm aware that complexes of this kind are completely
outside of this code's scope. I was just hoping it might still work here.
Indeed, RDKit does not parse this molecule in SMILES format, but I was able
to smuggle it through as MOL (in KNIME). But anyway, as only 6-valent Eu is
defined in UFF, this is not the way forward for my purpose.
Best wishes,
Michal

On 15 September 2015 at 10:54, Greg Landrum  wrote:

> Hi Michal,
>
> The problem here, I think, is that organometallic complexes like this one
> involve bond types that are not well represented by SMILES, which really
> assumes that a Lewis dot structure including shared electron pairs for all
> bonds can be drawn. This is decidedly not the case here, where the molecule
> can't even really properly be read in by the RDKit:
> In [39]: m =
> Chem.MolFromSmiles('[Eu]1234567OC(=CC(=[O]1)C)C.C(C=C(O2)C)(=[O]3)C.C(C=C(O4)C)(=[O]5)C.C1=[N]6C2=C(C=C1)C=CC1=C2[N]7=CC=C1')
> [06:23:05] Explicit valence for atom # 5 O, 3, is greater than permitted
>
> The UFF parameters that are available for Eu are for Eu6+3: an
> octahedrally coordinated Eu+3. Your complex has 8 connections to the Eu, so
> it wouldn't be covered by the UFF parameters anyway.
>
> I just did a bit of playing around to see if I could construct a sample
> molecule with a six-coordinate Eu+3 and make that work. I failed. This may
> be an RDKit bug but I'm not quite sure. The corners of the code for dealing
> with non-organic molecules are dark, dusty, and not particularly well
> tested.
>
> Best,
> -greg
>
>
>
>
> On Mon, Sep 14, 2015 at 2:26 PM, Michal Krompiec <
> michal.kromp...@gmail.com> wrote:
>
>> Hello,
>> I was trying to generate 3D coordinates for an europium complex,
>> [Eu(acac)3(phen)], with UFF, using RDKit nodes in KNIME (UFF is
>> parametrized for lanthanides). Whereas the generation of coordinates seems
>> to produce an almost sensible structure:
>> [image: Inline images 3]
>>
>>  subsequent geometry optimization does not: it moves the Eu atom way
>> outside of the coordination sphere:
>> [image: Inline images 4]
>>
>> Is it something with the bond types not specified correctly, or it is
>> just not supposed to work with this type of molecules at all? The molecule
>> is defined by the following SMILES (created with MarvinSketch):
>>
>> [Eu]1234567OC(=CC(=[O]1)C)C.C(C=C(O2)C)(=[O]3)C.C(C=C(O4)C)(=[O]5)C.C1=[N]6C2=C(C=C1)C=CC1=C2[N]7=CC=C1
>> The same result is obtained with La instead of Eu.
>> Best wishes,
>> Michal
>>
>>
>>
>> --
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] possible SMARTS translating mistake?

2015-09-15 Thread Bodle, Christopher R
All,

I am working on a filtering code in python to search for substructure matches 
against my hit list (in SMILES) and my filter lists (in SMARTS).  My current 
filter lists were copied from Rajarshi Guha's blog at 
http://blog.rguha.net/?p=850.

While working on this I was working with the following SMARTS string from the 
p_l150 collection, filter purrole_A(118):


n2(-[#6]:1:[!#1]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]

I have highlighted the problem area in the string.  Although this should be 
interpreted as 'not H', the rendering generated from Chem.MolFromSmarts does 
indeed result in a hydrogen in this position, which is in the middle of an 
aromatic ring and results in a valency issue and as such I can't standardize 
the mol for filtering purposes.

I confirmed this by making the following edit to the SMILES string:
n2(-[#6]:1:[!#6]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]

Which results in a carbon in the position of the hydrogen from the original 
SMARTS.  Is this a problem with the SMARTS translator?  Or is there something 
that I am missing?

I believe this happens quite frequently.  When running a standardization code 
for the filter p_l150 (55 compounds) using:

p_l150['standardized mol']=''
imax,jmax = p_l150.shape
for i in range(imax):
mol_file =mf= p_l150.loc[i,'mol file']
s = Standardizer()
try:
m = Chem.MolToSmiles(mf)
m2 = standardize_smiles(m)
m3 = Chem.MolFromSmiles(m2)
smol = s.standardize(m3)
p_l150.loc[i,'standardized mol'] = smol
except Exception as e:
print p_l150.loc[i,'filter'], e
p_l150

I return 11 errors, 8 of which are valency (7 of those involve hydrogens):