Re: [Rdkit-discuss] Chembience

2018-10-31 Thread Markus Sitzmann
Hello,

I have releases Chembience 0.2.6 - it switches Python from 3.6 to 3.7 and
updates RDKit to 2018.09.1. Just to mention it, the Docker images of all
previous releases are also still available from Dockerhub.

https://github.com/chembience/chembience/releases/tag/v0.2.6

https://twitter.com/markussitzmann/status/105216581521409

Markus







On Tue, Apr 24, 2018 at 10:44 AM Markus Sitzmann 
wrote:

> Hello,
>
> since it includes RDKit as one of its major components I am happy to
> announce the first release of my new open-source project Chembience:
>
> A Docker-based, cloudable platform for the development of
> chemoinformatics-centric web applications and microservices.
>
> https://github.com/chembience/chembience
>
> (unfortunately it is still on RDKit 2017.09_3, I failed releasing it
> before 2018.03 :-) ).
>
> Best,
> Markus
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sometimes one sanitization is not enough?

2018-10-31 Thread Ivan Tubert-Brohman
Hi Greg,

Thanks for the detailed explanation. You are right that this is not a real
molecule; it came from applying a user-supplied reaction SMARTS. (The
reaction SMARTS was not the best-written perhaps, but that's
tangential...). I normally sanitize the products and skip those that fail
the sanitization, but in this case I was surprised when the sanitized
molecule caused issues later while trying to compute descriptors.

I look forward to a fix, but in the meantime maybe I'll consider running
SanitzeMol twice. :-)

Best,
Ivan


On Wed, Oct 31, 2018 at 2:41 AM Greg Landrum  wrote:

> Hi Ivan,
>
> Short answer: I would not normally expect a second sanitization to fail if
> the first succeeds, but your input SMILES is very odd and triggers a bug.
>
> This is an interesting edge case for the sanitization code because it
> includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope
> this came out of some computational process and isn't a "real" molecule.
> You almost couldn't have picked a better example to highlight the situation
> that's causing the problem here. Some form of congratulations are in order.
> :-)
>
> Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1
> The fundamental problem is that atom 1 (the first nitrogen) has a valence
> of 4 and is neutral...
> If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization
> process produces, I don't think you'd be surprised that the RDKit
> sanitization fails (and your second call to sanitize does fail).
>
> To understand why it passes the first time, you need to understand the
> flow of the sanitization process, described here;
> https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization
> Step 3, updatePropertyCache(), is the part that reports valency errors.
> There's a special case in this code for aromatic atoms that allows atoms
> like the N in Cn11 to pass sanitization even though they are formally
> four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is
> triggering that special case because atom 1 is aromatic in the input
> SMILES. Incorrect aromatic rings that get through this step normally end up
> getting caught later when the molecule is kekulized (step 5). In your case
> there are no aromatic bonds to kekulize, so no error is thrown. The
> aromaticity perception (step 6) does not consider the ring to be aromatic,
> so the final molecule is the equivalent of C1=N(C)C=CN1.
>
> It ought to be possible to clear this in the sanitization code relatively
> easily; I just need to think about it a bit and do a bunch of testing.
>
> -greg
>
>
>
>
>
>
>
>
> On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi,
>>
>> I was surprised to see that a (dubious) structure that goes through
>> SanitizeMol OK can fail a subsequent sanitization call:
>>
>> print("Start")
>> mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False)
>> print("Before first sanitization")
>> Chem.SanitizeMol(mol)
>> print("Before second sanitization")
>> Chem.SanitizeMol(mol)
>> print("Done")
>>
>>
>> The output is:
>>
>> Start
>> Before first sanitization
>> Before second sanitization
>> [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted
>> Traceback (most recent call last):
>>   File "./san.py", line 9, in 
>> Chem.SanitizeMol(mol)
>> ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is
>> greater than permitted
>>
>>
>> Is this an unavoidable aspect of the way SanitizeMol works, since it does
>> several operations (Kekulize, check valencies, set aromaticity, conjugation
>> and hybridization) in a certain order, or should this be considered a bug?
>>
>> Best,
>> Ivan
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Adding certain atomic indexes to functional groups

2018-10-31 Thread Noki Lee
Thanks, I'll check it out.

Until now, it works pretty well!

By the way, I'm curious about the RWmol. What does it do and what is the
difference between GetMol() and EditableMol()?

On Tue, Oct 30, 2018 at 4:36 PM Greg Landrum  wrote:

> Hi,
>
> I think this gist does what you're looking for:
> https://gist.github.com/greglandrum/fd488309268cb085be218f26178e13b8
>
> -greg
>
>
> On Tue, Oct 30, 2018 at 7:20 AM Noki Lee  wrote:
>
>> Hi rdkit-discuss,
>>
>> I'm struggling to add functional groups.
>>
>> What I want is a new function like below.
>>
>> Using the new func(smiles1, smiles2, smiles3, **)
>> it will return as a combined molecule (smiles1 + smiles2 + smiles3) as a
>> new smiles
>> connecting between atomic indexes (i1, j), (i2, k).
>> The smileses should be arbitrary but we know their index orders for each
>> isolated smiles.
>>
>> For example, func('c1c1','C(=O)O','Cl', {3:0}, {5:0})
>> => 'c1cc(C(=O)O)cc(Cl)c1' (This might be worng, but result will be smiles
>> format)
>>
>> The struggle part is to keep the persistent indexes of the core molecule.
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sometimes one sanitization is not enough?

2018-10-31 Thread Greg Landrum
Hi Ivan,

Short answer: I would not normally expect a second sanitization to fail if
the first succeeds, but your input SMILES is very odd and triggers a bug.

This is an interesting edge case for the sanitization code because it
includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope
this came out of some computational process and isn't a "real" molecule.
You almost couldn't have picked a better example to highlight the situation
that's causing the problem here. Some form of congratulations are in order.
:-)

Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1
The fundamental problem is that atom 1 (the first nitrogen) has a valence
of 4 and is neutral...
If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization
process produces, I don't think you'd be surprised that the RDKit
sanitization fails (and your second call to sanitize does fail).

To understand why it passes the first time, you need to understand the flow
of the sanitization process, described here;
https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization
Step 3, updatePropertyCache(), is the part that reports valency errors.
There's a special case in this code for aromatic atoms that allows atoms
like the N in Cn11 to pass sanitization even though they are formally
four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is
triggering that special case because atom 1 is aromatic in the input
SMILES. Incorrect aromatic rings that get through this step normally end up
getting caught later when the molecule is kekulized (step 5). In your case
there are no aromatic bonds to kekulize, so no error is thrown. The
aromaticity perception (step 6) does not consider the ring to be aromatic,
so the final molecule is the equivalent of C1=N(C)C=CN1.

It ought to be possible to clear this in the sanitization code relatively
easily; I just need to think about it a bit and do a bunch of testing.

-greg








On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Hi,
>
> I was surprised to see that a (dubious) structure that goes through
> SanitizeMol OK can fail a subsequent sanitization call:
>
> print("Start")
> mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False)
> print("Before first sanitization")
> Chem.SanitizeMol(mol)
> print("Before second sanitization")
> Chem.SanitizeMol(mol)
> print("Done")
>
>
> The output is:
>
> Start
> Before first sanitization
> Before second sanitization
> [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted
> Traceback (most recent call last):
>   File "./san.py", line 9, in 
> Chem.SanitizeMol(mol)
> ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is
> greater than permitted
>
>
> Is this an unavoidable aspect of the way SanitizeMol works, since it does
> several operations (Kekulize, check valencies, set aromaticity, conjugation
> and hybridization) in a certain order, or should this be considered a bug?
>
> Best,
> Ivan
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss