Re: [Rdkit-discuss] Beta of the 2017.09 release available

2017-10-02 Thread Ivan Tubert-Brohman
I can reproduce this with an older version, but the problem is that you
have RemoveHs instead of removeHs.

On Mon, Oct 2, 2017 at 7:20 AM, Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Dear Greg,
>
>
> I don't know if it's related but I have this issue on my mac version since
> this morning:
>
>
> ---ArgumentError
>  Traceback (most recent call 
> last) in ()> 1 suppl = 
> Chem.SDMolSupplier("/Users/GVALMTGG/Github/neemp/examples/set01.sdf",RemoveHs=False)
>   2 for mol in suppl:  3 for atom in mol.GetAtoms():  4   
>   print str(atom.GetSymbol())+ ":" +str(Atomtype(atom,mol))  5
> ArgumentError: Python argument types in
> SDMolSupplier.__init__(SDMolSupplier, str)
> did not match C++ signature:
> __init__(_object*, std::__1::basic_string std::__1::char_traits, std::__1::allocator > fileName, bool 
> sanitize=True, bool removeHs=True, bool strictParsing=True)
> __init__(_object*)
>
>
>
> Best regards,
>
>
> *Dr. Guillaume GODIN*
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R DIVISION
> DIRECT LINE +41 (0)22 780 3645 <+41%2022%20780%2036%2045>
> MOBILE  +41 (0)79 536 1039 <+41%2079%20536%2010%2039>
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
> --
> *De :* Greg Landrum 
> *Envoyé :* vendredi 29 septembre 2017 09:30
> *À :* RDKit Discuss; RDKit Developers List
> *Objet :* [Rdkit-discuss] Beta of the 2017.09 release available
>
> Dear all,
>
> This morning I tagged the beta of the 2017.09 RDKit release in github:
> https://github.com/rdkit/rdkit/releases/tag/Release_2017_09_1b1
>
> I will try to get some conda builds up over the next day or so. These will
> use the beta label so that they do not install by default; you'll need to
> run "conda install" as follows:
>
> conda install -c rdkit/label/beta rdkit python=3.6
>
> Be sure to confirm that it's installing the right version when you are
> prompted (if there's no build available, it will pick the current
> production release instead).
>
> You can check to see if a build is available for your platform/python
> version here:
> https://anaconda.org/rdkit/rdkit/files?version=2017.09.1.b1
>
> Note that I've done the conda linux builds using a Centos7 system - in the
> past I've used Centos6 - if this causes compatibility problems or if you
> think that this is a bad idea please let me know and I will switch back to
> Centos6 for the builds.
>
> The relevant section of the release notes is below.
>
> As usual, if you have time to try out the new release I would love
> feedback. If nothing major comes up, I plan to do the actual release a week
> from today.
>
> Best,
> -greg
>
>
> # Release_2017.09.1
> (Changes relative to Release_2017.03.1)
>
> ## Important
> - The fix for bug #1567 changes the way fragment SMILES are canonicalized.
>   MolFragmentToSmiles() and canonicalizeFragment() will now often return
>   different results
>
> ## Acknowledgements:
> Brian Cole, Peter Gedeck, Guillaume Godin, Malitha Kabir, Tuomo
> Kalliokoski,
> Brian Kelley, Noel O'Boyle, Matthew O'Meara, Pavel Polishchuk, Cameron Pye,
> Christian Ribeaud, Stephen Roughley, Patrick Savery, Roger Sayle,
> Nadine Schneider, Matt Swain, Paolo Tosco, Alain Vaucher, Sam Webb,
> 'phenethyl', 'xiaotaw'
>
> ## Highlights:
> - The new R-Group decomposition code provides a flexible and powerful tool
> for
>   building R-group tables or datasets look in $RDBASE/Docs/Notebooks for
>   example notebooks showing how to use this.
> - Drawing of chemical reactions has been greatly improved and is now done
> using
>   the C++ rendering code.
> - The MaxMinPicker is dramatically faster.
> - New descriptors: the QED descriptor has been added as have a large
> collection
>   of new 3D descriptors and implementations of the USR and USRCAT
> fingerprints.
>
>
>
> ## New Features and Enhancements:
>   - Bring back USR and USRCAT descriptors
>  (github pull #1417 from greglandrum)
>   - Generate a warning for conflicting bond directions
>  (github issue #1423 from greglandrum)
>   - expose and test GetDrawCoords()
>  (github pull #1427 from greglandrum)
>   - Improvement suggestions for SaltRemover
>  (github issue #1431 from ribeaud)
>   - Remove obsolete scripts from Scripts dir
>  (github pull #1440 from greglandrum)
>   - Support drawing reactions from C++
>  (github pull #1444 from greglandrum)
>   - QED code with unit test file
>  (github pull #1445 from gedeck)
>   - Add support for other datatypes to  ConvertToNumpyArray
>  (github issue #1447 from pyeguy)
>   - - updated FindCairo.cmake
>  (github pull #1455 from ptosco)
>   - - fixes PgSQL CMakeLists.txt to enable conda build on Windows
>  (github pull #1457 from ptosco)
>   - Some cleanups to make Travis builds faster
>  (github pull #1464 from 

[Rdkit-discuss] useQueryQueryMatches with recursive SMARTS

2018-03-07 Thread Ivan Tubert-Brohman
Hi,

Is it reasonable to expect that a SMARTS should match itself when
useQueryQueryMatches=True?

  query = Chem.MolFromSmarts('[C;!$(C=O)]Cl')
  query.HasSubstructMatch(query, useQueryQueryMatches=True)

The above returns False. Without useQueryQueryMatches, it returns True, but
I think I need useQueryQueryMatches for my application because queries are
not generally valid molecules, and in some cases HasSubstructMatch ends up
raising exceptions (for example when the SMARTS specifies hydrogen counts).

If I remove the recursive part of the SMARTS above, I get a match, so I
suspect that the query-query search doesn't play well with recursive SMARTS.

Best,
Ivan
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] useQueryQueryMatches with recursive SMARTS

2018-03-09 Thread Ivan Tubert-Brohman
Hi Greg,

Thank you. After more thought I decided that query-to-query matching is not
the best way to go for my task, regardless of what it does with recursive
SMARTS, so don't lose sleep about it on my behalf. :-) Still, it's good to
know about this behavior for future reference.

Best,
Ivan


On Fri, Mar 9, 2018 at 12:13 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Ivan,
>
> On Wed, Mar 7, 2018 at 8:58 PM, Ivan Tubert-Brohman <ivan.tubert-brohman@
> schrodinger.com> wrote:
>
>>
>> Is it reasonable to expect that a SMARTS should match itself when
>> useQueryQueryMatches=True?
>>
>
> Not my favorite kind of answer to give, but... "it depends"
>
>
>>   query = Chem.MolFromSmarts('[C;!$(C=O)]Cl')
>>   query.HasSubstructMatch(query, useQueryQueryMatches=True)
>>
>> The above returns False. Without useQueryQueryMatches, it returns True,
>> but I think I need useQueryQueryMatches for my application because
>> queries are not generally valid molecules, and in some cases
>> HasSubstructMatch ends up raising exceptions (for example when the
>> SMARTS specifies hydrogen counts).
>>
>> If I remove the recursive part of the SMARTS above, I get a match, so I
>> suspect that the query-query search doesn't play well with recursive SMARTS.
>>
>
> That is the expected, though not particularly nice, behavior. The code is
> currently quite conservative about when it says that two queries match each
> other: if it sees a recursive query it just says "no".
> There may be a straightforward fix for this, but in general making the
> query-query matcher more flexible is one of those "it would be so cool, but
> it's s much work" projects, so I'm not going to make any promises.
>
> Best,
> -greg
>
>
>
>
>> Best,
>> Ivan
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sometimes one sanitization is not enough?

2018-10-31 Thread Ivan Tubert-Brohman
Hi Greg,

Thanks for the detailed explanation. You are right that this is not a real
molecule; it came from applying a user-supplied reaction SMARTS. (The
reaction SMARTS was not the best-written perhaps, but that's
tangential...). I normally sanitize the products and skip those that fail
the sanitization, but in this case I was surprised when the sanitized
molecule caused issues later while trying to compute descriptors.

I look forward to a fix, but in the meantime maybe I'll consider running
SanitzeMol twice. :-)

Best,
Ivan


On Wed, Oct 31, 2018 at 2:41 AM Greg Landrum  wrote:

> Hi Ivan,
>
> Short answer: I would not normally expect a second sanitization to fail if
> the first succeeds, but your input SMILES is very odd and triggers a bug.
>
> This is an interesting edge case for the sanitization code because it
> includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope
> this came out of some computational process and isn't a "real" molecule.
> You almost couldn't have picked a better example to highlight the situation
> that's causing the problem here. Some form of congratulations are in order.
> :-)
>
> Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1
> The fundamental problem is that atom 1 (the first nitrogen) has a valence
> of 4 and is neutral...
> If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization
> process produces, I don't think you'd be surprised that the RDKit
> sanitization fails (and your second call to sanitize does fail).
>
> To understand why it passes the first time, you need to understand the
> flow of the sanitization process, described here;
> https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization
> Step 3, updatePropertyCache(), is the part that reports valency errors.
> There's a special case in this code for aromatic atoms that allows atoms
> like the N in Cn11 to pass sanitization even though they are formally
> four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is
> triggering that special case because atom 1 is aromatic in the input
> SMILES. Incorrect aromatic rings that get through this step normally end up
> getting caught later when the molecule is kekulized (step 5). In your case
> there are no aromatic bonds to kekulize, so no error is thrown. The
> aromaticity perception (step 6) does not consider the ring to be aromatic,
> so the final molecule is the equivalent of C1=N(C)C=CN1.
>
> It ought to be possible to clear this in the sanitization code relatively
> easily; I just need to think about it a bit and do a bunch of testing.
>
> -greg
>
>
>
>
>
>
>
>
> On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi,
>>
>> I was surprised to see that a (dubious) structure that goes through
>> SanitizeMol OK can fail a subsequent sanitization call:
>>
>> print("Start")
>> mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False)
>> print("Before first sanitization")
>> Chem.SanitizeMol(mol)
>> print("Before second sanitization")
>> Chem.SanitizeMol(mol)
>> print("Done")
>>
>>
>> The output is:
>>
>> Start
>> Before first sanitization
>> Before second sanitization
>> [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted
>> Traceback (most recent call last):
>>   File "./san.py", line 9, in 
>> Chem.SanitizeMol(mol)
>> ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is
>> greater than permitted
>>
>>
>> Is this an unavoidable aspect of the way SanitizeMol works, since it does
>> several operations (Kekulize, check valencies, set aromaticity, conjugation
>> and hybridization) in a certain order, or should this be considered a bug?
>>
>> Best,
>> Ivan
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Sometimes one sanitization is not enough?

2018-10-30 Thread Ivan Tubert-Brohman
Hi,

I was surprised to see that a (dubious) structure that goes through
SanitizeMol OK can fail a subsequent sanitization call:

print("Start")
mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False)
print("Before first sanitization")
Chem.SanitizeMol(mol)
print("Before second sanitization")
Chem.SanitizeMol(mol)
print("Done")


The output is:

Start
Before first sanitization
Before second sanitization
[16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted
Traceback (most recent call last):
  File "./san.py", line 9, in 
Chem.SanitizeMol(mol)
ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is
greater than permitted


Is this an unavoidable aspect of the way SanitizeMol works, since it does
several operations (Kekulize, check valencies, set aromaticity, conjugation
and hybridization) in a certain order, or should this be considered a bug?

Best,
Ivan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] number of significant digits in molblock?

2018-10-05 Thread Ivan Tubert-Brohman
Hi Michal,

The old SDF format (aka V2000 CTAB) is column-based, as things often were
in the era of Fortran 77 and punch cards. Not only the precision but also
the exact position of each value on the line is specified! Here's what the
spec says:

The Atom Block is made up of atom lines, one line per atom with the
following format:

x.y.z. aaaddcccssshhhbbbvvvHHHrrriiimmmnnneee

which explains why you see four digits after the decimal point. Also note
that in a huge blow to readability, no spaces are required between the
coordinates; if you have coordinates with five digits before the decimal
point, the numbers run into each other, and if you have even more digits,
the number doesn't even fit! There are also limits in the number of atoms
for similar reasons. But I digress...

In the newer "V3000", the atom line is not column-based, which I believe
gives more freedom to implementers to decide the precision of the
coordinates. You can force RDKit to write in this format by calling
SetForceV3000(True) on your writer object. I tried it and I get 5 digits
after the decimal point instead of 4, so at least that's a start. Looking
at the RDKit code (function GetV3000MolFileAtomLine), it just writes the
coordinates without setting the precision, so what you get is the default
stringstream conversion. Here's where one could in principle adjust this
precision, but there's clearly no API to do so at the moment.

Hope this helps,
Ivan


On Fri, Oct 5, 2018 at 5:44 AM Michal Krompiec 
wrote:

> Hello,
> Is it possible to control the number of significant digits of XYZ
> coordinates? I am modifying coordinates of my molecules
> using SetAtomPosition but when I save them into an SDF it seems that the
> precision is limited to 4 digits after the decimal point (I'd like 10
> instead...).
> Best wishes,
> Michal
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Count rings in bicyclic compounds

2018-12-05 Thread Ivan Tubert-Brohman
Hi Baptiste,

RDKit focuses on "simple rings". As far as I know, it has no builtin
function to return all possible cycles in a molecule.

For a molecule with a "basis set" of N rings, there can be up to 2^N-1 ring
systems, which can be obtained by taking all possible subsets (aka the
powerset) of rings and fusing them. Below is an implementation based on
fusing simple rings. Another possibility would be to write an exhaustive
ring search (DFS or BFS) of the molecular graph and report all cycles that
are found, instead of only the simple ones.

*Warning*: do not run this code on fullerenes or similar molecules unless
you are prepared to wait for a long, long time!

def all_bond_rings(mol):
"""
Generate all ring systems for a molecule. A Ring is a set of bond
indexes.

:type mol: rdkit.Chem.Mol
:rtype: set of int
"""
ring_info = mol.GetRingInfo()
rings = [set(r) for r in ring_info.BondRings()]

# Truncate nrings to the basis set size because RDKit returns redundant
# rings (e.g., 6 instead of 5 for cubane).
nfrags = len(Chem.GetMolFrags(mol))
nrings = mol.GetNumBonds() - mol.GetNumAtoms() + nfrags
del rings[nrings:]

for i in range(1, len(rings)+1):
for comb in itertools.combinations(rings, i):
fused = fuse(comb)
if fused:
yield fused

def fuse(rings):
"""
Return the ring system that results from fusing the given rings, if the
rings are fusable into a single ring system; otherwise None.

:type rings: list of set of int
:rtype: set of in
"""
pending = list(rings)
fused = set(pending.pop())
while pending:
for i in range(len(pending)):
ring = pending[i]
if fused & ring: # rings are fused
fused ^= ring
del pending[i]
break
else:
# None of the pending rings were fusable!
return None
return fused

Hope this helps,
Ivan



On Wed, Dec 5, 2018 at 5:54 AM Baptiste CANAULT 
wrote:

> Hi RDKiters,
>
> I would like to identify all cycles present in a molecular
> structure. However, when the molecules correspond to bicyclic compounds,
> the ring count does not correspond to the number actually observed in the
> structure. Simple example:
>
> >>> m = Chem.MolFromSmiles('C1CC2CCC1O2')
> >>> r = m.GetRingInfo()
> >>> r.NumRings()
> 2
>
> In reality, this molecular structure has 3 cycles with the cyclohexan. Am
> I completely wrong and is there a trick to identify all the cycles present
> in a structure?
>
> Thanks in advance,
>
> Best regards,
>
> Baptiste
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Finding out the origin of product atoms after applying a reaction

2018-09-17 Thread Ivan Tubert-Brohman
Hi Connor,

Thank you for your suggestions! I think the isotope hack will work for me
for now, but for the longer term it would be nicer to have the official
version of RDKit provide sufficient atom mapping information, so I'll
consider that as well.

Ivan

On Mon, Sep 17, 2018 at 11:52 AM, Connor Coley  wrote:

> Hi Ivan,
>
> This is something I ran into a couple years ago - it's a pretty easy fix.
>
> One approach is to update the source with a few lines to copy over the
> atom map numbers from the reactants to the products as a new field. You can
> see the necessary changes to the code in my forked version here:
> https://github.com/rdkit/rdkit/commit/0a8393fbf89e486ed67f2a44f9a7ea
> 8d9f2efd95
>
> Another approach is more hacky but might be good enough for your use case.
> If your reactions don't involve isotopic changes or require specific
> isotopes, you can set unique isotope numbers for every reacting atom. Those
> will be preserved in the products so you can get the atom-atom mapping
> after running the reaction.
>
> Connor
>
> On Mon, Sep 17, 2018 at 10:36 AM Ivan Tubert-Brohman  schrodinger.com> wrote:
>
>> I'd like to know where each atom in a reaction product came from, but as
>> far as I can tell, RDKit doesn't provide enough information. Here's what I
>> found out empirically so far.
>>
>> There are four kinds of product atoms:
>>
>> 1. New atoms: atoms are defined in the product template without a mapping
>> number. These can't be mapped to reactant atoms, so there's no issue.
>>
>> 2. Unmatched atoms: atoms that were not matched by any atom from the
>> reactant template but were carried over to the product because they were
>> connected (directly or indirectly) to a mapped reactant atom. These have
>> the "react_atom_idx" property, which holds the atom index of the atom in
>> the reactant molecule. This is useful as long as the reactant side has only
>> one molecule; when there are multiple molecules, it is not clear which
>> reactant molecule a product atom came from.
>>
>> 3. Mapped dummy atoms: atoms which are defined as dummies in the product
>> template and have a mapping number (e.g., [*:1]). These also get a
>> "react_atom_idx" property, as well as an "old_mapno" property which holds
>> the mapping number (1 for [*:1])
>>
>> 4. Mapped non-dummy atoms: atoms which are NOT defined as dummies in the
>> product template and have a mapping number (e.g., [C:1]).  These have 
>> "old_mapno",
>> but no "react_atom_idx".
>>
>> One thing I tried was to add my own unique atom identifier property to
>> all reactant atoms before applying the reaction, but that didn't help
>> because mapped atoms (types 3 and 4) don't preserve user properties (I
>> guess these atoms are seen as "replacements" for the atoms they mapped,
>> rather than "copies"?)
>>
>> I'd be willing to hack the RDKit source code if necessary and contribute
>> my changes, but before starting such a project I'd like to hear if it's
>> reasonable.
>>
>> One possibility would be to preserve user properties for mapped atoms.
>>
>> Alternatively we could add react_atom_idx to mapped non-dummy atoms, and
>> also add a new "react_mol_idx" property?
>>
>> Best,
>> Ivan
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Finding out the origin of product atoms after applying a reaction

2018-09-17 Thread Ivan Tubert-Brohman
I'd like to know where each atom in a reaction product came from, but as
far as I can tell, RDKit doesn't provide enough information. Here's what I
found out empirically so far.

There are four kinds of product atoms:

1. New atoms: atoms are defined in the product template without a mapping
number. These can't be mapped to reactant atoms, so there's no issue.

2. Unmatched atoms: atoms that were not matched by any atom from the
reactant template but were carried over to the product because they were
connected (directly or indirectly) to a mapped reactant atom. These have
the "react_atom_idx" property, which holds the atom index of the atom in
the reactant molecule. This is useful as long as the reactant side has only
one molecule; when there are multiple molecules, it is not clear which
reactant molecule a product atom came from.

3. Mapped dummy atoms: atoms which are defined as dummies in the product
template and have a mapping number (e.g., [*:1]). These also get a
"react_atom_idx" property, as well as an "old_mapno" property which holds
the mapping number (1 for [*:1])

4. Mapped non-dummy atoms: atoms which are NOT defined as dummies in the
product template and have a mapping number (e.g., [C:1]).  These have
"old_mapno",
but no "react_atom_idx".

One thing I tried was to add my own unique atom identifier property to all
reactant atoms before applying the reaction, but that didn't help because
mapped atoms (types 3 and 4) don't preserve user properties (I guess these
atoms are seen as "replacements" for the atoms they mapped, rather than
"copies"?)

I'd be willing to hack the RDKit source code if necessary and contribute my
changes, but before starting such a project I'd like to hear if it's
reasonable.

One possibility would be to preserve user properties for mapped atoms.

Alternatively we could add react_atom_idx to mapped non-dummy atoms, and
also add a new "react_mol_idx" property?

Best,
Ivan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] problem when doing Chem.MolFromSmiles()

2019-03-13 Thread Ivan Tubert-Brohman
The problem is this line:

>
core_smiles_2='C1=C/C2=C/c3ccc4n3[Zn]n3/c(cc/c3=C/C3=N/C(=C\4)C=C3)=C\C1=N2'

Python is interpreting the \4 as an escape sequence. You either need to
double the backslash or use an "r string" to protect the backslash from
being interpreted that way. That is, either of these should be fine:

>
core_smiles_2=r'C1=C/C2=C/c3ccc4n3[Zn]n3/c(cc/c3=C/C3=N/C(=C\4)C=C3)=C\C1=N2'
>
core_smiles_2='C1=C/C2=C/c3ccc4n3[Zn]n3/c(cc/c3=C/C3=N/C(=C\\4)C=C3)=C\C1=N2'

Ivan



On Wed, Mar 13, 2019 at 6:54 AM Chencheng Fan 
wrote:

> Hello everyone,
>
> I meet a problem that I don’t understand how it happens and how to solve
> it.
>
> First, I generate an rdkit_object from reading a mol file(Porphyrin
> molecule). Then generate Smiles from the rdkit_object and save the Smiles
> string in a parameter. Then use the parameter to generate rdkit_object. The
> code works well until now. However, if I directly try to generate
> rdkit_object from the Smiles string, it will return SMILES Parse
> Error:syntax error for input.
>
> I appreciate very much for your help!
>
> Have a good day!
>
> Best wishes
> Cheng
>
> The mol file is:
>
>
> Code is:
>
> from __future__ import print_function
> import rdkit
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
>
> Corefile='Porphyrin.mol'
> core_rdkit_object=Chem.MolFromMolFile(Corefile)
> core_smiles=Chem.MolToSmiles(core_rdkit_object)
> print('core_smiles',core_smiles)
> core=Chem.MolFromSmiles(core_smiles)
> print('rdkit_object',core)
> coreh=Chem.AddHs(core)
> AllChem.EmbedMolecule(coreh)
> print(Chem.MolToMolBlock(coreh))
>
>
> core_smiles_2='C1=C/C2=C/c3ccc4n3[Zn]n3/c(cc/c3=C/C3=N/C(=C\4)C=C3)=C\C1=N2'
> core=Chem.MolFromSmiles(core_smiles_2)
> print(core)
> coreh=Chem.AddHs(core)
> AllChem.EmbedMolecule(coreh)
> print(Chem.MolToMolBlock(coreh))
>
>
> Error is:
>
> core_smiles
>  C1=C/C2=C/c3ccc4n3[Zn]n3/c(cc/c3=C/C3=N/C(=C\4)C=C3)=C\C1=N2
> rdkit_object.   
> [11:50:20] UFFTYPER: Unrecognized atom type: Zn1+2 (9)
> [11:50:20] SMILES Parse Error: syntax error for input:
> 'C1=C/C2=C/c3ccc4n3[Zn]n3/c(cc/c3=C/C3=N/C(=C)C=C3)=C\C1=N2'
> None
> Traceback (most recent call last):
>  File "test-generate_core.py", line 21, in 
>coreh=Chem.AddHs(core)
> Boost.Python.ArgumentError: Python argument types in
>rdkit.Chem.rdmolops.AddHs(NoneType)
> did not match C++ signature:
>AddHs(RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False,
> boost::python::api::object onlyOnAtoms=None, bool addResidueInfo=False)
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Reaction SMARTS

2019-02-06 Thread Ivan Tubert-Brohman
Hi Jean-Marc,

Try the reaction smarts '[C:1]([OH:2])=[N:3]>>[C:1](=[OH0:2])[NH:3]'. The
only difference is the addition of "H0" to product atom :2. The problem is
that the hydrogen count from the reactant atom gets copied over unless
specified otherwise.

Hope this helps,
Ivan


On Wed, Feb 6, 2019 at 9:37 AM Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Dear All,
>
> I need to convert iminol functional groups into amides.
>
> Being new to reaction SMARTS I wrote the following code:
>
> *
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
> #from rdkit.Chem.Draw import IPythonConsole
>
> sm1 = 'CNC(C)=O'
> m1 = Chem.MolFromSmiles(sm1)
> #m1
> ## m1 is an amide
>
> inchi = Chem.MolToInchi(m1)
> m2 = Chem.MolFromInchi(inchi)
> #m2
> ## m2 is an iminol, as expected from InChI
>
> rxn =
> AllChem.ReactionFromSmarts('[C:1]([OH:2])=[N:3]>>[C:1](=[O:2])[NH:3]')
> ps = rxn.RunReactants((m2,))
> len(ps)
> ## 1
>
> m3 = ps[0][0]
> #m3
> ## m3 is wrong and cannot be sanitized
> print(Chem.MolToSmiles(m3))
> ## CNC(C)=[OH]
>
> rxn =
> AllChem.ReactionFromSmarts('[H:4][0:1][C:2]=[N:3]>>[O:1]=[C:2][N:3][H:4]')
> ps = rxn.RunReactants((m2,))
> len(ps)
> ## 0
> 
>
> The first reaction SMARTS I tried
> ('[C:1]([OH:2])=[N:3]>>[C:1](=[O:2])[NH:3]')
> gives a result but with a trivalent neutral oxygen atom.
> Reading more about SMIRKS theory, I tried:
> '[H:4][0:1][C:2]=[N:3]>>[O:1]=[C:2][N:3][H:4]' without any success.
>
> Could someone indicate me the correct iminol->amide reaction SMARTS?
>
> Best regards,
>
> Jean-Marc
>
> --
> Jean-Marc Nuzillard
> Directeur de Recherches au CNRS
>
> Institut de Chimie Moléculaire de Reims
> CNRS UMR 7312
> Moulin de la Housse
> CPCBAI, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 03 26 91 82 10
> Fax : 03 26 91 31 66
> http://www.univ-reims.fr/ICMR
> http://eos.univ-reims.fr/LSD/CSNteam.html
>
> http://www.univ-reims.fr/LSD/
> http://www.univ-reims.fr/LSD/JmnSoft/
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()

2019-11-06 Thread Ivan Tubert-Brohman
Hi,

For reasons to complicated to get into here, I ended up with a molecule
containing a =CH2 in which one of the hydrogens was explicit and had E/Z
stereo info. For example, consider [H]/C=C/F.

I was surprised that RemoveHs() refused to remove the hydrogen, although
later I found that that's the documented behavior, and generally it makes
sense as a way to prevent the loss of stereochemical information.

For example, compare these two:

In [7]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]/C=C/F')))
Out[7]: '[H]/C=C/F'

In [8]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]C=C/F')))
Out[8]: 'C=CF'

A chemist would say that these two are obviously the same molecule, and
arguably the second representation is better, because a double bond ending
in =CH2 can't have geometric isomers. Maybe it's unreasonable to expect
RDKit to make that kind of inference, but still I wonder, what would be a
good automated way to get from [H]/C=C/F to C=CF?

One idea is to add a "=CH2 cleanup" step, perhaps implemented by applying
this reaction:

[H][C:1]=[*:2]>>[CH2:1]=[*:2]

but perhaps there's a better way?

Best,
Ivan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Incorrect Aromaticity?

2019-10-30 Thread Ivan Tubert-Brohman
It is aromatic according to the RDKit aromaticity model described here:
https://www.rdkit.org/docs/RDKit_Book.html#aromaticity

The O and N each contribute 2 electrons. Each of the carbons shared with
the 6-member ring contribute one electron. The carbonyl is sp2 and
contributes zero electrons. The total is 6, which satisfies the 4n+2 rule.

Ivan

On Wed, Oct 30, 2019 at 3:00 PM Hao  wrote:

> Hello,
>
> It seems like RDKit is making my molecule aromatic when I don't think it
> should it. Here's the original smiles: c1ccc2c(c1)OC(N2)=O. A snippet of
> the workflow:
> [image: image.png]
> As you can see it makes the 5 membered ring aromatic. My chemistry isn't
> strong, so if someone can elucidate what I'm seeing, that would be
> very helpful.
>
> Thanks!
> Hao
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Num Rotors descriptors?

2019-10-15 Thread Ivan Tubert-Brohman
This is from lipinski.cpp:

  if (strict == NonStrict) {
std::string pattern = "[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]";
pattern_flyweight m(pattern);
return m.get().countMatches(mol);
  }
  else if (strict==Strict) {
std::string strict_pattern =

"[!$(*#*)&!D1&!$(C(F)(F)F)&!$(C(Cl)(Cl)Cl)&!$(C(Br)(Br)Br)&!$(C([CH3])("

"[CH3])[CH3])&!$([CD3](=[N,O,S])-!@[#7,O,S!D1])&!$([#7,O,S!D1]-!@[CD3]="

"[N,O,S])&!$([CD3](=[N+])-!@[#7!D1])&!$([#7!D1]-!@[CD3]=[N+])]-!@[!$(*#"

"*)&!D1&!$(C(F)(F)F)&!$(C(Cl)(Cl)Cl)&!$(C(Br)(Br)Br)&!$(C([CH3])([CH3])"
"[CH3])]";
pattern_flyweight m(strict_pattern);
return m.get().countMatches(mol);
  } else {

I suspect that it's taking the "Strict" brach, where the SMARTS pattern is
more complex and actually has a rule that excludes tert-butyl (the
"&!$(C([CH3])([CH3])[CH3])]" at the end).

Ivan


On Tue, Oct 15, 2019 at 1:06 PM Geoffrey Hutchison <
geoff.hutchi...@gmail.com> wrote:

> I'm using a script in RDKit to grab a bunch of descriptors for QSAR / ML.
> I have a particular question about the # of rotatable bonds:
>
> Consider:
>
> from rdkit import Chem
> from rdkit.Chem import Descriptors
> example = Chem.MolFromSmiles("CN(C(=O)OC(C)(C)C)C(=S)NC(=O)OC(C)(C)C")
> Descriptors.NumRotatableBonds(example)
> 0
>
> example2 = Chem.MolFromSmiles("CC(C)(C)CCC(C)(C)C")
> Descriptors.NumRotatableBonds(example2)
> 1
>
>
> Can someone explain why a terminal butyl isn't a rotatable bond? Based on
> mailing list questions, I thought NumRotatableBonds was supposed to return
> matches for the SMARTS "[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]"  (e.g. what's
> defined in the Lipinski code).
>
> Thanks,
> -Geoff
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Num Rotors descriptors?

2019-10-15 Thread Ivan Tubert-Brohman
PS: I found that if you want to use the non-strict version, you can do this:

from rdkit.Chem import rdMolDescriptors
rdMolDescriptors.CalcNumRotatableBonds(mol, 0)

(where 0 means NonStrict, 1 means Strict, and -1 means Default.)

I tested this with a molecule containing t-butyl and got different results
for strict and non-strict, as expected, and the default was the same as
strict.


On Tue, Oct 15, 2019 at 1:57 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> This is from lipinski.cpp:
>
>   if (strict == NonStrict) {
> std::string pattern = "[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]";
> pattern_flyweight m(pattern);
> return m.get().countMatches(mol);
>   }
>   else if (strict==Strict) {
> std::string strict_pattern =
>
> "[!$(*#*)&!D1&!$(C(F)(F)F)&!$(C(Cl)(Cl)Cl)&!$(C(Br)(Br)Br)&!$(C([CH3])("
>
> "[CH3])[CH3])&!$([CD3](=[N,O,S])-!@[#7,O,S!D1])&!$([#7,O,S!D1]-!@[CD3]="
>
> "[N,O,S])&!$([CD3](=[N+])-!@[#7!D1])&!$([#7!D1]-!@[CD3]=[N+])]-!@[!$(*#"
>
> "*)&!D1&!$(C(F)(F)F)&!$(C(Cl)(Cl)Cl)&!$(C(Br)(Br)Br)&!$(C([CH3])([CH3])"
> "[CH3])]";
> pattern_flyweight m(strict_pattern);
> return m.get().countMatches(mol);
>   } else {
>
> I suspect that it's taking the "Strict" brach, where the SMARTS pattern is
> more complex and actually has a rule that excludes tert-butyl (the
> "&!$(C([CH3])([CH3])[CH3])]" at the end).
>
> Ivan
>
>
> On Tue, Oct 15, 2019 at 1:06 PM Geoffrey Hutchison <
> geoff.hutchi...@gmail.com> wrote:
>
>> I'm using a script in RDKit to grab a bunch of descriptors for QSAR / ML.
>> I have a particular question about the # of rotatable bonds:
>>
>> Consider:
>>
>> from rdkit import Chem
>> from rdkit.Chem import Descriptors
>> example = Chem.MolFromSmiles("CN(C(=O)OC(C)(C)C)C(=S)NC(=O)OC(C)(C)C")
>> Descriptors.NumRotatableBonds(example)
>> 0
>>
>> example2 = Chem.MolFromSmiles("CC(C)(C)CCC(C)(C)C")
>> Descriptors.NumRotatableBonds(example2)
>> 1
>>
>>
>> Can someone explain why a terminal butyl isn't a rotatable bond? Based on
>> mailing list questions, I thought NumRotatableBonds was supposed to return
>> matches for the SMARTS "[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]"  (e.g. what's
>> defined in the Lipinski code).
>>
>> Thanks,
>> -Geoff
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()

2019-11-20 Thread Ivan Tubert-Brohman
Thank you, Greg and Andrew, for your replies, and I'm glad to hear that
this is something that can be fixed within RDKit. I had almost forgotten I
had sent this email... :-)

Best,
Ivan

On Wed, Nov 20, 2019 at 12:17 AM Greg Landrum 
wrote:

> Hi Ivan,
>
> I agree that there is a bug here, but I think the problem is actually that
> the double bond is being assigned stereochemistry at all in this case.
>
> In [2]: m = Chem.MolFromSmiles('[H]/C=C/F')
>
>
>
> In [3]: m.Debug()
>
>
> Atoms:
> 0 1 H chg: 0  deg: 1 exp: 1 imp: 0 hyb: 1 arom?: 0 chi: 0
> 1 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
> 2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
> 3 9 F chg: 0  deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0
> Bonds:
> 0 0->1 order: 1 dir: 4 conj?: 0 aromatic?: 0
> 1 1->2 order: 2 stereo: 3 stereoAts: (0 3) conj?: 0 aromatic?: 0
> 2 2->3 order: 1 dir: 4 conj?: 0 aromatic?: 0
>
>
> Given that the two substituents on the first C are the same, the double
> bond shouldn't be marked as STEREOE at all.
>
> I'll get this fixed.
> -greg
>
>
>
> On Wed, Nov 6, 2019 at 4:34 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi,
>>
>> For reasons to complicated to get into here, I ended up with a molecule
>> containing a =CH2 in which one of the hydrogens was explicit and had E/Z
>> stereo info. For example, consider [H]/C=C/F.
>>
>> I was surprised that RemoveHs() refused to remove the hydrogen, although
>> later I found that that's the documented behavior, and generally it makes
>> sense as a way to prevent the loss of stereochemical information.
>>
>> For example, compare these two:
>>
>> In [7]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]/C=C/F')))
>> Out[7]: '[H]/C=C/F'
>>
>> In [8]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]C=C/F')))
>> Out[8]: 'C=CF'
>>
>> A chemist would say that these two are obviously the same molecule, and
>> arguably the second representation is better, because a double bond ending
>> in =CH2 can't have geometric isomers. Maybe it's unreasonable to expect
>> RDKit to make that kind of inference, but still I wonder, what would be a
>> good automated way to get from [H]/C=C/F to C=CF?
>>
>> One idea is to add a "=CH2 cleanup" step, perhaps implemented by applying
>> this reaction:
>>
>> [H][C:1]=[*:2]>>[CH2:1]=[*:2]
>>
>> but perhaps there's a better way?
>>
>> Best,
>> Ivan
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] distinguishing macrocyclic molecules

2019-10-09 Thread Ivan Tubert-Brohman
Hi David,

Thanks for the tip! I just found it in the documentation; the syntax is
[r{12-20}]. See
http://rdkit.org/docs_temp/RDKit_Book.html#smarts-support-and-extensions

Note that this doesn't suffer from the hard-coded limitation I mentioned,
and you can even specify open ranges such as [r{12-}].

Ivan


On Wed, Oct 9, 2019 at 12:35 PM David Cosgrove 
wrote:

> Hi Ivan,
> There is an RDKit extension to SMARTS that allows something like [r12-20].
> I can’t check the exact syntax at the moment. You might want to check that
> atoms are not in smaller rings as well, so as not to pull up things like
> anthracene which might not be something you’d want to class as a macrocycle.
> Cheers,
> Dave
>
> On Wed, 9 Oct 2019 at 14:39, Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi Thomas,
>>
>> I don't know of an RDKit function that directly recognizes macrocycles,
>> but you could find the size of the largest ring this way:
>>
>> ri = mol.GetRingInfo()
>> largest_ring_size = max((len(r) for r in ri.AtomRings()), default=0)
>> if largest_ring_size > 12:
>> ...
>>
>> You can also find if a molecule has a ring of a certain size using
>> SMARTS, but only for rings up to size 20 at the moment (this is an
>> RDKit-specific limit). For example, if you are happy with finding rings of
>> size 12-20, you could use SMARTS [r12,r13,r14,r15,r16,r17,r18,r19,r20].
>> It's ugly but can be handy if you already have SMARTS-based tools to reuse.
>>
>> Ivan
>>
>> On Wed, Oct 9, 2019 at 7:25 AM Thomas Evangelidis 
>> wrote:
>>
>>> Greetings,
>>>
>>> Is there an automatic way to distinguish the macrocyclic molecules
>>> within a large chemical library using RDKit? For example, according to this
>>> definition: Macrocycles are ring structures composed of at least twelve
>>> atoms in the central cyclic framework [1,2,3]. Maybe someone here has a
>>> better definition. Could anyone give me some hints on how to program this?
>>>
>>> I thank you in advance.
>>> Thomas
>>>
>>> 1. Yudin AK (2015) Macrocycles: lessons from the distant past, recent
>>> developments, and future directions. Chem Sci 6:30–49.
>>> 2. Marsault E, Peterson ML (2011) Macrocycles are great cycles:
>>> applications, opportunities, and challenges of synthetic macrocycles in
>>> drug discovery. J Med Chem 54:1961–2004.
>>> 3. Heinis C (2014) Drug discovery: tools and rules for macrocycles. Nat
>>> Chem Biol 10:696–698.
>>>
>>>
>>> --
>>>
>>> ==
>>>
>>> Dr. Thomas Evangelidis
>>>
>>> Research Scientist
>>>
>>> IOCB - Institute of Organic Chemistry and Biochemistry of the Czech
>>> Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>
>>> , Prague, Czech Republic
>>>   &
>>> CEITEC - Central European Institute of Technology
>>> <https://www.ceitec.eu/>, Brno, Czech Republic
>>>
>>> email: teva...@gmail.com, Twitter: tevangelidis
>>> <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
>>> <https://www.linkedin.com/in/thomas-evangelidis-495b45125/>
>>>
>>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>>
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitize molecule with explicit Hydrogens to catch an error

2020-05-11 Thread Ivan Tubert-Brohman
Hi Pablo,

SMILES by definition has implicit hydrogens (enough to satisfy the typicial
valence) for atoms that are not within brackets.

It doesn't matter if you write C, C[H], [H]C[H], or [H]C([H])([H])[H]; they
are all methane. The number of hydrogens that are returned by
GetNumImplicitHs() and GetNumExplicitHs() on the carbon atom do vary,
though. But if you convert back to SMILES, you get "C" in all cases (with
the default options).

If you want an atom with no implicit hydrogens, you have to put it in
brackets. [C] is a lone carbon atom. [CH3] or [H][C]([H])[H] is a methyl
radical.

But you still won't get sanitization errors for such species, because
radicals are legitimate species after all, and I imagine RDKit doesn't want
to guess whether you like radicals or not. So I suspect you'll have to
write your own "sanity" function which uses whatever definition of sanity
you need. For example, you could loop over the atoms and check the value of
GetTotalValence() and compare it with the expected valence for that
element. Or perhaps you could make use of GetNumRadicalElectrons().

Hope this helps,
Ivan

On Mon, May 11, 2020 at 9:37 AM Pablo Ramos 
wrote:

> Dear all,
>
>
>
> I am trying to catch an error every time that a smiles associated to a mol
> object does not exist. To do this, I want to use sanitize function: if the
> smiles is incorrect I will get my error.
>
> My smiles with *explicit hydrogens* is the next one: [H]C([H])O
>
>
>
> I want it to provide an error since valences do not match the ones
> specified for Carbon and Oxygen beingHydrogens already explicit: C_val = 4
> ; O_val = 2
>
>
>
> However, sanitizing this object creates automatically the missing Hydrogens 
> providing a valid smiles: [H]OC([H])([H])[H] and therefore it assumes the 
> smiles is correct.
>
>
>
> Is there any way to specify that my Hydrogens are already explicit during
> sanitazion so I get my error message?
>
>
>
> Thank you so much J
>
>
>
>
>
> Best regards,
>
>
>
> *Pablo Ramos*
>
> Ph.D. at Covestro Deutschland AG
>
>
>
>
>
> covestro.com 
>
> *Telephone*
>
> +49 214 6009 7356
>
>
>
> Covestro Deutschland AG
>
> COVDEAG-Chief Commer-PUR-R
>
> B103, R164
>
> 51365 Leverkusen, Germany
>
> *pablo.ra...@covestro.com *
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment

2020-03-07 Thread Ivan Tubert-Brohman
Hi Curt,

According to
https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions ,
it's not supported:

Here’s the (hopefully complete) list of SMARTS features that are *not*
>  supported:
>
>- Non-tetrahedral chiral classes
>
>
>- the @? operator
>
>
>- explicit atomic masses (though isotope queries are supported)
>
>
>- component level grouping requiring matches in different components,
>i.e. (C).(C)
>
> OK, the way it's worded it sounds like (C.C) might be supported (since
that would be requiring matches in the same component), but as you've seen,
it isn't supported either...

Ivan


On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer 
wrote:

> Hi rdkit fiends!
>
> The [Daylight SMARTS example page](
> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
> gives several examples for "multiple group" smarts, including these strings:
>
> ([Cl!$(Cl~c)].[c!$(c~Cl)])
> ([Cl]).([c])
> ([Cl].[c])
> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
>
> In general, I cannot get these to be parsed by Chem.MolFromSmarts().
>
> For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
> this error message:
>
> ```
> [13:01:41] SMARTS Parse Error: syntax error while parsing:
> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS
> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
> ```
> My understanding of SMARTS is that the outermost parentheses in this
> SMARTS string are required to force the chlorine and the aromatic carbon to
> be somewhere in the same covalently connected fragment.  E.g. this pattern
> *should* hit benzyl chloride ClCc1c1 but should *not* hit the
> hydrochloride salt of aniline Cl.Nc1c1.
>
> What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
> that achieves this?  (I want to filter our molecules that contain more than
> one of certain moieties, while allowing molecules that have one (or zero)
> such moieties.  But salts or covalently disconnected fragments that each
> contain one instance of the moiety should be fine.)
>
> Details on my setup:
>
> - RDKit Version: 2019.09.3
> - Operating system: macOS 10.15.2
> - Python version (if relevant): 3.6
> - Are you using conda? yes
> - If you are using conda, which channel did you install the rdkit from?
> `conda-forge`
> - If you are not using conda: how did you install the RDKit?
>
> Curt
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Display Molecules within IPython Console

2020-09-01 Thread Ivan Tubert-Brohman
Hi Vin,

If you are running the IPython console on a terminal emulator that supports
graphics, you could display the molecule by printing out the necessary
terminal escape codes followed by the image buffer. The solution is
terminal-specific; here's an example that works using the Kitty terminal:

from base64 import standard_b64encode
import io

def serialize_gr_command(cmd, payload=None):
   cmd = ','.join('{}={}'.format(k, v) for k, v in cmd.items())
   ans = []
   w = ans.append
   w(b'\033_G'), w(cmd.encode('ascii'))
   if payload:
  w(b';')
  w(payload)
   w(b'\033\\')
   return b''.join(ans)

def write_chunked(cmd, data):
   data = standard_b64encode(data)
   while data:
  chunk, data = data[:4096], data[4096:]
  m = 1 if data else 0
  cmd['m'] = m
  sys.stdout.buffer.write(serialize_gr_command(cmd, chunk))
  sys.stdout.flush()
  cmd.clear()

def cat_mol(mol):
img = Chem.Draw.MolToImage(mol)
buf = io.BytesIO()
img.save(buf, format='png')
write_chunked({'a': 'T', 'f': 100}, buf.getvalue())

And then you can do something like this:
[image: image.png]

I'm sure the above could be adapted to work with iterm2 or other terminals
that support graphics, but that's left as an exercise to the reader.

An alternative I often use when graphics are not available is to render the
structure as ASCII art, but that may be an acquired taste. The above would
look like this:

[image: image.png]

I'll leave the code that does that for some other day.

Best,
Ivan

On Tue, Sep 1, 2020 at 9:57 AM Scalfani, Vincent  wrote:

> Hello,
>
> Is it possible to display a molecule image directly in an IPython console
> (not a Jupyter Notebook)? Or maybe I need to send the image file directly
> to my image viewer? I would like to be able to quickly view the molecules
> without using a Jupyter Notebook or having to save the PNGs. For example:
>
> In [6]: from rdkit import Chem
>...: from rdkit.Chem.Draw import IPythonConsole
>...: from rdkit.Chem import Draw
>
>
> In [7]: m = Chem.MolFromSmiles('c1ncncc1C(=O)[O-]')
>
>
> In [8]: m
>
> Out[8]: 
>
> In [9]: Chem.Draw.MolToImage(m)
>
> Out[9]:  0x...>
>
>
> Thanks for your help.
>
> Vin
>
> ---
>
> Vincent F. Scalfani
> The University of Alabama
>
> 
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] difference between _CalcMolWt vs CalcExactMW?

2020-10-15 Thread Ivan Tubert-Brohman
Hi Steven,

MolWt uses naturally occurring average atomic weights, the ones you find in
a typical periodic table. For example, Cl = 35.453.

ExactMolWt uses the weight of a specific isotope (the most naturally
abundant isotope unless the structure specifies a different one for an
atom). These are the atomic weights you find in an isotope chart, not a
regular periodic table. For example, 35Cl = 34.96885268.

Hope this helps,
Ivan

On Thu, Oct 15, 2020 at 4:29 PM Steven Pak 
wrote:

> Hello,
>
> I have a couple of questions.
> 1. I was wondering what are the detailed calculation methods on
> both _CalcMolWt (python: from the rdMolDescriptors module) vs CalcExactMW
> (CPP)? Could you help me direct to literature on how these are calculated?
>
> 2. _CalcMolWt can be found in the QED.py code, which calculates MW for the
> calculation of QED score. I am personally trying to create a CPP version of
> the QED score but I don't see a _CalcMolWt function for the CPP verison.
> All I can find in the API documentation is the CalcExactMW function. Do you
> know what is the equivalent CPP MW calculator that is found in the QED.py
> code of RDKit?
>
> Thanks for your help
> Steven.
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] proper technical term for generating virtual compounds with rdkit and smarts

2020-09-25 Thread Ivan Tubert-Brohman
We use "reaction-based enumeration" to distinguish it from "R-group
enumeration". Both are types of virtual library enumeration.

R-group enumeration allows you to attach any R-group anywhere. It is simple
and fast but you can easily create implausible (or hard to synthesize)
molecules if you are not careful.

Reaction-based enumeration at least guarantees that you already have
reactants with the right functional groups in the right places, and if you
make your reactions sophisticated enough you can even check for other
disqualifying conditions such as interfering functional groups.

Of course in a context where it is not immediately obvious that you are
talking about doing it on a computer, I suppose something along the lines
of "virtual reaction-based enumeration" might make sense.

Best,
Ivan


On Fri, Sep 25, 2020 at 4:26 AM Thomas Strunz  wrote:

> Hi Brian,
>
> commercial tools usually use the term "reaction-based enumeration" or
> "reaction-based library design".
>
> Best Regards,
>
> Thomas
>
> --
> *Von:* Bennion, Brian via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net>
> *Gesendet:* Freitag, 25. September 2020 07:19
> *An:* RDKit Discuss 
> *Betreff:* [Rdkit-discuss] proper technical term for generating virtual
> compounds with rdkit and smarts
>
> hello
>
> I have a paper in review and is intended for a large audience that has
> synthetic chemists, biologist and comp chem.
> One reviewer had issues with the term in-silico syntheses.
> I used rdkit and smarts reactions to generate large libraries of compounds
> for our research project.  Is there a better term to use?  I feel "chemical
> enumeration" is just as foreign.
>
> The abstract is below.
>
> The current standard treatment for organophosphate poisoning primarily
> relies on the use of small molecule-based oximes that can efficiently
> restore acetylcholinesterase (AChE) activity.  Despite their efficacy in
> reactivating AChE, the action of drugs like 2-pralidoxime (2-PAM) is
> primarily limited to the peripheral nervous system (PNS) and, thus,
> provides no protection to the central nervous system (CNS).  This lack of
> action in the CNS stems from the ionic nature of the drugs; they cannot
> cross the blood-brain barrier (BBB) to access to any nerve
> agent-inhibited AChE therein.  In this report, we present a small
> molecule oxime, called LLNL-02, that can diffuse across the BBB for
> reactivation of nerve agent-inhibited AChE in the CNS.  Our 
> candidate-development
> approach utilizes a combination of parallel chemical and in - silico
> syntheses, computational modeling, and a battery of detailed in vitro and in
> vivo assessments that have identified LLNL-02 as a top CNS-active candidate
> against nerve agent poisoning.   Additional experiments to determine acute
> and chronic  toxicity as required for regulatory approval are ongoing.
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Missing atom indices in the last structure

2020-09-22 Thread Ivan Tubert-Brohman
Hi Norwid,

The inner loop over mols here:

for i in smiles_list:

for mol in mols:
for atom in mol.GetAtoms():
atom.SetAtomMapNum(atom.GetIdx())

mols.append(Chem.MolFromSmiles(i))

is not in the right place. First, because you'll go over the same mol
multiple times unnecessarily, but also, because it happens before appending
to mols, the mol corresponding to the last item in smiles_list won't be
labelled, as you noted. Either move the loop over mols out and after the
the loop over smiles_list, or better yet, get rid of the inner loop
altogether:

for i in smiles_list:
mol = Chem.MolFromSmiles(i)
for atom in mol.GetAtoms():
atom.SetAtomMapNum(atom.GetIdx())
mols.append(mol)

Best regards,
Ivan

On Tue, Sep 22, 2020 at 6:39 AM Norwid Behrnd via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Starting with lists consisting of a compound identifier, an explicit space,
> and a SMILES string, I would like to generate illustrations about these
> structures including RDKit's atom indices.  What is puzzling to me is that
> consistently the last entry / compound list is converted into a structure
> representation where none of the atoms is labeled by the index.  But why?
> Is it possible that erred on putting the outer structure (looping over the
> molecules) together with the inner (attribute the atom indices)?
>
> Initially written in Python 3.8.6 backed by RDKit 2019.09.1 as available
> in Debian unstable, I post below the code exported from the Jupyter
> Notebook
> (IPython 7.18.1) which runs _as such_ from the CLI of python (the hard line
> limit is set to 80 chars/line):
>
>  8><  start script 
> from rdkit import Chem
> from rdkit.Chem.Draw import IPythonConsole
> from rdkit.Chem import Draw
>
> smiles = []
> input_file = str("list_1.csv")
>
>
> def draw_multiple_mol(smiles_list, mols_per_row=4, file_path=None):
> mols = []
> for i in smiles_list:
>
> for mol in mols:
> for atom in mol.GetAtoms():
> atom.SetAtomMapNum(atom.GetIdx())
>
> mols.append(Chem.MolFromSmiles(i))
> mols_per_row = min(len(smiles_list), mols_per_row)
>
> img = Draw.MolsToGridImage(mols,
>molsPerRow=4,
>subImgSize=(300, 300),
>useSVG=True)
> if file_path:
> with open(file_path, 'w') as f_handle:
> f_handle.write(img.data)
> return img
>
>
> with open(input_file, mode="r") as source:
> for line in source:
> smiles_entry = str(line).split()[1]
> smiles_entry = smiles_entry.strip()
> smiles.append(smiles_entry)
>
> draw_multiple_mol(smiles, file_path='output.svg')
>  8><  end script 
>
> The input files read -- both yielding an image with the missing annotation
> -- are the two of either file list_1.csv:
>
>  8><  start list_1.csv 
> compound_1 C1=CSC=C1
> compound_2 C1C=C(SC=1)C
> compound_3 C1C=C(OC=1)C=O
> compound_4 C1=C(C(=CC2)OC=2)OC=C1
> compound_5 C1(C(OC2)=CC=2)OC(=CC=1)C1OC(=CC=1)C
> compound_6 C1(C)=C(C)C(=C(O1)C)C
> compound_7 C1(Br)=C(C)C(=C(O1)Br)C
>  8><  end list_1.csv 
>
> or of the one of file list_2.csv:
>
>  8><  start list_2.csv 
> compound_1 C1=CSC=C1
> compound_2 C1=C(C(=CC2)SC=2)SC=C1
> compound_3 C2=CC1=C(C=CS1)S2
> compound_4 C3=CC1=C(C2=C(S1)C=CS2)S3
> compound_5 C3=CC1=C(C2=C(S1)C=C(S2)CC)S3
>  8><  end list_2.csv 
>
> Norwid
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Hybridization state

2020-05-26 Thread Ivan Tubert-Brohman
Hi Jean-Marc,

RDKit says that the oxygen is sp2 because it has a special rule that
considers the conjugation. Whether that is the "true" hybridization for the
oxygen could be a long debate; I sometimes hear that it's somewhere between
sp2 and sp3, perhaps not as close to sp2 as the nitrogen in amides, but
that's all fuzzy and in the end a cheminformatics toolkit needs to have a
clear rule.

As far as I can tell here's the relevant code:
https://github.com/greglandrum/rdkit/blob/d41752d558bf7200ab67b98cdd9e37f1bdd378de/Code/GraphMol/ConjugHybrid.cpp#L172

I'll quote the comment here (the "case 4" refers to the return value of
numBondsPlusLonePairs() for the atom).

case 4:
  // potentially SP3, but we'll set it down to SP2
  // if we have a conjugated bond (like the second O
  // in O=CO)
  // we'll also avoid setting the hybridization down to
  // SP2 in the case of an atom with degree higher than 3
  // (e.g. things like CP1(C)=CC=CN=C1C, where the P
  //   has norbs = 4, and a conjugated bond, but clearly should
  //   not be SP2)
  // This is Issue276

Hope this helps,
Ivan

On Tue, May 26, 2020 at 8:10 AM Jean-Marc Nuzillard <
jm.nuzill...@univ-reims.fr> wrote:

> Dear all,
>
> I recently arrived to this:
>
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles("C(=O)OC")
> >>> for x in m.GetAtoms():
> ... if x.GetSymbol() == 'O':
> ... print(repr(x.GetHybridization()))
> ...
> rdkit.Chem.rdchem.HybridizationType.SP2
> rdkit.Chem.rdchem.HybridizationType.SP2
> >>>
>
> even though an oxygen atom is really SP2 but the other one is SP3.
> I use version 2020.03.1 of RDKit.
>
> Best regards,
>
> Jean-Marc
>
> --
>
> Dr. Jean-Marc Nuzillard
> Institute of Molecular Chemistry, CNRS UMR 7312
> Faculté des Sciences Exactes et Naturelles, Bâtiment 18
> BP 1039
> 51687 REIMS Cedex 2
> France
>
> Tel : 33 3 26 91 82 10
> Fax : 33 3 26 91 31 
> 66http://www.univ-reims.fr/icmrhttp://eos.univ-reims.fr/LSD/CSNteam.html
>
> ORCID: 
> -0002-5120-2556http://www.univ-reims.fr/LSD/http://www.univ-reims.fr/LSD/JmnSoft/
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Scalability of Postgres cartridge

2020-06-10 Thread Ivan Tubert-Brohman
Thank you everyone for the suggestions. For now I don't have immediate
plans to adopt the cartridge but it's good to know these things when the
time comes.

Best,
Ivan

On Mon, Jun 8, 2020 at 6:49 PM Finnerty, Jim via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> If you have a billion molecule data source and would like to try an
> at-scale test, I'd be willing to help out with provisioning the hardware,
> looking at the efficiency of the plans, etc., using rdkit with Aurora
> PostgreSQL.
>
> If I understand how the rdkit GIST index filtering mechanism works for a
> given similarity metric, a parallel GIST index scan ought to be able to
> scale almost linearly scale with the number of cores, provided that the
> RDBMS is built on a scalable storage subsystem.
>
> If so, the largest instance size that's currently supported has 96 cores,
> so we can do a fairly high degree of parallelism.
>
> On 6/5/20, 1:07 PM, "dmaziuk via Rdkit-discuss" <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
> On 6/5/2020 4:45 AM, Greg Landrum wrote:
>
> > Having said that, the team behind ZINC used to use the RDKit
> cartridge with
> > PostgreSQL as the backend for ZINC. They had the database sharded
> > across multiple instances and managed to get the fingerprint indices
> to
> > work there. I don't remember the substructure search performance
> being
> > terrible, but it wasn't great either. They have since switched to a
> > specialized system (Arthor from NextMove software), which offers
> > significantly better performance.
>
> Generally speaking a database of a billion rows needs hardware capable
> of running it. Buy a server with 1TB RAM and 64 cores and a couple of
> U.2 NVME drives and see how Postgres runs on that.
>
> Then you need to look at the database, e.g. query in an indexed
> billion-row table could be OK but inserting a billion-first row will
> not be.
>
> If you want to scale to these kinds of volumes, you need to do some
> work.
>
> (And much of the point of no-sql hadoop "cloud" workflows is that if
> you
> can parallelize what you're doing to multiple machines, at some data
> size they will start outperforming a centralized fast search engine.)
>
> Dima
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Scalability of Postgres cartridge

2020-06-04 Thread Ivan Tubert-Brohman
Hi,

I've never tried the RDKit PostgreSQL cartridge but I'm curious about it.
In particular I wonder how far have people pushed it in terms of
database size. The documentation gives examples with several million rows;
has anyone tried it with a couple billion rows? How fast are substructure
queries with databases of that size? How much storage is needed after
accounting for the fingerprints etc.

Best regards,
Ivan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] substructure matching

2020-07-21 Thread Ivan Tubert-Brohman
Hi Quoc-Tuan,

I can't reproduce your observations; I get True in both cases. Which
version of RDKit are you using?

One thing to note is that you are parsing a SMARTS with MolFromSmiles. I
wouldn't recommend that in general, although it appears that in this case
RDKit is lenient enough to accept "~"  in a SMILES and turn it into a
QueryBond. What happens if you use MolFromSmarts instead?

Ivan

On Tue, Jul 21, 2020 at 8:12 AM Quoc-Tuan DO 
wrote:

> Hello,
>
> I am not very familiar with smiles/smarts and find the following results
> quite puzzling:
>
> >>> patt =
> Chem.MolFromSmiles('c1ccc(cc1)C~C2NC~Cc3c23.c1ccc(cc1)C~C2NC~Cc3c23')
>
> >>> mol =
> Chem.MolFromSmiles('COc1ccc2cc1Oc1ccc(cc1)CC1N(C)CCc3c1c1Oc4cc5C(C2)NCCc5cc4Oc1c(c3)OC')
>
> >>> print mol.HasSubstructMatch(patt)
>
> False
>
> >>> mol =
> Chem.MolFromSmiles('COc1ccc7cc1Oc2ccc(cc2)CC3N(C)CCc4c3cc(c(c4)OC)Oc5ccc6c(c5)CCNC6C7')
>
> >>> print mol.HasSubstructMatch(patt)
>
> True
>
> It seems that a presence of an extra Ph* - O - *Ph makes the difference
> but I am not sure why. How should the smarts be to have positive results
> for both smiles ?
>
> Thank you in advance for your help.
>
> Best regards,
>
> Quoc-Tuan
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] sanitization converts "I(=O)(=O)[O-]" into "[O-][I+2]([O-])[O-]"

2021-01-22 Thread Ivan Tubert-Brohman
I think there was some confusion between left and right in the original
message, but RDKit prefers the representation that preserves the octet at
the expense of having more formal charges:

In [9]: mol = Chem.MolFromSmiles('O=I(=O)([O-])')


In [10]: Chem.MolToSmiles(mol)

Out[10]: '[O-][I+2]([O-])[O-]'

I don't think there's right and wrong here; it's just a matter of a toolkit
picking a canonical convention.

Carboxylates are different in that the popular representation (C(=O)[O-])
doesn't break the octet rule. But another interesting case is nitro groups:

In [11]: mol = Chem.MolFromSmiles('CN(=O)=O')


In [12]: Chem.MolToSmiles(mol)

Out[12]: 'C[N+](=O)[O-]'

Ivan

On Thu, Jan 21, 2021 at 9:06 PM Peter S. Shenkin  wrote:

> It seems to me offhand RDKit's choice is analogous the way carboxylates
> are generally notated:
>
> R-C(=O)O- rather than R-C+(O-)O- .
>
> Both are legitimate and in fact equivalent upon application of
> chemical knowledge, but do you prefer the second representation for
> carboxylates?
>
> -P.
>
>
>
>
>
> On Thu, Jan 21, 2021 at 2:06 PM Jason Biggs  wrote:
>
>> The RDKit will always convert iodate from the form on the left, with an
>> expanded octet on iodine and a single negative charge, into the form on the
>> right with all single bonds and a charge on every atom (image here
>> https://i.stack.imgur.com/hq3St.png).  This happens no matter how I
>> import the molecule, from SMILES or from a file.  The only way to avoid it
>> is to skip sanitization, which I'd rather avoid.
>>
>> Is this the desired behavior?
>>
>> Thanks
>> Jason
>>
>> [image: image.png]
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Multiprocessing/Threading in Python/Rdkit

2021-06-12 Thread Ivan Tubert-Brohman
Hi Philipp,

This is an embarrassingly parallel problem (that's the actual technical
term, so no need to feel embarrassed. :-), meaning there's no need for
communication between threads or processes, which makes it really easy:
just split the search space, run a separate job for each fraction, and
combine the outputs at the end.

For example, if your building block file "A" has 100 structures and you
want to use 4 CPUs, just split it into four files of 25 structures, run a
job using each file, and concatenate the outputs when the four jobs are
done. IMHO there's no need for any fancy multiprocessing modules; running a
completely independent process gives you more flexibility (you can run each
subjob on a different machine, submit it to a queue, email it to your
cousin, or whatever...)

Ivan

On Fri, Jun 11, 2021 at 6:47 AM Philipp Otten 
wrote:

> Dear all lovely people,
> first of all, I'm rather new to to programming/python/Rdkit and probably
> my issue is quite easy to solve if you're more experienced.
> So I wrote a Python-Sycript simulating a reaction-workflow for a multi
> step synthesis with a lot of different building blocks. The program works
> as intended, but it takes a lot of time (few days) because it uses only one
> CPU thread. Therefore I thought about using more via
> multiprocessing/multithreading, but I couldn't get it to run. I tried a
> lot, but didn't even figure out exactly where to start. Maybe you guys can
> give me a hint in the right direction?
>
> First of all the "Synthesis-Class":
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
>
> class Synthesis:
> """Combination of the single synthesis steps."""
>
> def __init__(self,
>  sdf_a, sdf_b, sdf_f, sdf_k,
>  s1_smarts, s2_smarts, s3_smarts, s4_smarts, s5_smarts,
>  ):
> """Initialize building block molecules"""
> self.sdf_a = sdf_a
> self.sdf_b = sdf_b
> self.sdf_f = sdf_f
> self.sdf_k = sdf_k
> self.s1_smarts = s1_smarts
> self.s2_smarts = s2_smarts
> self.s3_smarts = s3_smarts
> self.s4_smarts = s4_smarts
> self.s5_smarts = s5_smarts
>
> def react1(self, t1):
> """First step of the reaction"""
> rxn1 = AllChem.ReactionFromSmarts(self.s1_smarts)
> with open(t1, "w") as t1:
> with open(self.sdf_a, "r") as f_a:
> while True:
> line_a = f_a.readline()
>
> if not line_a:
> break
>
> line_a = Chem.MolFromSmiles(line_a)
>
> with open(self.sdf_b) as f_b:
> while True:
> line_b = f_b.readline()
>
> if not line_b:
> break
>
> line_b = Chem.MolFromSmiles(line_b)
>
> p1 = rxn1.RunReactants((line_a, line_b))
> p1 = [x for t in p1 for x in t]
> for x in p1:
> x = Chem.MolToSmiles(x)
> t1.write(x)
> t1.write("\n")
> f_a.close()
> f_b.close()
> t1.close()
>
> def react2(self, t1, t2):
> """Second step of the synthesis"""
> rxn2 = AllChem.ReactionFromSmarts(self.s2_smarts)
> with open(t2, "w") as t2:
> with open(t1, "r") as f_c:
> while True:
> line_c = f_c.readline()
> if not line_c:
> break
>
> line_c = Chem.MolFromSmiles(line_c)
>
> if line_c == None:
> pass
> else:
> p2 = rxn2.RunReactants((line_c,))
> p2 = [x for t in p2 for x in t]
> for x in p2:
> x = Chem.MolToSmiles(x)
> t2.write(x)
> t2.write("\n")
> f_c.close()
> t2.close()
>
> def react3 (self, t2, t3):
> """Third step of the synthesis"""
> rxn3 = AllChem.ReactionFromSmarts(self.s3_smarts)
> with open(t3, "w") as t3:
> with open(t2, "r") as f_d:
> while True:
> line_d = f_d.readline()
>
> if not line_d:
> break
>
> line_d = Chem.MolFromSmiles(line_d)
>
> p3 = rxn3.RunReactants((line_d,))
> p3 = [x for t in p3 for x in t]
> for x in p3:
> x = Chem.MolToSmiles(x)
> t3.write(x)
> t3.write("\n")
> f_d.close()
> 

Re: [Rdkit-discuss] Hydrogens not recognised as Dummy Atoms?

2021-07-08 Thread Ivan Tubert-Brohman
Hi Adelene,

You can't match an atom that doesn't exist as a node in the molecular
graph, so if you really want to match a hydrogen, you'll have to add
explicit hydrogens to your molecule:

molh = Chem.AddHs(mol)
molh.HasSubstructMatch(q1)

> True

However, if all you want to know is whether the oxygen is next to a
hydrogen, you can make the hydrogen count a property of the oxygen atom by
using SMARTS:

q3s = 'CCOCCOCCC[OH]'

q3 = Chem.MolFromSmarts(q3s)
mol.HasSubstructMatch(q3)
> True

Hope this helps,
Ivan
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Error in RDKit output for finding ring atoms!

2021-03-11 Thread Ivan Tubert-Brohman
Hi Goutam,

The ring atoms reported by RDKit in your example are correct; you just need
to consider that the atom indexes correspond to the position of each atom
in the SMILES string. How could RDKit guess the index that the atom might
have in a PDB file that's not even being read in your example?

I'm guessing maybe in your real use case you did read the PDB file. It is
possible that the atoms got renumbered, for example if the hydrogens were
deleted in the process.

Hope this helps,
Ivan

On Thu, Mar 11, 2021 at 11:35 AM Goutam Mukherjee 
wrote:

> Dear Members,
>
> I have found an error in RDKit output. I am not sure whether it is my
> mistake.
> I have a SMILES code of  a molecule:
> C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@
> @H]1O)N1C=NC2=C1N=CN=C2N
>
> the 3D coordinates of the molecule is attached here with.
>
> *I ran the following command:*
>
>
>
>
>
>
>
> *In [1]: from rdkit import ChemIn [2]: m =
> Chem.MolFromSmiles('C[S+](CC[C@H](N)C([O-])=O)C[C@H]1O[C@H]([C@H](O)[C@@H]1O)N1C=NC2=C1N=CN=C2N')In
> [3]: ri = m.GetRingInfo()In [4]: print(ri.AtomRings())((10, 11, 12, 13,
> 15), (18, 17, 21, 20, 19), (22, 23, 24, 25, 20, 21))*
>
> Here the atom rank does not correspond to the ring atom ranks.
> This molecule contains two five members and one six member ring
> The true atom rank would be
> *{(11, 13, 14, 16, 19), (22, 23, 24, 25, 26), (25, 26, 27, 28, 29, 30)}*
>
> Could anyone please give me a solution how I get a corect atom ranks which
> are part of a ring.
>
> Thanks and Best Regards,
> Goutam
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search racemic compounds only

2021-03-17 Thread Ivan Tubert-Brohman
Hi Lauren,

SMARTS doesn't have a direct way of saying an atom is non-racemic, but you
can express that idea using recursive SMARTS. For example,

In [46]: racemic =
Chem.MolFromSmiles('c12c1cncc2NC(=O)C(CCO2)c1cc(Cl)ccc12')

In [47]: chiral1 = Chem.MolFromSmiles('c12c1cncc2NC(=O)[C@H
](CCO2)c1cc(Cl)ccc12')

In [48]: chiral2 = Chem.MolFromSmiles('c12c1cncc2NC(=O)[C@
@H](CCO2)c1cc(Cl)ccc12')

In [49]: [m.HasSubstructMatch(Chem.MolFromSmarts('c12c1cncc2NC(=O)
[CH;!$([@])](CC)c1cc(Cl)ccc1'), useChirality=True) for m in [racemic,
chiral1, chiral2]]

Out[49]: [True, False, False]

Where the highlighted atom [CH;!$([@])] means "a carbon with a hydrogen AND
not a chiral atom".

Hope this helps,
Ivan

On Wed, Mar 17, 2021 at 6:18 AM Lauren Reid 
wrote:

> Hi,
>
> I would like to perform a substructure search in which a racemic chiral
> SMARTS finds only racemic compounds and not those that have specified
> stereochemistry, e.g these compounds from the COVID moonshot project:
>
> Does anyone know if there’s a way to specify this distinction in an rdkit
> substructure search?
>
> Thanks,
>
> Lauren
>
> Dr Lauren Reid
> Computational Chemist / Developer
> lauren.r...@medchemica.com
> www.medchemica.com
>
> Medchemica Ltd is a company registered in England and Wales with company
> number 8162245.
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS representing a fragment (with "unbonded" bonds)

2021-03-05 Thread Ivan Tubert-Brohman
Hi Thomas,

I believe what you want can be done using recursive SMARTS and disconnected
SMARTS. For example,

In [7]: mol = Chem.MolFromSmiles('CCC=C')

In [8]:
mol.GetSubstructMatches(Chem.MolFromSmarts('[$(C-*)].CC.[$(C=*)]'))

Out[8]: ((0, 1, 2, 3),)

The recursive SMARTS let you match a single atom, but specifying its
context. [$(C=*)] means match any atom, as long as it's a carbon with a
double bond to any other atom. Importantly, the "any other atom" is not
"consumed", so it can still be matched elsewhere in the SMARTS.

The SMARTS above won't guarantee that there are no gaps, but you could
independently check that the number of atoms in the molecule equals the
number of atoms in the SMARTS.

Hope this helps,
Ivan



On Fri, Mar 5, 2021 at 7:36 AM Thomas  wrote:

> Is it possible to search for a fragment that is not a valid structure
> itself, but part of a structure?
>
> Problem: "Given a structure, and a decomposition of the structure,
> highlight each part with a different color"
> The decomposition is always in the form of 1 SMILES and n SMILES FRAGMENTS
> The "smiles fragments" are noted with an asterisk in the "connection
> bonds".
>
> For example:
> mol: CCC=C
> decomposition:  C*   CC*=C
>
> For a human it takes nothing to spot "who is who", but how would you
> approach it?
>
> - I cannot match the SMARTS "C=": it's not a valid SMARTS
> - I cannot match it without the broken bonds: I would lose the difference
> between C* and C=*
> - I cannot match it like it is: the asterisks will match the first atom of
> the other fragment. (Maybe is there a way to get which part matched with
> who? In that case I could remove the atom matching the asterisk...)
>
> Maybe there is an easy way to represent this pattern 'C=' in SMARTS, but
> the daylight manual is not clear about it. Or maybe I'm just too lazy to
> get it
>
> In other words: is it possible to write n SMARTS that together match the
> whole structure (all the atoms and all the bonds, with no overlapping and
> no gaps)? Because if the SMARTS must be a complete structure (without
> "unbonded" bonds), that's actually not possible.
> Thank you
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] using GetNumConjGrps and similar functions

2021-09-28 Thread Ivan Tubert-Brohman
Hi German,

GetNumConjGrps is not a function of the Chem module, but a method of the
ResonanceMolSupplier class. You have to create a resonance mol supplier
object first, for example:

>>> supp = Chem.ResonanceMolSupplier(mol)


>>> supp.GetNumConjGrps()

2

Hope this helps,
Ivan

On Tue, Sep 28, 2021 at 3:38 PM German Barcenas <
germanbarce...@u.boisestate.edu> wrote:

> Hello,
>
> I am trying to use:
> https://www.rdkit.org/docs/source/rdkit.Chem.rdchem.html#rdkit.Chem.rdchem.ResonanceMolSupplier.GetNumConjGrps
>
> But I keep getting this error message:
>
> module 'rdkit.Chem' has no attribute 'GetNumConjGrps
>
>
> I'd like to use this to find the conjugated group number as well as their 
> group size ( amount of bonds) to parse some structures.
>
>
> Here's a snippet of my code
>
>
> from rdkit import Chem
>
>
>
> conjchaintest='c1c1C=C-C(CC=C-C=Cc2ccc2)CC-C=CCC=C-C=C-C=C-C=C-C(c3c3)'
> m=Chem.MolFromSmiles(conjchaintest)
> originalSanitized=Chem.SanitizeMol(m,catchErrors=True)
> res=Chem.ResonanceMolSupplier(m)
> groups=Chem.GetNumConjGrps(ResonanceMolSupplier(m))
>
> And from here the error is given. I am using RDKit:2021.03.2. I'm not
> sure what the issue is here because the documentation says that
> GetNumConjGrps is a part of the Chem module. Am I calling it wrong?
>
> Thanks,
>
> --
> German Barcenas
> Ph.D Candidate | Materials Science and Engineering
> Materials Modeling and Theory Group
> Boise State University
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetSubstructMatch bug? + mol depiction issue

2021-11-04 Thread Ivan Tubert-Brohman
That does seem like a bug. You can also see it without involving
DeleteSubstructs, by starting from different SMILES representations of the
same molecule:

>>> m1 = Chem.MolFromSmiles('FC12C31C32F')


>>> m2 = Chem.MolFromSmiles('C12C31C32')


>>> m3 = Chem.MolFromSmiles('C1CC2C3C(C1)C23')

>>> Chem.MolToSmiles(m2) == Chem.MolToSmiles(m3)

True

>>> m1.GetSubstructMatch(m2)

(1, 2, 3, 4, 5, 6, 7)

>>> m1.GetSubstructMatch(m3)

()

Note that if you parse the problem SMILES as a SMARTS, you do get a match:

>>> m4 = Chem.MolFromSmarts('C1CC2C3C(C1)C23')


>>> m1.GetSubstructMatch(m4)

(4, 3, 2, 1, 6, 5, 7)

Another interesting bit is that while the Inchis of m2 and m3 are also the
same, the conversion produces a warning about stereochemistry:

>>> Chem.MolToInchi(m2) == Chem.MolToInchi(m3)

[18:26:48] WARNING: Omitted undefined stereo
[18:26:48] WARNING: Omitted undefined stereo
True

Ivan

On Wed, Nov 3, 2021 at 3:59 PM Ling Chan  wrote:

> Dear colleagues,
>
> I have a molecule "FC12C31C32F". I stripped it of the F's, and tried
> to do a GetSubstructMatch. It worked. But if I reconstruct the stripped
> molecule from a smiles string, it does not. Please see attached.
>
> I suppose some info is lost when you reconstruct the stripped core from a
> smiles string. But still, I would think it should match anyway.
>
> Another issue is that the 2D depiction has the left most carbons lying
> exactly on top of each other, creating a false impression. A better
> depiction would be like the second attached image. (Not sure if this is
> easy to fix though.)
>
> Thank you for you attention.
>
> Ling
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-03-01 Thread Ivan Tubert-Brohman
A minor correction: [H] by itself *is* valid and means a hydrogen atom. The
Daylight docs say as much in section 4.1. But in other contexts it means a
hydrogen count, so to be safe, always using #1 to mean a hydrogen atom can
be a good practice.

If you are ever in doubt about how RDKit is interpreting a SMARTS,
I recommend making use of the DescribeQuery function which provides a tree
representation of a query atom or bond. For example (comments added):

>>> mol = Chem.MolFromSmarts('[H][N,H][N,#1]')


>>> print(mol.GetAtomWithIdx(0).DescribeQuery())  # [H]

AtomAtomicNum 1 = val  # [H] interpreted as a hydrogen atom

>>> print(mol.GetAtomWithIdx(1).DescribeQuery())  # [N,H]

AtomOr
  AtomType 7 = val
  AtomHCount 1 = val  # H interpreted as a hydrogen count
# Overall query atom means "an aliphatic nitrogen OR (any atom with one
hydrogen)!

>>> print(mol.GetAtomWithIdx(2).DescribeQuery())   # [N,#1]

AtomOr
  AtomType 7 = val
  AtomAtomicNum 1 = val  # "#1" is atomic number, therefore a hydrogen atom.
# Overall query atom means "an aliphatic nitrogen OR a hydrogen"

One non-obvious convention in the DescribeQuery output is that AtomType
implies aliphatic when the value is a normal atomic number, or aromatic if
the atomic number is offset by 1000. For example, [n] is "AtomType 1007".

Hope you find this approach useful in the future.

Ivan

On Tue, Mar 1, 2022 at 6:33 AM David Cosgrove 
wrote:

> Hi Adam
> There are a number of issues here.  The key one, I think, is a
> misunderstanding about the meaning of H in SMARTS.  It means "a single
> attached hydrogen", and is a qualifier for another atom, it cannot be used
> by itself.  So [*H] is valid, [H] isn't.  See the table at
> https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html.  If you
> want to refer to an explicit hydrogen, you have to use [#1].  However, that
> will only match an explicit hydrogen in the molecule, not an implicit one.
> Thus c[#1] doesn't match anything in c1c1.  If you have read in a
> molecule from a molfile, for example, that has explicit hydrogens then you
> will be ok.
>
> Further to that, your SMARTS strings, at least as they have appeared in
> gmail, which may have garbled them, are incorrect.  In S1, the brackets
> round [N,n,H] make it a substituent, so it will not match the indole
> nitrogen.  Also, it would probably be better as [N,n;H], which would be
> read as "(aliphatic nitrogen OR aromatic nitrogen) AND 1 attached
> hydrogen."  The [N,n,H] will match a methylated indole nitrogen which I
> imagine is not what you want. Similar remarks apply to S2.  A SMARTS that
> matches both 6CI and PCT
> is [C,c]1(Cl)[C,c][C,c;H][C,c]([C,c])[C,c;H][C,c]1, but that won't match
> the H atoms themselves if you want to use them in the overlay, and it also
> won't work in the aliphatic case of, for example, ClC1CCC(C)CC1 because
> there the carbon atoms have 2 attached hydrogens.   If you really do want
> it to match aliphatic cases as well, then you will need something
> like 
> [C,c]1(Cl)[$([CH2]),$([cH])][$([CH2]),$([cH])][C,c]([C,c])[$([CH2]),$([cH])][$([CH2]),$([cH])]1
> which is quite a mouthful.  The carbons at the 2,3,5 and 6 positions on the
> ring are specified as either [CH2] or [cH].
>
> Jupyter notebook can be really useful for debugging SMARTS patterns like
> this.  The one I used was variations of
> ```
> from rdkit import Chem
> from IPython.display import SVG
> mol = Chem.MolFromSmiles('C1=CC(=CC2=C1C=CN2C)Cl')
> qmol = Chem.MolFromSmarts('[C,c]1(Cl)[C,c][C,c][C,c]([C,c])[C,c][C,c]1')
> print(mol.GetSubstructMatches(qmol))
> mol
> ```
> which prints the numbers of the matching atoms and also draws the molecule
> with the match highlighted.
> Regards,
> Dave
>
>
> On Tue, Mar 1, 2022 at 1:43 AM Adam Moyer  wrote:
>
>> Hello,
>>
>> I have a baffling case where I am trying to match substructures on two
>> ligands for the goal of aligning them.
>>
>> I have two ligands; one is a 6-chloroindole (6CI) and the other is a
>> para-chloro toluene (PCT).
>>
>> I am attempting to use the following SMARTS (S1) to match
>> them: '[C,c]1(Cl)[C,c][C,c]*([N,n,H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'.
>> For some reason S1 only finds a match in 6CI.
>>
>> When I use the following SMARTS (S2) I only match to PCT as expected:
>> '[C,c]1(Cl)[C,c][C,c]*([H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'.
>>
>> How can S1 not match PCT? S1 is strictly a superset of S2 because I am
>> using the "or" operation. Do I have a misunderstanding of how explicit
>> hydrogens work in RDKit/SMARTS?
>>
>> Lastly when I use the last SMARTS (S3) I am able to match to both, but I
>> cannot use that smarts due to other requirements in my
>> project: '[C,c]1(Cl)[C,c][C,c][C,c]([C,c,H])[C,c]([H])[C,c]1'
>>
>> Thanks!
>> Adam
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> --
> David Cosgrove
> Freelance computational 

Re: [Rdkit-discuss] how to report SDF records for which Chem.ForwardSDMolSupplier returns None?

2022-04-14 Thread Ivan Tubert-Brohman
How about splitting the file on lines consisting of "", and then
parsing each record? If the parsing fails, you can write out the bad record
for future inspection. (This addresses the basic use case, but not the
"even better" one.)

Here's a proof of concept:

from rdkit import Chem

def read_record(fh):
lines = []
for line in fh:
lines.append(line)
if line.rstrip() == '':
return ''.join(lines)

def read_records(fh):
while True:
rec = read_record(fh)
if rec is None:
return
yield rec

sup = Chem.SDMolSupplier()
with open('x.sdf') as fh:
for rec in read_records(fh):
sup.SetData(rec)
mol = next(sup)
if mol is None:
print("Bad record:\n", rec)
continue
print(mol.GetPropsAsDict())

I worry that this is not strictly correct, because what if the value of a
property happens to be ""? But apparently RDKit's own SDMolSupplier is
also confused by this (or maybe such values are forbidden by the file
format and/or there's some escape mechanism? I haven't checked), so I don't
feel nearly as bad about that.

Ivan

On Wed, Apr 13, 2022 at 4:29 PM Giovanni Tricarico <
giovanni.tricar...@glpg.com> wrote:

> Hello,
>
> I am using rdkit to read data from SD files.
>
>
>
> My goal is to extract both the molecules and their associated properties
> (which for our purposes are separate entities) from the SDF.
>
> [For 100% clarity: by ‘properties’ I don’t mean calculated properties or
> atom or bond properties, but the text properties that were saved in the SDF
> with each molecule, i.e. those that you get when you do
> mol.GetPropsAsDict() ].
>
>
>
> After several tests I found that Chem.ForwardSDMolSupplier does what I
> need.
>
>
>
> But there is an issue.
>
> When Chem.ForwardSDMolSupplier decides that a molecule is not OK, i.e.
> when it says it is None, the SDF record is lost:
>
> I cannot access its Props; I cannot save the failed SDF record for later
> inspection.
>
> [Or at least, I don’t know how to do it, hence this question].
>
> At most I can collect the indices of the records that fail.
>
>
>
> > Would anyone be able to suggest how to save to a text file (which an SDF
> essentially already is) the SDF records for which
> Chem.ForwardSDMolSupplier returns a None?
>
> > Even better, could the properties associated to the failed molecules be
> read independently? In theory the properties are in a separate part of the
> CTAB, so even when the atoms, bonds, etc. have a problem, the properties
> might still be OK.
>
>
>
> (Note: PandasTools.LoadSDF gives the same issue, it does not even store
> in the DataFrame the records for which the molecule is None, and in any
> case it cannot be used with the kind of SDF’s I am handling, as it uses an
> enormous amount of memory for the molecules – hence the decision to use
> Chem.ForwardSDMolSupplier and pickle the molecules as soon as they are
> read).
>
>
>
> Thanks
> This e-mail and its attachment(s) (if any) may contain confidential and/or
> proprietary information and is intended for its addressee(s) only. Any
> unauthorized use of the information contained herein (including, but not
> limited to, alteration, reproduction, communication, distribution or any
> other form of dissemination) is strictly prohibited. If you are not the
> intended addressee, please notify the originator promptly and delete this
> e-mail and its attachment(s) (if any) subsequently. Neither Galapagos nor
> any of its affiliates shall be liable for direct, special, indirect or
> consequential damages arising from alteration of the contents of this
> message (by a third party) or as a result of a virus being passed on.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS pattern

2022-06-07 Thread Ivan Tubert-Brohman
Hi Eduardo,

I believe the problem is that r6 means "in *smallest* SSSR ring of size
", where "smallest" in this context means that, for example, for an atom
at the ring fusion between a 5-member ring and a 6-member ring, r5 would
match that atom but r6 wouldn't.

Perhaps using x3 instead (means "number of ring bonds") would work for your
purposes?

Hope this helps,
Ivan


On Tue, Jun 7, 2022 at 1:22 PM Eduardo Mayo 
wrote:

> Greetings!!
>
> I hope this email finds you well.
>
> I need a SMARTS pattern that matches this molecule fragment
> [image: image.png]
> The first pattern I used was:
> [*;R2]~1~[*;R2]~[*;R2]~[*;R2]~[*;R2]~[*;R2]~1
>
> However, it also matches this fragment. This is not the expected behavior
> but it agrees with the pattern, so I tried adding the ring size constrain.
> [image: image.png]
> Now the pattern I am using is this:
> [*;R2r6]~1~[*;R2r6]~[*;R2r6]~[*;R2r6]~[*;R2r6]~[*;R2r6]~1
>
> It worked quite well but now it fail to find matches in this molecule
> [image: image.png]
>
> Does anyone know what I am doing wrong??
>
> Code:
> ---
>
> m1 = Chem.MolFromSmiles(
> "c1ccc2cc3c(ccc4c5c5c5cc6c7cc8c(cc7c6cc5c34)c3cccnc38)cc2c1")
> m2 = Chem.MolFromSmiles(
> "b12c1c1c(c3ccc4ccc4c3c3c4c5cc[nH]c5c4c13)c1ncc3c3c21")
> m3 = Chem.MolFromSmiles(
> "b1ccbc2c1c1ccoc1c1c2c2ccsc2c2[nH]c3ncc4c(c3c21)=c1n1=4")
>
> p = Chem.MolFromSmarts("[*;R2]~1~[*;R2]~[*;R2]~[*;R2]~[*;R2]~[*;R2]~1")
> for m, expected_value in zip([m1,m2,m3],[1,2,2]):
> print(len(m.GetSubstructMatches(p)) == expected_value)
>
>
> p = Chem.MolFromSmarts(
> "[*;R2r6]~1~[*;R2r6]~[*;R2r6]~[*;R2r6]~[*;R2r6]~[*;R2r6]~1")
> for m, expected_value in zip([m1,m2,m3],[1,2,2]):
> print(len(m.GetSubstructMatches(p)) == expected_value)
>
> All the best,
> Eduardo
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS pattern

2022-06-07 Thread Ivan Tubert-Brohman
On Tue, Jun 7, 2022 at 1:39 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Perhaps using x3 instead (means "number of ring bonds") would work for
> your purposes?
>

Nevermind, x3 won't exclude the fused 4-atom rings from your first example.
I'll let you know if I think of some other way. :-)
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] CalcNumAtoms import error

2022-07-14 Thread Ivan Tubert-Brohman
Hi Chris,

Please try a more recent version of RDKit. I believe this function was
added in the 2021.09 release.

Hope this helps,
Ivan


On Thu, Jul 14, 2022 at 7:04 AM Chris Swain via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hi,
>
> If I try
>
> from rdkit.Chem.rdMolDescriptors import CalcNumAtoms
>
> I get
>
> cannot import name 'CalcNumAtoms' from 'rdkit.Chem.rdMolDescriptors'
>
> I can import a range of other descriptors fine
>
> Using Python 3.7.6 and RDKit 2020.09.1
>
> Cheers
>
> Chris
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Enumerate Torsion angles

2022-10-19 Thread Ivan Tubert-Brohman
Hi Rohit,

Could you attach a complete example? I took the script from the email you
refer to, only edited the line that says mol = Chem.MolFromSmiles('CC')
to make it say mol = Chem.MolFromSmiles('CC'), and when I run it I get nine
torsions:

(2, 0, 1, 5)
(2, 0, 1, 6)
(2, 0, 1, 7)
(3, 0, 1, 5)
(3, 0, 1, 6)
(3, 0, 1, 7)
(4, 0, 1, 5)
(4, 0, 1, 6)
(4, 0, 1, 7)

Ivan

On Wed, Oct 19, 2022 at 3:36 AM Rohit Modee 
wrote:

> Hi,
>
> I am using enumeratetorsion angle function from this post
> https://sourceforge.net/p/rdkit/mailman/message/34554615/
> Re: [Rdkit-discuss] detect dihedral angles in a conformation
> 
> Re: [Rdkit-discuss] detect dihedral angles in a conformation Open-Source
> Cheminformatics and Machine Learning
> sourceforge.net
>
>
> The problem is if we provide Ethane or propane, it misses a few dihedrals.
>
> below for ethane, it misses atoms 6 and 5. No dihedrals for them.
>
>
> Similarly, for propane, it misses atom 11 and 12 dihedrals.
>
> What I expect it to have is say 11-3-2-1 and 12-3-2-1.
>
>
> Is there a way to get those missing dihedrals.
>
> Regards,
> Rohit Modee
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reactions with benzene rings

2022-09-23 Thread Ivan Tubert-Brohman
Hi Fernando,

What happens is that atoms on the left hand side of the reaction template
get deleted unless they have mapping numbers (and everything else they were
attached to that becomes unreachable from the mapped atoms is gone as
well). Atoms on the right hand side without mapping numbers are new atoms.

In other words, your reaction as written will delete the five atoms that
complete the aromatic ring (and everything connected to them), and five new
atoms will be added in their place.

The solution is either to add mapping numbers to these five atoms, or make
sure they are not matched by the SMARTS (you can use recursive smarts if
you need to ensure that c:3 is part of a 6-aromatic-carbon ring without
actually matching the atoms.)

Hope this helps,
Ivan

On Fri, Sep 23, 2022 at 2:26 PM Fernando Schimidt 
wrote:

> Hello guys!
>
> I'm trying to understand the applications of chemical reactions using
> RDKit and the SMARTS concepts. I am testing, as an example, oxidation of
> aromatic rings:
>
> rxn = AllChem.ReactionFromSmarts(
> '[#6:1]-[#8:2]-[c:3]1c1>>[#8:2]-[c:3]1-cc-cc-c1')
>
>
>
> If I understand correctly, the above reaction is limited to always using
> benzene as a reagent. When I use naphthalene or anthracene as a reagent,
> the product is always simple phenol. How do I make the reaction more
> general?
>
> Thanks for any help!
>
> Best regards
>
> Fernando
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Accessing CXSMILES information in the rdchem.Mol object

2022-11-08 Thread Ivan Tubert-Brohman
Hi Lauren,

The enhanced stereochemistry is available, not as atom properties, but as
"stereo groups" of the Mol object. For example,

>>> mol = Chem.MolFromSmiles('C[C@H]1CCCNC1 |&1:1,r|')

>>> for group in mol.GetStereoGroups():
print([group.GetGroupType(),
   [atom.GetIdx() for atom in group.GetAtoms()]])

[rdkit.Chem.rdchem.StereoGroupType.STEREO_AND, [1]]

So in this case you have a single "AND" group consisting of only one atom,
1.

Hope this helps,
Ivan

On Tue, Nov 8, 2022 at 11:46 AM Lauren Reid 
wrote:

> Hi RDKit users,
>
> I'm using RDKit to read in CXSMILES with enhanced stereochemistry
> information using:
>
> from rdkit import Chem
> parser_params = Chem.SmilesParserParams()
> parser_params.allowCXSMILES = True
> parser_params.strictCXSMILES = True
>
> mol = Chem.MolFromSmiles("C[C@H]1CCCNC1 |&1:1,r|", parser_params)
>
>
> I can’t find in the docs how I can access the enhanced stereochemistry
> information associated with the rdchem.Mol. Does anyone know if it is
> possible to iterate over the atoms in mol and receive the enhanced
> stereochemistry labels associated with each atom? E.g, Is there an atom
> property that stores the “&” label for atom 1?
>
> Thanks in advance for any help.
>
> Lauren
>
> Dr Lauren Reid
> Computational Chemist / Developer
> MedChemica Ltd
>
> Medchemica Ltd is a company registered in England and Wales with company
> number 8162245
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reaction involving aromatic atoms

2023-06-21 Thread Ivan Tubert-Brohman
Hi Michal,

A key point to consider is that the default bond order in SMARTS is not
single, but "single or aromatic". If you really want to match single bonds
only, you can specify a single bond with "-".

However, it sounds as if you actually expect aromatic bonds to match as
well, since you expect Cc1c1C to become CC=CC=CC=CC; but only bonds
that would show up as "single" on a Kekule structure? You _could_ do that
if you kekulize the molecule first (Chem.Kekulize(mol)), but the problem
there is that there is more than one Kekule structure and you'll only get
one of them, where the bond that you were thinking of breaking in this
example might be double.

Regarding "aromatic" non-ring atoms on the products, I think sanitizing the
products might help. (See Chem.SanitizeMol).

Hope this helps,
Ivan

‪On Wed, Jun 21, 2023 at 5:30 AM ‫מיכל רוט‬‎ 
wrote:‬

> Hello,
> I am trying to run reactions using RunReactants.
> I want to define a general reaבtion involving braking bond in a chain of
> one-single bonded carbons: '[#6:1][#6:2][#6:3][#6:4] >>
> ([#6:1][#6:2].[#6:3][#6:4])'.
> When the reaction does not involve any aromatic carbons all works fine,
> bun when I hava a reactant with aromatic carbons I get strange things..
> For example - if I use the following structure as reactant - 'Cc1c1C'
> I expect to get after the reaction only one product - 'CC=CC=CC=CC'.
> But it seems that any chain in the aromatic ring recognized as one-single
> bonded carbons and I get many other fragments( - after converting the
> pruducts to smiles format using Chem.MolToSmiles):
> ['CccC',
>  'c(C)cC',
>  'C',
>  'Cc1c-1',
>  'c(C)CC',
>  '(C)c(C)C',
>  'C',
>  'Cc1c-1',
>  'c(C)CC',
>  '(C)c(C)C',
>  'CccC',
>  'c(C)cC',
>  'CcCC',
>  'C',
>  'Cc1c-1',
>  'ccc(C)c(C)cC',
>  'C(C)cC',
>  'cc(C)c(C)ccC',
>  'cc(C)c(C)ccC',
>  'C(C)cC',
>  'ccc(C)c(C)cC',
>  'C',
>  'Cc1c-1',
>  'CcCC']
>  More over, atoms are marked as aromatic eventhough they are not anymore.
>  When I try to change it back to mol object (using Chem.MolFromSmiles) I
> get an error message:
>  "non-ring atom 1 marked aromatic"
>
> I will be greatfull if anyone has an idea for solving this problem without
> specifing the aromatic ring in the reaction (I want to keep the reaction
> general so it could happen in any place in the molecule that has a chain of
> 3 single bonds).
>
> Thank you!
> Michal
>
> my code:
>
> def flatten(lst):  # take list of lists and convert them to one list.
> return sum(([x] if not isinstance(x, list) else flatten(x)
> for x in lst), [])
>
> def flat_children(children):
> return flatten(children)
>
> def mol_to_smile(MolList):  # convert mol to smiles
> smiles = []
> for x in range(len(MolList)):
> new_smile = Chem.MolToSmiles(MolList[x])
> new_smiles = split_smile(new_smile)
> for n_s in new_smiles:
> smiles.append(n_s)
> return smiles
>
> def split_smile(smile):
> """The function split two fragments in one smile into two smiles"""
> smile = smile.split(".")
> return(smile)
>
> smile = 'CC1=CC=CC=C1C'
> mol = Chem.MolFromSmiles(smile)
> rxn = AllChem.ReactionFromSmarts('[#6:1][#6:2][#6:3] >>
> ([#6:1][#6:2].[#6:3])')
> products = rxn.RunReactants((mol,))
> mols = []
> for product in products:
> mols.append([x for x in product]) # converts the products from tuples
> (products) to list (mols)
> mols = flat_children(mols)# return [Mol, Mol, ..]
> smiles = mol_to_smile(mols)
> print(smiles)
>
> for smile in smiles:
> m = Chem.MolFromSmiles(smile)
> img = Draw.MolToImage(m)
> img.show()
>
>
> output:
>
> ['CccC',
>  'c(C)cC',
>  'C',
>  'Cc1c-1',
>  'c(C)CC',
>  '(C)c(C)C',
>  'C',
>  'Cc1c-1',
>  'c(C)CC',
>  '(C)c(C)C',
>  'CccC',
>  'c(C)cC',
>  'CcCC',
>  'C',
>  'Cc1c-1',
>  'ccc(C)c(C)cC',
>  'C(C)cC',
>  'cc(C)c(C)ccC',
>  'cc(C)c(C)ccC',
>  'C(C)cC',
>  'ccc(C)c(C)cC',
>  'C',
>  'Cc1c-1',
>  'CcCC']
>
>  [12:06:11] non-ring atom 1 marked aromatic
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] C++ Molecular Weight

2023-05-15 Thread Ivan Tubert-Brohman
Hi Jarod,

Something like this should work:

#include 

#include 

#include 



#include 



int main()

{

auto mol = RDKit::SmilesToMol("CCO");

auto mw = RDKit::Descriptors::calcAMW(*mol);

std::cout << mw << "\n";

}

Hope this helps,
Ivan

On Mon, May 15, 2023 at 3:00 PM Jarod Younker 
wrote:

> Can one calculate average molecular weight on an ROMol or RWMol in the C++
> implementation of RDKit?  If yes, how?
>
> Sent from my iPhone
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss