Re: [Rdkit-discuss] mol properties in SDWriter

2023-09-29 Thread Dan Nealschneider
I'd also be curious how the index is causing you problems. All SD reading
code that I know about ignores those suffixes. If you're not using RDKit to
read the SD file, maybe it would be best to update whatever it is you
*are *using
to parse the file.

dan nealschneider | senior staff developer

*he/him/his*

[image: Schrödinger, Inc.] <https://schrodinger.com/>


On Fri, Sep 29, 2023 at 1:08 AM Andrew Dalke 
wrote:

> On Sep 26, 2023, at 01:17, Ling Chan  wrote:
> > >(1)
> > 4.099
>   ..
> > Just wonder what was the rationale behind this extra "(1)" on the
> property field lines (pKa and logP in the above example)?
> >
> > And is there a way to get rid of these? I am not sure if this extra
> "(1)" is part of the standard sd format.
>
> RDKit uses the increasing value as a sort of per-file registry number.
>
> This is follows the part of the standard which says "External registry
> numbers must be enclosed in parentheses."
>
> The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp :
>
>   if (d_molid >= 0) {
> (*dp_ostream) << "(" << d_molid + 1 << ") ";
>   }
>
> There is no way to suppress this output. No only is there no direct way to
> change the d_molid, but d_molid cannot be negative as
> Code/GraphMol/FileParsers/MolWriters.h declares it as:
>
>   unsigned int d_molid;  // the number of the molecules we wrote so far
>
>
> Wim suggested a post-processing approach. Another is to write the SD data
> items yourself, that is, use MolToMolBlock() to generate the connection
> table/molfile as a string, then iterate through the properties and generate
> the data items.
>
>
> import sys
> from rdkit import Chem
>
> def MolToSDFRecord(
> mol,
> includeStereo: bool = True,
> confId: int = -1,
> kekulize: bool = True,
> forceV3000: bool = False):
> mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize,
> forceV3000)
>
> lines = []
> for prop_name in mol.GetPropNames():
> if "\n" in prop_name or ">" in prop_name or "<" in prop_name:
> sys.stderr.write(f"WARNING: Skipping property {prop_name!r}
> because the "
>  "name includes an unsupported character.\n")
> continue
>
> prop_value = mol.GetProp(prop_name)
> if "\n" in prop_value:
> if "\n\n" in prop_value or "\r\n\r\n" in prop_value:
> sys.stderr.write(f"WARNING: Skipping property
> {prop_name!r} because the "
>  "value includes an embedded newline.\n")
> continue
> if prop_value.endswith("\r\n"):
> prop_value = prop_value[:-2]
> elif prop_value.endswith("\n"):
> prop_value = prop_value[:-1]
>
> lines.append(f"> <{prop_name}>\n{prop_value}\n\n")
>
> lines.append("\n")
>
> return mol_block + "".join(lines)
>
> mol = Chem.MolFromSmiles("CCO")
> mol.SetProp("pKa","3.3\r\n")
> print(MolToSDFRecord(mol))
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS: "NOT Hydrogen" wildcard?

2023-01-30 Thread Dan Nealschneider
Thomas -
Check out https://www.rdkit.org/docs/RDKit_Book.html#smarts-reference. It's
always hard to make a SMARTS expression that works for both implicit
hydrogen and explicit hydrogen ROMols. I think that you probably want:

[#6d1]


Which will match any carbon with non-hydrogen degree one, including
terminal methyls, and also =C and ≡C. If you also want the neighbor:

[!#0][#6d1]


If you want terminal methyls only:

[!#0][#6H3]


Good luck!

dan nealschneider | senior staff developer

*he/him/his*

[image: Schrödinger, Inc.] <https://schrodinger.com/>


On Mon, Jan 30, 2023 at 8:48 AM Greg Landrum  wrote:

> Hi Thomas,
>
> * in SMARTS just means "any atom".
> [!H], for historical reasons, means "and atom without a single Hydrogen"
> (i.e. it matches CH2 and CH3, but not CH)
> You want [!#0], that is "not hydrogen"
>
> -greg
>
>
> On Mon, Jan 30, 2023 at 5:40 PM Thomas  wrote:
>
>> I thought that the wildcard * would match any atom except hydrogen, but
>> that's true unless hydrogens are explicit in the molecule
>>
>> I have some patterns in the form of SMILES with wildcards and implicit
>> hydrogens. For example C* means "terminal carbons" only.
>> (" * "  stands for any atom except hydrogen)
>>
>> I want to transform this SMILES in SMARTS, if I just write:
>>
>> smarts = rdkit.MolFromSmarts('*C')
>>
>> the smarts I get matches any C with AT LEAST one non-hydrogen bond (not
>> EXACTLY one).
>>
>> If I add explicit hydrogens to the smarts (and to the molecules to be
>> tested)
>>
>> smartsH = rdkit.AddHs(smarts)
>> rdkit.MolToSmiles(smartsH)
>> '*C([H])([H])[H]'
>>
>> I get this pattern where the wildcard matches ANY atom including hydrogen
>> (it matches with the single carbon atom).
>>
>> Basically I am trying to get the SMARTS *C[H3] starting from the
>> respective SMILES *C. Is there a way?
>>
>> I've already tried to replace the * with a [!H] (NOT hydrogen) with no
>> luck.
>> Thanks to anyone :)
>> Thomas
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] What is the recommended 3D-sensitive file format to use with RDKit?

2022-06-16 Thread Dan Nealschneider
François-
I've been writing code for a couple of toolkits (rdk, schrodinger, and
canvas), and my personal opinion is that mol2 format should be avoided when
possible.

In mol2, as far as I can tell, formal charges and valence (and Kekulé form)
depend on a scheme of atom types that is large but not exhaustive. Every
time a tool reads or writes a mol2 file, it is guessing about the user's
intent and guessing what atom types are available to the rest of the user's
workflow. These problems are compounded for molecules with implicit
hydrogens: negative charges and implicit hydrogens seem difficult to
distinguish in a mol2 file. It also does not have official ways to
represent elements that are not included in the standard type system (like
silver or manganese), but many toolkits have local extensions to cover
these elements.

Some people use mol2 because it has per-atom properties, but so do .cif and
.mae.

I'm not speaking for Schrödinger, but this *is* what I tell our internal
developers.

dan nealschneider | senior staff developer

*he/him/his*

[image: Schrödinger, Inc.] <https://schrodinger.com/>


On Thu, Jun 16, 2022 at 7:42 AM Greg Landrum  wrote:

> Hi Francois,
>
> Yes, I would recommend SDF, the V3000 version if possible. The xyz format
> is problematic because it doesn’t have bonds. There is still a way to kind
> of work with that together with SMILES or sdf though:
>
> https://mattermodeling.stackexchange.com/questions/7234/how-to-input-3d-coordinates-from-xyz-file-and-connectivity-from-smiles-in-rdkit
>
> As for mol2: the RDKit has some support, but only for the corina version
> of mol2. I would avoid that format if at all possible
>
>
> -greg
>
> On Thu, 16 Jun 2022 at 13:58, Francois Berenger  wrote:
>
>> Hi all,
>>
>> I assume it's ".sdf".
>>
>> But, do we have good support for ".xyz" also?
>>
>> In addition, what about RDKit's support of ".mol2" these days?
>>
>> Regards,
>> F.
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SGroup information in SD files

2022-05-24 Thread Dan Nealschneider
Thomas -
It looks like SubstanceGroups aren't currently creatable from Python - only
copyable:

from rdkit import Chem
m = Chem.MolFromSmiles('C')
Chem.SubstanceGroup(m, 'Dat')

---
RuntimeError  Traceback (most recent call
last)
 in 
> 1 Chem.SubstanceGroup(m, 'Dat')

RuntimeError: This class cannot be instantiated from Python

I have some example code where we, for example, add bracket coordinates to
an existing sgroup in Python if that would be helpful.



dan nealschneider | senior staff developer

*he/him/his*

[image: Schrödinger, Inc.] <https://schrodinger.com/>


On Tue, May 24, 2022 at 1:42 PM  wrote:

> Hi,
>
>
>
> how would I get the SGroups information into a ROMol so that this is
> output when I write the ROMol to a SDFile?
>
> I have seen the SubstanceGroup class, but haven’t found any example how to
> use this in a python context.
>
>
>
> Thanks,
>
> Th.
>
> Mit freundlichen Grüßen / Kind regards,
> Dr. Thomas Fox
>
> Boehringer Ingelheim Pharma GmbH & Co. KG
> Research Sites
> Tel.: +49 (7351) 54-7585
> Fax: +49 (7351) 83-7585
> mailto:thomas@boehringer-ingelheim.com
> 
>
> Pflichtangaben finden Sie unter:
> https://www.boehringer-ingelheim.de/unser-unternehmen/gesellschaften-in-deutschland
> Mandatory information can be found at:
> https://www.boehringer-ingelheim.de/unser-unternehmen/gesellschaften-in-deutschland
>
> *Datenschutzhinweis*: Klicken Sie *hier
> <https://www.boehringer-ingelheim.de/unternehmensprofil/datenschutz>*, um
> weitere Informationen auf der lokalen Unternehmensinternetseite des
> betreffenden Landes über Datenschutz bei Boehringer Ingelheim und zu Ihren
> Rechten zu erhalten.
>
> *Privacy Notice*: Click *here
> <https://www.boehringer-ingelheim.com/locations/europe>* for more
> information on the local company website of the respective country about
> data protection at Boehringer Ingelheim and your rights.
>
> Diese E-Mail ist vertraulich zu behandeln. Sie kann besonderem rechtlichem
> Schutz unterliegen. Wenn Sie nicht der richtige Adressat sind, senden Sie
> bitte diese E-Mail an den Absender zurück, löschen die eingegangene E-Mail
> und geben den Inhalt der E-Mail nicht weiter. Jegliche unbefugte
> Bearbeitung, Nutzung, Vervielfältigung oder Verbreitung ist verboten. /
> This e-mail is confidential and may also be legally privileged. If you
> are not the intended recipient please reply to sender, delete the e-mail
> and do not disclose its contents to any person. Any unauthorized review,
> use, disclosure, copying or distribution is strictly prohibited.
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] issue with V3000 SD files containing enhanced stereochemistry information

2022-04-04 Thread Dan Nealschneider
Giovanni -
Thanks for reporting this! I've heard a report like this, but in that
instance the reporter wasn't able to share the structure. Would you mind
creating an issue on the RDKit github issues page and attaching your
problematic structure? Please add it as an attachment - my hunch from those
other reports was that this is something fishy with whitespace characters.

Thanks!


dan nealschneider | senior staff developer

*he/him/his*

[image: Schrödinger, Inc.] <https://schrodinger.com/>


On Mon, Apr 4, 2022 at 11:02 AM Giovanni Tricarico <
giovanni.tricar...@glpg.com> wrote:

> Hello,
>
> I am trying to process V3000 MolBlock’s from some SD files, and I seem to
> encounter issues when enhanced stereochemistry information is present,
> depending on the source of the SD file.
>
>
>
> To test that the molecule to SDF and back conversion within rdkit was
> working OK, I ran this code:
>
>
>
> import pandas as pd
>
> from rdkit import Chem
>
> from rdkit.Chem import Draw
>
> from rdkit.Chem import PandasTools
>
>
>
> # 1. convert to molecule a CXSMILES with encoded enhanced
> stereochemistry
>
> m = Chem.MolFromSmiles('O=C(NC[C@@H]1CC[C@H](C2=CC=CC=C2)O1)N[C@
> @H]1COC[C@@H]1O |&1:4,7,&2:16,20|')
>
>
>
> # check that the V3K molblock contains the enhanced stereochemistry
> information
>
> print(Chem.MolToV3KMolBlock(m))
>
>
>
> # 2. write the molecule to an SDF
>
> writer = Chem.SDWriter('m_with_enh_stereo.sdf')
>
> writer.SetForceV3000(True)
>
> writer.write(m)
>
> writer.close()
>
>
>
> # 3. read the molecule back into a list ms
>
> with Chem.SDMolSupplier('m_with_enh_stereo.sdf') as SDF:
>
> ms = [m for m in SDF if m is not None]
>
>
>
> # check that the V3000 molblock is OK
>
> print(Chem.MolToV3KMolBlock(ms[0]))
>
>
>
> This worked well.
>
> The content of the SD file made by this script ('m_with_enh_stereo.sdf')
> was:
>
>
>
>
>
>  RDKit  2D
>
>
>
>   0  0  0  0  0  0  0  0  0  0999 V3000
>
> M  V30 BEGIN CTAB
>
> M  V30 COUNTS 22 24 0 0 0
>
> M  V30 BEGIN ATOM
>
> M  V30 1 O 7.414605 -6.052405 0.00 0
>
> M  V30 2 C 6.201079 -6.934083 0.00 0
>
> M  V30 3 N 4.830761 -6.323978 0.00 0
>
> M  V30 4 C 4.673969 -4.832195 0.00 0
>
> M  V30 5 C 3.303650 -4.222090 0.00 0
>
> M  V30 6 C 2.004612 -4.972090 0.00 0
>
> M  V30 7 C 0.889895 -3.968394 0.00 0
>
> M  V30 8 C 1.50 -2.598076 0.00 0
>
> M  V30 9 C 0.75 -1.299038 0.00 0
>
> M  V30 10 C 1.50 0.00 0.00 0
>
> M  V30 11 C 0.75 1.299038 0.00 0
>
> M  V30 12 C -0.75 1.299038 0.00 0
>
> M  V30 13 C -1.50 0.00 0.00 0
>
> M  V30 14 C -0.75 -1.299038 0.00 0
>
> M  V30 15 O 2.991783 -2.754869 0.00 0
>
> M  V30 16 N 6.357872 -8.425866 0.00 0
>
> M  V30 17 C 7.728190 -9.035971 0.00 0
>
> M  V30 18 C 9.027228 -8.285971 0.00 0
>
> M  V30 19 O 10.141946 -9.289667 0.00 0
>
> M  V30 20 C 9.531841 -10.659985 0.00 0
>
> M  V30 21 C 8.040058 -10.503192 0.00 0
>
> M  V30 22 O 7.036362 -11.617910 0.00 0
>
> M  V30 END ATOM
>
> M  V30 BEGIN BOND
>
> M  V30 1 2 1 2
>
> M  V30 2 1 2 3
>
> M  V30 3 1 3 4
>
> M  V30 4 1 5 4 CFG=3
>
> M  V30 5 1 5 6
>
> M  V30 6 1 6 7
>
> M  V30 7 1 8 7 CFG=3
>
> M  V30 8 1 8 9
>
> M  V30 9 2 9 10
>
> M  V30 10 1 10 11
>
> M  V30 11 2 11 12
>
> M  V30 12 1 12 13
>
> M  V30 13 2 13 14
>
> M  V30 14 1 8 15
>
> M  V30 15 1 2 16
>
> M  V30 16 1 17 16 CFG=3
>
> M  V30 17 1 17 18
>
> M  V30 18 1 18 19
>
> M  V30 19 1 19 20
>
> M  V30 20 1 20 21
>
> M  V30 21 1 21 22 CFG=3
>
> M  V30 22 1 15 5
>
> M  V30 23 1 21 17
>
> M  V30 24 1 14 9
>
> M  V30 END BOND
>
> M  V30 BEGIN COLLECTION
>
> M  V30 MDLV30/STERAC1 ATOMS=(2 5 8)
>
> M  V30 MDLV30/STERAC2 ATOMS=(2 17 21)
>
> M  V30 END COLLECTION
>
> M  V30 END CTAB
>
> M  END
>
> >  <_CXSMILES_Data>  (1)
>
> |&1:4,7,&2:16,20|
>
>
>
> 
>
>
>
> Then I tried reading an SD file for the exact same molecule, made by some
> other software.
>
> The content of that SD file ('mol_with_enhanced_stereo_2_And_groups.sdf')
> was:
>
>
>
> 2 And groups, from CXSMILES
>
>   SciTegic04042214202D
>
>
>
>   0  0  0  0  0  0999 V3000
>
> M  V30 BEGIN CTAB
>
> M  V30 COUNTS 22 24 0 0 0
>
> M  V30 BEGIN A

Re: [Rdkit-discuss] Partial substructure match?

2020-11-19 Thread Dan Nealschneider
Gustavo -
That sounds like the "maximum common substructure" problem. Here's the
relevant section in RDKit's  "Getting started in Python"

https://www.rdkit.org/docs/GettingStartedInPython.html#maximum-common-substructure


*dan nealschneider* | lead developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Thu, Nov 19, 2020 at 8:50 AM Gustavo Seabra 
wrote:

> Hi all,
>
> Is it possible to search for *partial* substructure matches using RDKit?
>
> I'm aware of "HasSubstructMatch/ GetSubstructMatch", but my impression is
> that it only returns full matches (100%) of the required pattern in a
> structure.
>
> However, what I'd like to do is a bit different: Imagine I have one
> specific
> substructure (scaffold), and I'd like to search for molecules that have the
> full substructure *or part of it*, and maybe get the percentage of the
> substructure match? (100% = the full substructure is contained in the
> molecule). For example, if the pattern is a naphthalene and the molecule to
> search has a benzene, that would count as a 60% match.
>
> Is there a way to do that in RDKit?
>
> Thanks a lot!
> --
> Gustavo Seabra
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] sd file format question

2020-10-02 Thread Dan Nealschneider
In addition to Andrew's suggestions, I'd also recommend that you submit a
bug report to the maker of your other tool! They probably want to know
about this issue - I know I would if it's one of ours...

*dan nealschneider* | lead developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Fri, Oct 2, 2020 at 2:26 PM Andrew Dalke 
wrote:

> Hi Markus,
>
> > On Oct 2, 2020, at 19:56, Markus Metz  wrote:
> > I have a question to the sd file format.
> > When I write charged molecules via rdkit I noticed that the charge
> definition in the atom block is not written.
> > The charge is written at the end of the entry.
> > So far this worked perfectly fine for me.
>
>
> The ctfile documentation I have from 2011 says this of the charge
> definition in the atom block:
>
>Wider range of values in M CHG and M RAD lines below. Retained
>for compatibility with older Ctabs, M CHG and M RAD lines take
>precedence.
>
> and
>
>With Ctab version V2000, the dd and ccc fields have been
>superseded by the M ISO, M CHG, and M RAD lines in the properties
>block, described below. For compatibility, all releases since ISIS 1.0:
>
> • Write appropriate values in both places if the values
>   are in the old range.
>
> • Use the atom block fields if there are no M  ISO, M  CHG, or
>   M  RAD lines in the properties block.
>
>Support for these atom block fields might be removed in future
>releases of Symyx software.
>
> Further, I looked into this when I wrote the blog post
> http://www.dalkescientific.com/writings/diary/archive/2020/09/25/mixing_text_and_chemistry_toolkits.html
> a couple of week ago, and found the 1992 JCICS paper "Description of
> Several Chemical Structure File Formats Used by Computer Programs Developed
> at Molecular Design Limited" by Dalby et al. has the "Wider range ...
> Retained for compatibility with older Ctabs" in it.
>
> So including the charge in the atom block as well as in the properties
> block is a 28+ year old backwards compatibility practice.
>
>
> > Now, I am using a program which reads the atom block charge info only.
> > Is there a way in rdkit to enable the charge written in the atom block?
>
> No. The code in Code/GraphMol/FileParsers/MolFileWriter.cpp has it
> hard-coded to 0.
>
> > Do you have any thoughts on this?
>
> The two I can think of are:
>   - post-processing to add it back in,
>   - pass it through another toolkit which adds the duplicated charge
> information
>
>
> I've attached a program for the first of these options. The command-line
> tools reads an SDF and generates a new SDF with the "M  CHG" lines added to
> the atom block. Here's the --help:
>
> ===
> usage: set_atom_block_charges.py [-h] [--output FILENAME] [--roundtrip]
> [--verify] [--no-set] [FILENAME]
>
> copy charge information from the 'M CHG' data line to the atom block
>
> positional arguments:
>   FILENAME  input filename (default: stdin)
>
> optional arguments:
>   -h, --helpshow this help message and exit
>   --output FILENAME, -o FILENAME
> output file name (default: stdout)
>   --roundtrip   use RDKit to parse the record and regenerate the
> SDF record
>   --verify  ensure the input and output SMILES match
>   --no-set  don't set the charges (useful if you want to see
> the round-trip output)
> ===
>
> This depends on the latest commercial version chemfp to identify records
> in an SDF and to help with the verification.
>
> While chemfp is not open source, the base license lets you use this
> functionality for in-house use. (See the file for installation details; the
> pre-compiled package only installs on Linux-based OSes.)
>
> Or, you can grab set_atom_block_charges() from the code (and some code it
> depends on) so you don't need chemfp at all.
>
> In the following, I round-trip the input through RDKit but don't set the
> atom block charges:
>
> % python set_atom_block_charges.py piperidine.sdf --roundtrip --no-set
> piperidine
>  RDKit  3D
>
>   6  6  0  0  1  0  0  0  0  0999 V2000
>-1.46500.7843   -0.9210 N   0  0  0  0  0  0  0  0  0  0  0  0
> 0.06010.7265   -0.6801 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.6663   -0.3976   -1.5418 C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.0188   -1.7539   -1.2886 C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.5436   -1.6645   -1.4884 C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.1760   -0.5554   -0.6261 C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0
>   1  6  1  0
>   2  3

Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread Dan Nealschneider
In RDKit, atoms are owned by the molecule. When you ask for:

auto atom = mol->getAtomWithIdx(0);


You are asking for a pointer to memory internally owned by the
RDKit::ROMol. However:

auto mol = RDKit::SmilesToMol("");


creates a new molecule in memory, so it's your job to delete it. This is
documented here:
https://github.com/rdkit/rdkit/blob/e86e2c1d5d375c75cbd7e00871ecc1e0a29b3548/Code/GraphMol/SmilesParse/SmilesParse.h#L47.
I think that, in general, RDKit tends to document when you *do* need to
clean up after yourself.

I'd recommend one of these idioms:

ROMOL_SPTR mol1(RDKit::SmilesToMol("")); // this is a
boost::shared_ptr, requires #include 
std::unique_ptr mol2(RDKit::SmilesToMol("")); // requires
#include 



*dan nealschneider* | lead developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Thu, Aug 27, 2020 at 1:33 PM dmaziuk via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> On 8/27/2020 3:06 PM, Nils Weskamp wrote:
> > To add to this: you are looking at the wonderful concept of an
> > "undefined behavior" in C/C++. There is no guarantee that your example
> > program will always show the same behaviour.
> >
> > In more recent versions of C++, you have access to "smart pointers" like
> > std::shared_ptr, which basically implement reference counting. Not sure
> > if this would help here.
>
> It's worse: with all the boost junk they pulled in the really recent
> versions, good luck figuring out which calls pass "smart" pointers and
> which don't.
>
> There are reasons why everyone's into Rust, and the efforts of C++
> Standards Committee are behind many of them.
>
> Dima
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Count rings in bicyclic compounds

2020-03-17 Thread Dan Nealschneider
I'm not sure if this is a good candidate for the cookbook. As Ivan said,
it's a pretty dangerous function (exponentially complex) to run on
molecules of arbitrary size/complexity. You may want a path-based approach
with a cutoff for ring-size. (e.g. using
https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.unweighted.single_target_shortest_path.html#networkx.algorithms.shortest_paths.unweighted.single_target_shortest_path
)

*dan nealschneider* | lead developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Tue, Mar 17, 2020 at 6:50 AM Scalfani, Vincent  wrote:

> Hello Ivan and all,
>
>
> I found this old thread about counting rings in bicyclics and I would like
> to add it to the RDKit Cookbook, however, I'm not able to get the fuse
> function to work. I get a 'Mol object is not iterable' error.
>
>
> Any help appreciated. Thanks.
>
>
> Vin
>
>
> --
> *From:* Ivan Tubert-Brohman 
> *Sent:* Wednesday, December 5, 2018 9:06 AM
> *To:* baptiste.cana...@gmail.com
> *Cc:* RDKit Discuss
> *Subject:* Re: [Rdkit-discuss] Count rings in bicyclic compounds
>
> Hi Baptiste,
>
> RDKit focuses on "simple rings". As far as I know, it has no builtin
> function to return all possible cycles in a molecule.
>
> For a molecule with a "basis set" of N rings, there can be up to 2^N-1
> ring systems, which can be obtained by taking all possible subsets (aka the
> powerset) of rings and fusing them. Below is an implementation based on
> fusing simple rings. Another possibility would be to write an exhaustive
> ring search (DFS or BFS) of the molecular graph and report all cycles that
> are found, instead of only the simple ones.
>
> *Warning*: do not run this code on fullerenes or similar molecules unless
> you are prepared to wait for a long, long time!
>
> def all_bond_rings(mol):
> """
> Generate all ring systems for a molecule. A Ring is a set of bond
> indexes.
>
> :type mol: rdkit.Chem.Mol
> :rtype: set of int
> """
> ring_info = mol.GetRingInfo()
> rings = [set(r) for r in ring_info.BondRings()]
>
> # Truncate nrings to the basis set size because RDKit returns redundant
> # rings (e.g., 6 instead of 5 for cubane).
> nfrags = len(Chem.GetMolFrags(mol))
> nrings = mol.GetNumBonds() - mol.GetNumAtoms() + nfrags
> del rings[nrings:]
>
> for i in range(1, len(rings)+1):
> for comb in itertools.combinations(rings, i):
> fused = fuse(comb)
> if fused:
> yield fused
>
> def fuse(rings):
> """
> Return the ring system that results from fusing the given rings, if the
> rings are fusable into a single ring system; otherwise None.
>
> :type rings: list of set of int
> :rtype: set of in
> """
> pending = list(rings)
> fused = set(pending.pop())
> while pending:
> for i in range(len(pending)):
> ring = pending[i]
> if fused & ring: # rings are fused
> fused ^= ring
> del pending[i]
> break
> else:
> # None of the pending rings were fusable!
> return None
> return fused
>
> Hope this helps,
> Ivan
>
>
>
> On Wed, Dec 5, 2018 at 5:54 AM Baptiste CANAULT <
> baptiste.cana...@gmail.com> wrote:
>
>> Hi RDKiters,
>>
>> I would like to identify all cycles present in a molecular
>> structure. However, when the molecules correspond to bicyclic compounds,
>> the ring count does not correspond to the number actually observed in the
>> structure. Simple example:
>>
>> >>> m = Chem.MolFromSmiles('C1CC2CCC1O2')
>> >>> r = m.GetRingInfo()
>> >>> r.NumRings()
>> 2
>>
>> In reality, this molecular structure has 3 cycles with the cyclohexan. Am
>> I completely wrong and is there a trick to identify all the cycles present
>> in a structure?
>>
>> Thanks in advance,
>>
>> Best regards,
>>
>> Baptiste
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] compiling C++examples from "Release_2019_09_2"

2020-01-02 Thread Dan Nealschneider
Looks like you're missing  libRDKitmaeparser.so.1,
libRDKitcoordgen.so.1, libRDKitRingDecomposerLib.so.1,
and libRDKitDataStructs.so.1. Are they in a directory pointed to by your
LD_LIBRARY_PATH?

*dan nealschneider* | senior developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Thu, Jan 2, 2020 at 1:25 PM Rasmus "Termo" Lundsgaard <
termope...@gmail.com> wrote:

> I have compiled rdkit as suggested in the docs by using a conda
> environment for c++ and boost.
>
> I would like to move from python to cpp with my RDkit work, and I thought
> to start with the C++ exaples in the Docs/Book, but I'm having some
> problems getting the minimal c++ examples to link with the current CMake
> files there.
>
> Attached is the output from the make command where I have only set it to
> make "example1.cpp" in CMakeLists.txt
>
> I guess the problem is the "/bin/ld: warning: libRDKitmaeparser.so.1,
> needed by /home/termo/HFlabs/rdkit/lib/libRDKitFileParsers.so, not found"
>
> I have set RDBASE and LD_LIBRARY_PATH, and as far as I can see with the
> "-Wl,-rpath,/home/termo/HFlabs/rdkit/lib" part in the linking command it
> should find the needed .so file (that is there).
>
> Any idea why it fails to find the .so files?
>
> /Rasmus
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Incorrect Aromaticity?

2019-10-30 Thread Dan Nealschneider
You've specified that the ring is aromatic in your smiles input. Did you
mean "C1CCC2C(C1)OC(N2)=O"?

*dan nealschneider* | senior developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Wed, Oct 30, 2019 at 12:00 PM Hao  wrote:

> Hello,
>
> It seems like RDKit is making my molecule aromatic when I don't think it
> should it. Here's the original smiles: c1ccc2c(c1)OC(N2)=O. A snippet of
> the workflow:
> [image: image.png]
> As you can see it makes the 5 membered ring aromatic. My chemistry isn't
> strong, so if someone can elucidate what I'm seeing, that would be
> very helpful.
>
> Thanks!
> Hao
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Problem with getting hybridization from mol object

2019-10-22 Thread Dan Nealschneider
Navid-
You probably need to "sanitize" the mol:

rdkit.Chem.rdmolops.SanitizeMol(mol)

*dan nealschneider* | senior developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Tue, Oct 22, 2019 at 6:31 PM Navid Shervani-Tabar 
wrote:

> Hello,
>
> I am trying to load a dataset using a vector of atoms (e.g [6,6,7,6,6,8])
> and the corresponding adjacency matrix. I am using the following script to
> transform these into a mol object:
>
> def MolFromGraphs(node_list, adjacency_matrix):
>
> # create empty editable mol object
> mol = Chem.RWMol()
>
> # add atoms to mol and keep track of index
> node_to_idx = {}
> for i in range(len(node_list)):
> a = Chem.Atom(node_list[i].item())
> molIdx = mol.AddAtom(a)
> node_to_idx[i] = molIdx
>
> # add bonds between adjacent atoms
> for ix, row in enumerate(adjacency_matrix):
> for iy, bond in enumerate(row):
>
> # only traverse half the matrix
> if iy <= ix:
> continue
>
> # add relevant bond type (there are many more of these)
> if bond == 0:
> continue
> elif bond == 1:
> bond_type = Chem.rdchem.BondType.SINGLE
> mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
> elif bond == 2:
> bond_type = Chem.rdchem.BondType.DOUBLE
> mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
> elif bond == 3:
> bond_type = Chem.rdchem.BondType.TRIPLE
> mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
> elif bond == 1.5:
> bond_type = Chem.rdchem.BondType.AROMATIC
> mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
>
> # Convert RWMol to Mol object
> mol = mol.GetMol()
>
> return mol
>
>
> When I try to get the hybridization of atoms using the mol object
> generated from the function above, I get *UNSPECIFIED.*
>
> To make sure that this function works, I used *MolToSmiles *to generate a
> SMILES string from the generated mol object and it matched the actual
> SMILES from the dataset. Interestingly, when I regenerate the mol object
> from the SMILES that I already generated from the above function, I can get
> the hybridization from the new mol object with no problem. I was wondering
> if there is a flag or variable that I should set in the above function to
> be able to get hybridization from the generated mol object.
>
> Thanks!
> Navid
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Issue with Chirality

2019-09-27 Thread Dan Nealschneider
Guillaume-
You may want to add hydrogens before embedding. Not sure *specifically* why
it's required in this case - maybe the stereo center needs a hydrogen in
order to pop it into 3d?

*dan nealschneider* | senior developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Wed, Sep 25, 2019 at 1:45 AM Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Dear All,
>
>
>
> One question why this is not working ?
>
>
>
> def mol3D(mol):
>
> tot = AllChem.EmbedMolecule(mol)
>
> try:
>
> X = AllChem.Get3DDistanceMatrix(mol)
>
> except:
>
> print('err')
>
> print(tot)
>
> n = mol.GetNumAtoms()
>
> X = np.zeros(n,n)
>
> return X
>
>
>
> mol = Chem.MolFromSmiles('CC[C@H](C)O')
>
> X = mol3D(mol)
>
>
>
> Result:
>
>
>
> err
>
> -1
>
> ---
>
> ValueErrorTraceback (most recent call
> last)
>
>  in mol3D(mol)
>
> *  3* try:
>
> > 4 X = AllChem.Get3DDistanceMatrix(mol)
>
> *  5* except:
>
>
>
> ValueError: Bad Conformer Id
>
>
>
> During handling of the above exception, another exception occurred:
>
>
>
> Thanks in advance,
>
>
>
> Guillaume
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AssignStereochemistry confusion

2019-09-27 Thread Dan Nealschneider
RDKit stores absolute configuration around atoms using parity (which is
based on atom index), not CIP codes. This has oodles of benefits, for
instance you can represent absolute configuration of atoms in achiral
structures like 1,4-dimethylcyclohexane. So, yeah - AssignStereochemistry
is intended to produce different results with different atom sequences.

If you need Cahn/Ingold/Prelog-style R or S for display, maybe try:
from rdkit import Chem
print(Chem.FindMolChiralCenters(mol))

https://github.com/rdkit/rdkit/blob/2d8bb6c1687a2fa78f66fc61966544e5f44f157c/rdkit/Chem/__init__.py#L80



*dan nealschneider* | senior developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Wed, Sep 25, 2019 at 2:35 PM Zoltan Takacs  wrote:

> Dear RDkitters,
>
> I am playing around with the assignstereochemistry function and I am
> getting confused. If I change the numbering of the atoms but not their
> coordinates (for example switch to canonical numbering) I get a different
> answer. In fact it only seems to work right when I switch to canonical
> numbering with openbabel.
>
> What rules does the function follow? Is there any role played by the
> numbering of atoms? Do they have to be numbered in any special way for
> example the groups need to follow each other when assigning CIP priority?
>
> Thanks
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetAngleDeg alternative for the case of no conformation

2019-09-20 Thread Dan Nealschneider
Navid-

If there aren't any conformers, then atoms don't have coordinates. In that
case, I'm not sure that it makes sense to measure an angle between the
atoms. Maybe you want to generate the 3d coordinates using embedmol, or a
2d depiction using rddepict.

*dan nealschneider* | senior developer
[image: Schrodinger Logo] <https://www.schrodinger.com/>


On Fri, Sep 20, 2019 at 11:16 AM Navid Shervani-Tabar 
wrote:

> Hello,
>
> Couple weeks ago, I asked if there is an RDKit based method that can give
> the distance between two atoms between the molecule. The solution that I
> got was:
>
> from rdkit.Chem import AllChem
>
> mol = Chem.MolFromSmiles('O=CC1OC12CC1OC12')
> AllChem.EmbedMolecule(mol)
> conf = mol.GetConformer()
>
> at1Coords = np.array(conf.GetAtomPosition(bond_i.GetBeginAtomIdx()))
> at2Coords = np.array(conf.GetAtomPosition(bond_i.GetEndAtomIdx()))
>
> dist = np.linalg.norm(at2Coords - at1Coords)
>
> This was good until I tried the same for molecules such as  'CC12CCC1CC2' 
> where
> there are zero conformations and the method did not work. I found the
> alternative
>
> from rdkit.Chem import rdDistGeom as molDG
>
> bound_matrix = molDG.GetMoleculeBoundsMatrix(mol)
> bond_i = mol.GetBondWithIdx(idx)
> d = bound_matrix[bond_i.GetBeginAtomIdx(), bond_i.GetEndAtomIdx()]
>
> Now getting back to my current question, I am trying to find the angle
> between two atoms using the method
>
> angle = Chem.rdMolTransforms.GetAngleDeg(conf, atom[0], atom[1], atom[2])
>
> Again, this method does not work when there are zero conformations. I was
> wondering if there are any RDKit alternatives for this that do not use
> mol.GetConformer or my best bet is to use molDG.GetMoleculeBoundsMatrix
> and trigonometry to find them.
>
> Thanks,
> Navid
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Get SMARTS of subset of atoms?

2019-07-31 Thread Dan Nealschneider
I'm trying to get a SMARTS pattern for a known subset of atoms in an ROMol.
I haven't been able to find a way to do this directly. Is there a way to
generate a SMARTS from a subset? Or to extract a subset of atoms from an
ROMol as a new ROMol? I can have a C++ std::vector<*Atom> or the indices or
a Python list of atoms or atom indices.

The way that I get a list of atoms is that a user lassos them in a GUI.

- dan nealschneider

(né wandschneider)

Senior Developer
Schr*ö*dinger, Inc
Portland, OR
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] implicit conversion of smart pointer with version 2018_03_4

2018-12-11 Thread Dan Nealschneider
boost::shared_ptr atom = (*pCurrentROMol)[*atBegin]

should probably be:

boost::shared_ptr atom((*pCurrentROMol)[*atBegin]);

Although I'm skeptical of the correctness of this code, because
ROMol::operator[] returns a pointer to an atom owned by the ROMol. If your
take ownership of the atom with a shared_ptr and then allow the shared_ptr
to clean it up you're asking for trouble. Maybe you were trying to copy the
atom, and control ownership with the shared_ptr?

- dan nealschneider

(né wandschneider)

Senior Developer
Schr*ö*dinger, Inc
Portland, OR




On Tue, Dec 11, 2018 at 3:36 PM Yingfeng Wang  wrote:

> I am using the C++ library of RDKit on Mac. My C++ code works with
> RDKit_2017_09_3. However, after I switch to RDKit 2018_03_4, I got the
> following error when compiling my C++ source code.
>
> *Database.cpp:148:33: **error: **no viable conversion from 'const
> RDKit::Atom *' to*
>
> *  'boost::shared_ptr'*
>
> boost::shared_ptr atom = (*pCurrentROMol)[*atBegin];
>
> *^  ~~*
>
> */usr/local/Cellar/boost/1.68.0/include/boost/smart_ptr/shared_ptr.hpp:358:21:
> **note: *
>
>   candidate constructor not viable: no known conversion from
>
>   'const RDKit::Atom *' to 'boost::detail::sp_nullptr_t' (aka
> 'nullptr_t')
>
>   for 1st argument
>
> BOOST_CONSTEXPR shared_ptr( boost::detail::sp_nullptr_t )
> BOOST_SP_N...
>
> *^*
>
> */usr/local/Cellar/boost/1.68.0/include/boost/smart_ptr/shared_ptr.hpp:422:5:
> **note: *
>
>   candidate constructor not viable: no known conversion from
>
>   'const RDKit::Atom *' to 'const boost::shared_ptr &'
> for 1st
>
>   argument
>
> shared_ptr( shared_ptr const & r ) BOOST_SP_NOEXCEPT : px( r.px ),
> p...
>
> *^*
>
> I am using Clang on Mac. The version information is given as follows.
>
> g++ -v
>
> Configured with: --prefix=/Library/Developer/CommandLineTools/usr
> --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
>
> Apple LLVM version 10.0.0 (clang-1000.10.44.4)
>
> Target: x86_64-apple-darwin18.2.0
>
> Thread model: posix
>
> I notice that "Starting with the 2018_03 release, the RDKit core C++ code
> is written in modern C++; for this release that means C++11. "
>
> Actually, I also use -std=c++11 when compiling my C++ source code. I also
> tested RDKit 2018_09_1 and got the similar error. I am wondering how to fix
> this problem.
>
> Thanks.
>
> Yingfeng
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Double Bond Stereochemistry in the RDKit

2018-12-04 Thread Dan Nealschneider
I've done some in-memory translation of molecules to ROMols, and have used
#2 without major problems. I do remember needing to make sure that the
stereoatoms are in the correct order - that is, that the first stereoatom
is bonded to the beginAtom of the bond. In Python, this is something like:

bond = mol.GetBondBetweenAtoms(begin, end)
if bond.GetBeginAtomIdx() != begin:
 assert bond.GetBeginAtomIdx() == end
 stereoatom1, stereoatom2 = stereoatom2, stereoatom1
bond.SetStereoAtoms(stereoatom1, stereoatom2)
bond.SetStereo(stereo)

- dan nealschneider

(né wandschneider)

Senior Developer
Schr*ö*dinger, Inc
Portland, OR




On Tue, Dec 4, 2018 at 7:02 AM Brian Cole  wrote:

> Hi Kovas,
>
> For your use-case #2 should suffice, "set STEREOCIS/STEREOTRANS tags +
> manually set stereo atoms". This is what the EnumerateStereoisomers code
> does:
> https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/EnumerateStereoisomers.py#L38
>
> As to what is the 'ground truth', that is a more difficult question that I
> fear the answer may be 'none of them'. STEREOCIS/STEREOTRANS are rather
> recent additions to the RDKit API, while we strived to make sure
> STEREOCIS/STEREOTRANS across the RDKit, there are probably looming bugs in
> untested parts of the RDKit that don't handle them properly. However, I
> think those other APIs should be fixed to handle them properly, so please
> do report any problems you spot into the github issue tracker.
>
> Cheers,
> Brian
>
>
>
> On Mon, Dec 3, 2018 at 7:00 PM Kovas Palunas 
> wrote:
>
>> Hi All,
>>
>>
>>
>> I’m looking for a bit more clarity regarding double bond stereochem in
>> RDKit.  Currently, my understanding is that there are 3 ways to currently
>> store this information:
>>
>>
>>
>>1. STEREOE/STEREOZ tags + stereo atoms on either side of bond set by
>>CIP ranks, as computed when calling MolFromSmiles to make a new molecule 
>> or
>>AssignStereochemistry on an existing molecule
>>2. Manually set STEREOCIS/STEREOTRANS tags + manually set stereo atoms
>>3. ENDUPRIGHT/etc. single bond directionality tags, which are set
>>when reading a molecule from smiles/inchi/mol file
>>
>>
>>
>> Is one of these methods the “ground truth” that is looked for by RDKit
>> functions that care about this info, like the substructure matching code or
>> the SMILES writing code?
>>
>>
>>
>> I am currently working on code that mutates molecules using a
>> predetermined list of changes to be made to the molecule.  I’d like to be
>> able to include bond stereochemistry changing/creation/destruction here,
>> and was thinking of doing so using the STEREOCIS/STEREOTRANS tags (and also
>> providing the reference stereo atoms).  Before I do this I want to make
>> sure that molecules with these tags will be handled correctly by other
>> RDKit functions downstream.  Would these tags be a good choice here?  Are
>> there any caveats I should keep in mind as I work with this information?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> - Kovas
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2018_03_2 md5 checksum problems

2018-06-28 Thread Dan Nealschneider
Oh, good point (that the version is pinned). I guess that I saw an md5
mismatch which I addressed by submitting
https://github.com/rdkit/rdkit/pull/1904. That case caused a md5sum
mismatch because the reference md5sum was being compared to a different
library. It would be interesting to know which library is stored in
/data/kaushik/src/rdkit-Release_2018_03_2/External/CoordGen/master.tar.gz
(Is it actually maeparser, or is it coordgen?).


- dan nealschneider

(né wandschneider)

Senior Developer
Schr*ö*dinger, Inc
Portland, OR




On Thu, Jun 28, 2018 at 1:13 PM Kaushik Lakkaraju 
wrote:

> Thanks for helping out Greg and Dan.
>
> @Greg, rdkit version 2018_03_2.
>
> -Kaushik
>
>
> On Thu, Jun 28, 2018 at 3:53 PM, Greg Landrum 
> wrote:
>
>>
>>
>> On Thu, Jun 28, 2018 at 12:52 PM Greg Landrum 
>> wrote:
>>
>>> I don't think that should be a problem. The versions should already be
>>> pinned to a particular commit of both libraries in this case:
>>>
>>> https://github.com/rdkit/rdkit/blob/Release_2018_03_2/External/CoordGen/CMakeLists.txt
>>>
>>> I'm somewhat confused that you're seeing md5 errors at all since the
>>> 2018.03.2 version of the code doesn't have md5s for this package.
>>>
>>
>> Sorry, mis-spoke here. The coordgen download isn't checking an md5 but
>> the maeparser download is.
>>
>> I'm not at all sure where the problem is coming from though.
>>
>> -greg
>>
>>
>>
>>
>>> @Kaushik: which version of the RDKit source are you using?
>>>
>>> -greg
>>>
>>>
>>>
>>>
>>> On Thu, Jun 28, 2018 at 12:46 PM Dan Nealschneider <
>>> dan.nealschnei...@schrodinger.com> wrote:
>>>
>>>> I've noticed this as well, I think it's because maeparser has been
>>>> updated since it was pinned within RDKit. You can work around this by
>>>> downloading the mae[arser manually to External outside the make process.
>>>> This may also be required for coordgen. I'll also submit a PR to either
>>>> ignore the md5sum for these packages for now (this is what is done for
>>>> other fast moving packages) or to get the correct tagged version of these
>>>> libraries.
>>>>
>>>> - dan nealschneider
>>>>
>>>> (né wandschneider)
>>>>
>>>> Senior Developer
>>>> Schr*ö*dinger, Inc
>>>> Portland, OR
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 25, 2018 at 2:19 PM Kaushik Lakkaraju 
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to install rdkit from source on Ubuntu 14.04 using the
>>>>> following commands. I have a previously compiled 2016 version on the same
>>>>> machine, so my paths for boost, python etc are already set.
>>>>>
>>>>> The following is my sequence of steps:
>>>>>
>>>>> 1) wget
>>>>> https://github.com/rdkit/rdkit/archive/Release_2018_03_2.tar.gz
>>>>>
>>>>> 2) tar -xvzf Release_2018_03_2.tar.gz
>>>>>
>>>>> 3) cd rdkit-Release_2018_03_2
>>>>>
>>>>> 4) mkdir build
>>>>>
>>>>> 5) cd build
>>>>>
>>>>> 6) cmake .. -DCMAKE_INSTALL_PREFIX=/data/kaushik/apps/rdkit-2018
>>>>>
>>>>> Upon doing so, I bump into configuration errors :
>>>>>
>>>>> -- Boost version: 1.59.0
>>>>>
>>>>> -- Found the following Boost libraries:
>>>>>
>>>>> --   python
>>>>>
>>>>> PYTHON Py_ENABLE_SHARED: 1
>>>>>
>>>>> PYTHON USING LINK LINE: -pthread -shared -Wl,-O1
>>>>> -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro
>>>>> -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes
>>>>> -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 
>>>>> -Wformat
>>>>> -Werror=format-security
>>>>>
>>>>> -- Could NOT find Eigen3 (missing:  EIGEN3_INCLUDE_DIR
>>>>> EIGEN3_VERSION_OK) (Required is at least version "2.91.0")
>>>>>
>>>>> Eigen3 not found, disabling the Descriptors3D build.
>>>>>
>>>>> -- Boost version: 1.59.0
>>>>>
>>>>> -- Found the following Boost libraries:
>>

Re: [Rdkit-discuss] 2018_03_2 md5 checksum problems

2018-06-28 Thread Dan Nealschneider
I've noticed this as well, I think it's because maeparser has been updated
since it was pinned within RDKit. You can work around this by downloading
the mae[arser manually to External outside the make process. This may also
be required for coordgen. I'll also submit a PR to either ignore the md5sum
for these packages for now (this is what is done for other fast moving
packages) or to get the correct tagged version of these libraries.

- dan nealschneider

(né wandschneider)

Senior Developer
Schr*ö*dinger, Inc
Portland, OR




On Mon, Jun 25, 2018 at 2:19 PM Kaushik Lakkaraju 
wrote:

> Hi all,
>
> I am trying to install rdkit from source on Ubuntu 14.04 using the
> following commands. I have a previously compiled 2016 version on the same
> machine, so my paths for boost, python etc are already set.
>
> The following is my sequence of steps:
>
> 1) wget https://github.com/rdkit/rdkit/archive/Release_2018_03_2.tar.gz
>
> 2) tar -xvzf Release_2018_03_2.tar.gz
>
> 3) cd rdkit-Release_2018_03_2
>
> 4) mkdir build
>
> 5) cd build
>
> 6) cmake .. -DCMAKE_INSTALL_PREFIX=/data/kaushik/apps/rdkit-2018
>
> Upon doing so, I bump into configuration errors :
>
> -- Boost version: 1.59.0
>
> -- Found the following Boost libraries:
>
> --   python
>
> PYTHON Py_ENABLE_SHARED: 1
>
> PYTHON USING LINK LINE: -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions
> -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g
> -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g
> -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security
>
> -- Could NOT find Eigen3 (missing:  EIGEN3_INCLUDE_DIR EIGEN3_VERSION_OK)
> (Required is at least version "2.91.0")
>
> Eigen3 not found, disabling the Descriptors3D build.
>
> -- Boost version: 1.59.0
>
> -- Found the following Boost libraries:
>
> --   serialization
>
> == Using strict rotor definition
>
> Downloading
> https://codeload.github.com/schrodinger/maeparser/tar.gz/83368293dcc0eb07562dadfb7728b8d18d23a6cb.
> ..
>
> CMake Error at Code/cmake/Modules/RDKitUtils.cmake:194 (MESSAGE):
>
>   The md5 checksum for
>
>   /data/kaushik/src/rdkit-Release_2018_03_2/External/CoordGen/master.tar.gz
>
>   is incorrect; expected: 32c0c3b315bba49fbf4c41a07aa58528, found:
>
>   d41d8cd98f00b204e9800998ecf8427e
>
> Call Stack (most recent call first):
>
>   External/CoordGen/CMakeLists.txt:9 (downloadAndCheckMD5)
>
>
>
> -- Configuring incomplete, errors occurred!
>
>
> Have others seen this problem? Could somebody please help with proceeding
> forward with installation?
>
>
> Thanks,
>
> Kaushik
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen

2018-04-06 Thread Dan Nealschneider
Thanks, Greg-


>
>> What is the correct treatment of bond stereochemistry at centers for
>> which a hydrogen is required in order to specify the bond stereochemistry?
>> For example, an imine with a hydrogen substituent (trivial example,
>> F/C=N/[H]).
>>
>
> In these cases the H cannot be implicit. The double bond stereochemistry
> is always defined relative to atoms bonded to the double-bonded atoms (more
> complex to write than it actually is) and there’s just no way to do this if
> either of those atoms is implicit.
>

Ok. It sounds like the correct treatment for my schrodinger/rdkit
translation layer is to leave these hydrogens explicit.


> I notice that when I use the smiles constructor, or if I read from an SDF
>> file using the SDMolSupplier, the C=N bond in the example shown above is
>> not recognized as having stereochemistry. However, if I use
>> removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as
>> Z.
>>
>
> I need to confirm it (I’m on my phone at the moment), but I think this is
> a bug: removeHs() should not remove atoms that determine stereochemistry.
> This might be something I can get fixed before the next release.
>

Reading from SMILES in RDKit also loses this hydrogen:

Python 3.6.2 (default, Sep 26 2017, 17:33:28)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
>>> import rdkit.Chem
>>> rdkit.__version__
'2017.03.1'
>>> m = rdkit.Chem.MolFromSmiles('F/C=N/[H]')
>>> rdkit.Chem.MolToSmiles(m, isomericSmiles=True)
'N=CF'

Would it be useful for me to file a bug report?
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Fwd: What is correct treatment of bond stereochemistry defined by hydrogen

2018-04-05 Thread Dan Nealschneider
I'm working on a translation layer between Schrodinger structures and RDKit
mols. Schrodinger structures do not have implicit hydrogens, so I'm
struggling a bit to understand how best to treat potentially implicit
hydrogens!

What is the correct treatment of bond stereochemistry at centers for which
a hydrogen is required in order to specify the bond stereochemistry? For
example, an imine with a hydrogen substituent (trivial example, F/C=N/[H]).

I notice that when I use the smiles constructor, or if I read from an SDF
file using the SDMolSupplier, the C=N bond in the example shown above is
not recognized as having stereochemistry. However, if I use
removeHydrogens=False in the SDMolSupplier, the bond *is* recognized as Z.
Maybe that can beg presented more clearly as code (here's an interactive
Python shell, I've also attached this as a script, as well as an SDF file).

Python 3.6.2 (default, Jul 21 2017, 13:21:26)
[GCC 4.9.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdkit
>>> print(rdkit.__version__)
2017.03.1
>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> from rdkit.Chem import rdmolops
>>> def summarize(mol):
...  bond = mol.GetBondBetweenAtoms(0, 1)
...  atoms = list(bond.GetStereoAtoms())
...  atoms.insert(1, bond.GetEndAtom().GetIdx())
...  atoms.insert(1, bond.GetBeginAtom().GetIdx())
...  print(Chem.MolToSmiles(mol, isomericSmiles=True))
...  print(bond.GetStereo(), atoms)
...
>>> has_h = next(Chem.SDMolSupplier('cis_imine.sdf', removeHs=False))
>>> no_h = rdmolops.RemoveHs(has_h)
>>> has_h_again = rdmolops.AddHs(no_h)
>>> summarize(has_h)
[H]/N=C(/[H])F
STEREOZ [3, 0, 1, 2]
>>> summarize(no_h)
N=CF
STEREOZ [1, 0]
>>> summarize(has_h_again)
[H]N=C([H])F
STEREOZ [1, 0]
>>> AllChem.EmbedMolecule(has_h)
0
>>> AllChem.EmbedMolecule(no_h)
0
>>> AllChem.EmbedMolecule(has_h_again)
Fatal Python error: Segmentation fault

Current thread 0x7faa949d8740 (most recent call first):
  File "", line 1 in 
Segmentation fault

*At core, I have 2 questions:* Is RDKit able to represent stereochemistry
about this bond if the hydrogen is implicit? It's fine if not, I just want
to know. If RDKit can represent stereochemistry for bonds for which one
substituent is hydrogen, what different information do I need to provide
RDKit?

- dan nealschneider

(né wandschneider)

Senior Developer
Schr*ö*dinger, Inc
Portland, OR


cis_imine.sdf
Description: Binary data
"""
Demonstrate my questions about bonds whose stereochemistry is specified
based on a hydrogen, especially when that hydrogen is made implicit.

"""
import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdmolops

has_h = next(Chem.SDMolSupplier('cis_imine.sdf', removeHs=False))
def summarize(mol, a0=0, a1=1):
bond = mol.GetBondBetweenAtoms(a0, a1)
atoms = list(bond.GetStereoAtoms())
atoms.insert(1, bond.GetEndAtom().GetIdx())
atoms.insert(1, bond.GetBeginAtom().GetIdx())
print(Chem.MolToSmiles(mol, isomericSmiles=True))
print(bond.GetStereo(), atoms)

no_h = rdmolops.RemoveHs(has_h)
has_h_again = rdmolops.AddHs(no_h)

print(rdkit.__version__)
summarize(has_h)
summarize(no_h)
summarize(has_h_again)
AllChem.EmbedMolecule(has_h)
AllChem.EmbedMolecule(no_h)
# This generates a SEGV in my hands. Totalview says it happened in
# _ZN5RDKit12DGeomHelpers14_getAtomStereoEPKNS_4BondEjj, but I
# can't find a getAtomStereo or 2DGeomHelpers in RDKit's github.
AllChem.EmbedMolecule(has_h_again)

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss