Re: [Rdkit-discuss] Beta of the 2017.09 release available

2017-10-02 Thread Ivan Tubert-Brohman
I can reproduce this with an older version, but the problem is that you have RemoveHs instead of removeHs. On Mon, Oct 2, 2017 at 7:20 AM, Guillaume GODIN < guillaume.go...@firmenich.com> wrote: > Dear Greg, > > > I don't know if it's related but I have this issue on my mac version since > this

[Rdkit-discuss] useQueryQueryMatches with recursive SMARTS

2018-03-07 Thread Ivan Tubert-Brohman
Hi, Is it reasonable to expect that a SMARTS should match itself when useQueryQueryMatches=True? query = Chem.MolFromSmarts('[C;!$(C=O)]Cl') query.HasSubstructMatch(query, useQueryQueryMatches=True) The above returns False. Without useQueryQueryMatches, it returns True, but I think I need

Re: [Rdkit-discuss] useQueryQueryMatches with recursive SMARTS

2018-03-09 Thread Ivan Tubert-Brohman
On Fri, Mar 9, 2018 at 12:13 AM, Greg Landrum <greg.land...@gmail.com> wrote: > Hi Ivan, > > On Wed, Mar 7, 2018 at 8:58 PM, Ivan Tubert-Brohman <ivan.tubert-brohman@ > schrodinger.com> wrote: > >> >> Is it reasonable to expect that a SMARTS should match i

Re: [Rdkit-discuss] Sometimes one sanitization is not enough?

2018-10-31 Thread Ivan Tubert-Brohman
no error is thrown. The > aromaticity perception (step 6) does not consider the ring to be aromatic, > so the final molecule is the equivalent of C1=N(C)C=CN1. > > It ought to be possible to clear this in the sanitization code relatively > easily; I just need to think about i

[Rdkit-discuss] Sometimes one sanitization is not enough?

2018-10-30 Thread Ivan Tubert-Brohman
Hi, I was surprised to see that a (dubious) structure that goes through SanitizeMol OK can fail a subsequent sanitization call: print("Start") mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False) print("Before first sanitization") Chem.SanitizeMol(mol) print("Before second sanitization")

Re: [Rdkit-discuss] number of significant digits in molblock?

2018-10-05 Thread Ivan Tubert-Brohman
Hi Michal, The old SDF format (aka V2000 CTAB) is column-based, as things often were in the era of Fortran 77 and punch cards. Not only the precision but also the exact position of each value on the line is specified! Here's what the spec says: The Atom Block is made up of atom lines, one line

Re: [Rdkit-discuss] Count rings in bicyclic compounds

2018-12-05 Thread Ivan Tubert-Brohman
Hi Baptiste, RDKit focuses on "simple rings". As far as I know, it has no builtin function to return all possible cycles in a molecule. For a molecule with a "basis set" of N rings, there can be up to 2^N-1 ring systems, which can be obtained by taking all possible subsets (aka the powerset) of

Re: [Rdkit-discuss] Finding out the origin of product atoms after applying a reaction

2018-09-17 Thread Ivan Tubert-Brohman
t; isotopes, you can set unique isotope numbers for every reacting atom. Those > will be preserved in the products so you can get the atom-atom mapping > after running the reaction. > > Connor > > On Mon, Sep 17, 2018 at 10:36 AM Ivan Tubert-Brohman schrodinger.com> wrote: >

[Rdkit-discuss] Finding out the origin of product atoms after applying a reaction

2018-09-17 Thread Ivan Tubert-Brohman
I'd like to know where each atom in a reaction product came from, but as far as I can tell, RDKit doesn't provide enough information. Here's what I found out empirically so far. There are four kinds of product atoms: 1. New atoms: atoms are defined in the product template without a mapping

Re: [Rdkit-discuss] problem when doing Chem.MolFromSmiles()

2019-03-13 Thread Ivan Tubert-Brohman
The problem is this line: > core_smiles_2='C1=C/C2=C/c3ccc4n3[Zn]n3/c(cc/c3=C/C3=N/C(=C\4)C=C3)=C\C1=N2' Python is interpreting the \4 as an escape sequence. You either need to double the backslash or use an "r string" to protect the backslash from being interpreted that way. That is, either of

Re: [Rdkit-discuss] Reaction SMARTS

2019-02-06 Thread Ivan Tubert-Brohman
Hi Jean-Marc, Try the reaction smarts '[C:1]([OH:2])=[N:3]>>[C:1](=[OH0:2])[NH:3]'. The only difference is the addition of "H0" to product atom :2. The problem is that the hydrogen count from the reactant atom gets copied over unless specified otherwise. Hope this helps, Ivan On Wed, Feb 6,

[Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()

2019-11-06 Thread Ivan Tubert-Brohman
Hi, For reasons to complicated to get into here, I ended up with a molecule containing a =CH2 in which one of the hydrogens was explicit and had E/Z stereo info. For example, consider [H]/C=C/F. I was surprised that RemoveHs() refused to remove the hydrogen, although later I found that that's

Re: [Rdkit-discuss] Incorrect Aromaticity?

2019-10-30 Thread Ivan Tubert-Brohman
It is aromatic according to the RDKit aromaticity model described here: https://www.rdkit.org/docs/RDKit_Book.html#aromaticity The O and N each contribute 2 electrons. Each of the carbons shared with the 6-member ring contribute one electron. The carbonyl is sp2 and contributes zero electrons.

Re: [Rdkit-discuss] RDKit Num Rotors descriptors?

2019-10-15 Thread Ivan Tubert-Brohman
This is from lipinski.cpp: if (strict == NonStrict) { std::string pattern = "[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]"; pattern_flyweight m(pattern); return m.get().countMatches(mol); } else if (strict==Strict) { std::string strict_pattern =

Re: [Rdkit-discuss] RDKit Num Rotors descriptors?

2019-10-15 Thread Ivan Tubert-Brohman
results for strict and non-strict, as expected, and the default was the same as strict. On Tue, Oct 15, 2019 at 1:57 PM Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> wrote: > This is from lipinski.cpp: > > if (strict == NonStrict) { > std::string patte

Re: [Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()

2019-11-20 Thread Ivan Tubert-Brohman
) conj?: 0 aromatic?: 0 > 2 2->3 order: 1 dir: 4 conj?: 0 aromatic?: 0 > > > Given that the two substituents on the first C are the same, the double > bond shouldn't be marked as STEREOE at all. > > I'll get this fixed. > -greg > > > > On Wed, Nov 6, 2019

Re: [Rdkit-discuss] distinguishing macrocyclic molecules

2019-10-09 Thread Ivan Tubert-Brohman
t to pull up things like > anthracene which might not be something you’d want to class as a macrocycle. > Cheers, > Dave > > On Wed, 9 Oct 2019 at 14:39, Ivan Tubert-Brohman < > ivan.tubert-broh...@schrodinger.com> wrote: > >> Hi Thomas, >> >> I don't know

Re: [Rdkit-discuss] Sanitize molecule with explicit Hydrogens to catch an error

2020-05-11 Thread Ivan Tubert-Brohman
Hi Pablo, SMILES by definition has implicit hydrogens (enough to satisfy the typicial valence) for atoms that are not within brackets. It doesn't matter if you write C, C[H], [H]C[H], or [H]C([H])([H])[H]; they are all methane. The number of hydrogens that are returned by GetNumImplicitHs() and

Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment

2020-03-07 Thread Ivan Tubert-Brohman
Hi Curt, According to https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions , it's not supported: Here’s the (hopefully complete) list of SMARTS features that are *not* > supported: > >- Non-tetrahedral chiral classes > > >- the @? operator > > >- explicit atomic

Re: [Rdkit-discuss] Display Molecules within IPython Console

2020-09-01 Thread Ivan Tubert-Brohman
Hi Vin, If you are running the IPython console on a terminal emulator that supports graphics, you could display the molecule by printing out the necessary terminal escape codes followed by the image buffer. The solution is terminal-specific; here's an example that works using the Kitty terminal:

Re: [Rdkit-discuss] difference between _CalcMolWt vs CalcExactMW?

2020-10-15 Thread Ivan Tubert-Brohman
Hi Steven, MolWt uses naturally occurring average atomic weights, the ones you find in a typical periodic table. For example, Cl = 35.453. ExactMolWt uses the weight of a specific isotope (the most naturally abundant isotope unless the structure specifies a different one for an atom). These are

Re: [Rdkit-discuss] proper technical term for generating virtual compounds with rdkit and smarts

2020-09-25 Thread Ivan Tubert-Brohman
We use "reaction-based enumeration" to distinguish it from "R-group enumeration". Both are types of virtual library enumeration. R-group enumeration allows you to attach any R-group anywhere. It is simple and fast but you can easily create implausible (or hard to synthesize) molecules if you are

Re: [Rdkit-discuss] Missing atom indices in the last structure

2020-09-22 Thread Ivan Tubert-Brohman
Hi Norwid, The inner loop over mols here: for i in smiles_list: for mol in mols: for atom in mol.GetAtoms(): atom.SetAtomMapNum(atom.GetIdx()) mols.append(Chem.MolFromSmiles(i)) is not in the right place. First, because you'll go over the same

Re: [Rdkit-discuss] Hybridization state

2020-05-26 Thread Ivan Tubert-Brohman
Hi Jean-Marc, RDKit says that the oxygen is sp2 because it has a special rule that considers the conjugation. Whether that is the "true" hybridization for the oxygen could be a long debate; I sometimes hear that it's somewhere between sp2 and sp3, perhaps not as close to sp2 as the nitrogen in

Re: [Rdkit-discuss] Scalability of Postgres cartridge

2020-06-10 Thread Ivan Tubert-Brohman
Thank you everyone for the suggestions. For now I don't have immediate plans to adopt the cartridge but it's good to know these things when the time comes. Best, Ivan On Mon, Jun 8, 2020 at 6:49 PM Finnerty, Jim via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > If you have a

[Rdkit-discuss] Scalability of Postgres cartridge

2020-06-04 Thread Ivan Tubert-Brohman
Hi, I've never tried the RDKit PostgreSQL cartridge but I'm curious about it. In particular I wonder how far have people pushed it in terms of database size. The documentation gives examples with several million rows; has anyone tried it with a couple billion rows? How fast are substructure

Re: [Rdkit-discuss] substructure matching

2020-07-21 Thread Ivan Tubert-Brohman
Hi Quoc-Tuan, I can't reproduce your observations; I get True in both cases. Which version of RDKit are you using? One thing to note is that you are parsing a SMARTS with MolFromSmiles. I wouldn't recommend that in general, although it appears that in this case RDKit is lenient enough to accept

Re: [Rdkit-discuss] sanitization converts "I(=O)(=O)[O-]" into "[O-][I+2]([O-])[O-]"

2021-01-22 Thread Ivan Tubert-Brohman
I think there was some confusion between left and right in the original message, but RDKit prefers the representation that preserves the octet at the expense of having more formal charges: In [9]: mol = Chem.MolFromSmiles('O=I(=O)([O-])') In [10]: Chem.MolToSmiles(mol) Out[10]:

Re: [Rdkit-discuss] Multiprocessing/Threading in Python/Rdkit

2021-06-12 Thread Ivan Tubert-Brohman
Hi Philipp, This is an embarrassingly parallel problem (that's the actual technical term, so no need to feel embarrassed. :-), meaning there's no need for communication between threads or processes, which makes it really easy: just split the search space, run a separate job for each fraction, and

Re: [Rdkit-discuss] Hydrogens not recognised as Dummy Atoms?

2021-07-08 Thread Ivan Tubert-Brohman
Hi Adelene, You can't match an atom that doesn't exist as a node in the molecular graph, so if you really want to match a hydrogen, you'll have to add explicit hydrogens to your molecule: molh = Chem.AddHs(mol) molh.HasSubstructMatch(q1) > True However, if all you want to know is whether the

Re: [Rdkit-discuss] Error in RDKit output for finding ring atoms!

2021-03-11 Thread Ivan Tubert-Brohman
Hi Goutam, The ring atoms reported by RDKit in your example are correct; you just need to consider that the atom indexes correspond to the position of each atom in the SMILES string. How could RDKit guess the index that the atom might have in a PDB file that's not even being read in your example?

Re: [Rdkit-discuss] Substructure search racemic compounds only

2021-03-17 Thread Ivan Tubert-Brohman
Hi Lauren, SMARTS doesn't have a direct way of saying an atom is non-racemic, but you can express that idea using recursive SMARTS. For example, In [46]: racemic = Chem.MolFromSmiles('c12c1cncc2NC(=O)C(CCO2)c1cc(Cl)ccc12') In [47]: chiral1 = Chem.MolFromSmiles('c12c1cncc2NC(=O)[C@H

Re: [Rdkit-discuss] SMARTS representing a fragment (with "unbonded" bonds)

2021-03-05 Thread Ivan Tubert-Brohman
Hi Thomas, I believe what you want can be done using recursive SMARTS and disconnected SMARTS. For example, In [7]: mol = Chem.MolFromSmiles('CCC=C') In [8]: mol.GetSubstructMatches(Chem.MolFromSmarts('[$(C-*)].CC.[$(C=*)]')) Out[8]: ((0, 1, 2, 3),) The recursive SMARTS let you match a single

Re: [Rdkit-discuss] using GetNumConjGrps and similar functions

2021-09-28 Thread Ivan Tubert-Brohman
Hi German, GetNumConjGrps is not a function of the Chem module, but a method of the ResonanceMolSupplier class. You have to create a resonance mol supplier object first, for example: >>> supp = Chem.ResonanceMolSupplier(mol) >>> supp.GetNumConjGrps() 2 Hope this helps, Ivan On Tue, Sep 28,

Re: [Rdkit-discuss] GetSubstructMatch bug? + mol depiction issue

2021-11-04 Thread Ivan Tubert-Brohman
That does seem like a bug. You can also see it without involving DeleteSubstructs, by starting from different SMILES representations of the same molecule: >>> m1 = Chem.MolFromSmiles('FC12C31C32F') >>> m2 = Chem.MolFromSmiles('C12C31C32') >>> m3 = Chem.MolFromSmiles('C1CC2C3C(C1)C23')

Re: [Rdkit-discuss] Question matching substructures from SMARTS with explicit hydrogens

2022-03-01 Thread Ivan Tubert-Brohman
A minor correction: [H] by itself *is* valid and means a hydrogen atom. The Daylight docs say as much in section 4.1. But in other contexts it means a hydrogen count, so to be safe, always using #1 to mean a hydrogen atom can be a good practice. If you are ever in doubt about how RDKit is

Re: [Rdkit-discuss] how to report SDF records for which Chem.ForwardSDMolSupplier returns None?

2022-04-14 Thread Ivan Tubert-Brohman
How about splitting the file on lines consisting of "", and then parsing each record? If the parsing fails, you can write out the bad record for future inspection. (This addresses the basic use case, but not the "even better" one.) Here's a proof of concept: from rdkit import Chem def

Re: [Rdkit-discuss] SMARTS pattern

2022-06-07 Thread Ivan Tubert-Brohman
Hi Eduardo, I believe the problem is that r6 means "in *smallest* SSSR ring of size ", where "smallest" in this context means that, for example, for an atom at the ring fusion between a 5-member ring and a 6-member ring, r5 would match that atom but r6 wouldn't. Perhaps using x3 instead (means

Re: [Rdkit-discuss] SMARTS pattern

2022-06-07 Thread Ivan Tubert-Brohman
On Tue, Jun 7, 2022 at 1:39 PM Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> wrote: > Perhaps using x3 instead (means "number of ring bonds") would work for > your purposes? > Nevermind, x3 won't exclude the fused 4-atom rings from your first example. I'l

Re: [Rdkit-discuss] CalcNumAtoms import error

2022-07-14 Thread Ivan Tubert-Brohman
Hi Chris, Please try a more recent version of RDKit. I believe this function was added in the 2021.09 release. Hope this helps, Ivan On Thu, Jul 14, 2022 at 7:04 AM Chris Swain via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hi, > > If I try > > from

Re: [Rdkit-discuss] Enumerate Torsion angles

2022-10-19 Thread Ivan Tubert-Brohman
Hi Rohit, Could you attach a complete example? I took the script from the email you refer to, only edited the line that says mol = Chem.MolFromSmiles('CC') to make it say mol = Chem.MolFromSmiles('CC'), and when I run it I get nine torsions: (2, 0, 1, 5) (2, 0, 1, 6) (2, 0, 1, 7) (3, 0, 1,

Re: [Rdkit-discuss] reactions with benzene rings

2022-09-23 Thread Ivan Tubert-Brohman
Hi Fernando, What happens is that atoms on the left hand side of the reaction template get deleted unless they have mapping numbers (and everything else they were attached to that becomes unreachable from the mapped atoms is gone as well). Atoms on the right hand side without mapping numbers are

Re: [Rdkit-discuss] Accessing CXSMILES information in the rdchem.Mol object

2022-11-08 Thread Ivan Tubert-Brohman
Hi Lauren, The enhanced stereochemistry is available, not as atom properties, but as "stereo groups" of the Mol object. For example, >>> mol = Chem.MolFromSmiles('C[C@H]1CCCNC1 |&1:1,r|') >>> for group in mol.GetStereoGroups(): print([group.GetGroupType(), [atom.GetIdx()

Re: [Rdkit-discuss] reaction involving aromatic atoms

2023-06-21 Thread Ivan Tubert-Brohman
Hi Michal, A key point to consider is that the default bond order in SMARTS is not single, but "single or aromatic". If you really want to match single bonds only, you can specify a single bond with "-". However, it sounds as if you actually expect aromatic bonds to match as well, since you

Re: [Rdkit-discuss] C++ Molecular Weight

2023-05-15 Thread Ivan Tubert-Brohman
Hi Jarod, Something like this should work: #include #include #include #include int main() { auto mol = RDKit::SmilesToMol("CCO"); auto mw = RDKit::Descriptors::calcAMW(*mol); std::cout << mw << "\n"; } Hope this helps, Ivan On Mon, May 15, 2023 at 3:00 PM Jarod