Re: [Rdkit-discuss] Count rings in bicyclic compounds
I'm not sure if this is a good candidate for the cookbook. As Ivan said, it's a pretty dangerous function (exponentially complex) to run on molecules of arbitrary size/complexity. You may want a path-based approach with a cutoff for ring-size. (e.g. using https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.unweighted.single_target_shortest_path.html#networkx.algorithms.shortest_paths.unweighted.single_target_shortest_path ) *dan nealschneider* | lead developer [image: Schrodinger Logo] <https://www.schrodinger.com/> On Tue, Mar 17, 2020 at 6:50 AM Scalfani, Vincent wrote: > Hello Ivan and all, > > > I found this old thread about counting rings in bicyclics and I would like > to add it to the RDKit Cookbook, however, I'm not able to get the fuse > function to work. I get a 'Mol object is not iterable' error. > > > Any help appreciated. Thanks. > > > Vin > > > -- > *From:* Ivan Tubert-Brohman > *Sent:* Wednesday, December 5, 2018 9:06 AM > *To:* baptiste.cana...@gmail.com > *Cc:* RDKit Discuss > *Subject:* Re: [Rdkit-discuss] Count rings in bicyclic compounds > > Hi Baptiste, > > RDKit focuses on "simple rings". As far as I know, it has no builtin > function to return all possible cycles in a molecule. > > For a molecule with a "basis set" of N rings, there can be up to 2^N-1 > ring systems, which can be obtained by taking all possible subsets (aka the > powerset) of rings and fusing them. Below is an implementation based on > fusing simple rings. Another possibility would be to write an exhaustive > ring search (DFS or BFS) of the molecular graph and report all cycles that > are found, instead of only the simple ones. > > *Warning*: do not run this code on fullerenes or similar molecules unless > you are prepared to wait for a long, long time! > > def all_bond_rings(mol): > """ > Generate all ring systems for a molecule. A Ring is a set of bond > indexes. > > :type mol: rdkit.Chem.Mol > :rtype: set of int > """ > ring_info = mol.GetRingInfo() > rings = [set(r) for r in ring_info.BondRings()] > > # Truncate nrings to the basis set size because RDKit returns redundant > # rings (e.g., 6 instead of 5 for cubane). > nfrags = len(Chem.GetMolFrags(mol)) > nrings = mol.GetNumBonds() - mol.GetNumAtoms() + nfrags > del rings[nrings:] > > for i in range(1, len(rings)+1): > for comb in itertools.combinations(rings, i): > fused = fuse(comb) > if fused: > yield fused > > def fuse(rings): > """ > Return the ring system that results from fusing the given rings, if the > rings are fusable into a single ring system; otherwise None. > > :type rings: list of set of int > :rtype: set of in > """ > pending = list(rings) > fused = set(pending.pop()) > while pending: > for i in range(len(pending)): > ring = pending[i] > if fused & ring: # rings are fused > fused ^= ring > del pending[i] > break > else: > # None of the pending rings were fusable! > return None > return fused > > Hope this helps, > Ivan > > > > On Wed, Dec 5, 2018 at 5:54 AM Baptiste CANAULT < > baptiste.cana...@gmail.com> wrote: > >> Hi RDKiters, >> >> I would like to identify all cycles present in a molecular >> structure. However, when the molecules correspond to bicyclic compounds, >> the ring count does not correspond to the number actually observed in the >> structure. Simple example: >> >> >>> m = Chem.MolFromSmiles('C1CC2CCC1O2') >> >>> r = m.GetRingInfo() >> >>> r.NumRings() >> 2 >> >> In reality, this molecular structure has 3 cycles with the cyclohexan. Am >> I completely wrong and is there a trick to identify all the cycles present >> in a structure? >> >> Thanks in advance, >> >> Best regards, >> >> Baptiste >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Count rings in bicyclic compounds
Hello Ivan and all, I found this old thread about counting rings in bicyclics and I would like to add it to the RDKit Cookbook, however, I'm not able to get the fuse function to work. I get a 'Mol object is not iterable' error. Any help appreciated. Thanks. Vin From: Ivan Tubert-Brohman Sent: Wednesday, December 5, 2018 9:06 AM To: baptiste.cana...@gmail.com Cc: RDKit Discuss Subject: Re: [Rdkit-discuss] Count rings in bicyclic compounds Hi Baptiste, RDKit focuses on "simple rings". As far as I know, it has no builtin function to return all possible cycles in a molecule. For a molecule with a "basis set" of N rings, there can be up to 2^N-1 ring systems, which can be obtained by taking all possible subsets (aka the powerset) of rings and fusing them. Below is an implementation based on fusing simple rings. Another possibility would be to write an exhaustive ring search (DFS or BFS) of the molecular graph and report all cycles that are found, instead of only the simple ones. Warning: do not run this code on fullerenes or similar molecules unless you are prepared to wait for a long, long time! def all_bond_rings(mol): """ Generate all ring systems for a molecule. A Ring is a set of bond indexes. :type mol: rdkit.Chem.Mol :rtype: set of int """ ring_info = mol.GetRingInfo() rings = [set(r) for r in ring_info.BondRings()] # Truncate nrings to the basis set size because RDKit returns redundant # rings (e.g., 6 instead of 5 for cubane). nfrags = len(Chem.GetMolFrags(mol)) nrings = mol.GetNumBonds() - mol.GetNumAtoms() + nfrags del rings[nrings:] for i in range(1, len(rings)+1): for comb in itertools.combinations(rings, i): fused = fuse(comb) if fused: yield fused def fuse(rings): """ Return the ring system that results from fusing the given rings, if the rings are fusable into a single ring system; otherwise None. :type rings: list of set of int :rtype: set of in """ pending = list(rings) fused = set(pending.pop()) while pending: for i in range(len(pending)): ring = pending[i] if fused & ring: # rings are fused fused ^= ring del pending[i] break else: # None of the pending rings were fusable! return None return fused Hope this helps, Ivan On Wed, Dec 5, 2018 at 5:54 AM Baptiste CANAULT mailto:baptiste.cana...@gmail.com>> wrote: Hi RDKiters, I would like to identify all cycles present in a molecular structure. However, when the molecules correspond to bicyclic compounds, the ring count does not correspond to the number actually observed in the structure. Simple example: >>> m = Chem.MolFromSmiles('C1CC2CCC1O2') >>> r = m.GetRingInfo() >>> r.NumRings() 2 In reality, this molecular structure has 3 cycles with the cyclohexan. Am I completely wrong and is there a trick to identify all the cycles present in a structure? Thanks in advance, Best regards, Baptiste ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Count rings in bicyclic compounds
Hi Baptiste, RDKit focuses on "simple rings". As far as I know, it has no builtin function to return all possible cycles in a molecule. For a molecule with a "basis set" of N rings, there can be up to 2^N-1 ring systems, which can be obtained by taking all possible subsets (aka the powerset) of rings and fusing them. Below is an implementation based on fusing simple rings. Another possibility would be to write an exhaustive ring search (DFS or BFS) of the molecular graph and report all cycles that are found, instead of only the simple ones. *Warning*: do not run this code on fullerenes or similar molecules unless you are prepared to wait for a long, long time! def all_bond_rings(mol): """ Generate all ring systems for a molecule. A Ring is a set of bond indexes. :type mol: rdkit.Chem.Mol :rtype: set of int """ ring_info = mol.GetRingInfo() rings = [set(r) for r in ring_info.BondRings()] # Truncate nrings to the basis set size because RDKit returns redundant # rings (e.g., 6 instead of 5 for cubane). nfrags = len(Chem.GetMolFrags(mol)) nrings = mol.GetNumBonds() - mol.GetNumAtoms() + nfrags del rings[nrings:] for i in range(1, len(rings)+1): for comb in itertools.combinations(rings, i): fused = fuse(comb) if fused: yield fused def fuse(rings): """ Return the ring system that results from fusing the given rings, if the rings are fusable into a single ring system; otherwise None. :type rings: list of set of int :rtype: set of in """ pending = list(rings) fused = set(pending.pop()) while pending: for i in range(len(pending)): ring = pending[i] if fused & ring: # rings are fused fused ^= ring del pending[i] break else: # None of the pending rings were fusable! return None return fused Hope this helps, Ivan On Wed, Dec 5, 2018 at 5:54 AM Baptiste CANAULT wrote: > Hi RDKiters, > > I would like to identify all cycles present in a molecular > structure. However, when the molecules correspond to bicyclic compounds, > the ring count does not correspond to the number actually observed in the > structure. Simple example: > > >>> m = Chem.MolFromSmiles('C1CC2CCC1O2') > >>> r = m.GetRingInfo() > >>> r.NumRings() > 2 > > In reality, this molecular structure has 3 cycles with the cyclohexan. Am > I completely wrong and is there a trick to identify all the cycles present > in a structure? > > Thanks in advance, > > Best regards, > > Baptiste > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss