Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2023-08-25 Thread Greg Landrum
If you're willing to live with the RDKit's definition of bridgehead (see
below), then there is built-in functionality you can use:

from rdkit.Chem import rdqueries
qa = rdqueries.IsBridgeheadQueryAtom()
mol = Chem.MolFromSmiles('C1CC2CCC1C2')
mol.GetAtomsMatchingQuery(qa)


That last call returns a sequence with the matching atoms.

The RDKit bridgehead definition:
  // at least three ring bonds, all ring bonds in a ring which shares at
  // least two bonds with another ring involving this atom
 is definitely not perfect, primarily because of the use of the ring
systems, but it's the best that we were able to come up with while keeping
things efficient.
There's some discussion here https://github.com/rdkit/rdkit/pull/6061 and
in the linked issue.

-greg


On Fri, Aug 25, 2023 at 11:23 PM Wim Dehaen  wrote:

> greetings all,
> i have thought about the problem some more, and in the end came to the
> conclusion that looping through all rings really is necessary. In the gist
> below you can see the adjusted code, making use of Pat Walters' method
>  for finding
> all rings. Apologies for the code being messy.
> https://gist.github.com/dehaenw/41eb8e4c39c1158e88b36c6dfc2606d8
> fortunately, this one manages to also detect these difficult cases, see
> below:
> i did not check how fast it is, but i guess it will be a fair bit slower.
>
> best wishes,
> wim
>
> On Fri, Aug 25, 2023 at 8:28 PM Wim Dehaen  wrote:
>
>> Dear Andreas,
>> that's a good find. i agree the breaking case can be considered
>> bridgehead structure, as it's essentially bicyclo-[3.2.1]-octane plus an
>> extra bond. I need to think about this some more, but it might be related
>> to getting the ringinfo as SSSR instead of exhaustively. The best solution
>> may therefore be to just prune non ring atoms from the graph, enumerate all
>> rings and check it really exhaustively.
>> FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol =
>> Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on
>> this end.
>> best wishes
>> wim
>>
>> On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens <
>> andreas.lutt...@gmail.com> wrote:
>>
>>> Dear Wim,
>>>
>>> Thanks for your reply!
>>>
>>> Apologies for the delay, finally got time to pick up this project again.
>>>
>>> Your suggestion works great, though I have found some cases where it
>>> breaks. For instance the molecule:
>>>
>>> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3")
>>>
>>> It seems, in this case, a bridgehead atom is also a fused-ring atom.
>>> Maybe these looped compounds have too complex topology for this type of
>>> analysis.
>>>
>>> I don't see a straight way forward to identify just the bridgehead atoms.
>>>
>>> Best wishes,
>>> Andreas
>>>
>>> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen  wrote:
>>>
 Hi Andreas,
 I don't have a good SMARTS pattern available for this but here is a
 function that should return bridgehead idx and not include non bridgehead
 fused ring atoms:

 ```
 def return_bridgeheads_idx(mol):
 bh_list=[]
 intersections=[]
 sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
 for i,ring1 in enumerate(sssr_idx):
 for j,ring2 in enumerate(sssr_idx):
 if i>j:
 intersections+=[ring1.intersection(ring2)]
 for iidx in intersections:
 if len(iidx)>2: #condition for bridgehead
 for idx in iidx:
 neighbors = [a.GetIdx() for a in
 mol.GetAtomWithIdx(idx).GetNeighbors()]
 bh_list+=[idx for nidx in neighbors if nidx not in iidx]
 return tuple(set(bh_list))
 ```

 Here are 6 test molecules:

 ```
 mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
 mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
 mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
 mol4 = Chem.MolFromSmiles("C1CCC12C2")
 mol5 = Chem.MolFromSmiles("C1CC2C1C2")
 mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
 for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
 print(return_bridgeheads_idx(mol))
 ```

 giving the expected answer:

 (2, 5)
 (4, 7)
 (0, 5)
 ()
 ()
 ()

 hope this is helpful!

 best wishes
 wim

 On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens <
 andreas.lutt...@gmail.com> wrote:

> Dear users,
>
> I am trying to identify bridgehead atoms in multi-looped ring systems.
> The issue I have is that it can be sometimes difficult to distinguish 
> these
> atoms from ring-fusion atoms. The pattern I used (see below) looks for
> atoms that are part of three rings but cannot be bonded to an atom that
> also fits this description, in order to avoid ring-fusion atoms. The code
> works, except for cases where bridgehead atoms are bonded to a ring-fusion
> atom.
>
> *PASS:*
> pattern 

Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2023-08-25 Thread Wim Dehaen
greetings all,
i have thought about the problem some more, and in the end came to the
conclusion that looping through all rings really is necessary. In the gist
below you can see the adjusted code, making use of Pat Walters' method
 for finding all
rings. Apologies for the code being messy.
https://gist.github.com/dehaenw/41eb8e4c39c1158e88b36c6dfc2606d8
fortunately, this one manages to also detect these difficult cases, see
below:
i did not check how fast it is, but i guess it will be a fair bit slower.

best wishes,
wim

On Fri, Aug 25, 2023 at 8:28 PM Wim Dehaen  wrote:

> Dear Andreas,
> that's a good find. i agree the breaking case can be considered bridgehead
> structure, as it's essentially bicyclo-[3.2.1]-octane plus an extra bond. I
> need to think about this some more, but it might be related to getting the
> ringinfo as SSSR instead of exhaustively. The best solution may therefore
> be to just prune non ring atoms from the graph, enumerate all rings and
> check it really exhaustively.
> FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol =
> Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on
> this end.
> best wishes
> wim
>
> On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens 
> wrote:
>
>> Dear Wim,
>>
>> Thanks for your reply!
>>
>> Apologies for the delay, finally got time to pick up this project again.
>>
>> Your suggestion works great, though I have found some cases where it
>> breaks. For instance the molecule:
>>
>> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3")
>>
>> It seems, in this case, a bridgehead atom is also a fused-ring atom.
>> Maybe these looped compounds have too complex topology for this type of
>> analysis.
>>
>> I don't see a straight way forward to identify just the bridgehead atoms.
>>
>> Best wishes,
>> Andreas
>>
>> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen  wrote:
>>
>>> Hi Andreas,
>>> I don't have a good SMARTS pattern available for this but here is a
>>> function that should return bridgehead idx and not include non bridgehead
>>> fused ring atoms:
>>>
>>> ```
>>> def return_bridgeheads_idx(mol):
>>> bh_list=[]
>>> intersections=[]
>>> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
>>> for i,ring1 in enumerate(sssr_idx):
>>> for j,ring2 in enumerate(sssr_idx):
>>> if i>j:
>>> intersections+=[ring1.intersection(ring2)]
>>> for iidx in intersections:
>>> if len(iidx)>2: #condition for bridgehead
>>> for idx in iidx:
>>> neighbors = [a.GetIdx() for a in
>>> mol.GetAtomWithIdx(idx).GetNeighbors()]
>>> bh_list+=[idx for nidx in neighbors if nidx not in iidx]
>>> return tuple(set(bh_list))
>>> ```
>>>
>>> Here are 6 test molecules:
>>>
>>> ```
>>> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
>>> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>>> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
>>> mol4 = Chem.MolFromSmiles("C1CCC12C2")
>>> mol5 = Chem.MolFromSmiles("C1CC2C1C2")
>>> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
>>> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
>>> print(return_bridgeheads_idx(mol))
>>> ```
>>>
>>> giving the expected answer:
>>>
>>> (2, 5)
>>> (4, 7)
>>> (0, 5)
>>> ()
>>> ()
>>> ()
>>>
>>> hope this is helpful!
>>>
>>> best wishes
>>> wim
>>>
>>> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens <
>>> andreas.lutt...@gmail.com> wrote:
>>>
 Dear users,

 I am trying to identify bridgehead atoms in multi-looped ring systems.
 The issue I have is that it can be sometimes difficult to distinguish these
 atoms from ring-fusion atoms. The pattern I used (see below) looks for
 atoms that are part of three rings but cannot be bonded to an atom that
 also fits this description, in order to avoid ring-fusion atoms. The code
 works, except for cases where bridgehead atoms are bonded to a ring-fusion
 atom.

 *PASS:*
 pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
 rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2")
 print(rdkit_mol.GetSubstructMatches(pattern))
 >>>((2,),(5,))

 *FAIL:*
 pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
 rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
 print(rdkit_mol.GetSubstructMatches(pattern))
 >>>()

 Any hint on what alternative pattern I could use to isolate true
 bridgeheads would be greatly appreciated. Maybe other strategies are more
 suitable to find these atoms?

 Thanks in advance!

 Best regards,
 Andreas
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

>>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2023-08-25 Thread Wim Dehaen
Dear Andreas,
that's a good find. i agree the breaking case can be considered bridgehead
structure, as it's essentially bicyclo-[3.2.1]-octane plus an extra bond. I
need to think about this some more, but it might be related to getting the
ringinfo as SSSR instead of exhaustively. The best solution may therefore
be to just prune non ring atoms from the graph, enumerate all rings and
check it really exhaustively.
FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol =
Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on
this end.
best wishes
wim

On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens 
wrote:

> Dear Wim,
>
> Thanks for your reply!
>
> Apologies for the delay, finally got time to pick up this project again.
>
> Your suggestion works great, though I have found some cases where it
> breaks. For instance the molecule:
>
> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3")
>
> It seems, in this case, a bridgehead atom is also a fused-ring atom. Maybe
> these looped compounds have too complex topology for this type of analysis.
>
> I don't see a straight way forward to identify just the bridgehead atoms.
>
> Best wishes,
> Andreas
>
> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen  wrote:
>
>> Hi Andreas,
>> I don't have a good SMARTS pattern available for this but here is a
>> function that should return bridgehead idx and not include non bridgehead
>> fused ring atoms:
>>
>> ```
>> def return_bridgeheads_idx(mol):
>> bh_list=[]
>> intersections=[]
>> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
>> for i,ring1 in enumerate(sssr_idx):
>> for j,ring2 in enumerate(sssr_idx):
>> if i>j:
>> intersections+=[ring1.intersection(ring2)]
>> for iidx in intersections:
>> if len(iidx)>2: #condition for bridgehead
>> for idx in iidx:
>> neighbors = [a.GetIdx() for a in
>> mol.GetAtomWithIdx(idx).GetNeighbors()]
>> bh_list+=[idx for nidx in neighbors if nidx not in iidx]
>> return tuple(set(bh_list))
>> ```
>>
>> Here are 6 test molecules:
>>
>> ```
>> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
>> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
>> mol4 = Chem.MolFromSmiles("C1CCC12C2")
>> mol5 = Chem.MolFromSmiles("C1CC2C1C2")
>> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
>> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
>> print(return_bridgeheads_idx(mol))
>> ```
>>
>> giving the expected answer:
>>
>> (2, 5)
>> (4, 7)
>> (0, 5)
>> ()
>> ()
>> ()
>>
>> hope this is helpful!
>>
>> best wishes
>> wim
>>
>> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens 
>> wrote:
>>
>>> Dear users,
>>>
>>> I am trying to identify bridgehead atoms in multi-looped ring systems.
>>> The issue I have is that it can be sometimes difficult to distinguish these
>>> atoms from ring-fusion atoms. The pattern I used (see below) looks for
>>> atoms that are part of three rings but cannot be bonded to an atom that
>>> also fits this description, in order to avoid ring-fusion atoms. The code
>>> works, except for cases where bridgehead atoms are bonded to a ring-fusion
>>> atom.
>>>
>>> *PASS:*
>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>>> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2")
>>> print(rdkit_mol.GetSubstructMatches(pattern))
>>> >>>((2,),(5,))
>>>
>>> *FAIL:*
>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>>> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>>> print(rdkit_mol.GetSubstructMatches(pattern))
>>> >>>()
>>>
>>> Any hint on what alternative pattern I could use to isolate true
>>> bridgeheads would be greatly appreciated. Maybe other strategies are more
>>> suitable to find these atoms?
>>>
>>> Thanks in advance!
>>>
>>> Best regards,
>>> Andreas
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2023-08-25 Thread S Joshua Swamidass
Perhaps using ring perception instead would work better?

On Fri, Aug 25, 2023 at 10:22 AM Andreas Luttens 
wrote:

> Dear Wim,
>
> Thanks for your reply!
>
> Apologies for the delay, finally got time to pick up this project again.
>
> Your suggestion works great, though I have found some cases where it
> breaks. For instance the molecule:
>
> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3")
>
> It seems, in this case, a bridgehead atom is also a fused-ring atom. Maybe
> these looped compounds have too complex topology for this type of analysis.
>
> I don't see a straight way forward to identify just the bridgehead atoms.
>
> Best wishes,
> Andreas
>
> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen  wrote:
>
>> Hi Andreas,
>> I don't have a good SMARTS pattern available for this but here is a
>> function that should return bridgehead idx and not include non bridgehead
>> fused ring atoms:
>>
>> ```
>> def return_bridgeheads_idx(mol):
>> bh_list=[]
>> intersections=[]
>> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
>> for i,ring1 in enumerate(sssr_idx):
>> for j,ring2 in enumerate(sssr_idx):
>> if i>j:
>> intersections+=[ring1.intersection(ring2)]
>> for iidx in intersections:
>> if len(iidx)>2: #condition for bridgehead
>> for idx in iidx:
>> neighbors = [a.GetIdx() for a in
>> mol.GetAtomWithIdx(idx).GetNeighbors()]
>> bh_list+=[idx for nidx in neighbors if nidx not in iidx]
>> return tuple(set(bh_list))
>> ```
>>
>> Here are 6 test molecules:
>>
>> ```
>> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
>> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
>> mol4 = Chem.MolFromSmiles("C1CCC12C2")
>> mol5 = Chem.MolFromSmiles("C1CC2C1C2")
>> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
>> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
>> print(return_bridgeheads_idx(mol))
>> ```
>>
>> giving the expected answer:
>>
>> (2, 5)
>> (4, 7)
>> (0, 5)
>> ()
>> ()
>> ()
>>
>> hope this is helpful!
>>
>> best wishes
>> wim
>>
>> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens 
>> wrote:
>>
>>> Dear users,
>>>
>>> I am trying to identify bridgehead atoms in multi-looped ring systems.
>>> The issue I have is that it can be sometimes difficult to distinguish these
>>> atoms from ring-fusion atoms. The pattern I used (see below) looks for
>>> atoms that are part of three rings but cannot be bonded to an atom that
>>> also fits this description, in order to avoid ring-fusion atoms. The code
>>> works, except for cases where bridgehead atoms are bonded to a ring-fusion
>>> atom.
>>>
>>> *PASS:*
>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>>> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2")
>>> print(rdkit_mol.GetSubstructMatches(pattern))
>>> >>>((2,),(5,))
>>>
>>> *FAIL:*
>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>>> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>>> print(rdkit_mol.GetSubstructMatches(pattern))
>>> >>>()
>>>
>>> Any hint on what alternative pattern I could use to isolate true
>>> bridgeheads would be greatly appreciated. Maybe other strategies are more
>>> suitable to find these atoms?
>>>
>>> Thanks in advance!
>>>
>>> Best regards,
>>> Andreas
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
-- 
Sent from Gmail Mobile
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2023-08-25 Thread Andreas Luttens
Dear Wim,

Thanks for your reply!

Apologies for the delay, finally got time to pick up this project again.

Your suggestion works great, though I have found some cases where it
breaks. For instance the molecule:

mol = Chem.MolFromSmiles("C1CC2C3C2C1C3")

It seems, in this case, a bridgehead atom is also a fused-ring atom. Maybe
these looped compounds have too complex topology for this type of analysis.

I don't see a straight way forward to identify just the bridgehead atoms.

Best wishes,
Andreas

On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen  wrote:

> Hi Andreas,
> I don't have a good SMARTS pattern available for this but here is a
> function that should return bridgehead idx and not include non bridgehead
> fused ring atoms:
>
> ```
> def return_bridgeheads_idx(mol):
> bh_list=[]
> intersections=[]
> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
> for i,ring1 in enumerate(sssr_idx):
> for j,ring2 in enumerate(sssr_idx):
> if i>j:
> intersections+=[ring1.intersection(ring2)]
> for iidx in intersections:
> if len(iidx)>2: #condition for bridgehead
> for idx in iidx:
> neighbors = [a.GetIdx() for a in
> mol.GetAtomWithIdx(idx).GetNeighbors()]
> bh_list+=[idx for nidx in neighbors if nidx not in iidx]
> return tuple(set(bh_list))
> ```
>
> Here are 6 test molecules:
>
> ```
> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
> mol4 = Chem.MolFromSmiles("C1CCC12C2")
> mol5 = Chem.MolFromSmiles("C1CC2C1C2")
> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
> print(return_bridgeheads_idx(mol))
> ```
>
> giving the expected answer:
>
> (2, 5)
> (4, 7)
> (0, 5)
> ()
> ()
> ()
>
> hope this is helpful!
>
> best wishes
> wim
>
> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens 
> wrote:
>
>> Dear users,
>>
>> I am trying to identify bridgehead atoms in multi-looped ring systems.
>> The issue I have is that it can be sometimes difficult to distinguish these
>> atoms from ring-fusion atoms. The pattern I used (see below) looks for
>> atoms that are part of three rings but cannot be bonded to an atom that
>> also fits this description, in order to avoid ring-fusion atoms. The code
>> works, except for cases where bridgehead atoms are bonded to a ring-fusion
>> atom.
>>
>> *PASS:*
>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2")
>> print(rdkit_mol.GetSubstructMatches(pattern))
>> >>>((2,),(5,))
>>
>> *FAIL:*
>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>> print(rdkit_mol.GetSubstructMatches(pattern))
>> >>>()
>>
>> Any hint on what alternative pattern I could use to isolate true
>> bridgeheads would be greatly appreciated. Maybe other strategies are more
>> suitable to find these atoms?
>>
>> Thanks in advance!
>>
>> Best regards,
>> Andreas
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2022-12-03 Thread Wim Dehaen
Hi Andreas,
I don't have a good SMARTS pattern available for this but here is a
function that should return bridgehead idx and not include non bridgehead
fused ring atoms:

```
def return_bridgeheads_idx(mol):
bh_list=[]
intersections=[]
sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
for i,ring1 in enumerate(sssr_idx):
for j,ring2 in enumerate(sssr_idx):
if i>j:
intersections+=[ring1.intersection(ring2)]
for iidx in intersections:
if len(iidx)>2: #condition for bridgehead
for idx in iidx:
neighbors = [a.GetIdx() for a in
mol.GetAtomWithIdx(idx).GetNeighbors()]
bh_list+=[idx for nidx in neighbors if nidx not in iidx]
return tuple(set(bh_list))
```

Here are 6 test molecules:

```
mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
mol4 = Chem.MolFromSmiles("C1CCC12C2")
mol5 = Chem.MolFromSmiles("C1CC2C1C2")
mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
print(return_bridgeheads_idx(mol))
```

giving the expected answer:

(2, 5)
(4, 7)
(0, 5)
()
()
()

hope this is helpful!

best wishes
wim

On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens 
wrote:

> Dear users,
>
> I am trying to identify bridgehead atoms in multi-looped ring systems. The
> issue I have is that it can be sometimes difficult to distinguish these
> atoms from ring-fusion atoms. The pattern I used (see below) looks for
> atoms that are part of three rings but cannot be bonded to an atom that
> also fits this description, in order to avoid ring-fusion atoms. The code
> works, except for cases where bridgehead atoms are bonded to a ring-fusion
> atom.
>
> *PASS:*
> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2")
> print(rdkit_mol.GetSubstructMatches(pattern))
> >>>((2,),(5,))
>
> *FAIL:*
> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
> print(rdkit_mol.GetSubstructMatches(pattern))
> >>>()
>
> Any hint on what alternative pattern I could use to isolate true
> bridgeheads would be greatly appreciated. Maybe other strategies are more
> suitable to find these atoms?
>
> Thanks in advance!
>
> Best regards,
> Andreas
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2022-12-02 Thread Andreas Luttens
Dear users,

I am trying to identify bridgehead atoms in multi-looped ring systems. The
issue I have is that it can be sometimes difficult to distinguish these
atoms from ring-fusion atoms. The pattern I used (see below) looks for
atoms that are part of three rings but cannot be bonded to an atom that
also fits this description, in order to avoid ring-fusion atoms. The code
works, except for cases where bridgehead atoms are bonded to a ring-fusion
atom.

*PASS:*
pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2")
print(rdkit_mol.GetSubstructMatches(pattern))
>>>((2,),(5,))

*FAIL:*
pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
print(rdkit_mol.GetSubstructMatches(pattern))
>>>()

Any hint on what alternative pattern I could use to isolate true
bridgeheads would be greatly appreciated. Maybe other strategies are more
suitable to find these atoms?

Thanks in advance!

Best regards,
Andreas
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss