Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers

2024-02-05 Thread Wim Dehaen
hi lewis,
if i am not mistaken this is because the tautomer transfor "1,3 aromatic
heteroatom H shift" does not account for other chalcogens than oxygen, so
no selenium, tellurium or sulfur.
you can find the list of transforms here:
https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
(poiting to the line with the relevant transform).
best wishes
wim

On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin 
wrote:

> Hi all,
> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
> by Weider et al* at:
>
> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>
> This dataset has pairs of tautomers with experimental logK values to
> determine the preferred tautomer.
>
> In at least one case, depending on which tautomer you use as the 'entry'
> point, the enumerated tautomers by RDKit either do or don't include both of
> the pair of input molecules. *I'm hoping there's a way to uniquely
> recover the full set of possible tautomers from using any input tautomer. *
>
> Here's a code example:
>
> from rdkit import Chem
>>
> from rdkit.Chem import Draw
>
> from rdkit.Chem.Draw import IPythonConsole
>> IPythonConsole.drawOptions.addStereoAnnotation = True
>> from rdkit.Chem.MolStandardize import rdMolStandardize
>>
>> #same result if you don't do any of these params.
>
> tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters()
>> tautomer_params.tautomerRemoveSp3Stereo = False
>> tautomer_params.tautomerRemoveBondStereo = False
>> tautomer_params.tautomerRemoveIsotopicHs = False
>> tautomer_params.tautomerReassignStereo = False
>> tautomer_params.doCanonical = True
>>
>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)
>>
>> smi1 = 'Sc1cc2c2cn1'
>> smi2 = 'S=c1cc2c2c[nH]1'
>> mol1 = Chem.MolFromSmiles(smi1)
>> mol2 = Chem.MolFromSmiles(smi2)
>>
>> #choose mol1 or mol2 to be source of tautomers:
>> #choose mol1, and look at the tautomers. Note that mol2 isn't present!
>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
>> enumerator.Enumerate(mol1)]
>>
>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
>>  molsPerRow=4)
>>
>
> And a picture of this in a notebook for an at-a-glance view:
> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>
> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>
> Thank you!
> Lewis
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Aromatic atoms

2023-11-05 Thread Wim Dehaen
how about:
len(list(mol.GetAromaticAtoms()))

best wishes
wim


On Sun, 5 Nov 2023, 08:41 Chris Swain via Rdkit-discuss, <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hi,
>
> Perhaps I’m missing something obvious, but is there a way to calculate the
> number of aromatic atoms in a molecule?
>
> Cheers
>
> Chris
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mol properties in SDWriter

2023-09-25 Thread Wim Dehaen
Why there is a counter between parentheses there, I don't know, but in case
there's no option to remove it, you might just manually remove it using a
regex to remove anything between parentheses on a line that starts with >
for example:

from rdkit import Chem
import re
from io import StringIO
m = Chem.MolFromSmiles("CCC")
m.SetProp("pKa","3.3")
sio = StringIO()
with Chem.SDWriter(sio) as o:
o.write(m)
sio.seek(0)
with open("temp3.sdf", "w") as f:
for line in sio.readlines():
f.write(re.sub(r'^>(.*?)\((.*?)\)', r'>\1', line))

best wishes
wim

On Tue, Sep 26, 2023 at 1:20 AM Ling Chan  wrote:

> Dear Colleagues,
>
> I noticed that when writing out molecules using SDWriter() , the
> properties fields are followed by something like "(1)" , "(2)". I mean, the
> sdf looks like:
>
> propane
>  RDKit  3D
>
>   3  2  0  0  0  0  0  0  0  0999 V2000
> 0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.42800.0. C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.90401.3000   -0.3480 C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0
>   2  3  1  0
> M  END
> >(1)
> 4.099
>
> >(1)
> 2
>
> 
>
> Just wonder what was the rationale behind this extra "(1)" on the property
> field lines (pKa and logP in the above example)?
>
> And is there a way to get rid of these? I am not sure if this extra "(1)"
> is part of the standard sd format.
>
> Thank you!
>
> Regards,
> Ling
>
>
> ---
>
> To create an sdf, you can do something like:
>
> >>> from rdkit import Chem
> >>> m = Chem.MolFromSmiles("CCC")
> >>> m.SetProp("pKa","3.3")
> >>> with Chem.SDWriter("temp3.sdf") as o:
> ...   o.write(m)
>
> Or use Chem.SDMolSupplier() to get mols from another sdf.
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2023-08-25 Thread Wim Dehaen
greetings all,
i have thought about the problem some more, and in the end came to the
conclusion that looping through all rings really is necessary. In the gist
below you can see the adjusted code, making use of Pat Walters' method
<https://sourceforge.net/p/rdkit/mailman/message/30387811/> for finding all
rings. Apologies for the code being messy.
https://gist.github.com/dehaenw/41eb8e4c39c1158e88b36c6dfc2606d8
fortunately, this one manages to also detect these difficult cases, see
below:
i did not check how fast it is, but i guess it will be a fair bit slower.

best wishes,
wim

On Fri, Aug 25, 2023 at 8:28 PM Wim Dehaen  wrote:

> Dear Andreas,
> that's a good find. i agree the breaking case can be considered bridgehead
> structure, as it's essentially bicyclo-[3.2.1]-octane plus an extra bond. I
> need to think about this some more, but it might be related to getting the
> ringinfo as SSSR instead of exhaustively. The best solution may therefore
> be to just prune non ring atoms from the graph, enumerate all rings and
> check it really exhaustively.
> FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol =
> Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on
> this end.
> best wishes
> wim
>
> On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens 
> wrote:
>
>> Dear Wim,
>>
>> Thanks for your reply!
>>
>> Apologies for the delay, finally got time to pick up this project again.
>>
>> Your suggestion works great, though I have found some cases where it
>> breaks. For instance the molecule:
>>
>> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3")
>>
>> It seems, in this case, a bridgehead atom is also a fused-ring atom.
>> Maybe these looped compounds have too complex topology for this type of
>> analysis.
>>
>> I don't see a straight way forward to identify just the bridgehead atoms.
>>
>> Best wishes,
>> Andreas
>>
>> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen  wrote:
>>
>>> Hi Andreas,
>>> I don't have a good SMARTS pattern available for this but here is a
>>> function that should return bridgehead idx and not include non bridgehead
>>> fused ring atoms:
>>>
>>> ```
>>> def return_bridgeheads_idx(mol):
>>> bh_list=[]
>>> intersections=[]
>>> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
>>> for i,ring1 in enumerate(sssr_idx):
>>> for j,ring2 in enumerate(sssr_idx):
>>> if i>j:
>>> intersections+=[ring1.intersection(ring2)]
>>> for iidx in intersections:
>>> if len(iidx)>2: #condition for bridgehead
>>> for idx in iidx:
>>> neighbors = [a.GetIdx() for a in
>>> mol.GetAtomWithIdx(idx).GetNeighbors()]
>>> bh_list+=[idx for nidx in neighbors if nidx not in iidx]
>>> return tuple(set(bh_list))
>>> ```
>>>
>>> Here are 6 test molecules:
>>>
>>> ```
>>> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
>>> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>>> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
>>> mol4 = Chem.MolFromSmiles("C1CCC12C2")
>>> mol5 = Chem.MolFromSmiles("C1CC2C1C2")
>>> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
>>> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
>>> print(return_bridgeheads_idx(mol))
>>> ```
>>>
>>> giving the expected answer:
>>>
>>> (2, 5)
>>> (4, 7)
>>> (0, 5)
>>> ()
>>> ()
>>> ()
>>>
>>> hope this is helpful!
>>>
>>> best wishes
>>> wim
>>>
>>> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens <
>>> andreas.lutt...@gmail.com> wrote:
>>>
>>>> Dear users,
>>>>
>>>> I am trying to identify bridgehead atoms in multi-looped ring systems.
>>>> The issue I have is that it can be sometimes difficult to distinguish these
>>>> atoms from ring-fusion atoms. The pattern I used (see below) looks for
>>>> atoms that are part of three rings but cannot be bonded to an atom that
>>>> also fits this description, in order to avoid ring-fusion atoms. The code
>>>> works, except for cases where bridgehead atoms are bonded to a ring-fusion
>>>> atom.
>>>>
>>>> *PASS:*
>>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>>>> rdkit_

Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2023-08-25 Thread Wim Dehaen
Dear Andreas,
that's a good find. i agree the breaking case can be considered bridgehead
structure, as it's essentially bicyclo-[3.2.1]-octane plus an extra bond. I
need to think about this some more, but it might be related to getting the
ringinfo as SSSR instead of exhaustively. The best solution may therefore
be to just prune non ring atoms from the graph, enumerate all rings and
check it really exhaustively.
FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol =
Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on
this end.
best wishes
wim

On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens 
wrote:

> Dear Wim,
>
> Thanks for your reply!
>
> Apologies for the delay, finally got time to pick up this project again.
>
> Your suggestion works great, though I have found some cases where it
> breaks. For instance the molecule:
>
> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3")
>
> It seems, in this case, a bridgehead atom is also a fused-ring atom. Maybe
> these looped compounds have too complex topology for this type of analysis.
>
> I don't see a straight way forward to identify just the bridgehead atoms.
>
> Best wishes,
> Andreas
>
> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen  wrote:
>
>> Hi Andreas,
>> I don't have a good SMARTS pattern available for this but here is a
>> function that should return bridgehead idx and not include non bridgehead
>> fused ring atoms:
>>
>> ```
>> def return_bridgeheads_idx(mol):
>> bh_list=[]
>> intersections=[]
>> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
>> for i,ring1 in enumerate(sssr_idx):
>> for j,ring2 in enumerate(sssr_idx):
>> if i>j:
>> intersections+=[ring1.intersection(ring2)]
>> for iidx in intersections:
>> if len(iidx)>2: #condition for bridgehead
>> for idx in iidx:
>> neighbors = [a.GetIdx() for a in
>> mol.GetAtomWithIdx(idx).GetNeighbors()]
>> bh_list+=[idx for nidx in neighbors if nidx not in iidx]
>> return tuple(set(bh_list))
>> ```
>>
>> Here are 6 test molecules:
>>
>> ```
>> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
>> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
>> mol4 = Chem.MolFromSmiles("C1CCC12C2")
>> mol5 = Chem.MolFromSmiles("C1CC2C1C2")
>> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
>> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
>> print(return_bridgeheads_idx(mol))
>> ```
>>
>> giving the expected answer:
>>
>> (2, 5)
>> (4, 7)
>> (0, 5)
>> ()
>> ()
>> ()
>>
>> hope this is helpful!
>>
>> best wishes
>> wim
>>
>> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens 
>> wrote:
>>
>>> Dear users,
>>>
>>> I am trying to identify bridgehead atoms in multi-looped ring systems.
>>> The issue I have is that it can be sometimes difficult to distinguish these
>>> atoms from ring-fusion atoms. The pattern I used (see below) looks for
>>> atoms that are part of three rings but cannot be bonded to an atom that
>>> also fits this description, in order to avoid ring-fusion atoms. The code
>>> works, except for cases where bridgehead atoms are bonded to a ring-fusion
>>> atom.
>>>
>>> *PASS:*
>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>>> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2")
>>> print(rdkit_mol.GetSubstructMatches(pattern))
>>> >>>((2,),(5,))
>>>
>>> *FAIL:*
>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
>>> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
>>> print(rdkit_mol.GetSubstructMatches(pattern))
>>> >>>()
>>>
>>> Any hint on what alternative pattern I could use to isolate true
>>> bridgeheads would be greatly appreciated. Maybe other strategies are more
>>> suitable to find these atoms?
>>>
>>> Thanks in advance!
>>>
>>> Best regards,
>>> Andreas
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Two functional groups in a single SMARTS pattern

2023-08-24 Thread Wim Dehaen
Dear Andreas,
the issue is with your aldehyde/ketone smarts. it looks for an explicit
aldehyde H that is not there. When the input smi is NCC(=O)C the
substructure matches.

An alternative smarts you can use that will match aldehyde but not esters
and amides:
[#7H2].[#6][C;!$(C-O);!$(C-N)](=[O])

best wishes
wim

On Thu, Aug 24, 2023, 2:09 PM Andreas Luttens 
wrote:

> Dear community,
>
> I'm looking for a way to identify molecules that have two functional
> groups, for instance a primary amine and a ketone.
>
> How can I construct a single SMARTS pattern that encompasses two
> functional groups?
>
> I've tried the following pattern:
> [#7H2].[#6][#6](=[O])[#6,#1]
>
> This should ideally hit compounds that have both a primary amine and a
> ketone/aldehyde. For instance, the molecule:
> mol = Chem.MolFromSmiles("NCC(=O)")
>
> However, I get no matches using this SMARTS pattern. Is there anyway to
> enforce the identification with a single pattern?
>
> Thanks in advance!
>
> Best regards,
> Andreas
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Varying ring size substructure match.

2023-08-20 Thread Wim Dehaen
Hi,
i'm not sure if i understand the question perfectly, so apologies if the
below is behind the point. i think in general, for analysis like this it is
better to make use of rdkit's SSSR functionality and then use the ring
information in the way required for your purpose. this tends to be much
more flexible and natural.

however, here is a smarts pattern that matches both naphthalene and
azulene, as both of them are aromatic and are are a ten-membered ring plus
a single additional closure
```patt=Chem.MolFromSmarts("c1c1")
n=Chem.MolFromSmiles("c12c12")
a=Chem.MolFromSmiles("c121c2")
print(n.HasSubstructMatch(patt),a.HasSubstructMatch(patt))```
this results in True True

another way is to explicitly enumerate the possible ringsizes you are
willing to consider in the SMARTS:
[r3,r4,r5,r6,r7,r8,r9,r10]1[r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10][r3,r4,r5,r6,r7,r8,r9,r10]1
as you can see this is much more ugly, but it's able to capture cases such
as c1cc2c1c3c2c1c3cc1.

best wishes
wim

On Sun, Aug 20, 2023 at 5:34 PM Eduardo Mayo 
wrote:

> Hello,
>
> I hope you are all doing well. I'm looking for a smart pattern that can
> match rings of different sizes at the same time. The intention is to match
> something like naphthalene and azulene with the same pattern. Is that
> possible?
>
> Best,
> Eduardo
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Multiple products with runReactants in C++

2023-08-06 Thread Wim Dehaen
You need to run the reaction twice (or more generally you can rerun
runreactants on the products iteratively until their are no more new
products, so examples like phloroglucinol to 1,3,5-trifluorobenzene will
work too.)
best wishes
wim

On Sun, Aug 6, 2023 at 4:50 PM Andreas Luttens 
wrote:

> Dear RDKit community,
>
> I'm trying to convert phenolic hydroxyls into fluorines with runReactants.
> In my example below, I attempt turning resorcinol into 3-fluorophenol and
> 1,3-difluorobenzene.
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int main()
> {
> RDKit::ChemicalReaction *rxn = RDKit::RxnSmartsToChemicalReaction("
> [c:1][OH]>>[c:1][F]");
> rxn->initReactantMatchers();
>
> RDKit::ROMol *mol_to_convert(RDKit::SmilesToMol("c1cc(cc(c1)O)O"));
>
> RDKit::MOL_SPTR_VECT reacts;
>
> reacts.push_back(RDKit::ROMOL_SPTR(mol_to_convert));
>
> std::vector converted_molecules;
>
> converted_molecules = rxn->runReactants(reacts,1);
>
> for (const auto& product_vector : converted_molecules)
> {
> for (const auto& product : product_vector)
> {
> std::string converted_smiles = RDKit::MolToSmiles(*product);
> std::cout << "Converted smiles: " << converted_smiles << std::endl;
> }
> }
> return 0;
> }
>
> No matter what *unsigned int* I choose for the number of products, I get
> a single molecule (3-fluorophenol) as output. Resorcinol has two hydroxyls
> that can turn into fluorines, why is 1,3-difluorobenzene not listed as a
> product?
>
> Thanks in advance!
>
> Best regards,
> Andreas
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question of substructure "neighborhood"

2023-06-30 Thread Wim Dehaen
Hi Joey,
I think the most straightforward way to do this is to use GetNeighbors() on
all atoms. See below for an example:

from rdkit import Chem
mol=Chem.MolFromSmiles("O1COc2c1ccc(CC(NC)C)c2")
substruct=Chem.MolFromSmarts("c1c1")
a=mol.GetSubstructMatch(substruct)
print("substructure benzene can be found at atoms with idx",a)
extended_idx=set([])
for idx in a:
for n in mol.GetAtomWithIdx(idx).GetNeighbors():
nidx=n.GetIdx()
if nidx not in a and nidx not in extended_idx:
extended_idx.add(nidx)
print("neighboring atoms of substructure benzene are at idx",extended_idx)
print("idx of atoms in entire extended substructure
are",extended_idx.union(a))

if you'd like to include hydrogens in the extending of the substructure as
well you can add explicit hydrogens using mol=Chem.AddHs(mol)

best wishes,
wim

On Fri, Jun 30, 2023 at 11:07 PM Storer, Joey (J) via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Dear RDKit experts,
>
>
>
> Substructure search is working well these days.  RDKit is wonderful.
>
>
>
> For subsequent QM calcs., I would like to get the “next atom over” or the
> “one-atom-neighborhood” surrounding a substructure.
>
>
>
> The result would be something bigger than the original substructure with
> open valence capped by Hydrogen.
>
>
>
> Thanks for your thoughts,
>
> Joey Storer
>
> Dow Inc.
>
> General Business
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] REACTIVE SMARTS yields product with explicit valence greater than 5 for carbon

2023-05-30 Thread Wim Dehaen
I think the problem is that you specify exact, explicit hydrogen count on
the product end of your reaction SMARTS. Instead you can use this
(rxnSMARTS bolded):

rxn=AllChem.ReactionFromSmarts("
*[F][Zr@]([Cl])([Br])[CH2:1][C:2]([C:3])[C!H3:4].[CH2:5]=[CH1:6][C:7]>>[F][Zr@@]([Cl])([Br])[*:5][*@:6]([*:7])[*:1][*:2]([*:3])[*:4]*
")
products = rxn.RunReactants((Chem.MolFromSmiles("CCC[C@@H](C)C[Zr@
@](F)(Cl)Br"),Chem.MolFromSmiles("C=CC")))
display(products[0][0])
print(Chem.MolToSmiles(products[0][0]))

this gives

CCC[C@@H](C)C[C@@H](C)C[Zr@](F)(Cl)Br

which is a parsable valid SMILES.

The rxnSMARTS used above is also altered to give the (I think) intended
stereochemistry behavior: retain whatever configuration is in starting
product 1 and the newly introduce stereocenter (carbon 6) always give a
predefined configuration.

Some clarifying explanations of how reaction SMARTS are handled in RDkit
can be found in the rdkit book here.
https://www.rdkit.org/docs/RDKit_Book.html#chemical-reaction-handling

best wishes
wim


On Tue, May 30, 2023 at 8:04 PM Jarod Younker 
wrote:

> The following REACTIVE SMARTS:
>
> [F][Zr@]([Cl])([Br])[CH2:1][C@H:2]([C:3])[C!H3:4].[CH2:5]=[CH1:6][C:7]
> >>[F][Zr@@]([Cl])([Br])[CH2:5][C@H:6]([C:7])[CH2:1][C@H:2]([C:3])[C!H3:4]
>
> When reacting on the following two SMILES strings:
>
> CCC[C@@H](C)C[Zr@@](F)(Cl)Br and C=CC
>
> Gives the following SMILES product which cannot be sanitized ("explicit
> valence for atom # 10 C, 5, is greater than permitted"):
>
> CC[CH3][C@@H](C)C[C@@H](C)C[Zr@](F)(Cl)Br
>
> I don’t understand why?  Could it be a bug?  If not, what do I have wrong?
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Virtual hydrogens for metals (smiles and smarts)

2023-05-18 Thread Wim Dehaen
[Zr;H1] this smarts pattern should match an Zirconium with hcount of
exactly one. see below for a demonstration:

m=Chem.MolFromSmiles("[ZrH]CC")
pat=Chem.MolFromSmarts("[Zr;H1]")
len(m.GetSubstructMatches(pat))

1

hope this helps,
wim


On Fri, May 19, 2023 at 12:33 AM Jarod Younker 
wrote:

> I’ve got the following two smiles strings:
>
> [Zr]C
> [ZrH]
>
> The smarts string [Zr][CH3] matches [Zr]C. What’s the smarts for [ZrH]?
> I’ve tried [Zr][H].  It does not match.
>
> Sent from my iPhone
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Wim Dehaen
Hi,
I think if you simply need H and the H count appended it is by far the
easiest by just appending it to the symbol string. See the codeblock below:

def get_symbol_with_Hs(a):
symbol=a.GetSymbol()
charge=a.GetFormalCharge()
hcount=a.GetTotalNumHs()
if hcount > 0:
symbol+="H"
if hcount > 1:
symbol+=str(hcount)
if charge==1:
symbol+="+"
if charge==-1:
symbol+="-"
if charge > 1:
symbol+=f"(+{charge})"
if charge < -1:
symbol+=f"(-{charge})"
return symbol

mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
atom.SetProp('molAtomMapNumber',str(i))
print(i,get_symbol_with_Hs(atom))

-
another way I would recommend is using smiles and explicit hydrogens (i.e.
bracketed) instead. For your use case I would imagine this as follows:
from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
mol=Chem.AddHs(mol)
rwmol=Chem.RWMol(mol)
for b in list(rwmol.GetBonds()):
ba=b.GetBeginAtom()
ea=b.GetEndAtom()
if ba.GetAtomicNum()!=1 and ea.GetAtomicNum()!=1:
rwmol.RemoveBond(ba.GetIdx(),ea.GetIdx())
frags=Chem.GetMolFrags(rwmol, asMols=True,sanitizeFrags=False)
for i,f in enumerate(frags):
print(i,Chem.MolToSmiles(f))

this would output

0 [H]c
1 [H]c
2 [H]c
3 [H]c
4 [H]c
5 c
6 C
7 [H]N[H]
8 O


i hope that helps.

best wishes
wim

On Tue, May 9, 2023 at 7:58 AM Haijun Feng  wrote:

>
> 
>
> Hi All,
>
> I am trying to add atom numbers in smiles as belows,
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
>   atom.SetProp('molAtomMapNumber',str(i))
> smi=Chem.MolToSmiles(mol)
> print(smi)
>
> the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]
>
> then I want to split the smiles into atoms, I did it like this:
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
>   atom.SetProp('molAtomMapNumber',str(i))
>   print(i,atom.GetSymbol())
>
> the output is:
>
> 0 C
> 1 C
> 2 C
> 3 C
> 4 C
> 5 C
> 6 C
> 7 N
> 8 O
>
> *But what I do want is something like this (with fragments instead of
> atoms): *
>
>
>
>
>
>
> *0 cH1 CH...7 NH28 O  *
>
> Can anyone help me figure out how to get each atom with H from the smiles
> as above. Thanks so much!
>
>
> best,
>
>
> Hal
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molfile from smiles

2023-05-02 Thread Wim Dehaen
Hi all,
unfortunately I can't offer a "fix" but I can offer these minor comments:
-it seems like the SMILES has some parsing error. You can make uses of
RDKits extension for dative bonds in SMILES ("->") and replace the SMILES
with the below, which will parse, and give (what i assume is) the intended
structure:
"C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1"
-more fundamentally, i think the reason this molecule is hard to render is
because, as a hexavalent iridium complex it is more fundamentally
3-dimensional and therefore tougher to sketch. you can see here on
wikipedia Ir(ppy)3 even when manually sketched looks a bit funny:
https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png
-in general, organometallic species have various limitations when it comes
to their handling by cheminformatics packages. for this reason, some care
is needed when dealing with species like this to make sure you won't have
issues down the line. an overview of some rdkit related ones see this
presentation by prof jan jensen:
https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf

Finally, if i embed the molecule and then display its 2D projection, it
actually looks pretty good (despite a warning UFF doesnt recognize
iridium). See below:
[image: image.png]
This was generated using the following codeblock (in Python, not C++, sorry
for that):

mol =
Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol,randomSeed=0xf00d)
mol = Chem.RemoveHs(mol)
display(mol)

best wishes
wim

On Tue, May 2, 2023 at 5:06 PM Santiago Fraga 
wrote:

> Thanks for your answer, Ling Chan.
> But I am already using that option with the C++ API.
>
> Regards
> Santiago
>
> 
>
> SANTIAGO FRAGA
> *Software Developer*
> santi...@mestrelab.com <+santi...@mestrelab.com>
>
> *MESTRELAB RESEARCH S.L.*
> PHONE *+34881976775*
> FAX *+34981941079*
> Feliciano Barrera, 9B-Bajo 15706
> Santiago de Compostela (SPAIN)
>
> Follow us:
> [image: Mestrelab Twitter]   [image:
> Mestrelab Linkedin] 
>  [image: Canal de YouTube Mestrelab]
>   [image:
> MestreBlog] 
>
>
>
> --
> *De:* Ling Chan 
> *Enviado:* martes, 2 de mayo de 2023 4:15
> *Para:* Santiago Fraga 
> *Cc:* RDKit Discuss 
> *Asunto:* Re: [Rdkit-discuss] Molfile from smiles
>
> Hello Santiago,
>
> In case you are still looking for an answer, somewhere in my notes I wrote
> the following.
>
> to get a better depiction of complicated topology, do this before
> rendering.
> from rdkit.Chem import rdDepictor
> rdDepictor.SetPreferCoordGen(True)
>
> Sometimes it helps. Good luck.
>
> Ling
>
>
>
> Santiago Fraga  於 2023年4月21日週五 上午2:17寫道:
>
> Good morning
>
>   I am trying to generate a molfile from smiles, using the RDKit
> C++ implementation.
>   But in some cases the result molfile is like the one in the
> attached image.
>
>   My code is something like this:
>
> string molecule =
> "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1";
> RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr);
> mol->updatePropertyCache(false);
> RDDepict::preferCoordGen = true;
> RDDepict::compute2DCoords(*mol);
> string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true)
>
>
>How could I fix the molfile?
>
> Regards
> Santiago
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Unwanted explicit Hs

2023-04-29 Thread Wim Dehaen
THe reason for this is that it will prevent ambiguities due to nonstandard,
higher valences. Because of this, it is not possible to infer the implicit
hydrogen count, so it must be specified explicitly. For S and P the
standard valence would be 2 and 3 respectively, just like for O and N. But
S has nonstandard valences available: 4 and 6 as in sulfones and
sulfoxides. P can commonly have valence of 5, as in phosphoranes.
Your provided SMILES has a valence of at least 3, exceeding the standard
valence of 2. This creates and ambiguity, where the SMILES parser has to
decide whether the S has a valence of 4 or 6. Likewise, with the SMILES
"FP(F)(F)F" a roundtrip through rdkit will convert this into
"F[PH](F)(F)F", this means the notation is consistent with F[PH2](F)F and
distinguishable from FP(F)F. In general when higher valence states are not
possible rdkit will throw a valence error but there are some more examples
available. For example "CIC" will become C[IH]C.

best wishes
wim


On Sat, Apr 29, 2023 at 12:20 PM Thomas  wrote:

> I am not a chemist, so it can be a silly question, but I am interested in
> the logic behind it, also because other libraries (like OpenBabel) behave
> differently.
>
> Why sometimes RDKit writes hydrogens explicitly?
>
> mol = rdkit.MolFromSmiles('CCS=O', sanitize=False)
> rdkit.MolToSmiles(mol)
> 'CC[SH]=O'
>
> The input SMILES is intended as a pattern, not a molecule. I make a mol
> out of it only to get the canonical SMILES, that will be then used as
> SMARTS.
> Logically, I don't understand how the number of H attached to the S can be
> "guessed" by the library, still it cannot be left implicit.
>
> Furthermore, I have seen this behaviour only with S and P. I was wondering
> if it's a confined issue, or it can happen with any element.
> Thank you
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile

2023-04-11 Thread Wim Dehaen
Sorry for not reading your question properly. I am personally not aware of
a way to export molfiles in this way in rdkit, but I might just be unaware.
I think the easiest solution would be probably changing the molblock string
post hoc by reading the M  ISO line.

for example like this in python:
```
mol=Chem.MolFromSmiles("c1c21[3H].[2H]2")

def MolToMolfileDT(mol,path):
mb=Chem.MolToMolBlock(mol).split("\n")
iso=[x for x in mb[-3].split(" ") if len(x)>0]
if iso[1]=="ISO": #check if theres isotope info
for i in range(int(iso[2])):
isotope=int(iso[4+2*i])
idx=int(iso[3+2*i])+3
if isotope in [2,3]: #only D and T
mb[idx]=mb[idx].replace("H",{2:"D",3:"T"}[isotope])
#replace only H (to not have issues like [3Li])
with open(path, "w") as molfile:
molfile.write("\n".join(mb))
return

MolToMolfileDT(mol,"mol.mol")```

this returns:

 RDKit  2D

  8  8  0  0  0  0  0  0  0  0999 V2000
1.50000.0. C   0  0  0  0  0  0  0  0  0  0  0  0
0.7500   -1.29900. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7500   -1.29900. C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.50000.0. C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.75001.29900. C   0  0  0  0  0  0  0  0  0  0  0  0
0.75001.29900. C   0  0  0  0  0  0  0  0  0  0  0  0
1.50002.59810. T   0  0  0  0  0  0  0  0  0  0  0  0
1.5000   -2.59810. D   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0
  2  3  1  0
  3  4  2  0
  4  5  1  0
  5  6  2  0
  6  7  1  0
  6  1  1  0
  8  2  1  0
M  ISO  2   7   3   8   2
M  END


best wishes
wim


On Tue, Apr 11, 2023 at 9:14 AM Santiago Fraga 
wrote:

> Many thanks for your examples, Wim.
> But I was checking the option to save the labels D and T in the molfile
> for the hydrogen isotopes,
> as other tools can do.
>
> Regards
> Santiago
>
> <http://www.mestrelab.com>
>
> SANTIAGO FRAGA
> *Software Developer*
> santi...@mestrelab.com <+santi...@mestrelab.com>
>
> *MESTRELAB RESEARCH S.L.*
> PHONE *+34881976775*
> FAX *+34981941079*
> Feliciano Barrera, 9B-Bajo 15706
> Santiago de Compostela (SPAIN)
>
> Follow us:
> [image: Mestrelab Twitter] <https://twitter.com/mestrelab>  [image:
> Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research>
>  [image: Canal de YouTube Mestrelab]
> <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww>  [image:
> MestreBlog] <http://mestrelab.com/blog/>
>
>
>
> --
> *De:* Wim Dehaen 
> *Enviado:* lunes, 10 de abril de 2023 18:07
> *Para:* Santiago Fraga 
> *Cc:* rdkit-discuss@lists.sourceforge.net <
> rdkit-discuss@lists.sourceforge.net>
> *Asunto:* Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile
>
> rdkit outputs a molfile with correct isotope labels for me using just:
>
> mol=Chem.MolFromSmiles("[3H]c1c1[2H]")
> Chem.MolToMolFile(mol,"test.mol")
>
> or labelling the atoms post hoc:
>
> mol=Chem.MolFromSmiles("c1c1")
> mol=Chem.AddHs(mol)
> mol.GetAtomWithIdx(6).SetIsotope(3)
> mol.GetAtomWithIdx(7).SetIsotope(2)
> mol=Chem.RemoveHs(mol)
> Chem.MolToMolFile(mol,"test2.mol")
>
> I hope this helps
>
> best wishes
> wim
>
>
> On Mon, Apr 10, 2023 at 4:43 PM Santiago Fraga 
> wrote:
>
>  Good afternoon!
>
>   I am a relatively new user of RDKit, and mainly the C++ API.
>
>   I am trying to save in a molfile the labels D and T for the hydrogen
> isotopes.
>   Like in the following molfile:
>
>   MJ230401
>
>   8  8  0  0  0  0  0  0  0  0999 V2000
>-0.35720.41250. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.07160.0. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.0716   -0.82500. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.3572   -1.23750. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.3572   -0.82500. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.35720.0. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.35721.23750. T   0  0  0  0  0  0  0  0  0  0  0  0
> 1.07170.41250. D   0  0  0  0  0  0  0  0  0  0  0  0
>   3  4  2  0  0  0  0
>   4  5  1  0  0  0  0
>   5  6  2  0  0  0  0
>   6  1  1  0  0  0  0
>   1  2  2  0  0  0  0
>   2  3  1  0  0  0  0
>   6  8  1  0  0  0  0
>   1  7  1  0  0  0  0
> M  END
>
> I am trying to set directly the labels in the hydrogen atoms:
>
> atom->setProp("a

Re: [Rdkit-discuss] Deuterium/Tritium labels in Molfile

2023-04-10 Thread Wim Dehaen
rdkit outputs a molfile with correct isotope labels for me using just:

mol=Chem.MolFromSmiles("[3H]c1c1[2H]")
Chem.MolToMolFile(mol,"test.mol")

or labelling the atoms post hoc:

mol=Chem.MolFromSmiles("c1c1")
mol=Chem.AddHs(mol)
mol.GetAtomWithIdx(6).SetIsotope(3)
mol.GetAtomWithIdx(7).SetIsotope(2)
mol=Chem.RemoveHs(mol)
Chem.MolToMolFile(mol,"test2.mol")

I hope this helps

best wishes
wim


On Mon, Apr 10, 2023 at 4:43 PM Santiago Fraga 
wrote:

>  Good afternoon!
>
>   I am a relatively new user of RDKit, and mainly the C++ API.
>
>   I am trying to save in a molfile the labels D and T for the hydrogen
> isotopes.
>   Like in the following molfile:
>
>   MJ230401
>
>   8  8  0  0  0  0  0  0  0  0999 V2000
>-0.35720.41250. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.07160.0. C   0  0  0  0  0  0  0  0  0  0  0  0
>-1.0716   -0.82500. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.3572   -1.23750. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.3572   -0.82500. C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.35720.0. C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.35721.23750. T   0  0  0  0  0  0  0  0  0  0  0  0
> 1.07170.41250. D   0  0  0  0  0  0  0  0  0  0  0  0
>   3  4  2  0  0  0  0
>   4  5  1  0  0  0  0
>   5  6  2  0  0  0  0
>   6  1  1  0  0  0  0
>   1  2  2  0  0  0  0
>   2  3  1  0  0  0  0
>   6  8  1  0  0  0  0
>   1  7  1  0  0  0  0
> M  END
>
> I am trying to set directly the labels in the hydrogen atoms:
>
> atom->setProp("atomLabel", "D");
> or
>   atom->setProp("_displayLabel", "D");
>
>But when the molfile is generated the labels are not transferred.
>It seems also that when reading a mofile including the labels, they
> are discarded.
>
>
> Many thanks in advance
> Santiago Fraga
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] problem when reading in a .sdf file w/ hydrogens already present and removeHs=False

2023-04-03 Thread Wim Dehaen
the sdf doesnt parse so well for me when pasted from the mail(and seems to
contain an unusual conformation) but have you tried to turn
includeNeighbors=True in this line:
numHs = a.GetTotalNumHs(includeNeighbors=True)

in case this fixes your issue, discussion why this flag is needed can be
found in this gtihub issue: https://github.com/rdkit/rdkit/issues/1357

best wishes
wim

On Mon, Apr 3, 2023 at 9:41 AM Francois Berenger  wrote:

> Dear list,
>
> This code:
> ---
> #!/usr/bin/env python3
>
> import argparse, sys
> from rdkit import Chem
>
> def debug_mol(m):
>  for a in m.GetAtoms():
>  i = a.GetIdx()
>  anum = a.GetAtomicNum()
>  numHs = a.GetTotalNumHs()
>  print('%d %d %d' % (i, anum, numHs))
>
> if __name__ == '__main__':
>  # CLI options parsing
>  parser = argparse.ArgumentParser(description = "test strange rdkit
> behavior")
>  parser.add_argument("-i", metavar = "input.sdf", dest = "input_fn",
>  help = "3D conformer input file ")
>  # parse CLI
> ---
>  if len(sys.argv) == 1:
>  # user has no clue of what to do -> usage
>  parser.print_help(sys.stderr)
>  sys.exit(1)
>  args = parser.parse_args()
>  input_fn = args.input_fn
>  # parse CLI end
> ---
>  for mol in Chem.SDMolSupplier(input_fn, removeHs=False):
>  debug_mol(mol)
> ---
>
> On this file:
> ---
> caffeine
>   OpenBabel10171811233D
>
>   24 25  0  0  0  0  0  0  0  0999 V2000
> -1.45372.78480.2699 C   0  0  0  0  0  0  0  0  0  0  0  0
> -1.01081.40830.1062 N   0  0  0  0  0  0  0  0  0  0  0  0
>  0.30151.13230.0489 C   0  0  0  0  0  0  0  0  0  0  0  0
>  1.10812.09200.1407 O   0  0  0  0  0  0  0  0  0  0  0  0
>  0.8161   -0.1286   -0.1033 C   0  0  0  0  0  0  0  0  0  0  0  0
> -0.0929   -1.1771   -0.2031 C   0  0  0  0  0  0  0  0  0  0  0  0
>  0.6111   -2.3242   -0.3462 N   0  0  0  0  0  0  0  0  0  0  0  0
>  1.9386   -2.0269   -0.3392 C   0  0  0  0  0  0  0  0  0  0  0  0
>  2.0299   -0.6962   -0.1913 N   0  0  0  0  0  0  0  0  0  0  0  0
>  3.27290.0261   -0.1349 C   0  0  0  0  0  0  0  0  0  0  0  0
> -1.4004   -0.8770   -0.1432 N   0  0  0  0  0  0  0  0  0  0  0  0
> -2.3540   -1.9596   -0.2459 C   0  0  0  0  0  0  0  0  0  0  0  0
> -1.86970.37710.0073 C   0  0  0  0  0  0  0  0  0  0  0  0
> -3.09740.65100.0627 O   0  0  0  0  0  0  0  0  0  0  0  0
> -0.68843.31910.8569 H   0  0  0  0  0  0  0  0  0  0  0  0
> -1.50243.2204   -0.7549 H   0  0  0  0  0  0  0  0  0  0  0  0
> -2.46902.83500.7286 H   0  0  0  0  0  0  0  0  0  0  0  0
>  2.7299   -2.7636   -0.4379 H   0  0  0  0  0  0  0  0  0  0  0  0
>  3.47830.41860. H   0  0  0  0  0  0  0  0  0  0  0  0
>  4.1200   -0.5981   -0.4606 H   0  0  0  0  0  0  0  0  0  0  0  0
>  3.27000.9110   -0.8337 H   0  0  0  0  0  0  0  0  0  0  0  0
> -1.8812   -2.88340.1466 H   0  0  0  0  0  0  0  0  0  0  0  0
> -2.6277   -2.0396   -1.3222 H   0  0  0  0  0  0  0  0  0  0  0  0
> -3.2286   -1.70140.3855 H   0  0  0  0  0  0  0  0  0  0  0  0
>1  2  1  0  0  0  0
>1 15  1  0  0  0  0
>1 16  1  0  0  0  0
>1 17  1  0  0  0  0
>2  3  1  0  0  0  0
>3  4  2  0  0  0  0
>3  5  1  0  0  0  0
>5  6  2  0  0  0  0
>6  7  1  0  0  0  0
>6 11  1  0  0  0  0
>7  8  2  0  0  0  0
>8  9  1  0  0  0  0
>8 18  1  0  0  0  0
>9 10  1  0  0  0  0
>9  5  1  0  0  0  0
>   10 19  1  0  0  0  0
>   10 20  1  0  0  0  0
>   10 21  1  0  0  0  0
>   11 12  1  0  0  0  0
>   11 13  1  0  0  0  0
>   12 22  1  0  0  0  0
>   12 23  1  0  0  0  0
>   12 24  1  0  0  0  0
>   13 14  2  0  0  0  0
>   13  2  1  0  0  0  0
> M  END
> 
> ---
>
> Tells me that a.GetTotalNumHs() is always 0:
> ---
> 0 6 0
> 1 7 0
> 2 6 0
> 3 8 0
> 4 6 0
> 5 6 0
> 6 7 0
> 7 6 0
> 8 7 0
> 9 6 0
> 10 7 0
> 11 6 0
> 12 6 0
> 13 8 0
> 14 1 0
> 15 1 0
> 16 1 0
> 17 1 0
> 18 1 0
> 19 1 0
> 20 1 0
> 21 1 0
> 22 1 0
> 23 1 0
> ---
>
> This is wrong: e.g. atom at index 0 (Carbon) should have 3 hydrogens.
> The involved bonds are 1 15, 1 16 and 1 17 in the sdf file.
> The total of Hs attached to heavy atoms should be 10.
>
> The rdkit I am using:
> ---
> # pip3 list rdkit | grep rdkit
> rdkit2022.9.5
> rdkit-pypi   2022.9.3
> ---
>
> Should I feel in a bug on github, or am I doing something stupid?
>
> If I leave the removeHs flag to its default value (of False), then the
> result
> becomes correct !
>
> Regards,
> F.
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] MolToMolBlock problem

2023-02-22 Thread Wim Dehaen
Hi all,
No error on rdkit 2022.09.04 boost 1_78 (Ubuntu 20.04).
However I am able to reproduce the error on 2022.03.2, here it is:


Invariant Violation
no eligible neighbors for chiral center
Violation occurred on line 238 in file
/project/build/temp.linux-x86_64-cpython-39/rdkit/Code/GraphMol/FileParsers/MolFileStereochem.cpp
Failed Expression: nbrScores.size()



---RuntimeError
 Traceback (most recent call
last) in   3 for m1 in
Chem.SDMolSupplier("full.sdf", removeHs=False):  4
m2=Chem.RemoveHs(m1)> 5 print (Chem.MolToMolBlock(m2))
RuntimeError: Invariant Violation
no eligible neighbors for chiral center
Violation occurred on line 238 in file
Code/GraphMol/FileParsers/MolFileStereochem.cpp
Failed Expression: nbrScores.size()
RDKIT: 2022.03.2
BOOST: 1_75

In fact, just rendering m2 in the jupyter notebook using display() causes a
crash

RuntimeError: Invariant Violation
no eligible neighbors for chiral center
Violation occurred on line 238 in file
Code/GraphMol/FileParsers/MolFileStereochem.cpp
Failed Expression: nbrScores.size()
RDKIT: 2022.03.2
BOOST: 1_75

This seems to be related to sulfoxide stereochemistry assignment, see this
issue https://github.com/rdkit/rdkit/issues/2227
In any case, it seems like it is fixed in up-to-date rdkit.

However, it looks like there is still a subtle stereochemistry related
issue: if i convert m2 to SMILES I get:

C[C@@H]1[C@@H](C)[C@H](C=C[At])[S@](=O)[C@@H]1*[SH]*=O

I.e. the second sulfur (bolded) is not assigned stereochemistry, but the
first one is. Both are asymmetric.

best wishes
wim





On Wed, Feb 22, 2023 at 4:20 PM Greg Landrum  wrote:

> Hi Ling,
>
> I can't reproduce this problem on windows using the most recent version of
> the RDKit.
> Which version of the RDKit are you using and how did you install it?
>
> Please also share exactly what you see for an error message.
>
> -greg
>
>
> On Tue, Feb 21, 2023 at 7:03 AM Ling Chan  wrote:
>
>> Dear colleagues,
>>
>> Don't know if this is a bug, or if my input molecule is not good. I
>> suspect that it is the former.
>>
>> If you run the following on the file "full.sdf", it will crash at the
>> MolToMolBlock line.
>>
>> for m1 in Chem.SDMolSupplier(inputsdf, removeHs=False):
>> m2=Chem.RemoveHs(m1)
>> print (Chem.MolToMolBlock(m2))
>>
>> You can confirm that the problem is due to the stereo definition of the
>> double bond, since if you edit the bond line " 4 5 2 3" to " 4 5 2 0" it
>> will not crash.
>>
>> I tried to simplify the situation by boiling the molecule down to
>> "simple.sdf". Unfortunately it does not crash any more.
>>
>> Thanks.
>>
>> Ling
>>
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Adding hydrogen to conformations works srangely.

2022-12-20 Thread Wim Dehaen
Hello,
I think the place the hydrogens get lost is during the "MolFromMolBlock"
operation. Try to add the flag *removeHs=False.*
best wishes,
wim

On Tue, Dec 20, 2022 at 10:56 AM Omar H94  wrote:

> Dear Petro,
>
> Try using: Chem.AddHs(mol, addCoords=True) to get hydrogen atoms with 3D
> coordinates.
>
> Regards,
> Omar
>
> On Tue, 20 Dec 2022 at 12:18 PM Khoroshyy Petro 
> wrote:
>
>> Hi, list.
>> I use the code below to generate conformers.
>> The thing is that If I generate conformers, they are without hydrogens,
>> and when I add hydrogens, they are all in the center of the ring
>> overlapping.
>> What am I doing wrong?
>> Thanks.
>> Petro.
>>
>> from rdkit import Chem
>> from rdkit import RDConfig
>> from rdkit.Chem import AllChem
>> from rdkit.Chem import rdMolAlign
>> from rdkit.Chem import rdShapeHelpers
>> from rdkit.Chem import rdMolDescriptors
>> from rdkit.Chem import PyMol
>> from rdkit.Chem import Draw
>> params = Chem.rdDistGeom.srETKDGv3()
>> params.randomSeed = 0xf00d
>> params.clearConfs = True
>> m  = Chem.MolFromSmiles(''C1CCC1'')
>> m2=Chem.AddHs(m)
>> cids = AllChem.EmbedMultipleConfs(m2, numConfs=10, params=params)
>> rmslist = []
>> AllChem.AlignMolConformers(m2, RMSlist=rmslist)
>> print(rmslist)
>> res = []
>> for i in range(5):
>>  mb = Chem.MolToMolBlock(m2, confId=i)
>>
>>  mb_h =  Chem.AddHs(Chem.MolFromMolBlock(mb))
>>  res.append(mb_h)
>>
>>
>> --
>> __
>> Petro Khoroshyy
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Distinguishing bridgeheads from ring-fusions with SMARTS

2022-12-03 Thread Wim Dehaen
Hi Andreas,
I don't have a good SMARTS pattern available for this but here is a
function that should return bridgehead idx and not include non bridgehead
fused ring atoms:

```
def return_bridgeheads_idx(mol):
bh_list=[]
intersections=[]
sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))]
for i,ring1 in enumerate(sssr_idx):
for j,ring2 in enumerate(sssr_idx):
if i>j:
intersections+=[ring1.intersection(ring2)]
for iidx in intersections:
if len(iidx)>2: #condition for bridgehead
for idx in iidx:
neighbors = [a.GetIdx() for a in
mol.GetAtomWithIdx(idx).GetNeighbors()]
bh_list+=[idx for nidx in neighbors if nidx not in iidx]
return tuple(set(bh_list))
```

Here are 6 test molecules:

```
mol1 = Chem.MolFromSmiles("C1CC2CCC1C2")
mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1")
mol4 = Chem.MolFromSmiles("C1CCC12C2")
mol5 = Chem.MolFromSmiles("C1CC2C1C2")
mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12")
for mol in [mol1,mol2,mol3,mol4,mol5,mol6]:
print(return_bridgeheads_idx(mol))
```

giving the expected answer:

(2, 5)
(4, 7)
(0, 5)
()
()
()

hope this is helpful!

best wishes
wim

On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens 
wrote:

> Dear users,
>
> I am trying to identify bridgehead atoms in multi-looped ring systems. The
> issue I have is that it can be sometimes difficult to distinguish these
> atoms from ring-fusion atoms. The pattern I used (see below) looks for
> atoms that are part of three rings but cannot be bonded to an atom that
> also fits this description, in order to avoid ring-fusion atoms. The code
> works, except for cases where bridgehead atoms are bonded to a ring-fusion
> atom.
>
> *PASS:*
> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2")
> print(rdkit_mol.GetSubstructMatches(pattern))
> >>>((2,),(5,))
>
> *FAIL:*
> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]")
> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1")
> print(rdkit_mol.GetSubstructMatches(pattern))
> >>>()
>
> Any hint on what alternative pattern I could use to isolate true
> bridgeheads would be greatly appreciated. Maybe other strategies are more
> suitable to find these atoms?
>
> Thanks in advance!
>
> Best regards,
> Andreas
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Cannot match reaction SMART with reactant

2022-11-23 Thread Wim Dehaen
Apologies for the double message. I just realized the SMARTS has bad
mappings and therefore would fail when there are any substituents on the
benzene ring.
Here's a working SMARTS:
reaction_smarts =
"[c:0]1[c:1][c:2][c:3]([NX3H2:4])[c:5]([NX3H2:6])[c:7]1>>[c:0]1[c:1][c:2][c:3][c:5]2[*:4]-[n]=[*:6][c:7]12"


On Wed, Nov 23, 2022 at 4:02 PM Wim Dehaen  wrote:

> Hello,
> I think the error is in the placement of the $ in the SMARTS query, I am
> not sure what is the purpose in there. if i run
>
> reaction_smarts =
>  
> '[cX3]1[cX3][cX3][cX3]([NX3H2:1])[cX3]([NX3H2:2])[cX3]1>>[cX3]1[cX3][cX3][cX3][cX3]2[nX3:1]=[nX3]-[nX3H:2][cX3]12'
> rxn = AllChem.ReactionFromSmarts(reaction_smarts)
> product = rxn.RunReactants((reactant,))
> Chem.SanitizeMol(product[0][0])
> print(Chem.MolToSmiles(product[0][0]))
> i get
>
> c1ccc2[nH]nnc2c1
>
>
> an alternative, somewhat shorter way to write this formation of a
> benzotriazole would be:
> reaction_smarts =
> 'c1ccc([NX3H2:1])c([NX3H2:2])[cX3]1>>c12[*:1]-[n]=[*:2]c12'
>
> best wishes
> wim
>
>
>
> On Wed, Nov 23, 2022 at 3:36 PM Alfredo Quevedo 
> wrote:
>
>> Dear users,
>>
>> I have been struggling for days trying to model a specific reaction but
>> cannot succeed in matching the corresponding SMARTS notation, I am
>> reproducing my code below,
>>
>> My molecule is:
>>
>> *reactant = 'c1ccc(N)c(N)c1'*
>>
>> I am trying to prepare a second ring by reacting the two primary amines.
>> The final molecule should be:
>>
>> *product = 'c12nn[nH]c12'*
>>
>> In this way, I am setting the SMARTS reaction pattern as:
>>
>> *reaction_smarts =
>> '[cX3]1[cX3][cX3][cX3]($[NX3H2:1])[cX3]($[NX3H2:2])[cX3]1>>[cX3]1[cX3][cX3][cX3][cX3]2$[nX3:1]=[nX3]-[nX3H:2]$[cX3]12'*
>>
>> And after running the reaction,
>>
>> *rxn = AllChem.ReactionFromSmarts(reaction_smarts)*
>>
>> *product = rxn.RunReactants((reactant,))*
>>
>> I am not getting no product at all,
>>
>> Any hint on what may be my problem would be greatly appreciated,
>>
>> Thank you very much in advance for the help,
>>
>> regards
>>
>> Alfredo
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Cannot match reaction SMART with reactant

2022-11-23 Thread Wim Dehaen
Hello,
I think the error is in the placement of the $ in the SMARTS query, I am
not sure what is the purpose in there. if i run

reaction_smarts =
 
'[cX3]1[cX3][cX3][cX3]([NX3H2:1])[cX3]([NX3H2:2])[cX3]1>>[cX3]1[cX3][cX3][cX3][cX3]2[nX3:1]=[nX3]-[nX3H:2][cX3]12'
rxn = AllChem.ReactionFromSmarts(reaction_smarts)
product = rxn.RunReactants((reactant,))
Chem.SanitizeMol(product[0][0])
print(Chem.MolToSmiles(product[0][0]))
i get

c1ccc2[nH]nnc2c1


an alternative, somewhat shorter way to write this formation of a
benzotriazole would be:
reaction_smarts =
'c1ccc([NX3H2:1])c([NX3H2:2])[cX3]1>>c12[*:1]-[n]=[*:2]c12'

best wishes
wim



On Wed, Nov 23, 2022 at 3:36 PM Alfredo Quevedo 
wrote:

> Dear users,
>
> I have been struggling for days trying to model a specific reaction but
> cannot succeed in matching the corresponding SMARTS notation, I am
> reproducing my code below,
>
> My molecule is:
>
> *reactant = 'c1ccc(N)c(N)c1'*
>
> I am trying to prepare a second ring by reacting the two primary amines.
> The final molecule should be:
>
> *product = 'c12nn[nH]c12'*
>
> In this way, I am setting the SMARTS reaction pattern as:
>
> *reaction_smarts =
> '[cX3]1[cX3][cX3][cX3]($[NX3H2:1])[cX3]($[NX3H2:2])[cX3]1>>[cX3]1[cX3][cX3][cX3][cX3]2$[nX3:1]=[nX3]-[nX3H:2]$[cX3]12'*
>
> And after running the reaction,
>
> *rxn = AllChem.ReactionFromSmarts(reaction_smarts)*
>
> *product = rxn.RunReactants((reactant,))*
>
> I am not getting no product at all,
>
> Any hint on what may be my problem would be greatly appreciated,
>
> Thank you very much in advance for the help,
>
> regards
>
> Alfredo
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [bug] ResonanceMolSupplier not working as expected

2022-11-14 Thread Wim Dehaen
one more thing: the issue with benzene in 2020 is (i think) expected
behavior and unrelated to the other issue: due to symmetry the resonance
structures are identical. if instead you break the symmetry and put in
o-xylene ("Cc1c1C"), 2020 version (and the fix proposed above) will
correctly give two discrete resonance structures:

CC1=CC=CC=C1C
CC1=C(C)C=CC=C1


best wishes
wim

On Mon, Nov 14, 2022 at 1:13 PM Wim Dehaen  wrote:

> Hello,
> I can reproduce the different behavior between a 2020.09 version and an up
> to date one on my end as well. I think it is related to this issue:
> https://github.com/rdkit/rdkit/issues/3973 during the kekulization some
> unintended normalization seems to be happening leading to the same kekule
> structures in the output
> for the moment here is a dirty but functional fix (changes in bold):
>
> from rdkit import Chem
> from rdkit.Chem import ResonanceMolSupplier, ResonanceFlags
>
> mol = Chem.MolFromSmiles("C1=CC2=C(C=C1)C=CC=C2")  # Naphthalene
> suppl = ResonanceMolSupplier(mol, Chem.KEKULE_ALL)
> for m in suppl:
>
> *for a in m.GetAtoms():a.SetIsAromatic(False)*
> print(Chem.MolToSmiles(m, kekuleSmiles=True))
> this returns
>
> C1=CC2=C(C=C1)C=CC=C2
> C1=CC2=CC=CC=C2C=C1
> C1=CC=C2C=CC=CC2=C1
>
>
>
> best wishes
> wim
>
> On Sun, Nov 13, 2022 at 12:56 AM Eduardo Mayo 
> wrote:
>
>> Dear all,
>>
>> I hope you all are doing well.
>>
>> I've been trying to enumerate all Kekule structures, and apparently it
>> should be done with  ResonanceMolSupplier. In this email
>> <https://sourceforge.net/p/rdkit/mailman/message/36034365/>  apparently,
>> this snippet works:
>>
>> from rdkit import Chem
>>> from rdkit.Chem import ResonanceMolSupplier, ResonanceFlags
>>
>>
>>
>> mol = Chem.MolFromSmiles("c1c1")
>>
>>
>>
>> suppl = ResonanceMolSupplier(mol, ResonanceFlags.KEKULE_ALL)
>>
>> for m in suppl:
>>> print(Chem.MolToSmiles(m, kekuleSmiles=True))
>>
>> [out] expected output
>>
>>> C1C=CC=CC=1
>>
>> C1=CC=CC=C1
>>
>> [out] rdkit.__version__ ==  '2020.03.6'
>>
>>> C1=CC=CC=C1
>>>
>> C1=CC=CC=C1
>>>
>> [out] rdkit.__version__ ==  '2022.03.5'
>>
>>> C1=CC=CC=C1
>>
>> C1=CC=CC=C1
>>>
>>
>> But I have tried it on Win, RDKit '2022.03.5' and it doesn't work. Also,
>> I tried with RDKit '2020.03.6', and it doesn't work for benzene or
>> naphthalene.
>>
>>> from rdkit import Chem
>>> from rdkit.Chem import ResonanceMolSupplier, ResonanceFlags
>>>
>>> mol = Chem.MolFromSmiles("C1=CC2=C(C=C1)C=CC=C2")  # Naphthalene
>>>
>>> suppl = ResonanceMolSupplier(mol, ResonanceFlags.KEKULE_ALL)
>>> for m in suppl:
>>> print(Chem.MolToSmiles(m, kekuleSmiles=True))
>>>
>> [out] rdkit.__version__ ==  '2020.03.6'
>>
>>> C1=CC2=C(C=C1)C=CC=C2
>>> C1=CC2=CC=CC=C2C=C1
>>> C1=CC=C2C=CC=CC2=C1
>>
>> [out] rdkit.__version__ ==  '2022.03.5'
>>
>>> C1=CC2=C(C=C1)C=CC=C2
>>> C1=CC2=C(C=C1)C=CC=C2
>>> C1=CC2=C(C=C1)C=CC=C2
>>
>>
>> Please let me know if you have any workaround to enumerate Kekule
>> structures.
>> All the best,
>>
>> Eduardo
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [bug] ResonanceMolSupplier not working as expected

2022-11-14 Thread Wim Dehaen
Hello,
I can reproduce the different behavior between a 2020.09 version and an up
to date one on my end as well. I think it is related to this issue:
https://github.com/rdkit/rdkit/issues/3973 during the kekulization some
unintended normalization seems to be happening leading to the same kekule
structures in the output
for the moment here is a dirty but functional fix (changes in bold):

from rdkit import Chem
from rdkit.Chem import ResonanceMolSupplier, ResonanceFlags

mol = Chem.MolFromSmiles("C1=CC2=C(C=C1)C=CC=C2")  # Naphthalene
suppl = ResonanceMolSupplier(mol, Chem.KEKULE_ALL)
for m in suppl:

*for a in m.GetAtoms():a.SetIsAromatic(False)*
print(Chem.MolToSmiles(m, kekuleSmiles=True))
this returns

C1=CC2=C(C=C1)C=CC=C2
C1=CC2=CC=CC=C2C=C1
C1=CC=C2C=CC=CC2=C1



best wishes
wim

On Sun, Nov 13, 2022 at 12:56 AM Eduardo Mayo 
wrote:

> Dear all,
>
> I hope you all are doing well.
>
> I've been trying to enumerate all Kekule structures, and apparently it
> should be done with  ResonanceMolSupplier. In this email
>   apparently,
> this snippet works:
>
> from rdkit import Chem
>> from rdkit.Chem import ResonanceMolSupplier, ResonanceFlags
>
>
>
> mol = Chem.MolFromSmiles("c1c1")
>
>
>
> suppl = ResonanceMolSupplier(mol, ResonanceFlags.KEKULE_ALL)
>
> for m in suppl:
>> print(Chem.MolToSmiles(m, kekuleSmiles=True))
>
> [out] expected output
>
>> C1C=CC=CC=1
>
> C1=CC=CC=C1
>
> [out] rdkit.__version__ ==  '2020.03.6'
>
>> C1=CC=CC=C1
>>
> C1=CC=CC=C1
>>
> [out] rdkit.__version__ ==  '2022.03.5'
>
>> C1=CC=CC=C1
>
> C1=CC=CC=C1
>>
>
> But I have tried it on Win, RDKit '2022.03.5' and it doesn't work. Also, I
> tried with RDKit '2020.03.6', and it doesn't work for benzene or
> naphthalene.
>
>> from rdkit import Chem
>> from rdkit.Chem import ResonanceMolSupplier, ResonanceFlags
>>
>> mol = Chem.MolFromSmiles("C1=CC2=C(C=C1)C=CC=C2")  # Naphthalene
>>
>> suppl = ResonanceMolSupplier(mol, ResonanceFlags.KEKULE_ALL)
>> for m in suppl:
>> print(Chem.MolToSmiles(m, kekuleSmiles=True))
>>
> [out] rdkit.__version__ ==  '2020.03.6'
>
>> C1=CC2=C(C=C1)C=CC=C2
>> C1=CC2=CC=CC=C2C=C1
>> C1=CC=C2C=CC=CC2=C1
>
> [out] rdkit.__version__ ==  '2022.03.5'
>
>> C1=CC2=C(C=C1)C=CC=C2
>> C1=CC2=C(C=C1)C=CC=C2
>> C1=CC2=C(C=C1)C=CC=C2
>
>
> Please let me know if you have any workaround to enumerate Kekule
> structures.
> All the best,
>
> Eduardo
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS definition for basic nitrogen

2022-10-06 Thread Wim Dehaen
Hi,
For an even simpler definition that works "well enough" for many cases I
have been using
"[NX3,NX2,nX2;!$(NC=O);!$(NS=O)]"
this excludes amides, nitro,nitroso, sulfonamide and pyrrole nitrogens, but
include aliphatic amines, anilines, pyridines, hydroxylamines, imines etc
best wishes
wim

On Thu, Oct 6, 2022 at 12:12 PM Axel Pahl  wrote:

> Dear RDKitters,
>
> does someone have a good SMARTS definition for basic nitrogen (aliphatic
> and aromatic), which they would be able to share?
>
> My Google-fu is failing me.
>
> Many thanks in advance.
>
> Kind regards,
> Axel
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS pattern

2022-06-07 Thread Wim Dehaen
The above solution with !r4 doesn't work because for sssr reasons these
atoms are considered to be in a 4 membered ring also if the 4 membered ring
is "exo" to the central 6 membered one. AFAIK there is no good way to do a
general ring size filter in an atom definition using SMARTS. Below is a
quite ugly, but working solution

def GetSubstructMatches_filtered(mol,pattern):
matches = mol.GetSubstructMatches(pattern)
filtered_matches = []
for match in matches:
if Chem.MolFragmentToSmiles(mol, atomsToUse=match).count("2") == 0:
filtered_matches.append(match)
return tuple(filtered_matches)

m1 =
Chem.MolFromSmiles("c1ccc2cc3c(ccc4c5c5c5cc6c7cc8c(cc7c6cc5c34)c3cccnc38)cc2c1")
m2 =
Chem.MolFromSmiles("b12c1c1c(c3ccc4ccc4c3c3c4c5cc[nH]c5c4c13)c1ncc3c3c21")
m3 =
Chem.MolFromSmiles("b1ccbc2c1c1ccoc1c1c2c2ccsc2c2[nH]c3ncc4c(c3c21)=c1n1=4")

p = Chem.MolFromSmarts("[*;R2]~1~[*;R2]~[*;R2]~[*;R2]~[*;R2]~[*;R2]~1")
for m, expected_value in zip([m1,m2,m3],[1,2,2]):
print(len(GetSubstructMatches_filtered(m,p)) == expected_value)



how does it work? the function GetSubstructMatches_filtered checks if there
is more than one ring in the substructure (by converting to substruct to
SMILES using atom indices from the GetSubstructMatches result and searching
for "2" in the string) and rejects it if so.
wim




On Tue, Jun 7, 2022 at 8:52 PM Geoffrey Hutchison 
wrote:

> Nevermind, x3 won't exclude the fused 4-atom rings from your first
> example. I'll let you know if I think of some other way. :-)
>
>
> I think you'd want something like this, perhaps - to exclude atoms in ring
> size 4?
>
> [*;R2!r4]~1~[*;R2!r4]~[*;R2!r4]~[*;R2!r4]~[*;R2!r4]~[*;R2!r4]~1
>
> I also don't know if you're trying to ensure that each of the atoms are
> aromatic, in which case, you'd want something like:
>
> [a;R2!r4]~1~[a;R2!r4]~[a;R2!r4]~[a;R2!r4]~[a;R2!r4]~[a;R2!r4]~1
>
> Hope that helps,
> -Geoff
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RunReactants and chirality?

2022-01-24 Thread Wim Dehaen
Hi,
I think that is not expected behavior and perhaps it could be a consequence
of a faulty reaction definition.
using the following with a generic amide coupling:

rxn =
AllChem.ReactionFromSmarts('[C:1](=[O:2])[OH:3].[N:4]>>[C:1](=[O:2])[N:4].[OH:3]')
reactant1 = Chem.MolFromSmiles("C[C@@H](OC1=CC(C(=O)O)=CC=C1)C1=CC=CC=C1")
reactant2 = Chem.MolFromSmiles("NC1(CO)CC(F)(F)C1")
prod = rxn.RunReactants((reactant1,reactant2))[0][0]
display(prod)

i get the expected product:
[image: image.png]

C[C@@H](Oc1(C(=O)NC2(CO)CC(F)(F)C2)c1)c1c1


using your more specific reaction definition that includes the whole
reactant1 structure, the reaction does not work because the amine reactant
is defined as [NH3] in the reaction smarts, so ammonia only. if i replace
it with an amine generic
'C[C@@H](Oc1(C(=O)O)c1)c1c1.[N:1]>>C[C@@H](Oc1(C(=O)[N:1])c1)c1c1'
reaction smarts i also get the expected product that includes the stereo
center.
I think if your problem persists it would be good to paste a whole code
block that outputs the undesired product.
best wishes
wim


On Mon, Jan 24, 2022 at 11:31 AM James Wallace  wrote:

> I've been trying to use the RunReactants portion of the ChemicalReaction
> enumeration with chiral reagents, and a chiral reaction, but I don't seem
> to get chiral products out.
>
> For example, using a template reaction:
>
> C[C@@H](Oc1(C(=O)O)c1)c1c1.[NH3:1]>>C[C@
> @H](Oc1(C(=O)[NH2:1])c1)c1c1
>
> [image: image.png]
>
> Reactant 1: C[C@@H](OC1=CC(C(=O)O)=CC=C1)C1=CC=CC=C1
>
> [image: image.png]
>
> Reactant 2: NC1(CO)CC(F)(F)C1
>
> [image: image.png]
>
> Product: CC(Oc1(C(=O)NC2(CO)CC(F)(F)C2)c1)c1c1
>
> [image: image.png]
>
> Not that the methyl at the top of the drawing no longer has an R
> indication.
>
> Is there anything inherent to the RunReactants function that clears
> chirality, as it appears to be retained right up until the combination step
> to generate the product?
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] generating smiles using RDKit

2021-12-08 Thread Wim Dehaen
Hi all,
Just noticed why some of the SMILES in my script don't parse: I
accidentally put brackets at the terminal carbons, my bad. Here is a fixed
script also with updated molecular weights
https://gist.github.com/dehaenw/bb5704fc4d108eec8f8e999d6ab79118
I looked into the different total amount of smiles with Andrew's nice and
general implementation. Immediately from the missing compounds it was clear
there was an error in the way my script dealt with capping of the atoms. In
the condition that both the first and the last atom bear two Cl atoms,
inadvertently one of the two options would not get added. Now the script
outputs 442849 parsable smiles. After canonicalization the amount is
reduced to 440334. This number is consistent with Andrew's result.
best wishes
wim

On Wed, Dec 8, 2021 at 2:59 PM Gyro Funch  wrote:

> Hello Andrew,
>
> Thank you for developing and documenting this awesome script!
>
> I greatly appreciate you and the other helpful and generous folks on the
> mailing list who have taken the time to assist me. Feel free to take the
> rest of the day off.  ;-)
>
> Kind regards,
> Gyro
>
>
> On 2021-12-08 02:33 PM, Andrew Dalke wrote:
> > Hi Gyro,
> >
> >> On Dec 8, 2021, at 11:02, Gyro Funch  wrote:
> >>
> >> My work is in the area of toxicology and I am interested in generating
> SMILES for molecules referred to as 'short chain chlorinated paraffins'
> (SCCP).
> >>
> >> A general definition that is sometimes used is that an SCCP is given by
> the molecular formula
> >>
> >> C_{x} H_{2x-y+2} Cl_{y}
> >>
> >> where
> >>
> >> x = 10-13
> >> y = 3-12
> >>
> >> and the average chlorine content ranges from 40-70% by mass.
> >>
> >> -
> >>
> >> Can anyone provide guidance on how to generate the list of SMILES
> corresponding to the above rules?
> > Here's an alternate approach to the ones presented so far.
> >
> >https://gist.github.com/adalke/e62df8774032560fef750fa9c88b6516
> >
> > Like Wim's version, it also generates the SMILES as the syntax level,
> though by default it use RDKit to generate canonical SMILES output. (use
> --no-canonical to disable the canonicalization step, which is also faster.)
> >
> > Here it is with 4 carbons and 3 chlorines.
> >
> > % python sccp_smiles.py --C 4 --Cl 3
> > Content range not specified. Using --min-content 0.4 and --max-content
> 0.7.
> > (Cl)(Cl)Cl
> > CCC(Cl)C(Cl)Cl
> > CCC(Cl)(Cl)CCl
> > CC(Cl)CC(Cl)Cl
> > CC(Cl)C(Cl)CCl
> > CC(Cl)C(C)(Cl)Cl
> > CC(Cl)(Cl)CCCl
> > Cl(Cl)Cl
> > ClCCC(Cl)CCl
> >
> > The "--C" and "--Cl" are aliases for "--min-C" and "--min-Cl"; if the
> maximums are not specified then the maximum is set to the minimum.
> >
> > Here's a range using all the bells and whistles:
> >
> > % time python sccp_smiles.py --min-C 10 --max-C 13 --min-Cl 3 --max-Cl
> 12 --max-Cl-per-atom 2 --min-content 0.4 --max-content 0.7
> --no-canonicalize > example.smi
> > 2.030u 0.156s 0:02.44 89.3%   0+0k 0+0io 0pf+0w
> > % wc -l example.smi
> >440334 example.smi
> >
> > Wim reported 437001 for the same configuration. I haven't figured out if
> the difference is due simply to differences in the molecular weight values.
> >
> > I couldn't canonicalize and pin down the differences in part because
> Wim's output generates SMILES strings that RDKit cannot parse:
> >
> > % grep '^[(]' CSSP.smi | head -4
> > (Cl)C(Cl)(Cl)C(Cl)(Cl)
> > (Cl)C(Cl)(Cl)(Cl)C(Cl)(Cl)
> > (Cl)C(Cl)(Cl)(Cl)(Cl)C(Cl)(Cl)
> > (Cl)C(Cl)(Cl)CCC(Cl)CC(Cl)(Cl)
> >
>  from rdkit import Chem
>  Chem.CanonSmiles("(Cl)C(Cl)(Cl)CCC(Cl)CC(Cl)(Cl)")
> > [14:31:12] SMILES Parse Error: syntax error while parsing:
> (Cl)C(Cl)(Cl)CCC(Cl)CC(Cl)(Cl)
> >
> > My code isn't well tested, but perhaps enough to get you on the way.
> >
> > Cheers,
> >
> >
> >   Andrew
> >   da...@dalkescientific.com
> >
> >
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] generating smiles using RDKit

2021-12-08 Thread Wim Dehaen
Due to the nature of SCCP, which are just based on chlorination of
n-alkanes (so just linear), enumerating them via a more limited method than
the interesting linked preprint is also possible.
This can be done exhaustively on the string level
a given carbon in the chain will have either 0 or 1 or 2 chlorines (and in
all examples given just 0 or 1). Either cap (terminal carbon) of the
molecule can also have a third chlorine (so trichloromethyl cap). N alkanes
are linear so our SMILES only needs branching for Cl, which due to having a
valence of 1 is terminal.
therefore the following script can be used to generate them exhaustively
and even filter them for chlorine ratio without ever having to convert to a
mol

```
import rdkit
from rdkit import Chem
import itertools as it

allsmi=[]
maxvalence=2

def cl_ratio(Cl,C):
return Cl*35.5/(C*12+Cl*35.5+C*2-Cl+2) #ratio of chlorine vs total MW

for chainlength in [10,11,12,13]:
combinations = (list(i) for i in
it.product(list(range(maxvalence+1)),repeat=chainlength) if
tuple(reversed(i)) >= tuple(i)) #filter out mirror image molecules
for comb in combinations:
curr_smi = ""
for cl_count in comb:
curr_smi+="C" # increase chain with one carbon
for i in range(cl_count):
curr_smi+="(Cl)" # chlorinate as needed
if 0.4<=cl_ratio(sum(comb),chainlength)<=0.7:
allsmi.append(curr_smi) #add if it has the correct ratio of Cl
to MW
if maxvalence>1: # check for molecules that have a trichloromethyl
terminal cap
if comb[0]==2:
if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7:
allsmi.append("(Cl)"+curr_smi)
if comb[-1]==2:
if 0.4<=cl_ratio(sum(comb)+2,chainlength)<=0.7:
allsmi.append(curr_smi+"(Cl)")
else:
if comb[-1]==2:
if 0.4<=cl_ratio(sum(comb)+1,chainlength)<=0.7:
allsmi.append(curr_smi+"(Cl)")
with open('CSSP.smi', 'w') as f:
for smi in allsmi:
f.write("%s\n" % smi)
```

it takes about 5 seconds to run on a desktop and the output is a smi file
with 437001 smiles strings. In case you want to have max 1 Chlorine per
carbon the solution is a bit easier and faster. In the above script you can
change max valence to 1, outputting ~7K molecules.


best wishes

On Wed, Dec 8, 2021 at 11:29 AM Gyro Funch  wrote:

> Thank you very much for the pointer. I will investigate.
>
> Kind regards,
> Gyro
>
> On 2021-12-08 11:15 AM, Jan Halborg Jensen wrote:
>
> This package might do the trick:
> https://doi.org/10.26434/chemrxiv-2021-gt5lb
>
> On 8 Dec 2021, at 11.02, Gyro Funch  wrote:
>
> [You don't often get email from gyromagne...@gmail.com. Learn why this is
> important at http://aka.ms/LearnAboutSenderIdentification.]
>
> Hello,
>
> I am not a chemist, but have been using RDKit to generate descriptors
> and fingerprints for molecules with known SMILES. It is a very useful
> package!
>
> I have a problem on which I hope someone can provide some guidance.
>
> My work is in the area of toxicology and I am interested in generating
> SMILES for molecules referred to as 'short chain chlorinated paraffins'
> (SCCP).
>
> A general definition that is sometimes used is that an SCCP is given by
> the molecular formula
>
> C_{x} H_{2x-y+2} Cl_{y}
>
> where
>
> x = 10-13
> y = 3-12
>
> and the average chlorine content ranges from 40-70% by mass.
>
> -
>
> Can anyone provide guidance on how to generate the list of SMILES
> corresponding to the above rules?
>
> Thank you very much for your help!
>
> Kind regards,
> gyro
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discussdata=04%7C01%7Cjhjensen%40chem.ku.dk%7C2b00aad0f20a4547f64708d9ba322450%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637745546797073496%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=NSXDodEMYH9B9ak6mmB8ogTtApi8MYaWJ0pr9fJJElQ%3Dreserved=0
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using EnumerateLibraryFromReaction without fragmenting reactants

2021-12-05 Thread Wim Dehaen
Hi,
I think there is an issue on the level of your reaction SMARTS, in fact i
cannot get it to work on your example molecules, as there is a methyl
[CH3:0] amine defined in the query which is not to be found in the
substrate. I imagine some more explicit mapping of what connects and what
breaks where might help. When i used a more generally formulated N+OC=O >
NC=O.O amide coupling type reaction smarts, i get the two expected products:
```
rxn =
AllChem.ReactionFromSmarts('[NX3;H2,H1:0].[O;H:1]-[C:2]=[O:3]>>[N:0]-[C:2]=[O:3].[O:1]')
mol1 = Chem.MolFromSmiles('NC(=O)Nc1ccc(C(=O)O)cc1')
mol2 = Chem.MolFromSmiles('OC(=O)C(C11)c1c1')
products = rxn.RunReactants((mol1,mol2))
for i in range(len(products)):
display(products[i][0])
```
giving the two expected products
[image: image.png]
as a side note: this kind of "amide coupling" is actually not like a
regular amide coupling because the nitrogen is far less basic because it is
part of this urea framework. A more strict reaction SMARTS that takes only
primary and secondary amines and not nitrogens connected to carbonyls would
be '[NX3;H2,H1;!$(NC=O):0].[O;H:1]-[C:2]=[O:3]>>[N:0]-[C:2]=[O:3].[O:1]'
best wishes
wim


On Sun, Dec 5, 2021 at 8:24 PM James Wallace  wrote:

> Here's the specific example that I was referring to:
>
> I used a reaction with this SMILES, making a ChemicalReaction out of it
> using the add reactant templates:
>
> [image: image.png]
> N[CH3:1].O=C(O)C(c1c1)C11>>O=C(N[CH3:1])C(c1c1)C11
>
> Taking two reagents to fit it:
> [image: image.png]
> OC(=O)C(C11)c1c1
>
> [image: image.png]
> NC(=O)Nc1ccc(C(=O)O)cc1
>
> Gives us products
>
> [image: image.png]
> O=C(NC(=O)C(c1c1)C11)Nc1ccc(C(=O)O)cc1
>
> [image: image.png]
> NC(=O)NC(=O)C(c1c1)C11
>
> [image: image.png]
> O=C(O)c1ccc(NC(=O)C(c2c2)C22)cc1
>
> In my case, I want to just see the result that uses the most of the
> reagent in each case, and not the ones where the reagent is fragmented.
>
> Is that possible directly, or do I need to filter my results post hoc?
>
> On Thu, 2 Dec 2021 at 20:04, Paolo Tosco 
> wrote:
>
>> Hi James,
>>
>> I am not quite sure I understand what you have done and what you'd like
>> to achieve.
>> Ideally, could you please post:
>>
>> * the reaction you are using
>> * some example reactants
>> * the desired product(s)
>> * the undesired product(s)
>>
>> Thanks, cheers
>> p.
>>
>> On Thu, Dec 2, 2021 at 6:03 PM James Wallace 
>> wrote:
>>
>>> Hi,
>>> I've been working with the EnumerateLibraryFromReaction function to
>>> generate some quick access molecule libraries, using standard lists of
>>> reagents.
>>>
>>> However, when I get the output back, I notice that I get results that
>>> chop up the reactants I use wherever they match the reaction rule, so for a
>>> rule looking for NH I would effectively get the yellow ringed area here
>>> used as a structure as well as the desired red one.
>>> [image: image.png]
>>> As well, since this contains acid character, it would also match there,
>>> but I accept that I can't necessarily remove that.
>>>
>>> Is there any way I can filter out the partial usages, or am I stuck with
>>> using the substructures as well?
>>>
>>> Thanks,
>>>
>>> James
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss