Re: [Rdkit-discuss] One tautomer not included in list of enumerated tautomers

2024-02-05 Thread Diogo Martins
Hello,

I think it's a bug because the tautomers depend on how the input SMILES is
written. Both represent mol1:

Sc1ncc2c(c1)2
Sc1cc2c2cn1

However the resulting tautomers differ depending on which is used as input.

Best regards,
Diogo

On Mon, 5 Feb 2024 at 11:38, Lewis Martin  wrote:

> Thank you very much for the detective work, Wim! This is helpful.
>
> It looks like the _reverse_ transition is possible, though. If I start by
> generating tautomers of "mol2", then "mol1" is recovered, which indicates
> this is an allowed transform. Is it possible that one direction is allowed
> but not the reverse?
>
> Failing a solution there, does anyone know if it is possible to add SMIRKS
> to the allowed tautomers through the python interface?
> Thanks,
> Lewis
>
> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen  wrote:
>
>> hi lewis,
>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic
>> heteroatom H shift" does not account for other chalcogens than oxygen, so
>> no selenium, tellurium or sulfur.
>> you can find the list of transforms here:
>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46
>> (poiting to the line with the relevant transform).
>> best wishes
>> wim
>>
>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin 
>> wrote:
>>
>>> Hi all,
>>> I'm looking at scoring tautomers, and using the 'tautobase' dataset used
>>> by Weider et al* at:
>>>
>>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt
>>>
>>> This dataset has pairs of tautomers with experimental logK values to
>>> determine the preferred tautomer.
>>>
>>> In at least one case, depending on which tautomer you use as the 'entry'
>>> point, the enumerated tautomers by RDKit either do or don't include both of
>>> the pair of input molecules. *I'm hoping there's a way to uniquely
>>> recover the full set of possible tautomers from using any input tautomer. *
>>>
>>> Here's a code example:
>>>
>>> from rdkit import Chem

>>> from rdkit.Chem import Draw
>>>
>>> from rdkit.Chem.Draw import IPythonConsole
 IPythonConsole.drawOptions.addStereoAnnotation = True
 from rdkit.Chem.MolStandardize import rdMolStandardize

 #same result if you don't do any of these params.
>>>
>>> tautomer_params =
 Chem.MolStandardize.rdMolStandardize.CleanupParameters()
 tautomer_params.tautomerRemoveSp3Stereo = False
 tautomer_params.tautomerRemoveBondStereo = False
 tautomer_params.tautomerRemoveIsotopicHs = False
 tautomer_params.tautomerReassignStereo = False
 tautomer_params.doCanonical = True

 enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params)

 smi1 = 'Sc1cc2c2cn1'
 smi2 = 'S=c1cc2c2c[nH]1'
 mol1 = Chem.MolFromSmiles(smi1)
 mol2 = Chem.MolFromSmiles(smi2)

 #choose mol1 or mol2 to be source of tautomers:
 #choose mol1, and look at the tautomers. Note that mol2 isn't present!
 tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in
 enumerator.Enumerate(mol1)]

 Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not
 present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))],
  molsPerRow=4)

>>>
>>> And a picture of this in a notebook for an at-a-glance view:
>>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03
>>>
>>> Does anyone know a way to recover "mol2" within tautomers of "mol1"?
>>>
>>> Thank you!
>>> Lewis
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Iterate through atoms in molecule including hydrogens in Python

2023-10-07 Thread Diogo Martins
Hi Jeremy,

Chem.AddHs returns a new molecule, you could reassign the variable:

mol = Chem.AddHs(mol)

Best regards,
Diogo

On Sat, Oct 7, 2023 at 9:36 PM Jeremy Monat  wrote:

> In Python, I'd like to iterate through all the atoms in a molecule,
> including hydrogens, so I can assign an isotope to each atom. I haven't
> been able to include hydrogens in the iterable of atoms:
>
> from rdkit import Chem
>
> mol = Chem.MolFromSmiles("CCO") # Example molecule: Ethanol (C2H5OH)
>
> # Add explicit hydrogens
> Chem.AddHs(mol)
>
> for atom in mol.GetAtoms():
> print(f"Atom Symbol: {atom.GetSymbol()}")
> Output:
> Atom Symbol: C
> Atom Symbol: C
> Atom Symbol: O
>
> Similarly, mol.GetAtomWithIdx() works up to an index of only 3, giving C,
> C, and O atoms but no hydrogens.
>
> Thanks,
> Jeremy
>  -- ~ -- ~ --
> Jeremy Monat, PhD
> LinkedIn: http://www.linkedin.com/in/jemonat
> Portfolio: https://bertiewooster.github.io
> GitHub: https://github.com/bertiewooster
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] molecular propierties from MolFile, multiprocessing and SDMolSupplier

2022-07-12 Thread Diogo Martins
Hi Eduardo,

Regarding 1., mol blocks end with "M  END", and SDF files (aka SD Files)
are the ones with properties. In
https://www.rdkit.org/docs/GettingStartedInPython.html the code below "An
SDWriter can also be initialized using a file-like object" produces an SDF
string.

Best regards,
Diogo

On Tue, 12 Jul 2022 at 06:54, Eduardo Mayo 
wrote:

> Hi all,
>
> I hope you are doing well. I have some questions:
>
> 1. Is there any way to read and write molecular properties in MolBlocks?
>
> mol = Chem.MolFromSmiles("C")
> mol.SetProp("Name", "methane")
> mol.SetProp("Formula", "CH4")
> Chem.MolToMolBlock(mol)
>
> Expected behavior:
> ```
>  RDKit  2D
>
>   1  0  0  0  0  0  0  0  0  0999 V2000
> 0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0
> M  END
> >  
> methane
>
> >  
> CH4
> ```
>
> 2. Is there any multiprocessor implementation of PandasTools.LoadSDF or
> SDMolSupplier?
>
> 3 How the MultithreadedSDMolSupplier works??
>
> All the best,
> Eduardo
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Integrating xyz2mol into the RDKit

2022-07-07 Thread Diogo Martins
Hi Sreya,

I did some modifications to xyz2mol to make it work for more molecules,
which unfortunately made it very slow. All of the changes are up on github.

The PR: https://github.com/jensengroup/xyz2mol/pull/25

My fork, on branch "penalize_charges", is a few commits ahead of the PR.

Best regards,
Diogo

On Thu, 30 Jun 2022 at 03:20, Sreya Gogineni  wrote:

> Hi everyone,
>
> For the next few months I'll be working on integrating 'xyz2mol', a
> program developed by Professor Jan Jensen's research group from the
> University of Copenhagen, into the RDKit. 'xyz2mol' (
> https://github.com/jensengroup/xyz2mol) can convert atomic coordinates
> presented as an xyz file into an RDKit molecule object with favorable bond
> ordering.
>
> The program will likely be integrated as a few different functions
> including a file parser that creates a mol object from an xyz file, a
> function that accepts a mol object and returns the object with atomic
> connectivity, and a function that accepts a mol object with defined
> connectivity and returns the object with bond ordering.
>
> Please feel free to check out my project proposal on Google Summer of Code
> (https://summerofcode.withgoogle.com/programs/2022/projects/ugO4HoEX),
> and I'd love to hear any thoughts or suggestions that anyone has.
>
> Thanks,
> Sreya Gogineni
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Different 3D descriptors depending on mol reading method

2022-06-15 Thread Diogo Martins
Hi JSousa,

Adding "removeHs=False" when reading from SDF should fix it.

Best regards,
Diogo

On Wed, 15 Jun 2022 at 01:24, J Sousa  wrote:

> Hi Greg,
>
> Including the randomSeed argument in all instances didn't change the
> situation:
> AllChem.EmbedMolecule(mol,useRandomCoords=True,randomSeed=42)
>
> The descriptors are still different using the same randomSeed=42. And they
> are quite different (not just the normal fluctuations from different
> conformers).
>
> Best,
> J
>
>
>
>
> Greg Landrum  escreveu no dia quarta, 15/06/2022
> à(s) 07:09:
>
>> Hi,
>>
>> I guess the differences you are seeing are arising because you have
>> different conformers of the molecule.
>> The conformer generation process  in EmbedMolecule() uses a stochastic
>> procedure and if you want to be sure that you get the same results from
>> multiple runs you need to provide a random seed using the randomSeed
>> argument.
>>
>> Please give that a try and see if it helps,
>> -greg
>>
>>
>>
>>
>> On Tue, Jun 14, 2022 at 9:15 PM J Sousa  wrote:
>>
>>> I'm trying RDKit to calculate 3D descriptors, but I get significant
>>> different descriptors if I read molecules from a SMILES file (and
>>> clean/optimize the 3D structure before calculating the descriptors) or if I
>>> read the SDF file obtained from exactly the same SMILES file using exactly
>>> the same code to optimize the structures.
>>>
>>> Scripts attached.
>>>
>>> Running smiltodesc_check.py produces descr_myfile.txt
>>>
>>> Running gen3D_check.py and then descr_from_sdf_check.py produces
>>> myfile_descr.txt
>>>
>>> But the two files are significantly different.
>>>
>>> Why aren't they the same? Which is wrong?
>>>
>>> JSousa
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] permutations of symmetric atoms

2022-04-19 Thread Diogo Martins
Thank you, Greg, this is exactly what I was looking for.

On Sat, 16 Apr 2022 at 20:54, Greg Landrum  wrote:

> Hi Diogo,
>
> The easiest way to do this is to use the substructure matching code with
> "uniquify=False" to find all the automorphisms between a molecule and
> itself:
> In [8]: m1 = Chem.MolFromSmiles('Oc1c1')
>
> In [9]: list(m1.GetSubstructMatches(m1,uniquify=False))
> Out[9]: [(0, 1, 2, 3, 4, 5, 6), (0, 1, 6, 5, 4, 3, 2)]
>
> Here's another example:
> In [10]: m = Chem.MolFromSmiles('Oc1ccc(c2ccc(Cl)cc2)cc1')
>
> In [11]: list(m.GetSubstructMatches(m,uniquify=False))
> Out[11]:
> [(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13),
>  (0, 1, 2, 3, 4, 5, 11, 10, 8, 9, 7, 6, 12, 13),
>  (0, 1, 13, 12, 4, 5, 6, 7, 8, 9, 10, 11, 3, 2),
>  (0, 1, 13, 12, 4, 5, 11, 10, 8, 9, 7, 6, 3, 2)]
>
>
> I hope this helps,
> -greg
>
>
> On Sat, Apr 16, 2022 at 1:46 AM Diogo Martins 
> wrote:
>
>> Hello,
>>
>> I'd like to enumerate all possible permutations of symmetric atoms.
>> Consider the following code:
>>
>> phenol = Chem.MolFromSmiles("Oc1c1")
>> equivalencies = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
>> print(equivalencies)
>> [0, 6, 4, 2, 1, 2, 4]
>>
>> Atoms that have the same value in list "equivalencies" are symmetric. For
>> phenol, the equivalent atoms correspond to a 180 degree rotation of the
>> aromatic ring over the axis containing the carbon-oxygen bond. The possible
>> permutations, expressed as atom indices, are:
>> [0, 1, 2, 3, 4, 5, 6]
>> [0, 1, 6, 5, 4, 3, 2]
>>
>> By permutations, I mean that it is possible to replace the coordinates of
>> the atoms and produce a realistic molecule.
>>
>> A brute force approach comes to mind, where one would enumerate all
>> possible combinations, and exclude those that change the molecular graph.
>> In the example above there are four possible combinations, because there
>> are two groups of two symmetric atoms. An example of an "invalid"
>> combination is swapping the third and seventh atoms without swapping the
>> fourth and sixth atoms:
>> [0, 1, 6, 3, 4, 5, 2]
>> This would be excluded as it breaks the bond between the third and fourth
>> atoms (among other bonds).
>>
>> Is there a method in the RDKit to enumerate the valid permutations?
>>
>> Thank you,
>> Diogo
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] permutations of symmetric atoms

2022-04-15 Thread Diogo Martins
Hello,

I'd like to enumerate all possible permutations of symmetric atoms.
Consider the following code:

phenol = Chem.MolFromSmiles("Oc1c1")
equivalencies = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
print(equivalencies)
[0, 6, 4, 2, 1, 2, 4]

Atoms that have the same value in list "equivalencies" are symmetric. For
phenol, the equivalent atoms correspond to a 180 degree rotation of the
aromatic ring over the axis containing the carbon-oxygen bond. The possible
permutations, expressed as atom indices, are:
[0, 1, 2, 3, 4, 5, 6]
[0, 1, 6, 5, 4, 3, 2]

By permutations, I mean that it is possible to replace the coordinates of
the atoms and produce a realistic molecule.

A brute force approach comes to mind, where one would enumerate all
possible combinations, and exclude those that change the molecular graph.
In the example above there are four possible combinations, because there
are two groups of two symmetric atoms. An example of an "invalid"
combination is swapping the third and seventh atoms without swapping the
fourth and sixth atoms:
[0, 1, 6, 3, 4, 5, 2]
This would be excluded as it breaks the bond between the third and fourth
atoms (among other bonds).

Is there a method in the RDKit to enumerate the valid permutations?

Thank you,
Diogo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss